Submitted by tensai on
I was welcomed to work this morning by a wonderful little failure. Our experience was pretty much like the story explains, only worse. After our two clusters built up a queue of 42,000 and 46,000 messages, we got the new antivirus definition installed. For one system that has antivirus on the back-end mail server, we disabled AV scanning completely and that sped things up drastically. Still, considering that it had to handle its normally load and the backlog it took 4 hours to clear completely out. The other system took a good 7. "Service was fully operational about two hours later", my eye.
Sounds like fun, no? The only thing that makes it even better is that they did the same thing to us yesterday through some new feature they pushed out. Apparently it does a checksum on images and then looks that up in DNS. Sounds like a pretty good tactic, but something about it overwhelmed the system with DNS lookups.
Wish me luck for whatever bug I get to fight tomorrow.
Recent comments