Last week (January 8th and 9th) we received a dozen reports of messages that simply vanished in the ExchangeDefender system. Upon investigation it turned out that one of the antivirus engines was picking up false positives: marking messages with certain PDF attachments as infected when in fact there was no infection there. The actual infection was simply a detection of an exploit, one that can easily and inadvertently be created by older versions of Acrobat. We have removed the antivirus engine from the rotation (don’t worry, everything is still being scanned by several other scanners). While the problem in the definition files was already addressed (Exploit.PDF-9669) and widely blogged and discussed, we need a way to deal with false positives. Prior to this we have never had an instance of a reported false positive with an antivirus engine but as more antivirus vendors get into the business of not just detecting viruses and worms but also exploits and other dangerous content, our reporting will have to get better as well. The bigger question here is: Why was I not notified? If this happened here, it would also explain why I am never received any of the other messages. Allow me to address that in two ways: 1) Almost all of our “missing messages” tickets are related to the messages being quarantined as SPAM and not coming into LiveArchive. At the present time there is no way to get a SPAM message into LiveArchive, even after it’s released from the Quarantine. Because our replication is done at the scan time, we have to move the copying protocol elsewhere to enable post-release and SPAM content.
2) We have never before seen a false positive from an antivirus engine. We’ve seen it crash, we’ve seen it fail to detect a real infection, we’ve seen it bring the scanning node to a crawl and just about everything you’d expect from a piece of security software: just never a false reading. Consequently, we never wrote a process to monitor for the false positives and we never bothered to present the infection logs because so many contained meaningless junk. Several years ago, after countless alerts for Sober and Nimda and so on, we disabled end user reports for antivirus and it was eventually dropped from the product completely.
IMPORTANT: While these infections appeared to be lost forever, we do have them stored on our servers. Reported messages are being released (by hand) by our support teams so if you know the message sender/recipient/subject and date the message was sent, we can retrieve the message and deliver it. -Vlad
We’ve had reports from a few partners that RPC over HTTPs is not connecting on HUEY. We are rebooting the server to rectify the issue. Update 10:12 AM Eastern: Service has been restored and we have confirmed RPC over HTTP is functioning properly.
We are currently working on livearchive.exchangedefender.com and the server is currently offline. The IIS application pool keeps crashing since a Windows Update from last night. We expect to have the server up shortly. Update 9:00 AM Eastern: We’ve resolved the issue with livearchive OWA.
All of our BES servers are currently offline as we move the virtual disks to the new RAID set added in yesterday. All servers are expected to be online in the next 45 minutes.
The livearchive database for some ExchangeDefender users is starting to show mail routing issues. We’ve disabled this database and temporarily put up a blank database. Over the weekend we will attempt to diagnose the issue with the database and remount the affected database. Update 12-06-09: After many attempts to restore the database, the decision was made to leave the database dismounted in preparation of LiveArchive 3.0 (Due for release in Feb 2010). The current running database is >6 TB in size and direct repairs would take at least a month, leaving customers without the ability to utilize LiveArchive. All users currently have new mailboxes and we plan to migrate the >6TB database into the new LA 3.0 database.
The sharepoint server that services users on DEWEY will be rebooted at 9PM Eastern tonight to finalize the installation of software updates. Ludwig is expected to be online no more than 10 minutes after the reboot. Update 9:12 PM Eastern: The reboot process for LUDWIG has begun. Update 9:25 PM Eastern The reboot has completed and LUDWIG is back online.
The backup74 OBS service will be offline until 3pm Eastern today. During this period we will be performing upgrades to the server and moving users around to relieve resource usage on the server. Update 1:35 PM Eastern: Service has been restored to backup74
As mentioned last week, we are now deferring all mail from popular SPAM blacklists at SpamCop and SpamHaus. It is important to stress that we are not blocking or rejecting mail from these sites, merely temporarily deferring accepting messages. This subtle difference is what separates spammers from legitimate senders. Legitimate mail server operators will immediately notice they are on an RBL and will address the issue and remove themselves from it. Our choice of SpamCop and SpamHaus came after years of use, peer reviews and our own statistical models indicating that they rarely make mistakes. We are not using third party reputation lists or greylisting which will delay mail delivery, we are just making sure that the mail arriving to you is from a legitimate source and a secure mail server. Important Notice: tempfail effect on SureSPAM Nearly all the messages in SureSPAM quarantine was from SpamHaus and SpamCop. As a result of us tempfailing mail from these known SPAM sources, you will see a significant decrease in SPAM and junk mail report stats as well. If you have clients that you have not yet migrated from SPAM reports to our Outlook and Desktop software, we recommend sending them the following alert:
We are closely monitoring the network during this change and will update the NOC blog if there are any issues. We do not expect anything unusual to come as a result of this implementation.
We will be starting maintenance on backup74 shortly. During this maintenance window we will be moving users around to new volumes on the server. Service is expected to be restored by 12:00PM Eastern. Update 1:06 PM Eastern: The backup service has been restarted and the user move has been completed.
We’ve had a couple reports of users unable to connect to HUEY through RPC. We’ve restarted the IIS service but some users are still reporting issues. The HUEY server will be going offline in a couple minutes for a service reboot. The server is estimated to be down for 15 minutes during the reboot. Update 12:04 PM Eastern: The server is now going down for the reboot. Update 12:16 PM Eastern: The server has returned from the reboot and service has been restored.
We’ve received reports from a couple partners that they’ve received an email titled “your mailbox has been deactivated” that has an executable attachment that gets stripped by ExchangeDefender. This seems to be a blind attack from the outside and we’ve already implimented the checks to block these messages from coming through ExchangeDefender. Just as a reminder, we will never email end users about issues with ExchangeDefender, we only contact our registered partner.
We are in the process of rebooting the livearchive server. Our alerting software showed periods of inaccessibility which we believe will be resolved with a reboot. Update 8:30 AM Eastern: The livearchive server has been rebooted and is back online. Services are running 100%.
We are about to replace the SSL certificate on our 2003 server Daisy. During the replacement RPC over HTTPs may be unavailable but will be restored shortly. Update 10:39 PM Eastern: The SSL has been replaced on Daisy and service has been restored 100%.
In preparation for the release of ExchangeDefender 5.0 we’ve installed 4 new servers to process outbound mail for ExchangeDefender clients. This transition was seamless and shouldn’t require any work from our partners, however any clients who are using SPF records will need to add the following IP addresses p4:65.99.255.234
At 7:10 PM Eastern we will begin stress test maintenance on HUEY that is scheduled to last until 10PM Eastern. We are performing click tests and harddrive upgrades. There will be periods of inaccessibility through OWA and Outlook. Update 2:07 AM: Maintenance has completed and the database has been moved to the new drive. Service has been restored 100%. Update 937 PM: The hard drives are in place however the copy is taking longer than expected. New mail is being queued so there will be no lost emails during this maintenance.
The Australian exchange server is currently offline for network upgrades from 00:00 – 04:00 GMT +10 Due to increased network activity over the past several month in the data center, Servers Australia technicians will be replacing several Fast Ethernet Interfaces with Gigabit Ethernet and Multimode Fiber interfaces in the Sydney Core Routers. The maintaince upgrades of our entire data center network within Equinix Mascot and the SAU Data Center at Tuggerah is to provide a more reliable and responsive experience.
We have received a few complaints about outlook not updating on HUEY. We will be restarting the services momentarily and if a reboot is required, this post will be updated.
At 10:30 AM Eastern we will be shutting off the Ahsay OBS service on Backup74 to install the latest updates to the OBS platform. Service is expected to be down for less than 15 minutes. Update 10:35 AM Eastern: The update has been completed. Service has been restored to Backup74. Update 10:31 AM Eastern: The service has been shut off for the update.
Over the weekend, the Tomcat web service on backup74 stopped answering new requests. We’ve opened up a ticket with Ahsay and was provided with a hotfix to install. We are going to try to make the changes to the Tomcat config files to restore service before installing the hotfix patch. Service is expected to be functional within the hour. Update 10:50pm Eastern: Service to backup74 has been restored without installing the hotfix. We are planning to install a stable patch upgrade to the server later in the week.
The Ahsay OBS service has been stopped on Backup74 while we move users around on the volumes. |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



