We’ve had reports from a few clients that they are having issues accessing our Exchange 2007 server DEWEY. We are currently investigating into this issue. Update 11:20 AM: Service on DEWEY has been running 100% over an hour. It seems close to 10 PM Eastern on 2/4/2010, IIS had issues compiling the metadatabase..which ultimately did not throw any errors in windows logs or in any of our monitoring software. We ran a quick repair on the metadatabase soon after being notified of service outage and restarted IIS which resolved the issue.
Backup74 will be going offline from 6pm Eastern till 10pm Eastern today (2/2/10) for extended maintenance. We will be installing 4TB of additional storage to the server. Update 8:20 PM Eastern: After installing the new controller the server was unable to boot property. We’ve removed the new controller and ordered a new replacement.
We have received reports from a few partners that ExchangeDefender users are not receiving mail released from Quarantine. We are investigating into this issue and assure that no mail has been lost. If you experience issues in releasing mail from quarantine please open a support ticket at support.ownwebnow.com including the senders email address , recipients email address, and time stamp. Update – Noon EST – We have resolved the issue that caused the release mechanism to fail, the problem has been fixed and as of about 11:30 the full functionality has been restored. Now a word from Vlad Mazek, CEO:
Throughout the day the ExchangeDefender outbound grid has been fighting extremely large mail queues and hour long delivery delays. The source has been identified as a DDoS attack and we’ve taken all mesures to remove the mail. Legitimate mail that hasn’t been delivered will be delivered throughout the next couple of hours. We highly apologize and we are making modifications to the outbound grid throughtout the next limit to prevent flooding.
We will be holding an extended maintenance window this weekend affecting admin.exchangedefender.com systems:
During this time window access to admin.exchangedefender.com will be intermittent as we undergo a major networking and hardware update to handle the expansion and additional services. We will start posting updates to this blog during the 4 AM – 7 AM.
Our second US OBS server, backup74 will be going offline for the next three hours as we begin to migrate users around different volumes. Service is expected to be restored before 5 PM Eastern.
The Offsite Backup Server, backup74, will be going offline at 3:00 PM Eastern to install hotfixes to OBS. The server is expected to be offline for 30 minutes and service should be restored by 3:30 PM Eastern. Update 3:01 PM Eastern: Service has been disabled on backup74 as we begin patching. Update 3:11 PM Eastern: Service has been restored on backup74 and the server is now running version 5.5.5.5-2.
At 4:10 PM Eastern we will be rebooting DEWEY for feature enhancements and more analytical monitoring. The reboot is expected to last no longer than 10 minutes and service should be 100% functional by 4:20 PM Eastern.
We are beginning a maintenance schedule for backup74.ownwebnow.com. During the maintenance schedule, access to backup74.ownwebnow.com will be interrupted, however, service is expected to be restored by 12:00PM Eastern. Updated 1:30 PM Eastern: User migration is taking a bit longer than expected. The final user move is in progress and the server should be online before 3pm Eastern. Updated 2:50 PM Eastern: User migration has completed and service has been restored to backup74. maintenance
We are about to reboot huey.exchangedefender.com due to user accessibility complaints. Service is expected to be fully impacted on HUEY for the 15 minutes while the reboot commences. Update 3:21 PM Eastern: The server is back online from the reboot and service has been restored.
We are in the process of replacing the certificate for HUEY in our Exchange 2007 cluster. We highly apologize for the inconvenience however service should be restored before 3pm Eastern. Updated 2:42 PM Eastern: The certificate has been successfully replaced on HUEY. Service on HUEY is now 100% operational.
Last week (January 8th and 9th) we received a dozen reports of messages that simply vanished in the ExchangeDefender system. Upon investigation it turned out that one of the antivirus engines was picking up false positives: marking messages with certain PDF attachments as infected when in fact there was no infection there. The actual infection was simply a detection of an exploit, one that can easily and inadvertently be created by older versions of Acrobat. We have removed the antivirus engine from the rotation (don’t worry, everything is still being scanned by several other scanners). While the problem in the definition files was already addressed (Exploit.PDF-9669) and widely blogged and discussed, we need a way to deal with false positives. Prior to this we have never had an instance of a reported false positive with an antivirus engine but as more antivirus vendors get into the business of not just detecting viruses and worms but also exploits and other dangerous content, our reporting will have to get better as well. The bigger question here is: Why was I not notified? If this happened here, it would also explain why I am never received any of the other messages. Allow me to address that in two ways: 1) Almost all of our “missing messages” tickets are related to the messages being quarantined as SPAM and not coming into LiveArchive. At the present time there is no way to get a SPAM message into LiveArchive, even after it’s released from the Quarantine. Because our replication is done at the scan time, we have to move the copying protocol elsewhere to enable post-release and SPAM content.
2) We have never before seen a false positive from an antivirus engine. We’ve seen it crash, we’ve seen it fail to detect a real infection, we’ve seen it bring the scanning node to a crawl and just about everything you’d expect from a piece of security software: just never a false reading. Consequently, we never wrote a process to monitor for the false positives and we never bothered to present the infection logs because so many contained meaningless junk. Several years ago, after countless alerts for Sober and Nimda and so on, we disabled end user reports for antivirus and it was eventually dropped from the product completely.
IMPORTANT: While these infections appeared to be lost forever, we do have them stored on our servers. Reported messages are being released (by hand) by our support teams so if you know the message sender/recipient/subject and date the message was sent, we can retrieve the message and deliver it. -Vlad
We’ve had reports from a few partners that RPC over HTTPs is not connecting on HUEY. We are rebooting the server to rectify the issue. Update 10:12 AM Eastern: Service has been restored and we have confirmed RPC over HTTP is functioning properly.
We are currently working on livearchive.exchangedefender.com and the server is currently offline. The IIS application pool keeps crashing since a Windows Update from last night. We expect to have the server up shortly. Update 9:00 AM Eastern: We’ve resolved the issue with livearchive OWA.
All of our BES servers are currently offline as we move the virtual disks to the new RAID set added in yesterday. All servers are expected to be online in the next 45 minutes.
The livearchive database for some ExchangeDefender users is starting to show mail routing issues. We’ve disabled this database and temporarily put up a blank database. Over the weekend we will attempt to diagnose the issue with the database and remount the affected database. Update 12-06-09: After many attempts to restore the database, the decision was made to leave the database dismounted in preparation of LiveArchive 3.0 (Due for release in Feb 2010). The current running database is >6 TB in size and direct repairs would take at least a month, leaving customers without the ability to utilize LiveArchive. All users currently have new mailboxes and we plan to migrate the >6TB database into the new LA 3.0 database.
The sharepoint server that services users on DEWEY will be rebooted at 9PM Eastern tonight to finalize the installation of software updates. Ludwig is expected to be online no more than 10 minutes after the reboot. Update 9:12 PM Eastern: The reboot process for LUDWIG has begun. Update 9:25 PM Eastern The reboot has completed and LUDWIG is back online.
The backup74 OBS service will be offline until 3pm Eastern today. During this period we will be performing upgrades to the server and moving users around to relieve resource usage on the server. Update 1:35 PM Eastern: Service has been restored to backup74
As mentioned last week, we are now deferring all mail from popular SPAM blacklists at SpamCop and SpamHaus. It is important to stress that we are not blocking or rejecting mail from these sites, merely temporarily deferring accepting messages. This subtle difference is what separates spammers from legitimate senders. Legitimate mail server operators will immediately notice they are on an RBL and will address the issue and remove themselves from it. Our choice of SpamCop and SpamHaus came after years of use, peer reviews and our own statistical models indicating that they rarely make mistakes. We are not using third party reputation lists or greylisting which will delay mail delivery, we are just making sure that the mail arriving to you is from a legitimate source and a secure mail server. Important Notice: tempfail effect on SureSPAM Nearly all the messages in SureSPAM quarantine was from SpamHaus and SpamCop. As a result of us tempfailing mail from these known SPAM sources, you will see a significant decrease in SPAM and junk mail report stats as well. If you have clients that you have not yet migrated from SPAM reports to our Outlook and Desktop software, we recommend sending them the following alert:
We are closely monitoring the network during this change and will update the NOC blog if there are any issues. We do not expect anything unusual to come as a result of this implementation.
We will be starting maintenance on backup74 shortly. During this maintenance window we will be moving users around to new volumes on the server. Service is expected to be restored by 12:00PM Eastern. Update 1:06 PM Eastern: The backup service has been restarted and the user move has been completed. |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


