Dear Telnexus Customer:
I am writing to personally apologize to you for the Telnexus system outage last week. Please allow me to explain what happened and let you know what we are doing to prevent it in the future.
On the morning of Thursday, March 13 our hosting partner experienced an “upstream outage” that prevented our telephone switch from receiving calls, and prevented most of your phones from making outbound calls. The outage was partial, where most web traffic was still moving but voice and video traffic was blocked. The partial nature of the outage made discovering and rectifying the broken part of the network harder than usual.
Our first emergency procedure was to contact every customer by voice and forward customer numbers to designated cell phones. Next, we needed to get a backup switch online and connected to your system. We already had a backup switch online in a cloud-based public network, but we needed to move several accounts and settings to that switch on an emergency basis. By 12:30pm on March 13 we had moved most of the impacted customers to our new switch. Two customers on who were a separate server at our hosting partner remained down until around 5:30pm.
Overnight our engineers worked to secure our new network, and we continued to move all of our customers over to other servers. On the morning of Friday, March 14 some customers experienced a repeat of Thursday’s issues where inbound calls were not being routed to your PBX system. We believe that this outage was due to lingering configurations and other issues that were induced by the move to a new switch. Two customers whose server was hosted at our partner continued to have problems until near the end of the day on Friday. As of noon today, March 17, we believe that all services have been restored.
Obviously, this is completely unacceptable and we assume full responsibility for providing you reliable, uninterrupted telephone service. Bottom line, the outage was an infrastructure issue that we can correct. Telnexus is a growing company, but fortunately we have the resources to take some steps to beef up our infrastructure and get ahead the problems that typically impact companies like Telnexus.
We are moving out of our current hosting provider to a new, high-end facility in downtown Oakland. This will be our new primary Network Operations Center (NOC) and we will have full control over all of the hardware components at the Oakland facility. This new facility is certified by the credit card industry association as worthy of guarding your credit card and payment information. That is needed since we will start accepting electronic payments soon.
We are also spreading our network out beyond the Bay Area. Our current cloud-based backup system is in Chicago, and we are planning on duplicating the Telnexus-owned hardware setup in Northern Virginia and another Western United States location. Once these new facilities are online we will have automatic “failover” to keep you up and running even after a regional disaster here in the Bay Area.
We know that you count on Telnexus to run your business. For several hours last week we failed you, and I am sincerely sorry that happened to you and Telnexus. We have an industry-standard plan to address this failure, and this incident will accelerate our already-existing plans to make that happen. We are also doing some things I didn’t detail here to increase the security of our network too.
You should also know I just hired a new key person here at Telnexus. Greg Merriweather is our new Senior Telecom Engineer. He comes to us from Yahoo!, where he was responsible for keeping 17,000 phones online. Greg was in the process of hardening our infrastructure when this incident happened, so thankfully we had the people and resources to get our backup systems running.
If you have any concerns about your service or my response to this incident, you may call me at any time to discuss. I will do my best to make sure you are pleased with our response.