By Kristen Mosbrucker – Reporter, San Antonio Business Journal
Sep 6, 2018, 6:50am CDT Updated Sep 6, 2018, 3:31pm
Lightning strikes and extreme weather from real clouds this week brought down the internet cloud that Microsoft Corp. hosts in San Antonio, resulting in what appears to have been a difficult outage for Azure customers connected to its south central region data centers.
Microsoft’s email server associated with Office 365, the cloud-based suite of office applications, requires access to Azure, so some users had sluggish email connectivity. What to many appeared to be a slow internet connection was a symptom of issues affecting the back-end system of Azure’s cloud.
Extreme weather “resulted in a power voltage increase that impacted cooling systems. Mitigation efforts are still ongoing. … Engineers have recovered a majority of the impacted network devices,” said an update shared on the Azure Support Twitter account.
Microsoft declined a request to be interviewed for this article.
“We’re working to resolve connectivity issues for some customers with resources hosted in South Central United States,” according to an emailed statement from a Microsoft representative.
Engineers in the data centers were working to mitigate the issue and posted updates on the company’s support Twitter account. Meanwhile, it was a tense time for businesses that resell Azure for customers to run critical tools in the cloud.
Many businesses that rely on hosting web applications on Azure reported having no access to databases in the cloud for hours after the initial issue on Tuesday morning.
For EmergeNext ADNM LLC, a technology consulting company in the Chicago region with a joint venture in Montreal, Canada that is a gold enterprise resource planning partner with Microsoft, the issue meant that thousands of its customers were unable to access the cloud to do business after the Labor Day weekend. The situation was amplified as companies along the coast sought to prepare for hurricanes heading toward them.
Doyle Dettro, a partner at EmergeNext, told the Business Journal that it was a stressful situation.
“We’ve got customers without access to their system, so they can’t ship. Their warehouses are down. They can’t do point of sale either,” Dettro said early afternoon on Sept. 5. “We’ve never been through a real live failover since we’ve been on Azure, so we were completely locked out. Last night, some of those storage units came back online, and [customers] were able to do business after hours, but today they were having networking issues.”
Some of the company’s customers had bought what is known as “geographically redundant” data center storage, meaning their data was available in another data center as backup.
“We have some customers who are geo-redundant, and we were able to get them back up and running within the hour” of the initial issues, Dettro said.
But after more than a day without access to data, EmergeNext was migrating data for customers to another data center when it got word from Microsoft that the issue was close to being resolved late Wednesday afternoon.
More than a decade ago, the company would have had to trek to the data center, as it did for an oil and gas industry customer that had its on-premise data center struck by lightning in Nigeria in the mid-2000s.
“We don’t really know where these data centers exist anymore. In real life, when things happen it’s kind of strange,” Dettro said. “All we can do is look at these status pages and keep refreshing them.”
The issue, which has impacted companies to varying degrees depending on their service agreements, lasted more than 24 hours. That is a big deal in the cloud hosting industry, where companies promise 99.9 percent uptime — or access to data for web applications to function.
If lack of access persists, cloud hosting providers often issue credits on accounts that were not accessible after outages.
What does EmergeNext plan to do after the situation is normalized? Require its customers to buy geographically redundant data center services, even small businesses who may be looking to trim monthly cloud hosting bills.
Meanwhile, Dettro said he still has faith in the cloud, including Microsoft’s. And since it was a rough lesson to learn, Microsoft’s cloud might be stronger now.
“We will essentially make [geo-redundancy] mandatory because it does work. We had only tested it before,” he said. “We’ve had no data loss so far, but I think the holiday saved us there because nobody was doing business on Monday.”
By 4 p.m. Central time on Sept. 5, Dettro told the Business Journal that EmergeNext’s customers were 100 percent back online.