This typo sparked a Microsoft Azure outage

Microsoft Azure DevOps, a suite of application lifecycle services, stopped working in the South Brazil region for about ten hours on Wednesday due to a basic code error.

On Friday Eric Mattingly, principal software engineering manager, offered an apology for the disruption and revealed the cause of the outage: a simple typo that deleted seventeen production databases.

Mattingly explained that Azure DevOps engineers occasionally take snapshots of production databases to look into reported problems or test performance improvements. And they rely on a background system that runs daily and deletes old snapshots after a set period of time.

During a recent sprint – a group project in Agile jargon – Azure DevOps engineers performed a code upgrade, replacing deprecated Microsoft.Azure.Managment.* packages with supported Azure.ResourceManager.* NuGet packages.

The result was a large pull request of changes that swapped API calls in the old packages for those in the newer packages. The typo occurred in the pull request – a code change that has to be reviewed and merged into the applicable project. And it led the background snapshot deletion job to delete the entire server.

“Hidden within this pull request was a typo bug in the snapshot deletion job which swapped out a call to delete the Azure SQL Database to one that deletes the Azure SQL Server that hosts the database,” said Mattingly.

Azure DevOps has tests to catch such issues, but according to Mattingly, the errant code only runs under certain conditions and thus isn’t well covered under existing tests. Those conditions, presumably, require the presence of a database snapshot that is old enough to be caught by the deletion script.

Mattingly said Sprint 222 was deployed internally (Ring 0) without incident due to the absence of any snapshot databases. Several days later, the software changes were deployed to the customer environment (Ring 1) for the South Brazil scale unit (a cluster of servers for a specific role). That environment had a snapshot database old enough to trigger the bug, which led the background job to delete the “entire Azure SQL Server and all seventeen production databases” for the scale unit.

The data has all been recovered, but it took more than ten hours. There are several reasons for that, said Mattingly.

One is that since customers can’t revive Azure SQL Servers themselves, on-call Azure engineers had to handle that, a process that took about an hour for many.

Another reason is that the databases had different backup configurations: some were configured for Zone-redundant backup and others were set up for the more recent Geo-zone-redundant backup. Reconciling this mismatch added many hours to the recovery process.

“Finally,” said Mattingly, “Even after databases began coming back online, the entire scale unit remained inaccessible even to customers whose data was in those databases due to a complex set of issues with our web servers.”

These issues arose from a server warmup task that iterated through the list of available databases with a test call. Databases in the process of being recovered chucked up an error that led the warm-up test “to perform an exponential backoff retry resulting in warmup taking ninety minutes on average, versus sub-second in a normal situation.”

Further complicating matters, this recovery process was staggered and once one or two of the servers started taking customer traffic again, they’d get overloaded, and go down. Ultimately, restoring service required blocking all traffic to the South Brazil scale unit until everything was sufficiently ready to rejoin the load balancer and handle traffic.

Various fixes and reconfigurations have been put in place to prevent the issue from recurring.

“Once again, we apologize to all the customers impacted by this outage,” said Mattingly. ®

Note: This article have been indexed to our site. We do not claim legitimacy, ownership or copyright of any of the content above. To see the article at original source Click Here

PULIDIKI Car Cleaning Gel Car Cleaning Putty Kit Car Interior Cleaner Slime Auto Detail Tools Supplies Car Accessories Stocking Stuffers Gifts for Men Women White Elephant Gifts Adults Teens Christmas

(90324)

$9.49 (as of December 17, 2024 19:32 GMT +00:00 - )

Custom Air Fresheners for Car with Photo - 12 Scents - Double-Sided Die-Cut Picture Air Fresheners with Logo, Car Fresheners with Logo

(13)

$14.95 (as of December 17, 2024 19:03 GMT +00:00 - )

Drift Car Air Freshener - Wood Air Freshener - Car Odor Eliminator - Amber Scent Starter Kit

(2368)

$12.95 (as of December 17, 2024 19:32 GMT +00:00 - )

Crock-Pot Electric Lunch Box, 20-Ounce Portable Food Warmer, Black Licorice, Perfect for Travel, On-the-Go & Office Use | Stylish, Spill-Free & Dishwasher-Safe | Ideal Men & Women's Gifts

(8243)

$34.99 (as of December 17, 2024 19:11 GMT +00:00 - )

Sony PlayStation DualSense wireless controller – 30th…

(14)

$169.95 (as of December 17, 2024 19:03 GMT +00:00 - )

Index Of News Author

Technology

Deal | The popular 1TB Samsung 980 Pro NVMe PCIe 4.0 SSD is back on sale

Reviews, News, CPU, GPU, Articles, Columns, Other "or" search relation.3D Printing, 5G, Accessory, AI, Alder Lake, AMD, Android, Apple, ARM, Audio, Biotech, Business, Camera, Cannon Lake, Cezanne (Zen 3), Charts, Chinese Tech, Chromebook, Coffee Lake, Comet Lake, Console, Convertible / 2-in-1, Cryptocurrency, Cyberlaw, Deal, Desktop, E-Mobility, Education, Exclusive, Fail, Foldable, Gadget, Galaxy Note, Galaxy S,…

January 24, 2022

Technology

Shrnutí zahraničních dojmů z Total War: Warhammer 3

Právě dnes padlo embargo na novinářské dojmy z preview buildu pravděpodobně nejočekávanější strategické hry letošního roku, takže jsme zalovili v zahraničních médiích a nabízíme zprostředkované dojmy i ukázky z hraní zahraničních kolegů. Windows Central: svoboda a potenciál RPG prvků Při budování Total War: Warhammer 3 mělo studio Creative Assembly naprostou důvěru ze strany mateřské firmy…

January 19, 2022

Technology

Amazon March Edition Spin & Win Quiz: Get Free OnePlus 9R 5G Phone And More

| Published: Wednesday, March 2, 2022, 16:36 [IST] Amazon is giving consumers a chance to win the OnePlus 9R smartphone for free via its quiz. It is a "spin and win" contest that is now live under the Amazon March edition games. Apart from the smartphone, you can also win the…

March 2, 2022

Technology

Steve Jobs is a legend, Tim Cook is just a manager. Talking about a man who will stay in the shadows forever. Wrong

Po deseti letech Tima Cooka ve funkci výkonného ředitele je Apple nejhodnotnější firmou na světě, patří mezi ty nejvlivnější, prodala dvě miliardy telefonů a růst se zdá být nezastavitelný. Steve Jobs byl vizionář, prosadil přelomové myšlenky a produkty. Tim Cook je dokázal prodat. Pojďme projít milníky v kariéře Tima Cooka a jeho desetiletého šéfování Applu. Těžké…

September 30, 2021

Technology

New Furiosa Trailer Promises Roaring Motors and Screaming Vengeance

It’s been several months since that first Furiosa: A Mad Max Saga trailer gave us a juicy glimpse of George Miller’s prequel, starring Anya Taylor-Joy as a younger version of Charlize Theron’s badass character introduced in Mad Max: Fury Road. It is now time, then, to witness a new trailer! In this one, we get

March 22, 2024

Hand-Picked Top-Read Stories

EFCC arrests man for allegedly defrauding 139 Australians

Exclude TETFUND, NITDA, NASENI from proposed Tax Reform Bill, Coalition of Northern Group urges FG

Inside Silencio: Sujimoto’s Secret Nightclub Where Billionaires, Movie Stars Escape LagosNightlife

Trending Tags

This typo sparked a Microsoft Azure outage

PULIDIKI Car Cleaning Gel Car Cleaning Putty Kit Car Interior Cleaner Slime Auto Detail Tools Supplies Car Accessories Stocking Stuffers Gifts for Men Women White Elephant Gifts Adults Teens Christmas

Custom Air Fresheners for Car with Photo - 12 Scents - Double-Sided Die-Cut Picture Air Fresheners with Logo, Car Fresheners with Logo

Drift Car Air Freshener - Wood Air Freshener - Car Odor Eliminator - Amber Scent Starter Kit

Crock-Pot Electric Lunch Box, 20-Ounce Portable Food Warmer, Black Licorice, Perfect for Travel, On-the-Go & Office Use | Stylish, Spill-Free & Dishwasher-Safe | Ideal Men & Women's Gifts

Sony PlayStation DualSense wireless controller – 30th…

Supercross Results: San Diego

Today’s Android game and app deals: Out of the Box, Kiwanuka, Spirit, and more

As Palestinians’ hunger strike protests detention without trial, the families blame Israel, Palestinian Authority

24C/32T：Raptor Lake酷睿i9-13900K《奇点灰烬》跑分曝光

Por malos resultados electorales, ordenan borrar al PAN-Tabasco

EFCC arrests man for allegedly defrauding 139 Australians

Exclude TETFUND, NITDA, NASENI from proposed Tax Reform Bill, Coalition of Northern Group urges FG

Inside Silencio: Sujimoto’s Secret Nightclub Where Billionaires, Movie Stars Escape LagosNightlife

Foreign investor lawsuits impede Honduras human rights & environment protections

Water returns to Amazon rivers amid historic drought

This typo sparked a Microsoft Azure outage

Related Posts