Skip to Content

Why Was the Recent Crippling AWS Outage So Difficult to Fix?

Did a Shocking Hidden Flaw Cause Your Favorite Apps to Crash?

On October 20, 2025, many popular websites and applications suddenly stopped working. This widespread issue stemmed from a problem within Amazon Web Services (AWS), the cloud computing platform that supports a large portion of the internet. The disruption began in the early hours and affected services for several hours before being resolved.​

The core of the problem occurred in a major AWS data hub located in Northern Virginia, known as the US-EAST-1 region. This event highlighted how much of the digital world relies on a small number of large cloud providers.​

What Caused the Outage?

The main cause of the disruption was a failure in the Domain Name System (DNS) for a critical database service called DynamoDB. DNS acts like the internet’s phone book, translating easy-to-remember website names into the numerical IP addresses that computers use to find each other.​

When this system failed, applications could no longer find the correct address to connect with the DynamoDB service, leading to a chain reaction of errors. Amazon later explained that a hidden flaw, or “latent defect,” in an automated management system was the trigger for this DNS failure.​

Impact of the Disruption

The outage had a significant and immediate impact on thousands of businesses and their customers globally.​

  • Popular Services Down: Many well-known platforms, including Snapchat, Reddit, Fortnite, and Venmo, experienced connectivity issues or went completely offline.​
  • Business Operations Halted: Companies that rely on AWS for their daily operations, from online banking to e-commerce, faced service interruptions.​
  • Global Reach: While the issue started in a U.S. data center, users from London to Tokyo were affected, demonstrating the interconnected nature of modern internet services.​

AWS engineers identified the source of the problem and worked to redirect network traffic. Most services were restored by midday on October 20, though some slowness continued as the systems recovered. This incident serves as a reminder of the complex infrastructure that powers the internet and the widespread consequences that can occur when a core component fails.​