An outage of the US Federal Aviation Administration’s (FAA’s) Notice to Air Missions System (NOTAMS) caused the agency to ground domestic flight departures earlier this week. The FAA permitted air traffic to resume after 9 am on Wednesday. The FAA says that the problem appears to have been a damaged database file. NOTAMS, which operates separately from the FAA’s air traffic control system, is used to notify pilots of potential hazards.
Note
- Every so often you need an incident to get attention (and funding) to fix broken systems. Let’s hope that this was all it took to get this system moved out of the 20th century. Some news suggests that the outage was due to not following procedures. But often there are reasons people do not follow procedures, for example if they are unpractical or if they just do not have the time/staffing required to follow procedures.
- Ah, self-inflicted wounds: Squirrels chewing through wires, untrimmed tree branches shorting out electricity distribution lines have been the cause of some of the largest power outages. Bad router or switch updates have been the cause of the biggest telecommunications outages. But, I can’t remember once any large outage being blamed on a security patch pushed out too quickly.
- One hopes this incident will similar regulatory review to the SouthWest issue earlier this year. Both underscore the need to have adequate staffing and updated applications/services, with automated failover. Ideally environments for regression testing and dynamic scaling. We’ve all been there when a “simple” change causes an unexpected outage. This would be a good time to check to make sure that critical systems are not only properly resourced but also have appropriate lifecycle plans which factor in the current workloads and demands.
- While attention will be on the aging infrastructure used by the FAA, one has to ask how the file(s) got corrupted in the first place and found their way to both the primary and backup NOTAMS. A review and changes to the procedures for updating, testing, and pushing these system files to the operational network is warranted.
- It was disappointing to see the number of people, many of them in the cybersecurity field, that jumped to the conclusion this outage was the result of a cyberattack. This type of overhyping of issues only leads to the undermining of the credibility of the cybersecurity industry. We need to do better in providing commentary on issues, not all IT incidents are cyber attacks.
- One would like to know whether the decision to ground the fleet in the event of the failure of this application was planned or (more likely) ad hoc. In the presence of a plan there was surely a cheaper, both economically and politically, remedy.
Read more in
- FAA NOTAM Statement
- FAA permits flights to resume as computer system outage resolved
- What is the FAA’s Notice to Air Missions System and why does it matter?
- FAA preliminary investigation traces NOTAMS outage to damaged database file
- A corrupt file led to the FAA ground stoppage. It was also found in the backup system
- FAA outage that grounded flights blamed on old tech and damaged database file
- US Flights Resume After Reported Computer Glitch Resolved
- FAA Blames Damaged File for Outage, Sees No Attack Evidence