Anatomy of a Mishap Investigation
By Rick Obenschain
On February 24, 2009, a Taurus XL launch vehicle carrying the Orbiting Carbon Observatory satellite lifted off from Vandenberg Air Force Base in California. The satellite was designed to measure atmospheric carbon dioxide to provide precise information about human and natural carbon-emission sources. The spacecraft failed to reach orbit and instead plunged into the ocean near Antarctica.
The likely source of that failure quickly became apparent: the fairing—the clamshell-shaped cover that protects the satellite during the early stages of the flight—had not separated as expected from the upper stage of the Taurus XL, and the extra mass of the still-attached component prevented the launch vehicle from reaching orbital altitude and speed. But the reason for that malfunction was far from clear.
The day after the accident, I was asked to lead the Mishap Investigation Board (MIB) that would try to understand why the fairing failed to separate and recommend design and process improvements to prevent similar problems in the future. NASA Headquarters challenged the board to get from day one to a final report in sixty days—a dramatically shorter span than most past mishap investigations. We did it in eighty-four days, which is still remarkably fast given the amount of work that needed to be done.
The MIB Team
Most of the credit for that efficiency goes to our down-to-earth, focused, dedicated team members, who often worked literally seven days a week. Some other important factors contributed. One was my decision to keep the team as small as possible, given our managerial and technical needs. There were fifteen of us, six board members and seven advisors—consisting of technical experts, legal, public affairs, external relations—plus two consultants we brought in toward the end of the process to deal with specific technical issues.
We also worked hard to be in close and constant contact. Team members from various locations got together at Goddard Space Flight Center to start the process, and we met frequently in person at Goddard and other sites during the whole course of our investigation. All in all, members met for fifty days at Goddard and twenty-five days elsewhere. In addition, we had daily "tag-ups" and other teleconferences to share information and ideas. A central online repository of documents helped us work together over the distances among our locations.
We were further helped by the openness of Orbital Sciences Corporation, the supplier of the Taurus launch vehicle, and the Kennedy Space Center Launch Services Program. They shared information from their own investigations and cooperated fully with ours. They were as determined as we were to discover and correct the cause of the failure.
Looking for the Root Cause
Our job was to try to discover both the intermediate cause or causes of the fairing separation malfunction—the particular component or components that failed to function as expected— and the root cause of those failures: the organizational behaviors, conditions, or practices that ultimately led to the production and acceptance of what proved to be faulty mechanisms. If you find and fix the intermediate, technical problems but ignore the underlying sources of those problems, they are likely to persist and lead to other failures, so identifying the root cause is important.
In the first three weeks, we conducted more than seventy interviews to collect as much data and information about the mishap as possible. Then we used NASA's Root-Cause Analysis tool to look for that fundamental cause. I admit to starting out with some skepticism about the tool, which requires adherence to demanding, detailed analytical processes. Having worked as an engineer earlier in my NASA career, I have always been concerned that some formal processes supposedly designed to support the work may actually get in the way of developing the product. In actual fact, though, what initially looked like a process that might be too rigid turned out to be usefully rigorous. Had we not gone through all the steps required by the Root-Cause Analysis tool, we could easily have missed possible contributors to the launch failure. In situations as complex and ambiguous as this one, relying on an informal sense of where the fault probably lies just doesn't work. We ultimately offered a few suggestions for improving the tool, but they were ways to make it more user friendly; in general, it proved its power and usefulness.
Using root-cause analysis, we ended up with a fault tree that had 133 branches—133 factors we needed to evaluate with the tool. That process eliminated 129 of them, leaving four possible causes of the fairing-separation failure. Although some of those four seemed more qualitatively likely than others, none could be ruled out.
Chief among the reasons that we were not able to identify the cause was that we didn't have access to the failed hardware that probably would have given a definitive answer. It was at the bottom of the ocean near Antarctica. Not having that clear answer, we were not able to determine a root cause either.
The MIB Report
Our report detailed the four factors that could not be discounted as possible intermediate causes of the mishap. Along with our description of these possible causes, we offered recommendations for how to ensure that they would not pose a risk on future missions. Briefly, these are the possible causes the board identified and our recommendations for improvement.
Frangible-joint base ring may not have fractured as required.
An incomplete fracture of the frangible-joint base ring that holds the fairing halves together and attaches them to the upper stage of the rocket could have prevented fairing separation. We could not discount this possibility because Orbital Sciences did not have complete information on the characteristics of the aluminum used in this component. We recommended that future aluminum extrusions for this component have a traceable "pedigree" to aluminum lots that have been appropriately and thoroughly tested.
Electrical subsystem may have failed.
The responsible subsystem might not have supplied enough electricity to fire the explosive devices that released the fairing. This remained a possibility because telemetry sent from the launch vehicle was not designed to measure and report the amount of current needed. We recommended changing the telemetry so that it would provide this information.
Pneumatic system may not have provided enough pressure to separate fairing.
We could not prove that the pneumatic system—a hot-gas generator, thrusters, and pneumatic tubing—supplied enough pressure to separate the fairings. We recommended design modifications and improved testing of the hot-gas generator system design to provide pressure to the thruster. If those changes prove impractical or impossible, we recommended using an alternate system.
Flexible, confined detonating cord could have snagged on part of frangible joint.
This seemed an unlikely failure cause, but we could not rule it out. We recommended rerouting the cord or adding a physical barrier if further analysis and testing could not eliminate the possibility.
In the days since we presented our report, continuing efforts of the Kennedy Launch Services Program and Orbital Sciences have shown that electrical system malfunction and detonating cord snagging were not contributing factors to the failure. The specific recommendations made by the MIB are being incorporated to ensure that these potential failure modes are prevented in the future.
A Valuable Investigation
All the skill and hard work of the board members and the many others who helped us did not get us to the clear-cut intermediate and root causes we had hoped to find. Instead, we "surrounded" the actual cause by identifying multiple possibilities. A few people have suggested this means that the Orbiting Carbon Observatory MIB "failed." I don't agree. The detailed and extensive testing and analysis that allowed us to identify the four potential intermediate causes should go a long way toward ensuring that the fairing problem will not recur. And our recommendations, although they do not get at a definitive root cause, do speak to small but meaningful shortfalls in testing, inspection, quality control, and manufacturing that will help guide the recovery activities.
One general conclusion that our work supports is the importance of rigorously adhering to the procedures designed to eliminate and minimize as much risk as possible. This is especially true when the project team has only sporadic experience with a particular vehicle, as was the case with the Taurus XL used to launch the Orbiting Carbon Observatory satellite. Only eight Taurus rockets have been launched, with typically several years separating launches. Many of the people involved with launching the Orbiting Carbon Observatory had little or no experience with this launch vehicle. The less often you launch, the more attention you should pay to the formal procedures that embody much of the information and knowledge past practitioners have acquired about how to launch successfully.
About the Author
Rick Obenschain has worked at NASA for more than forty years in positions ranging from discipline engineer to project manager five times, to director of engineering, to director of flight projects. He is currently the deputy center director at Goddard Space Flight Center.
More Articles by Rick Obenschain