Failure Is Not an Option
Do-It-Yourself Podcast Failure Prevention Videos Transcript
(Bolden) As a former astronaut and the current NASA Administrator, I'm here to tell you that American leadership in space will continue for at least the next half-century because we have laid the foundation for success -- and, for NASA, failure is not an option.
Meet Torey Long
(Long) My name is Torey Long, and I'm a materials engineer at NASA's Kennedy Space Center in the Materials Failure Analysis Laboratory. My job is to investigate failure, which just means that I figure out why things break. We like to think of ourselves as NASA's version of a detective, but instead of investigating crime, we investigate failure.
What Is Failure?
(Long) Failure occurs when a product doesn't do what it was designed to do. Even if a product works but not as well as intended, then this could be called a failure. Failure can be due to something obvious, like a crack, or it can be more subtle. Let's take a computer as an example. A computer could fail due to a hardware problem, which you could test or inspect for, or it could be due to a software problem, which you would need someone who's familiar with programming to go through the code and find out the problem. Failures can also, many times, be due to a lot of different reasons that come together all at once to cause a failure. And these failures can be a lot more difficult to investigate.
Failure in the Design Process
(Long) Failure plays an important role in the engineering design process. Many times, the design process is initiated because of an error in a previous design that had led to a failure and needs to be fixed. When the engineers are coming up with new ideas, they look at the lessons that have been learned in the past from failure in order to come up with better ideas. Once the engineers select the design, they then build a prototype, and then they test it. If the prototype were to fail during testing, they can investigate the cause of that failure and use the lessons that they learned during that investigation to better refine their design.
To Prevent Failure
(Long) NASA has processes in place for new rockets, payloads and equipment that's used on the ground in order to prevent failure. During the design process, NASA engineers review the drawings at different stages in order to prevent failure before the hardware is even built. While the hardware is being built, NASA ensures that the quality is top-notch. NASA then, after the hardware is built, puts the hardware through extensive testing, which just means that we put the hardware through conditions that (are) worse than what it would normally see. If the hardware can withstand these tough conditions, then we know that the hardware will be OK to use for flight.
Tested Before Launch
(Long) One of the tests that any hardware that is being launched has to go through is vibration testing. Rockets tend to shake quite a bit on the way up, so NASA has to test the hardware to make sure that it's not going to break before it gets to orbit. NASA also has to test the hardware to make sure that it's not going to cause a fire, or that it won't emit toxic gases and also that it won't interfere with the electronics of any of the hardware that's used around it. NASA also performs functional tests, which just means that we test the hardware to make sure that it does what it's designed to do in the environment and place where it's supposed to do it.
In Case of Failure
(Long) NASA plans for failure by having engineers like me who can investigate failures to figure out what happened. By having laboratories at different NASA centers, we can respond quickly to failures so that we don't delay rocket launches. NASA also has safety organizations who have their own process for investigating failures that are larger in scale or affect more people.
NASA Deals With Failure
(Long) The only way to prevent failure is to learn lessons from our past so that we don't repeat the same mistakes. That is why NASA is interested in failure. For NASA, a failure can have very bad consequences because the safety of the astronauts and the public is at stake. That's why we say that for NASA, failure is not an option. We have to do everything in our power to prevent a failure from occurring so that we can continue to fly safely.
Everyday Life in the Failure Lab
(Long) We typically start out an investigation by going out in the field and looking at the broken hardware. The broken hardware can be anything from a rocket part to something that is used on the ground to get the vehicles ready for launch. We interview the technicians and engineers who work on the hardware every day. Once we're back in the lab, we take pictures of the hardware, and then we use microscopes to look for evidence. We can use a light microscope, or we can use something called a scanning electron microscope, which uses electrons instead of light to see things. The magnifications that it can see are 100,000 times better than the human eye. Once we're done looking at the outside of a part, we can look at the inside of a part by cutting it and polishing it. And then, we can find more evidence. Once we're done with the investigation, we collect all this evidence together and write a report telling what happened to the part, and then we make recommendations so that the failure doesn't happen again.
(Long) One of the most significant outcomes to a NASA failure is the lessons that we learn from it so that we don't ever repeat the failure. NASA has a website so that engineers can go and put in their lessons that they've learned from the programs that they've worked on, and a lot of times the lessons that they've learned are due to failures. NASA engineers are also required to search that website when they're coming up with new designs for rockets and experiments that go into space so that they can make their ideas even better based on past experiences with failure.
(Alleyne) So, one of the things when you're building space vehicles is being able to test it on the ground like you are going to fly it in space. The test program is really designed around every phase that that vehicle is going to see, from launch until re-entry. And so the test program is designed around that. During a launch environment, there's high vibrations, and, so, on the ground you want to put that vehicle through that kind of environment, putting it on a shaker table and having it being subjected to not just the loads -- the vibrational loads it will see during that launch event -- but giving it a little bit of margin. You want to test. You want to build your spacecraft so that there's margin, design margin, in it so that it can withstand exactly what it's going to see. But if there is any anomaly above that, that it could also withstand that. You're pushing it to the limits of what you think it could withstand or endure. That's really what a test program for a space vehicle is based on.
Pushing the Boundaries
(Alleyne) You want to test the corner -- what we call the corners -- of the envelope, where it is going to operate, where you know the vehicle needs to operate. You want to test the boundaries to make sure that those are met, or that the vehicle could survive those extreme conditions. But you also test a nominal performance, or how it would normally perform going through that environment. Then, again, we have to think about, when we're designing a test program, about the cost, because it's not just about the design of the vehicle and its testing, but we all have a cost bucket or a cost profile with which to stay within. So, sometimes our test plan is constrained by our cost. But even within that, we still try to test to the extremes.
Back to the Drawing Board
(Alleyne) You go back to the drawing board, and you redesign. Sometimes, it takes redesigning. Sometimes, you may think that the material that you used would withstand the environment, and you find in testing that it doesn't. That's why we test, so that we are prepared to have success during the real event. And so when your test fails, you do an analysis -- a root cause analysis -- of "Why did it fail?" Was it something to do with the materials or the system? Was it something inherent in the design, and would we need to redesign that part or that system in order to have success?
How Astronauts Prepare in Case of Failure
(Alleyne) They do extensive training both in understanding the space station systems. They do (survival) training. But one of the things they do is anomaly training. And, really, if there are emergencies on board, how do you deal with those anomalies on board the space station. So, preparing for failures doesn't mean you're hoping that it happens, but it means in the event that it happens, you know what to do.
Astronauts Respond to Failure
(Interviewer) What is a failure that occurred during one of your spaceflights, and how did the crew handle it?
(Michael Barratt) Well, that's a really good question. You know, I've got a little over 200 days in space, and let me tell you that failures happen. The one thing I learned is that the best way to react to a failure is to be ready for it in the first place. And everything we do up here involves a lot of planning and having backup systems and methods to handle things. When I first arrived at the space station on the Soyuz, our automatic docking system failed, and we were trained and equipped to bring it in manually, and we did that. And more recently, here, when I had Steve Bowen out on the end of the robotic arm, and he had a big heavy piece of very expensive hardware and our robotic workstation crashed. And that's not a good place to be. But we had a backup system, and the space station crew -- in a very calm, very cool-headed but very quick response -- brought up that secondary work station. And within a few minutes we were back in business and good to go, and Steve was moving again.
(Steve Bowen) Seemed like forever to me. I’m not sure.
(Barratt) Seemed a long time to Steve, but obviously being prepared is the biggest thing. It’s great to see that, when everything is in order and the crew reacts to their training, and cool heads ... and you're back in business really fast.
The Electron Microscope in Failure Analysis
(Long) This is a digital microscope, and I'm going to show you how we use this instrument to investigate the debris that was brought back from the International Space Station. The debris was collected on tapes like these, and what the astronauts did is they wrapped the tape around their glove of their spacesuits, and they collected the debris. When they brought it back -- they have to reuse everything on the space station -- so they actually brought it back in this bag that was originally used to take up a sweater for Peggy Whitson, who was a commander on the space station. When they brought it back, we used this instrument to take a look at it, and as you can see on this screen, there's a lot of different sizes and shapes of fragments. We can use this instrument to go up to 200x magnification. So we can zoom in on the fragment and get evidence from them so that we can figure out what happened. We were able to determine that the fragments came off of the joint that rotates 360 degrees so that the solar arrays are always facing the sun. We were also able to determine, using pictures like this … (This is from a scanning electron microscope. And like I was telling you before, it uses electrons instead of light to see things. And we can use magnifications 100,000 times better than the human eye.) We were able to see the features on here and determine why the debris was coming off. And that's how we use pictures like this in a failure investigation.
Investigating With Metallography
(Long) One of the techniques that we use in a failure investigation is called metallography. And we use that to determine the quality of the material. What we do is we take a piece of metal, like this, and then we mount it into plastic, like this. We then use sandpaper to grind a flat surface, and these are some examples of the sandpapers that we use. We start off with really rough-grit sandpaper, and then we get finer and finer until we get to a polishing step. For polishing, we use a diamond paste, and then we use very fine minerals to get a mirror finish, like this. We then use acids to etch away the very top layer of the metal so that we can see the structure of the material. And, that way, we can determine the quality. Here's an example of one of the pieces of debris that was brought back from the International Space Station. As you can see, it's a very small piece of metal, and we're still able to do this process and get information from it. We can zoom in on magnification. So, right now, we're looking at magnification that's 200 times better than the human eye. And we can see that we find evidence, and we use that evidence to determine why the debris was coming off of the joint that rotates and keeps the solar arrays so that they're always facing the sun.
Picking Up Debris With the EVA Wipe
(Wright) Now what I have here is called an EVA wipe. EVA stands for extravehicular activity. Let me open this up. EVA is when an astronaut will go out and do a spacewalk. In this particular case, an astronaut used this while we were investigating the SARJ, the solar alpha rotary joint. The astronaut went out there, and they used this for a cleaning procedure -- put it over their glove, they're out in space -- and you can see there's dark on this glove. They used this to clean out that joint because there was all that debris that had accumulated over time. And they had to make sure that it was clean enough so that we can keep on using it. And this is what they used. If you were able to look at this up close, it looks almost like a bath towel. It's terry. We use a lot of those kinds of different types of materials when we are dealing with space. And that was able to grab some of that debris. They also used some grease, added the grease and cleaned up, and that's how they were able to remove a lot of that debris.
Atlas Explosion (1961)
Spectacular disaster as an Atlas missile blows up on its pad.
An advanced model of the famed Atlas, the intercontinental missile was being launched for the first time from an underground coffin launching pad when a malfunction blasted it to bits. Fiery pieces of the rocket fall back into the pad completely destroying the experimental two million dollar installation. That coupled with the cost of the Atlas itself makes the loss a staggering multimillion dollar chapter in our space story. Miraculously … no injuries.
Atlas Failure and Success (1961)
In one of the last tests before a man is launched into space by the United States, an Atlas missile is poised on the launching pad in Cape Canaveral, Florida. With an astronaut capsule in its nose, the huge rocket quivers as the tremendous energy builds up and the countdown approaches zero. The Mercury capsule carries a breathing dummy so that scientists can make last minute studies of how a human might react. Now the critical moment approaches.
The Atlas soars aloft with majestic grace as its groundling creators tick off the long seconds that it takes to rise course. A shoot today will provide answers (they hope) to some of the vexing problems of safely orbiting a man. So far it's running true right -- on course. But many things can happen.
A malfunction! The capsule is separated and the safety officer takes over.
The missile is destroyed but all attention is now focused on the Mercury capsule that was separated from the Atlas. The separation is a success. The capsule that might have been carrying a man floats back to Earth.
Scientists are elated over this phase of the shoot. They say a man would have survived without injury for the robot pilot continued to function normally. They foresee no delays in putting a man into space and then into orbit.
Centaur Rocket Explosion (1962)
After nearly a year's delay posed by its liquid hydrogen fuel, the mighty Centaur is on its pad at Cape Canaveral for a maiden flight. It is to be boosted into space by an Atlas for a 15-minute flight -- a flight scheduled to study the performance of the temperamental hydrogen fuel. The Centaur is designed to put a payload of more than a ton on the moon, or a thousand pounds in the vicinity of Mars or Venus. The first few seconds of the shoot go without incident. The Centaur climbs to 30,000 feet. Then ... malfunction!
The immediate cause of the explosion is not known, but if it happened in the Atlas booster it means a probable delay for the next U.S. orbital flight by astronaut Malcolm Carpenter. Long lens cameras capture pieces of the wrecked missile falling into the sea. Falling like a wounded bird. However in the race for space, scientists find progress in every failure.
> Return to Failure Prevention Video Clips Page