Text Size

Do-It-Yourself Podcast Failure Prevention Transcript
1-a Failure Is Not an Option
(Bolden) As a former astronaut and the current NASA Administrator, I'm here to tell you that American leadership in space will continue for at least the next half-century because we have laid the foundation for success -- and, for NASA, failure is not an option.

2-a Meet Torey Long
(Long) My name is Torey Long, and I'm a materials engineer at NASA's Kennedy Space Center in the Materials Failure Analysis Laboratory. My job is to investigate failure, which just means that I figure out why things break. We like to think of ourselves as NASA's version of a detective, but instead of investigating crime, we investigate failure.

3-a What Is Failure?
(Long) Failure occurs when a product doesn't do what it was designed to do. Even if a product works but not as well as intended, then this could be called a failure. Failure can be due to something obvious, like a crack, or it can be more subtle. Let's take a computer as an example. A computer could fail due to a hardware problem, which you could test or inspect for, or it could be due to a software problem, which you would need someone who's familiar with programming to go through the code and find out the problem. Failures can also, many times, be due to a lot of different reasons that come together all at once to cause a failure. And these failures can be a lot more difficult to investigate.

4-a Failure in the Design Process
(Long) Failure plays an important role in the engineering design process. Many times, the design process is initiated because of an error in a previous design that had led to a failure and needs to be fixed. When the engineers are coming up with new ideas, they look at the lessons that have been learned in the past from failure in order to come up with better ideas. Once the engineers select the design, they then build a prototype, and then they test it. If the prototype were to fail during testing, they can investigate the cause of that failure and use the lessons that they learned during that investigation to better refine their design.

5-a To Prevent Failure
(Long) NASA has processes in place for new rockets, payloads and equipment that's used on the ground in order to prevent failure. During the design process, NASA engineers review the drawings at different stages in order to prevent failure before the hardware is even built. While the hardware is being built, NASA ensures that the quality is top-notch. NASA then, after the hardware is built, puts the hardware through extensive testing, which just means that we put the hardware through conditions that (are) worse than what it would normally see. If the hardware can withstand these tough conditions, then we know that the hardware will be OK to use for flight.

6-a Tested Before Launch
(Long) One of the tests that any hardware that is being launched has to go through is vibration testing. Rockets tend to shake quite a bit on the way up, so NASA has to test the hardware to make sure that it's not going to break before it gets to orbit. NASA also has to test the hardware to make sure that it's not going to cause a fire, or that it won't emit toxic gases and also that it won't interfere with the electronics of any of the hardware that's used around it. NASA also performs functional tests, which just means that we test the hardware to make sure that it does what it's designed to do in the environment and place where it's supposed to do it.

7-a In Case of Failure
(Long) NASA plans for failure by having engineers like me who can investigate failures to figure out what happened. By having laboratories at different NASA centers, we can respond quickly to failures so that we don't delay rocket launches. NASA also has safety organizations who have their own process for investigating failures that are larger in scale or affect more people.

8-a NASA Deals With Failure
(Long) The only way to prevent failure is to learn lessons from our past so that we don't repeat the same mistakes. That is why NASA is interested in failure. For NASA, a failure can have very bad consequences because the safety of the astronauts and the public is at stake. That's why we say that for NASA, failure is not an option. We have to do everything in our power to prevent a failure from occurring so that we can continue to fly safely.

9-a Everyday Life in the Failure Lab
(Long) We typically start out an investigation by going out in the field and looking at the broken hardware. The broken hardware can be anything from a rocket part to something that is used on the ground to get the vehicles ready for launch. We interview the technicians and engineers who work on the hardware every day. Once we're back in the lab, we take pictures of the hardware, and then we use microscopes to look for evidence. We can use a light microscope, or we can use something called a scanning electron microscope, which uses electrons instead of light to see things. The magnifications that it can see are 100,000 times better than the human eye. Once we're done looking at the outside of a part, we can look at the inside of a part by cutting it and polishing it. And then, we can find more evidence. Once we're done with the investigation, we collect all this evidence together and write a report telling what happened to the part, and then we make recommendations so that the failure doesn't happen again.

10-a Lessons Learned
(Long) One of the most significant outcomes to a NASA failure is the lessons that we learn from it so that we don't ever repeat the failure. NASA has a website so that engineers can go and put in their lessons that they've learned from the programs that they've worked on, and a lot of times the lessons that they've learned are due to failures. NASA engineers are also required to search that website when they're coming up with new designs for rockets and experiments that go into space so that they can make their ideas even better based on past experiences with failure.

11-a Test It!
(Alleyne) So, one of the things when you're building space vehicles is being able to test it on the ground like you are going to fly it in space. The test program is really designed around every phase that that vehicle is going to see, from launch until re-entry. And so the test program is designed around that. During a launch environment, there's high vibrations, and, so, on the ground you want to put that vehicle through that kind of environment, putting it on a shaker table and having it being subjected to not just the loads -- the vibrational loads it will see during that launch event -- but giving it a little bit of margin. You want to test. You want to build your spacecraft so that there's margin, design margin, in it so that it can withstand exactly what it's going to see. But if there is any anomaly above that, that it could also withstand that. You're pushing it to the limits of what you think it could withstand or endure. That's really what a test program for a space vehicle is based on.

12-a Pushing the Boundaries
(Alleyne) You want to test the corner -- what we call the corners -- of the envelope, where it is going to operate, where you know the vehicle needs to operate. You want to test the boundaries to make sure that those are met, or that the vehicle could survive those extreme conditions. But you also test a nominal performance, or how it would normally perform going through that environment. Then, again, we have to think about, when we're designing a test program, about the cost, because it's not just about the design of the vehicle and its testing, but we all have a cost bucket or a cost profile with which to stay within. So, sometimes our test plan is constrained by our cost. But even within that, we still try to test to the extremes.

13-a Back to the Drawing Board
(Alleyne) You go back to the drawing board, and you redesign. Sometimes, it takes redesigning. Sometimes, you may think that the material that you used would withstand the environment, and you find in testing that it doesn't. That's why we test, so that we are prepared to have success during the real event. And so when your test fails, you do an analysis -- a root cause analysis -- of "Why did it fail?" Was it something to do with the materials or the system? Was it something inherent in the design, and would we need to redesign that part or that system in order to have success?

14-a How Astronauts Prepare in Case of Failure
(Alleyne) They do extensive training both in understanding the space station systems. They do (survival) training. But one of the things they do is anomaly training. And, really, if there are emergencies on board, how do you deal with those anomalies on board the space station. So, preparing for failures doesn't mean you're hoping that it happens, but it means in the event that it happens, you know what to do.

15-a Astronauts Respond to Failure
(Michael Barratt) You know, I've got a little over 200 days in space, and let me tell you that failures happen. The one thing I learned is that the best way to react to a failure is to be ready for it in the first place. And everything we do up here involves a lot of planning and having backup systems and methods to handle things. When I first arrived at the space station on the Soyuz, our automatic docking system failed, and we were trained and equipped to bring it in manually, and we did that. And more recently, here, when I had Steve Bowen out on the end of the robotic arm, and he had a big heavy piece of very expensive hardware and our robotic workstation crashed. And that's not a good place to be. But we had a backup system, and the space station crew -- in a very calm, very cool-headed but very quick response -- brought up that secondary work station. And within a few minutes we were back in business and good to go, and Steve was moving again.

(Steve Bowen) Seemed like forever to me. I'm not sure.

(Barratt) Seemed a long time to Steve, but obviously being prepared is the biggest thing. It's great to see that, when everything is in order and the crew reacts to their training, and cool heads ... and you're back in business really fast.

16-a Apollo 13
OK, Houston, we've had a problem here.

This is Houston. Say again, please.

Ah, Houston, we've had a problem. We've had a MAIN B BUS UNDERBOLT.

Roger. MAIN B UNDERBOLT. OK, stand by, 13. We're looking at it.

>  Return to Failure Prevention Audio Clips Page