When an airplane flies, hundreds of data streams fly from it every second - pilot reports, incident reports, control positions, instrument positions, warning modes.
But there's so much data, it's been nearly impossible for airlines to do anything other than look back for the cause of something that's already happened.
Enter the data mining detectives from NASA.
Data mining is the art of digging through mountains of data when you don't know what you're looking for or what you might find. Popular search engines like Google™ do this every second.
NASA is mining terabytes of aviation data to find issues before they become incidents. Ashok Srivastava will talk to us about what computer tools NASA is building to do the digging.
Jeff Hamlett will talk about how Southwest Airlines is already using data mining "gold" to update their flight operations.
1. How is NASA figuring out how to find the needle in a haystack when we don't know what either looks like?
2. What's an "algorithm"? What's an "anomaly"? What's a "precursor" and why do data miners use those words all the time?
3. What has Southwest changed in its practices thanks to data mining?
4. How is our data mining different from Google's or Amazon's? How is it the same?
On Wednesday, March 23, Ashok and Jeff answered your questions. Read the transcript below.
More About Ashok and Jeff
Ashok is a data wizard. He is project manager for the System-wide Safety and Assurance Technologies project for the Aviation Safety Program at NASA, which works on using advanced machine learning and data mining algorithms to improve aviation safety. He founded DASH Link, a social networking site for people in systems health management, data mining, and related fields, and has advised numerous companies regarding analytics and their impact on strategic investments.
Jeff served in the United States Air Force and flew the O-2A, OT-37, F-15C, and T-38 aircraft, which, in his case, makes him "Captain Jeff Hamlett." He joined Southwest Airlines as a first officer in 1999 and then began working in their Safety Department. Currently he is Southwest Airlines' director of flight safety.
More About Data Mining
› Feature: Dashlink is Online Home for Collaborative Research
Karen (Moderator): Our chat is now underway. And here's a question from some folks at NASA. What are some examples of how data mining is used in other places in life right now?
Ashok: Data mining is found all over the place, including for example Amazon, Netflix, Macy's, the finance industry uses it and the medical industry uses it, too. Also data mining is used in the sciences, in biology for example. It's a very important element in studying biological systems like genetics.
Karen (Moderator): A question for Jeff, how did Southwest start working with NASA on data mining?
Jeff: We've had a long relationship with NASA that goes back into the '90s. From 2002-2004 we worked extensively with NASA on a major procedural rewrite (flight operations procedures). That work was a human factors project. Then we worked closely with the Aviation Safety Reporting System (ASRS) on producing information for the airline when we were going into an airport for the first time, and then more recently, in about 2008 we began working with NASA Ames Research Center and Ashok on the data mining in particular.
Perlpilot: What programming languages were used to implement NASA's data mining tools? Are there any "data mining toolboxes" to get someone started on their own data mining project?
Ashok: We prototyped many algorithms in MatLab, but then transition them to C++ or Java and sometimes Python for larger-scale applications. If you want to start applying some data mining techniques, open source package is called Weka - many people use it and it provides a good intro to many algorithms.
Footoocool: What new knowledge have you mined out from SWA flight data so far, and what more sort of things do you feel you can mine out going forward?
Jeff: Ashok's going to answer the first part of your question. (Ashok) NASA has an onging partnership with MITRE and we work with them closely and transitioned some of our algorithms to them. We're currently transferring one of our algorithms that we recently published. (NOTE: Ashok's response was actually to a different question from uma_ferrell about whether NASA collaborates with MITRE. See more in response to next question.) And this is Jeff for the second part of your question - we're currently looking at RNAV (RNP) procedures and studying fuel efficiency and track-mile savings. In the future, we hope to be able to monitor the health of the aircraft and get early indications of issues that could result in service disruptions.
uma_ferrell: There are other companies out there doing similar research and study - an example MITRE. How is your project different? Do you collaborate with these other studies?
Ashok: Yes, following up from the previous answer, we do collaborate with MITRE. NASA is investigating discovery methods and develops these tools so that people at MITRE, FAA and other entities can take advantage of our research.
Perlpilot: Could you explain the data mining process? Do you start with a mound of data and some software that tease meaning from it or do you ask specific questions and look to the data for answers or some combination?
Ashok: We actually create new algorithms (computer programs) that are applicable to the types of problems that we hear about through our partnerships with airline carriers, the FAA and other agencies. Sometimes, for specific problems, we can run existing software; most of the time we develop our own. (Jeff): The hard challenge is understanding the specific issue. We may be made aware of an issue through a pilot report and then have to turn to the flight data to understand exactly what's happening with the airplane and where it's happening in the system. Once we have a better view of the extent of the issue, we can come up with ways to communicate to our pilots and work with Air Traffic Control to solve the problem. But, yes, it is mounds of data. We'll start with a specific issue and then we have to the search through the data and discover the breadth and depth of the issue.
Footoocool: If you start with a goal in mind, e.g., health of aircraft, wouldn't it be best to perform traditional time series analysis, statistical pattern recognition etc to develop / "hard-code" solutions, as opposed to mind data (narrow sense) and discover?
Ashok: I'm not quite sure what you're asking, but in some cases we do include time series analysis and pattern recognition techniques. Once these patterns are discovered, then they can be implemented by the carrier and used to detect new anomalies.
Swapnil: Cool .... also, are you harnessing NASA's Nebula cloud for this high performance computing?
Ashok: Actually at the moment some people on my team are exploring using very high-performance computing engines to analyze about one year's worth of airline data.
RBRuss: Are there any of your data-either your processed results or the raw data themselves-that could feasibly be viewed or explored by the public? If so, are there any plans to open these data for public use?
Ashok: We have data from flight simulators available already on DASHLink and are exploring the possibility to release other types of related data. At this time we don't have data from commercial aircraft available on DASHLink (dashlink.arc.nasa.gov). (Jeff): There is the Aviation Safety Reporting System (ASRS) that gives de-identified searchable data from a broad range of aviation professionals in the United States (www.asrs.arc.nasa.gov).
Swapnil_Tamse: Oh, you said exploring. :) Just out of curiosity, isn't the NEBULA cloud available to NASA agencies to perform HPC? Or you have more reliable traditional ways?
Ashok: We have our own cluster and we're in discussions to potentially use some of NASA's other HPC resources.
Footoocool: Ashok, this part Jeff said you would answer and I haven't seen it answered: What new knowledge have you minded out from SWA flight data so far?
Ashok: Southwest analyzes its own data using our algorithms. If you'd like to see some of the results of their work, you can find a poster on DASHLink.
Footoocool: The resources involved on NASA's side comes out of NASA's budget, right? And the result benefits not only SWA but as well the whole industry via FAA, right?
Ashok: The work that NASA does in data mining is published for broad distribution to the public. We put as much of our software through the open source process as possible so that companies like Southwest Airlines, other carriers and other communities can benefit from our work. And again, those other communities can be health, science, automotive, space science ... a lot of people can benefit from the work.
Karen (Moderator): A question for Jeff ... can you give an example of something Southwest learned from data mining that helped you?
Jeff: The issues that pilots report allow us to mine our flight data (from FDAP) to get a better idea of where those problems may be occurring and to what extent. We have been successful in communicating with air traffic control in Las Vegas and in Denver about arrival issues and how their instructions impact our operations.
borrowedhour.com: The quality of the conclusions reached is directly proportional to the quality of questions asked. How do you ensure correct conclusion? In other words, how do you avoid drawing wrong conclusions...
Ashok: We work very closely with domain experts (e.g., pilots) from the carriers, FAA, and other agencies to make sure our research results are useful and correct and relevant. We also publish our algorithms in journals and conferences around the world so that the computer science community can evaluate the effectiveness of our approach through the peer review process.
Karen (Moderator): Hi everyone. We've got just under 10 minutes to go in the chat, so get those questions in. Very smart folks out there today.
Ravinder: Sir what's the scope of data mining in aeronautics…also is SAS used here?
Ashok: We have used SAS in the past, but our research is primarily done in the development of new algorithms. I'd like to point out that these algorithms are one facet of a very broad set of activities that are going on within a carrier and the FAA in order to ensure safety. There are entire departments within the airlines that focus on safety and maintenance and we're just trying to add additional tools for their use.
uma_ferrell: Have you examined Concorde incidents and accidents as an example case to check your data mining algorithms?
Ashok: NASA is developing tools and technologies for others to analyze this type of data. We have not analyzed data from Concorde so far as I know.
uma_ferrell: Do you break up the data specifically by avionics - for example look at all data from an Air Data Computer or do you always look at all data within the bigger context?
Ashok: Sometimes we focus on particular subsystems but often we look at all the data we have available. It really depends on the problem we're trying to address. At NASA we're really involved with the development of new algorithms and we validate them on real-world data sets. It's really the safety professionals at other organizations that find useful solutions to safety issues.
Perlpilot: Jeff, has data mining been used primarily to affect change in process or have there been instances of aircraft design modification that have come from data mining?
Jeff: Our primary focus is on the policies, procedures, guidelines that affect our operation. That's where the immediate benefit of these programs is felt. Our focus is on our current operation rather than aircraft design modifications.