Suggested Searches

Data Sciences Group

Overview

The Data Sciences (DS) Group is a collaboration of scientists in a variety of fields, including Integrated System Health Management, Aeronautics, Space Exploration, Earth Sciences and Space Sciences. DS conducts fundamental research to create tools and methods that answer pressing scientific questions in the fields of machine learning, knowledge discovery, and other areas, as well as aid in the assimilation and understanding of scientific and engineering data to best advance NASA’s missions.

The aggregation, synthesis, and analysis of large quantities of data pose significant challenges for scientists in fields ranging from astronomy to aviation safety. DS has several active projects and teams focused on data mining methods, such as the Discovery of Precursors to Safety Incidents (DPSI) task, which focuses on data mining to discover sequences of events that have a higher than normal probability of leading to an adverse event.

When real-time data are not accessible, DS teams develop models to accurately and effectively simulate or predict the behavior of natural processes. Our work on Virtual Sensors provides methods to “fill in” the gaps of missing data by first creating a model that predicts the behavior of existing data and then extrapolating that model to the spatial or temporal period under study.

Other times computer models are too complex to feasibly monitor data in real time. The Inductive Monitoring System (IMS) software produces health monitoring knowledge bases for such cases when systems are either difficult to model (simulate) with a computer or which require computer models that are too complex to use for real time monitoring.

See below for more information about these projects as well as many other past and current DS group projects.

Technical Overviews

Orca: A Program for Mining Distance-Based Outliers
The Inductive Monitoring System (PDF)
Recurring Anomaly Detection System (PDF)
Using sequenceMiner to Discover Anomalous Flights (PDF)
Intelligent Data Understanding for Earth and Space Science (PDF)
Probability Collectives for Science Engineering (PDF)
Intelligent Data Understanding for Integrated Systems Health Management (PDF)
SIAM Text Mining Contest

Open Source Projects

ACCEPT (Rodney Martin, Bryan Matthews, Santanu Das, Vijay Janakiraman, Nikunj Oza, Ashok Srivastava, Richard Watson, Stefan Hosein)
Block-GP (Kamalika Das, Ashok Srivastava)
Kalman Filter Code Augmentation (Rodney Martin)
MKAD (Santanu Das, Bryan Matthews, Ashok Srivastava and Nikunj Oza)
Optimal Alarm System Design and Implementation (Rodney Martin)
ROC Curve Code Augmentation (Rodney Martin, John Stutz)
sequenceMiner (Suratna Budalakoti, Ashok Srivastava, and Matthew Otey)

* Project links and more of Code TI’s open source projects to be listed soon.

As part of NASA’s commitment to the Open Government Initiative, all NASA open source data sets are available in the NASA.gov data catalog.

Project List

– Active

DASHlink

DASHlink is an informal sharing network for scientists and engineers to disseminate results and collaborate on research problems in health management technologies for aeronautics systems.

Inductive Monitoring System
Project Lead: David L. Iverson

The Inductive Monitoring System (IMS) software was developed to provide a technique to automatically produce health monitoring knowledge bases for systems that are either difficult to model (simulate) with a computer or which require computer models that are too complex to use for real time monitoring. The Inductive Monitoring System (IMS) is a tool that uses a data mining technique called clustering to extract models of normal system operation from archived data.

Reducing the Environmental Impact of Aviation: A Data Mining Approach to Instantaneous Estimation of Fuel Consumption
Project Lead: Nikunj C. Oza, Ph.D.

Every year there are nearly 6 million commercial flights in the U.S. alone, prompting numerous policy and procedural measures to develop safe and affordable technologies to reduce the environmental impact of aviation. For this project, the DS group is developing several Virtual Sensors (VS) methods to estimate expected fuel consumption of aircraft given other measurements and identify situations in which the actual fuel consumption is significantly greater than the predicted fuel consumption. VS will enable operators to eliminate or mitigate these situations, resulting in cost and environmental benefits.

Discovery of Precursors to Safety Incidents
Project Lead: Nikunj C. Oza, Ph.D.

The Aviation Safety Program’s System-wide Safety and Assurance Technologies (SSAT) project aims to transform data produced by aircraft and associated systems and people into actionable knowledge that will aid in the detection, precursor identification, and prediction of aviation anomalies. These anomalies may range from the aircraft level to the fleet level, or ultimately to the level of the national airspace. DS group members develop methods to efficiently use the vast variety and quantity of heterogeneous aviation-related data available to glean information about sequences of states that may end with an operationally significant anomaly or, in rare cases, an incident or accident. These sequences can occur over time scales that range from a few minutes (e.g., within a single flight) to many years (e.g. the precursor may be the implementation of a policy decision and the effect may be a change in a high-level metric such as frequency of hard landings). Developing methods to mine the mass of available data leverages the inputs of human expertise without placing an unnecessary burden on them.

Virtual Sensors for Earth Science
Project Lead: Nikunj C. Oza, Ph.D.

The Virtual Sensors (VS) project is focused on developing methods for predicting missing data from existing datasets by exploiting the direct and indirect dependencies between different measurements. For example, virtual sensors for Earth science examines existing models trained on spectrally rich, high resolution data to first estimate their equivalent instrument channels and then extrapolate back into the past to extract additional information from spectrally poor data that had been previously collected, perhaps using an older or less sophisticated model. Identifying the analogous instrument channels allows scientists to more easily translate from one model’s data to the other. This method has exciting applications for Earth science, especially within the atmospheric science and remote sensing communities. The DS group is also using VS to identify situations in which the typical consistency between measurements changes. For example, VS can determine what the instantaneous fuel consumption of an aircraft should be and identify situations when the actual fuel consumption is significantly greater. Identifying these situations and removing or mitigating them will lead to significant cost savings and environmental benefits.

Detecting Anomalies in Air Traffic
Project Lead: Nikunj C. Oza, Ph.D.

The National Airspace System (NAS) is an ever-changing and complex engineering system. As the Next Generation Air Transportation System (NextGen) is developed, there will be an increased emphasis on safety and operational and environmental efficiency. Current operations in the NAS are monitored using a variety of data sources, including data from flight recorders, radar track data, weather data, and other massive data collection systems. Although numerous technologies exist to monitor the frequency of known but undesirable behaviors in the NAS, there are currently few methods that can analyze the large repositories to discover new and previously unknown events in the NAS. Simply monitoring the frequency of known events can only provide mitigations for already established problems, whereas having a tool to discover both events that have implications for safety and incidents of operational importance increases the awareness of such scenarios in the community and helps to broaden the overall safety of the NAS. The DS group is using radar-track data to develop approaches to discovering operationally significant events in the NAS that are currently not monitored and have potential safety and/or efficiency implications.

Integrating Parallel and Distributed Data Mining Algorithms into the NASA Earth Exchange (NEX)
Project Lead: Nikunj C. Oza, Ph.D.

As stated at The NASA Earth Exchange (NEX), “NEX represents a new platform for the Earth science community that provides a mechanism for scientific collaboration and knowledge sharing. NEX combines state-of-the-art supercomputing, Earth system modeling, workflow management, NASA remote sensing data feeds, and a knowledge sharing platform to deliver a complete work environment in which users can explore and analyze large datasets, run modeling codes, collaborate on new or existing projects, and quickly share results among the Earth Science communities.” NEX currently has a limited number of algorithms for data mining that have been deployed for several NASA applications. This project is implementing several algorithms for inclusion in NEX as well as a framework under which others can incorporate their data mining algorithms into the NEX.

Sustainability Base
Project Lead: Rodney Martin, Ph.D.

Sustainability Base (SB) is a high performance, LEED-platinum certified building at NASA Ames that combines NASA’s innovative technologies with “green” architecture and renewable energy systems. In addition to its unique design that maximizes daylight and natural ventilation, SB will eventually incorporate several key NASA technologies that have evolved from the Data Sciences group, as well as technologies that have been developed through collaborative partnerships. Such technologies may include the inductive monitoring system (IMS) that will learn how the building naturally operates to establish baseline operations of various building systems and identify any anomalous patterns. Also, ACCEPT (Adverse Condition & Critical Event Prediction Toolbox), an architectural framework designed to compare and contrast the performance of a variety of machine learning and early warning algorithms may be used. ACCEPT tests the capability of these algorithms to robustly predict the onset of adverse events in any time-series data generating systems or processes housed at Sustainability Base.

Anomaly Detection in Dynamic Graphs
Project Lead: Kamalika Das, Ph.D.

Every day, governments and other large organizations are faced with risks to the security and safety of their employees. Sometimes these risks originate externally, and sometimes they come from within. Even when a risky event occurs unexpectedly or with severe consequences, there is often a pattern or trail of signs or behaviors that led to the event. In this DARPA funded task, which is part of the ADAMS (Anomaly Detection At Multiple Scales) project, we try to solve this problem from the graphical perspective. Given social interaction data of employees in an organization, we create dynamic social graphs representing employee interactions over time. The task of spotting potentially malicious insiders in the organization then translates to identifying anomalous graph transitions that cause major structural changes in the graph. We have developed an efficient, scalable algorithm for identifying such anomalous changes in the graph’s shape and structure. The algorithm also identifies nodes that are responsible for such changes. This work has been done in collaboration with the Palo Alto Research Center.

Past Projects

Flow Control Valve Fault Detection
Project Lead: Bryan Matthews

The DS group applied two algorithms—Virtual Sensors (VS) and the Inductive Monitoring System (IMS)—to detect anomalies related to the fuel flow control valve in the main propulsion system of the space shuttle’s main engine. Each method can identify anomalies online or be used to analyze historical data to determine the time that a known anomaly or fault was likely to have occurred by looking at a multidimensional time series of temperature, pressure, and control signal data. Detecting faults in the fuel flow control valve is critical, as it controls the flow of hydrogen gas between the space shuttle’s main engine and the external fuel tank. A faulty fuel flow control valve can lead to a potentially catastrophic hydrogen gas leak. The group turned their attention to these types of anomalies after one of the space shuttle’s three fuel flow control valves cracked during launch in November 2008. Their algorithms were applied to historical data to determine if a precursor to the failure was detectable and used to monitor the Space Shuttle Discovery’s March 2009 launch data to determine whether a similar anomaly had also occurred, but models indicated nominal behavior.

Detecting Recurring Anomalies in Text Reports
Project Lead: Dawn McIntosh

Commissioned by the NASA Engineering Safety Center, the Recurring Anomaly Detection System (ReADS) team is developing a family of novel methods to mine text documents and identify recurring anomalies across reports. ReADS analyzes text reports, such as aviation reports and problem or maintenance records, uses text clustering algorithms to group loosely-related reports and documents, and identifies interconnected reports. The tool provides a visualization of the clusters and recurring anomalies, and has been integrated into a secure web-based search platform to allow users to perform their own text mining.

Hubble Space Telescope Project
Project Lead: Hamed Valizadegan, Ph.D.

As Hubble Space Telescope (HST) observatory operations have continued into an extended mission phase in the post Space Shuttle Program (SSP) servicing mission era, the HST Project has been cognizant of the observatory¹s component, subsystem, system, and mission reliability as a function of time. The HST Project requested an independent evaluation of the current reliability model methodology, and recommendations as to whether changes may be warranted to improve reliability predictions. The DS group has helped the NASA Engineering and Safety Center (NESC) perform the data-driven reliability of some of the HST components and has developed two data-driven tools to model the reliability and lifetime prediction of Fine Guidance Sensor units inside the Hubble Space Telescope.

Liquid Propulsion System Health Management
Project Lead: Ashok N. Srivastava, Ph.D.

Data mining researchers in the DS group are working with rocket propulsion experts at other NASA centers and at Pratt & Whitney Rocketdyne to apply data mining algorithms to historical data from the Space Shuttle Main Engine (SSME) for real time prognostics and diagnostics. These methods aim to detect and predict failures in liquid-fueled rocket engines, in particular, the SSME.

Mixture Density Mercer Kernels
Project Lead: Ashok N. Srivastava, Ph.D.

The team is developing a method to generate Mercer Kernels from an ensemble of probabilistic mixture models, where each mixture model is generated from a Bayesian mixture density estimate. They are able to convert the ensemble estimates into a Mercer Kernel, describe the properties of this new kernel function, and give examples of the performance of this kernel on unsupervised clustering of synthetic data and also in the domain of unsupervised multispectral image understanding. The Density Mercer Kernel algorithm can be applied to real-world image segmentation problems, specifically the differentiation of cloud cover over snow and ice in satellite imagery.

Modeling Spatial and Temporal Covariability Using Machine Learning
Project Lead: Ashok N. Srivastava, Ph.D.

Use self-organizing map (SOM) neural networks to identify year-to-year variability of terrestrial ecosystems associated with fluctuation in global circulation and climate.

Orca: A Program for Mining Distance-Based Outliers
Project Lead: Mark Schwabacher, Ph.D.

Orca mines distance-based outliers. That is, Orca uses the distance from a given example to its nearest neighbors to determine its unusualness. The intuition is that if there are other examples that are close to the candidate in the feature space, then the example is probably not an outlier. If the nearest examples are substantially different, then the example is likely to be an outlier. Probabilistically, one can view distance-based outliers as identifying candidates that lie at points where the nearest neighbor density estimate is small.

Probability Collectives
Project Lead: David Wolpert, Ph.D.

Collaborators at Ames, Oxford, Stanford, Berkeley, Los Alamos, GE, and BAE Systems have already demonstrated that game theory and statistical physics are identical when cast in terms of information theory, an associated formalism they refer to as Probability Collectives (PC). PC opens many new lines of research, and provides new approaches to problems in distributed control and distributed optimization. These collaborators continue to investigate the extremely rich theory arising from this hybridization, with applications in areas such as distributed optimization and control of multi-agent systems.

Using sequenceMiner to Discover Anomalous Flights
Project Lead: Ashok N. Srivastava, Ph.D.

The sequenceMiner algorithm was developed to address the problem of detecting and describing anomalies in large data sets from recordings of switch sensors in the cockpits of commercial airliners. The algorithm performs autonomous clustering (grouping) of similar sequences to generate a detailed analysis of outliers in order to detect anomalies. SequenceMiner also includes new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier, which provides analysts with a coherent description of the anomalies identified in the sequence, and why they differ from more “normal” sequences.

Virtual Sensors for Space Science
Project Lead: Ashok N. Srivastava, Ph.D.

Virtual Sensors predict the value of one sensor measurement by learning the nonlinear correlations between its values and potentially hundreds of other sensor measurements. In space science, virtual sensors can be used to explore the problem of estimating redshifts of galaxies from broadband photometric measurements. Virtual sensors allow for the estimation of unmeasured spectral phenomena based on learning the potentially nonlinear correlations between observed sets of spectral measurements. In the case of estimating redshifts, virtual sensors can establish a nonlinear correlation between two techniques of measuring redshifts, spectroscopy and broadband colors. Statistically speaking, this amounts to building a regression model to estimate the photometric redshift, although it’s somewhat more complicated than that.

Data Sciences Group Awards

2015
NASA Software Initial Award, Kamalika Das, Scalable Gaussian Process Regression

2014
NASA Ames Contractor Council Certificate of Excellence, Machine Learning and Data Mining Team
NASA Exceptional Public Service Medal, Kamalika Das

2013
NASA Software Initial Award, Bryan Matthews, iOrca

2011
NASA Ames Contractor Council Certificate of Excellence, Kamalika Das

2010
NASA Associate Administrator Awards for Technology and Innovation Group Award, IVHM Data Mining Group
NASA Ames Contractor Council Certificate of Excellence, Bryan Matthews, NASA Earth Exchange Website Development Team

2009
NASA Ames Contractor Council Certificate of Excellence, Bryan Matthews, ISRDS ARES I-X Ground Diagnostic Prototype Team

2008
NASA Certificate of Recognition, Bryan Matthews, Inductive Monitoring System

2005
AeroTech Congress & Exhibition World Aerospace Congress Participation, Nikunj Oza

Team

Group Lead
Nikunj Oza, Ph.D.

Group Members
Kamalika Das, Ph.D.
Dave Iverson
Vijay Janakiraman, Ph.D.
Claire Little
Rodney Martin, Ph.D.
Bryan Matthews
John Stutz
Hamed Valizadegan, Ph.D.

Current Affiliates
Rama Nemani, Ph.D.

Past Affiliates
Ram Akella, Ph.D. – UCSC, UARC
Mike Berry, Ph.D
Kanishka Bhaduri, Ph.D.
Peter Brende – SIVD – UCSC, UARC
Suratna Budalakoti-RIACS
Aditi Chattopadhyay, Ph.D.
Santanu Das, Ph.D. – UARC
Robert Delgadillo-FCCD Internship
Vesselin Diev – UCSC, UARC
Gregory Dorais, Ph.D.
Elizabeth Foughty
Darren Galaviz – FCCD Internship
Paul Gazis, Ph.D.
Michelle Ho – SHARP Internship
Upender Kaul, Ph.D. – NASA
Rebekah Kochavi – QSS
Sakthi Preethi Kumaresan – UCSC, UARC
Alex Lotch – Boston University
Bill Macready, Ph.D. – UARC
Amy Mai – SGT
Marianne Mosher, Ph.D. – NASA
Manos Pontikakis – UCSC, UARC
Avik Sarkar – Open University, U.K.
Smadar Shiffman, Ph.D. – QSS
Ashok Srivastava, Ph.D. – NASA
David Thompson, Ph.D. – NASA
Len Trejo, Ph.D. – NASA
Eugene Turkov
Richard Watson
David Wolpert, Ph.D.
Bing Xu – UCSC, UARC
Brett Zane-Ulman – CSC
Yi Zhang, Ph.D. – UCSC, UARC