Feature

Text Size

Data From Space, Filled Out for the Masses
02.17.12
 
By: Jim Hodges

The place would look like a locker room except for the 1,000 or so tiny green lights that show brightly through grillwork over the stacks upon stacks of computers in the steel cabinets in the server room of the Atmospheric Science Data Center (ASDC) in NASA Langley's Building 1268.

Data pours in from space -- from the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) spacecraft, in which laser beam pulses offer a picture of aerosols in Earth's atmosphere; and now from the Clouds and the Earth's Radiant Energy System Flight Model 5 (CERES FM5) instrument, which launched on October 28, 2011 and opened its protective covers to send first light data on Thursday, January 26, 2012.

Langley's Data Center.
Click to enlarge

John Kusterer, head of Langley's Atmospheric Science Data Center (ASDC), is in a narrow row of one the three centers that holds 16 petabytes of data on tapes. Credit: NASA/Sean Smith.

And from orbiting instruments that measure aerosols and tropospheric chemistry – Multi-angle Imaging SpectroRadiometer (MISR), Measurements Of Pollution In The Troposphere (MOPITT) instrument and Tropospheric Emission Spectrometer (TES). CERES FM5 is the 43rd project sending data to Langley -- sometimes the raw instrument data, mostly routed by Goddard Space Flight Center, sometimes more refined data that has been washed through various scientists' algorithms.

Data in amounts that challenge the imagination while stretching the vernacular. At the ASDC, about 1,000 computer cores gather and process that data, of which about 2 petabytes are stored.

A petabyte is about 13.3 years of HD-TV video. About 50 Libraries of Congress.

That's just the data in. The ASDC gathers lower level data and processes it with codes provided by mission scientists to make higher level data products, reprocesses existing data with new codes provided by the mission scientists to make newer products, and takes already processed data from the missions themselves. Then the ASDC stores the information. From that storage, requests for information are filled for doctoral candidates, teachers, industry, government and just about anybody else who needs it.

About 766 terabytes went out to more than 130,000 customers in 160 countries in 2010.

A terabyte would hold 1,000 sets of the Encyclopaedia Brittanica.

It goes out in small snippets that students download from a website to prove a hypothesis for their paper. It goes out in huge lots, such as the discs upon discs it took to get the complete data history of CALIPSO to the Navy's Sea Systems Command in San Diego.

The discs kept the Navy's request from potentially jamming up the ASDC data delivery system for days.

It's a never-ending data-in-data-out cycle that keeps over 65 people working.

"Our main charter here is data integrity," said John Kusterer, who is in charge of the ASDC, one of 13 Distributed Active Archive Centers (DAACs) in NASA’s Earth Observation Systems Data Information System. "We have to make sure what we're giving out is just exactly what the science team wants out there."

Make no mistake, the scientists want their data out.

"Scientists are always trying to make their data more useable and useful to the science community," said Kusterer. "The more it's used, the more valuable it is."

For all of the seeming sameness of 1's and 0's that stream through the ASDC, the information is more flexible than you would think. For one thing, data is like vocabulary to those who use it. Like two writers who can turn the same words into two different stories, scientists and researchers can use the same data to come up with more than one conclusion.

For another thing, a new look at today's data can change what tomorrow will bring.

"It's called reprocessing," said Kusterer. New algorithms applied to existing data can yield new conclusions, but only after the data is washed with the new idea.

"We've done that -- reprocessed 10 years of data because they've come out with a new processing code to apply some new scientific knowledge to the existing data," Kusterer said.

That brings up another issue: the value of old data products. "We're both forward and back," Kusterer said. "We get new data, but we mostly don't get rid of old data and getting rid of it is a very deliberate process. Old data is often important to keep to enable assessments of how the new algorithms impact scientific conclusions derived using the older data.” And so data lingers and is refreshed, added to, stored ... and used, over and over again.

Sue Sorlie likens data requests to a trip to the grocery store.

"We have some novice users who look at the web interface to see what we have," said Sorlie, who is in charge of data distribution through both the web and a help desk. "Others know what we have and want it. They have a list."

The capability for customers to receive data ranges from fast industry and academic computers to older, slower devices of emerging countries that are increasingly interested in solar data provided by the Surface Solar Energy program.

"It's the most popular data that we distribute," said Sorlie. "People are interested because they want to see solar input so they can determine solar needs."

Data emphasis may be changing, with scientists struggling to find funding for space instruments and with such problems as failure to achieve orbit for the Orbiting Carbon Observatory (OCO) and Glory. Cheaper airplane missions to sense the atmosphere are gaining favor, and that involves another look at data collection and storage.

The science data collectors have been involved from the planning stage on decade long instrument programs, but they have been something of an afterthought on many airplane missions, which frequently are more aimed at testing instruments and data calibration than gathering long term science data.

And so there are tests for the airplane-collected data to get into the archive. According to Kusterer the primary question is "How does aircraft data relate to the long term climate records that we're capturing from orbiting instruments if at all?" said Kusterer. "Understanding that is just in its infancy at the agency level. Can data captured from airborne field campaigns be used to fill voids created by the future reduction in the number of space-borne instruments? How can we manage this airborne field campaign data and so it provides maximum value to the science community? We're kind of getting an understanding about it now."

So far, Langley has been receptive to airplane data, even that data sent on an ad hoc basis.

It helps that the ASDC keeps ahead of the technology curve. Where once 12 racks of computers crunched numbers from a mission, it might be done with a single rack of eight machines now. For one of the instrument’s data processing, where four “data days” of data could be processed in one day a few years ago, now 48 “data days” can processed in one day.

"Over the past couple of years, we have architected and implemented a system that is flexible enough to handle the various types of products we're dealing with," said Chris Harris, who is in charge of information technology at the ASDC Center. "That means when a new project comes in, we don't have to reinvent the wheel."

It's more a process of simply adding a wheel at the ASDC, which just added its 43rd wheel with the successful launch and commissioning of CERES FM5.


The Researcher News
NASA Langley Research Center
Editor & Curator: Denise Lineberry
Managing Editor: Jim Hodges
Executive Editor & Responsible NASA Official: Rob Wyman