NASA Center for Climate Simulation: Data Supporting Science
Debuting in spring 2010, the NASA Center for Climate Simulation (NCCS) is the new name for a Goddard Space Flight Center organization that has provided supercomputing resources to NASA scientists and engineers for over 25 years.
"Computation here at Goddard is primarily to create datasets and make them available for science researchers around the world," said Phil Webster, chief of Goddard's Computational and Information Sciences and Technology Office, which includes NCCS. With climate and weather modeling representing the bulk of NCCS computing, the new name reflects "our mission to support NASA Earth science."
This science is carried out by hundreds of NCCS users from Goddard, other NASA centers, laboratories, and universities across the U.S. The two largest user groups are Goddard's Global Modeling and Assimilation Office (GMAO), headed by Michele Rienecker, and the Goddard Institute for Space Studies (GISS), directed by Jim Hansen. NCCS-hosted simulations span time scales from days (weather prediction) to seasons and years (short-term climate prediction) to decades and centuries (climate change projection).
At any time scale, NASA climate simulations use and produce vast amounts of data. "The unique thing about NASA is that we are the source of most of the research satellite observational data of the atmosphere, land, and ocean," Webster said. Add data from the National Oceanic and Atmospheric Administration (NOAA) and other sources, and GMAO needs to process as many as 8 million observations from satellites and additional platforms per day before assimilating them into models.
Data assimilation and other techniques create the right starting conditions for simulating physical processes around the Earth. In predicting future conditions, climate models generate data much like the observations: temperature, humidity, wind speed and direction, precipitation, and other values. Data processing requirements can be considerable. The largest project run at NCCS to date -- GMAO's Modern Era Retrospective-analysis for Research and Applications (MERRA) -- ingests more than 50 billion observations over the Earth Observing System satellite era. MERRA will eventually produce more than 150 terabytes (tera = trillion) of value-added Earth science data.
Today's climate science is "data-centric," as Webster describes it. "Everything we do supports the creation, utilization, and exploitation of Earth science model data," he said. The new NCCS is expanding its services to meet NASA's growing climate data needs.
The heart of the new NCCS is the "Discover" supercomputer. In 2009, NCCS added more than 8,000 computer processors to Discover, for a total of nearly 15,000 processors. The new processors are from Intel's latest Xeon 5500 series, which uses the Nehalem architecture introduced in spring 2009. Nehalem is well suited to climate studies, offering greater speed, larger memory, and faster memory access than processors installed just one year before. Significant augmentations to Discover will occur in summer 2010.
"With the new augmentations of Discover we probably have a 3 to 4x increase in the amount of work that we can push through the computer in a day," Webster said. "You can run more simulations at the same resolutions you've had, but the thing that really excites us is that we can run much higher resolution simulations."
Using Discover's new Nehalem processors, a "cubed-sphere" version of GMAO's flagship Goddard Earth Observing System Model, Version 5 (GEOS-5) ran at resolutions including 3.5 kilometers -- equaling the highest resolution to date for a global climate model. Most startling is the formation of numerous cloud types at groundbreaking fidelity. "When you hold that up against pictures taken from satellites, it's almost impossible to tell the difference between the simulation and the pictures," Webster said.
Working with Data
In addition to powerful computers, NCCS has long had a massive data archive for researchers to store, and later retrieve, model output and other data. The archive's current capacity is 17.5 petabytes (peta = 1,000 trillion). A new data management system (DMS) will reduce dataset duplication and keep the most heavily used datasets online for faster access. DMS software tools will help users to more easily locate and access the data they need.
NCCS is also expanding its data analysis and visualization capabilities. Webster explained that it is very difficult to analyze terabytes of data on a standard workstation, which might have a few hundred gigabytes of disk and perhaps eight gigabytes of memory. The NCCS' "Dali" analysis system offers "a machine comparable to the size of the data that is being generated by the computing center," Webster said. It is "specifically designed to allow a scientist to use that data as quickly as possible." Dali's capabilities include data visualization, scientific workflow management, and diagnostics for model evaluation and comparison. For visualization at room size, a 17- by 6-foot multi-screen visualization wall is engaging visitors and scientists with high-definition movies of simulation results.
Over the last few years, NCCS has distributed simulation data to users and non-users alike through its Data Portal. Especially to support data distribution for NASA's Intergovernmental Panel on Climate Change (IPCC) simulations, NCCS is deploying a node on the Earth System Grid (ESG). ESG integrates supercomputers with large-scale data and analysis servers at national laboratories and research centers, with the goal of "turning climate datasets into community resources."
The IPCC's Fifth Assessment Report, due to be completed in 2014, will include input from climate modeling groups worldwide. NASA contributions will come from GISS and GMAO, which are running the latest versions of their models on Discover. GISS ModelE will perform simulations going back a full millennium and forward to 2100. GMAO will focus on the years 1960 to 2035 and perform decadal prediction simulations using GEOS-5 and atmospheric chemistry-climate simulations using the GEOS Chemistry Climate Model. Employing ESG and its common data format, NCCS expects to distribute more than 50 terabytes of data from IPCC simulations to the climate research community.
Within that community, Webster sees Goddard and NCCS as particularly equipped to make contributions. "We have a tremendous amount of observational data, which is captured by our satellites," he said. "We have probably the largest collection of Earth scientists anywhere in the world, and we have this new state-of-the-art computing center. The combination of the data, the scientists, and the computing puts us in a unique position to enable advances in weather and climate research."
NASA Center for Climate Simulation – http://www.nccs.nasa.gov/
Global Modeling and Assimilation Office – http://gmao.gsfc.nasa.gov/
Goddard Institute for Space Studies – http://www.giss.nasa.gov/
NASA High-End Computing Program – http://www.hec.nasa.gov/
Multimedia Resources – http://svs.gsfc.nasa.gov/Gallery/NCCS.html
Flickr Gallery – http://www.flickr.com/photos/gsfc/sets/72157624141755552/
Goddard Release on NCCS-http://www.nasa.gov/centers/goddard/news/releases/2010/10-051.html
NASA's Goddard Space Flight Center