Suggested Searches

Small Steps, Giant Leaps: Episode 5, Big Data

Episode 5Mar 5, 2019

John Sprague, Deputy Chief Information Officer for Technology, Data and Innovation at NASA Headquarters, discusses big data.

Small Steps, Giant Leaps podcast cover art

John Sprague, Deputy Chief Information Officer for Technology, Data and Innovation at NASA Headquarters, discusses big data.

John Sprague: NASA has always had big data.

Some folks out there might think, oh, it’s big data. It’s very complex and these tools are really complex and it’s hard. Try it. Get involved. Find the data. You might find that that’s your niche and you really love it.

Data becomes information. Information becomes knowledge. Then knowledge becomes wisdom. That’s what you’re shooting for.

Deana Nunley (Host): You’re listening to Small Steps, Giant Leaps – a NASA APPEL Knowledge Services podcast featuring interviews and stories, tapping into project experiences in order to unravel lessons learned, identify best practices and discover novel ideas.

I’m Deana Nunley.

NASA gathers millions of data points – also known as big data – from dozens of active missions. Every hour, hundreds of terabytes of data are collected. NASA’s big data is used for everything from weather forecasting to searching for distant galaxies.

Our guest today, John Sprague, has been with NASA for about 10 years and currently serves as the Deputy Chief Information Officer for Technology, Data and Innovation at NASA headquarters.

Thanks for taking time to talk with us. Let’s start by defining the term. John, what’s a good working definition of big data?

Sprague: It’s funny. When you ask and when you look online, there are tons out there, but two Ames Research Center scientists coined the term back in the ’90s. I only knew that, not because I worked at NASA, but I just looked it up on the Internet and I was blown away. Two NASA folks created the term. They were out at Ames Research Center. It was too big for their computer they had at the time at their desk, which was a desktop. It was just too big and they couldn’t run it on that, so they said, “Oh, it’s big data.” They put it into a paper and that was one of the first instances that it was found.

But it can also be defined by another person, whose name is Doug Laney, from Gartner. He didn’t work at Gartner at the time, but he coined the three Vs of big data, and that is volume, velocity, and variety. “Oh, it’s just too big, too much.” Well, also, it can come at you too fast, faster than you can process it. That would be the velocity side. Then variety, you’re getting data. You can get it on audio recordings now. You can get it on video. You can get it in all these different formats, dot-docs and dot this and dot-this.That makes it really hard to put all that together, scrunch it and get the information that you’re looking for.

And if you think about it, like two big spreadsheets, gigantic spreadsheets, they both have their own formats, and this one has columns and rows that are different than this one. How do you put that together? That’s the variety and that makes it very difficult to put together, but there are tools out there to use. If you ever go out there and look at something called the data – just type Big Data Landscape. There’s a slide that will pop up and it’s mind-blowing. There are so many different companies out there doing all the little parts and pieces. There’s no one company that does it all. There’s no one standard that does it all. So you just go and find what you need.

A lot of the NASA folks like to take the tools that are out there, and then bring them in and use them with other tools and tweak them. So it’s not that that tool just doesn’t do it. They’ve got to make some changes to it, and then it does what they want it to do.

Host: What are some of the sources of big data in NASA?

Sprague: We have data that’s all over the place. We have data that’s out in the DAACs they call it, Distributed Active Archive Centers. They’re scattered all over the United States and that’s where all the science data goes. Each one has a specialty for whatever they hold, but it’s really cool that we’ve got a repository.

But that’s just the science data. We have data all over the place. You’ve got mission data in other areas. You’ve got data coming down the satellites and it might sit on a server somewhere for a little bit, but not long, because NASA shares its data, a lot of data. There are all kinds of sites out there you can go to. Data.Nasa.gov has a lot of information. But when you get there, you’re like, “How do I access that data? How can I use it best?” Then you might go to Api.Nasa.gov and use an application programming interface to be able to access that data.

Since our charter back in 1958 came out and said NASA needed to share data with the world, we’ve been doing that. I think we’re one of the bigger agencies that do a lot of that sharing of data out there.

Host: And how does big data fit into the NASA landscape?

Sprague: There’s a lot of different places out there that have big data. You can think of Department of Energy. You can think of Google and Amazon and all those others. But NASA has always had big data. We’ve always had all this data coming from satellites and probes, even ground instruments. So we’ve just always had it, and these NASA missions are made to gather data.

It’s funny. I always say and it’s even on the bottom of my e-mail, that data becomes information. Information becomes knowledge. Then knowledge becomes wisdom. That’s what you’re shooting for. The data is great and you want the data. Then you want to turn it into something else and make decisions on it. That’s the big point.

As a matter of fact, one of our agency data scientists, Dr. Brian Thomas, when he first got here, said, “Oh, I’m going to map the NASA data universe.” So he started looking at all the places we have our data and how it all fit together. After about a month or two, he had a big map up and it was amazing. We have data that’s sitting on our servers, our data farms, those kinds of datacenters, places like that. We have people that have them on their own machines. We have them in contractors’ databases and their servers and their centers. Then there’s also researchers from a lot of the universities, so our university partners. They have data, some of our data. So it’s scattered all over the place, but that’s what makes NASA unique, I think, and makes it work for us.

Host: How would you describe where NASA currently stands with big data?

Sprague: I’d describe the current state as we’ve got data. We’re doing very well with it, but we’re also gearing up. I say that because some of the new probes and the satellites that are going up right now are going to be giving us a lot more data. We have the DAACs out there and they’re gearing up, too. Should they stay where they’re at and everything is great, and just maybe make them bigger? Do you supplement it with cloud? Do you move it all to the cloud? I mean there are all kinds of different things that could happen in the future.

So that’s kind of where we’re at. We’ve got a lot of data and there’s way more coming. Some of the new tools out there might be able to process more of the data out on the probe, out on the satellite. That way, you’re not bringing as much down, although we love data. That data that you don’t collect, and you let go and you delete, might be the one that gives you the nugget of truth that you were looking for the whole time.

Host: What kind of impact has big data had on NASA programs and projects?

Sprague: I think the development of all the tools that are out there have made it easier for us to be able to collect the data, to start with, but then to process it. That’s usually the hardest part. I saw a bunch of studies out there and they’re always saying that the data scientists spend 60 percent of their time or 70 percent of their time just getting the data, wrangling it, and putting it all together. That’s a big problem, very painful that you’re spending that much time just messing with the data, not actually getting the information that you want out of it.

Host: Do you have a couple specific examples that you could share with us where big data has been used, and where NASA has had success because of it?

Sprague: I’ve got a really good one that I want to share, and this one we call EVA, Extravehicular Activities. That’s when an astronaut is up on the Space Station or somewhere else, and they’re moving around outside the spacecraft. Well, back in 2013, an astronaut went out on a mission. He was walking around out there, doing his thing, and all of a sudden he felt a little bit of moisture on the back of his neck. He reported it. Then he felt it going up towards the top of his head. He reported it and they said, “All right. Get back in.”

So he started making his way back in and, as he got closer, the water started to cover his eyes and it started to go all around, and all he had was where he was breathing. It came that close to having water that he was breathing in.

So what would you think would happen after that? They’d do an investigation, of course. Where did it come from? What’s going on? Why? In the past, they might have thought it was sweat or something like that. This was way too much water.

So when they did the investigation, it took six contractors two weeks to pull all the data together, because it was in file cabinets. It was in the margins in the papers in the file cabinets. It was in the contractor’s database. It was in the NASA database. It was all over the place. So right after that, when they were done with their investigation, they said, “Help.” They came to my experts and said, “Hey, can you help?”

One of our talented folks out at Ames Research Center, Sandeep Shetye, pulled a team together and said, “We can help you,” and he did it. EVA, the Extravehicular Activities team that’s based out of Johnson, liked it so much that they funded the whole thing. So none of it came out of my budget, which was a great story for me, but he helped them solve their data problem.

What he did is he created a 3D model that you could spin the spacesuit, not just the helmet, the spacesuit around and figure out, click on any part of it, do a quick search on anything, and it popped the data up instantly. That’s something they didn’t have before.

By the way, that then got picked up by the new spacesuit that they’re working on right now, the same exact model. So this model has spread. The next ones that picked it up, that heard about it and said, “Wait. Show me the demo,” and they saw it was the Orion team, so the Orion spacecraft that’s going to go on top of the rocket that will take us out there. They said, “We’ve got tons of data with every part number, piece, whatever.” So if you were to click on a screw on the side of a helmet or any other part, it would give you who created it, manufacturer, specs on it, who last touched it, when it’s due for maintenance, everything, all in one place.

So you’ve got a lot of data there. That’s another problem with data. He and his team solved it, and it’s now the Johnson engineering folks who are looking at it, some Marshall engineering folks who are looking at it. The Stennis Space Center down in Mississippi, they saw it and loved it, and they’re using it for their safety data, to be able to make it more accessible and easy to get to, and maybe find, “Wait a minute. These three different unrelated incidents all point to one thing.” That’s what you really want.

You can look at the spacesuit, turn it around, blow it up and find a part you’re looking for or something like that. So it just was a great tool. It’s a platform. We call it Insight and it’s just been phenomenal. That’s one instance.

Another one that I love to talk about, another use case that we’ve done is Exploration Medical Capabilities, ExMC. We all call it something a little shorter. We call it “Doc in a Box.” So it’s a doctor in a box. Now, you’ve got all the astronaut data. You put that all in there. Then you hook them up with probes and everything when they’re out there. They’re usually hooked up with probes anyway.

So you take all that data, and then if something happens and you’re way out in the middle of nowhere, like maybe on your way to Mars or your way back or something like that, if something happens, you don’t have time to try to get a flight surgeon on terra firma here, to be able to tell those people what to do, especially if it’s step-by-step. This Doc in a Box can do that kind of thing. It’s an amazing tool. I love it. I love to mention it, another one done with our folks, with a different team, out at Ames Research Center. So, really cool. That’s just two of the ones I can think of right off the top of my head that I like to talk about.

Host: John, if a program manager or project manager says, “Hey, I’m in. I want to use this in my program or project. We need to use big data.” What are some recommendations that you have for them to get started?

Sprague: I would tell those program managers – by the way, I am a program manager. I got my PMP, Program Management Professional certification a while back. They have a hard job. So when it comes to big data, I would tell them think about data. Where will it be stored? Who has access to it?

Most of the data is open. You do it. You play with it. You do your thing with it. You turn it in and then you share it out with the world. Some data you can’t, astronaut data, like their medical data. If you had enough of that medical data, you could tell which astronaut it is. So that’s not fair to them, right. So you’ve got to be careful of that kind of thing, too, you know, who has access.

For contracts, at least thinking about NASA contracts, do we have the right clauses in there? We have an internal site that’s only internal to NASA, that folks at NASA can go to and pull the clause that’s a good one. We think it is. It’s like a four-pager. But basically it just says when the contract is done, the data is still NASA’s data. So that kind of a thing is very important to get that in there right. I know a lot of the mission folks, when they’re doing their contracts, now pull that clause and go use it. I think everybody should use it. The site for the NASA folks is NIAM.NASA.gov. NIAM.NASA.gov has the clause on it.

Host: We’ll post that link to our podcast page at APPEL.NASA.gov/podcast. What are some of the newer trends with big data? How do you see them impacting NASA?

Sprague: I mentioned more big data is coming, so here’s what I mean. A few years ago, we had a pulsed laser beam set a record in data transmission. It transmitted almost between here and the moon, 239,000 miles. It actually had a record-breaking rate of 622 megabits per second. That’s a lot. That’s a lot of data in a second, and it’s using pulse lasers to do that.

When that happens in the future, and we’re starting to use that more rather than radio waves, which are smaller, a lot slower, less data is transmitted, that’s going to be a gigantic influx of data, good data that we really want. So that’s going be a problem. That’s a future trend that’s going to be great for NASA, but it’s also going to be hard because of all the extra data coming.

Two other areas that are very similar to that are artificial intelligence. Artificial intelligence is all about pulling in the data and then doing something with it. Machine learning is the same thing. You train the data. You train the algorithm on the machine learning to pull in that data and then process it. That’s a trend that’s just going crazy right now, not just at NASA, but all across the world. You’re starting to see more and more and more companies and agencies and others using artificial intelligence and machine learning. So that’s a trend that’s not going to go away.

We’re getting ready to do a course called R Shiny here at NASA headquarters for some of our folks, to get them up to speed on one of the tools that’s out there. It’s more or less an open source tool, and we love open source tools, as you might imagine. As a matter of fact, that’s another trend that’s come out just in the last five or even ten years that has become super-popular, going and getting the code that’s free, that then people can go work on and change and make it useful for them, for whatever they’re trying to process.

Host: And with open source you’re able to leverage the knowledge, skills and capabilities of a multitude of different people?

Sprague: Yeah. That’s taking advantage of citizen science, scientists out there, right. There are all these folks out there and maybe one or two of them around the world have a little more time and can really delve in deep, where others don’t have the time and can’t do it right now or don’t have the skills either, and be able to help that.

That reminds me. In OCIO, we used to run something called the International Space, a little competition that was going on. We ran it for many years and it just kept getting bigger and bigger and bigger. One of the quotes a long time ago in the news about it was that it was the biggest collaboration event in the world, at that one moment in time. So that International Space Apps Challenge that we put out, I keep hearing that it keeps getting bigger and bigger every year. I love participating in it and helping when it comes up every year.

Really, it’s just one weekend a month that you’re doing that. They generate a lot of data. As a matter of fact, a lot of companies get created out of that, from a group of people that got together, never knew each other before, some of them were even virtual, and they all helped each other with this project. When they were done, they were like, “I don’t want this to end.” Then they’d go create a company. It does great things with data, creating applications.

Host: Could you suggest NASA resources where people can get involved or get more information?

Sprague: There are a lot of resources, so let me just go through a few that I can think of right now. One is, if you’re within NASA and you’re listening to this, you can join the big data working group meeting that we have once a month, the second Thursday of every month from 2:30 til 4 Eastern time. We get all the subject matter experts with data, and we’re talking all over the place, too, data visualization, data analytics, anything data that they’re interested in. Those folks come to this group. We have about 160 people, I think, right now that come to the meeting. So that’s just a once a month meeting that we get together and talk about data challenges, and what’s going on, and the latest tools, and, “I’m having a problem with this.” “Well, hey, I just solved that problem last week. Here’s the solution.” So a lot of sharing goes on there.

And we do have a once a year meeting where we rotate different centers. We’ve already gone to Langley and had our annual meeting, big data. Big Think we call it. We’ve gone to Goddard. We’ve gone to Johnson. We’ve gone to Kennedy. We went to Ames Research Center. We went to JPL. Then this last one, we just went to Glenn Research Center out in Ohio. Phenomenal meeting, to have all the experts come together face-to-face, where you can really get into the nitty-gritty with what’s going on and what we can work on. As a matter of fact, during that meeting, we came up with a list of here’s our challenges that we need to work on. It was pretty cool.

Also, below the big data working group, we have two COIs or Communities of Interest. One is the Artificial Intelligence Community of Interest, and then another is the Data Management Community of Interest. The AI one is run by James McClellan down at Johnson Space Center, who works for me. Then the other one is Yuri Gawdiak, who works for Aero. He runs the data analytics one. They do really phenomenal work. They meet once a month, too, and they get a lot of really good things done. But those are like splinter groups that go off and do more things.

Host: Do you have any closing thoughts?

Sprague: Some folks out there might think, oh, it’s big data. It’s very complex and these tools are really complex and it’s hard. Try it. Get involved. Find the data. There are a bunch of different agencies and companies that share data out there. Go play with the tools. You might find that that’s your niche and you really love it. I’ll tell you what. A lot of different places that I see and deal with are all going towards that. They are all looking for those kinds of skills in people, to be able to do stuff with the data. That would be my final thought. Get out there and do it.

Host: Thanks to John Sprague for sharing his thoughts and expertise on big data. If you’re interested in “getting involved” and want more information, go to appel.nasa.gov/podcast. You’ll find links to sites and topics mentioned on the show today along with John’s bio and a transcript.

NASA relies heavily on big data to help the agency with artificial intelligence, and we’ll talk about AI on our next episode. We invite you to join us.

And if you haven’t already, here’s a quick reminder to subscribe to the podcast so you won’t miss an episode of Small Steps, Giant Leaps.

Thanks for listening.

Angelo Conner, Millennium Engineering and Integration Company, contributed to the development of this episode.