To learn more about big data and what it can do, we talked with Melinda Thielbar. Melinda holds a PhD in statistics and has spent most of her career working as a software developer, making programs to aid in statistical analysis. She currently works in the people analytics team for Fidelity Investments.
Can you explain what big data is in your own words?
So I’ll give you the technical explanation for it first. When someone like me says big data, they mean three things. They mean there’s a lot of data, that the data’s coming in really fast, and that there are a lot of different kinds of information in the data. They call it the velocity, the variety, and the volume.
So if I have a million observations sitting on my hard drive and I use that same data over and over, it’s not technically big data. I call that large data.
Big data is something like a self-driving car that’s taking images from a camera, information from a satellite, and information from a sensory system and trying to figure out what to do. So that’s tons and tons of information that’s coming into that one small chip that’s on your car that’s trying to make decisions. It has to parse the information down really quickly and decide whether it should be stopping, slowing down, or what other kind of action it should take.
Are there any unusual places big data is used that people might not think of?
Netflix actually is a great example of people who use big data very well. Netflix streams better than anybody else in the business. And part of the reason they do that is because they have a very sophisticated way of detecting, well, Melinda’s feed might slow down, so we’re going to spool her content a little bit differently and a little bit better, but that other person is on very fast Internet and it’s 3 a.m. over there, so we don’t have to worry about doing anything special for them.
So when you watch a movie on Netflix, there’s probably some big data decisions happening in the background just to get your content as quickly and seamlessly as possible.
How do you go from studying statistics to working with big data?
So, there are definitely a lot of different paths into a big data type of job. So my main job was doing the numerical algorithms and coming up with the decision rules. When the data points come in, the computer has to have a mathematical way to crunch all of that buried information into a single decision or a single action. All of that is some fairly complicated math, and it has to happen fast.
So what you get from a PhD in statistics is (1) an understanding of all of those operations that happen in the computer that you need to do in order to turn many, many, many pieces of information into a yes, no, stop, go, up, down, whatever you need, and (2) an understanding of the probability of being wrong. What’s an acceptable probability of being wrong? What’s the tiniest probability of being wrong that you can get? Those are the types of math that statistics focuses on.
What do you mean by “an acceptable probability of being wrong?”
When I swipe my credit card at the gas station, occasionally it’ll come back rejected even though it’s me. Maybe I got gas twice because I didn’t fill up enough the first time, or maybe I got gas in my husband’s car and now less than an hour later I’m putting gas in mine. And so the computer comes back and it says, Hey, Melinda, you can’t use your card here. If that happens too often I’m going to switch to another company, right? That’s too inconvenient. And so, there’s a whole lot of error in that process. There are acceptable reasons for things to happen that the computer might determine as fraud.
The self-driving car is a useful example because it’s been in the news a lot. All of the measurements that that car is taking—on the road, on the other cars that are in front of it—all of those measurements, they’re not perfect. There’s a little bit of error in each one. When you design an algorithm to turn all those imperfect measurements into a yes, no, up, down, stop, go, there’s a certain amount of error that is just part of the measurement process that you can’t get rid of.
And the mathematics of determining what that error is are actually kind of interesting and kind of fun. But it’s the stuff you learn how to do when you get an advanced degrees in statistics.
How will big data change the world?
One of the things I think big data’s doing on a human level is making technology easier to use. When I was in college and the Internet was first out there, search engine technology was just terrible. Your chances of getting something useful were tiny. And so a lot of us would keep these massive lists of websites that were useful. And a lot of those lists were lists of websites that tended to store good links to good content that we needed for whatever we were doing. A lot of people’s time was spent curating the web by hand. And then Google came along.
And now, if I get an error message in one of my programs that I don’t recognize, I copy it, I paste it into a Google search bar, and two or three links down I have the solution to my problem. That’s an incredible piece of technology. That is definitely big data. And it makes something as huge and complex as the Internet usable to anybody who can type into that search bar.
Do you have any concerns about big data?
I have a myriad of concerns. It’s the we-can-but-should-we on a lot of things.
Facebook has been in the news a lot right now. We’ve made it possible for anyone to share any content at any time, and we’ve built algorithms so that the more something gets shared, the more people see it. I sincerely believe everyone in the Facebook team who says they had no idea this would have negative consequences and that it seemed really reasonable at the time. Again, we’re talking about the computer getting ahead of our human understanding.
We’re also thinking about people who are able to figure out the game, being able to game the computer under the covers in a way that the computer gives results that we never intended and that are, you could argue, bad in some measurable way—bad for society, bad for a certain person, bad for the country.
There is also a sincere worry that in the United States our whole society is built on work. You work for a living, and a lot of our social ties are built on work. And it really is true that computers are getting to a place where they can replace us. If you’ve ever gone into a Panera Bread recently, there are way more computers taking orders than there are people. If we don’t need every able-bodied adult working for pay, what does that mean? What should we do?