To learn more about big data and what it can do, we talked with Melinda Thielbar. Melinda holds a PhD in statistics and has spent most of her career working as a software developer, making programs to aid in statistical analysis. She currently works in the people analytics team for Fidelity Investments.
Can you explain what big data is in your own words?
So I’ll give you the technical explanation for it first. When someone like me says big data, they mean three things. They mean there’s a lot of data, that the data’s coming in really fast, and that there are a lot of different kinds of information in the data. They call it the velocity, the variety, and the volume.
So if I have a million observations sitting on my hard drive and I use that same data over and over, it’s not technically big data. I call that large data.
Big data is something like a self-driving car that’s taking images from a camera, information from a satellite, and information from a sensory system and trying to figure out what to do. So that’s tons and tons of information that’s coming into that one small chip that’s on your car that’s trying to make decisions. It has to parse the information down really quickly and decide whether it should be stopping, slowing down, or what other kind of action it should take.
What big data projects is Fidelity involved in?
Right now I mostly work with large data, and that’s because I’m working with workforce data. And a workforce at a place like Fidelity doesn’t change that much. We might hire ten people in a week, so it’s not really a big data sort of situation.
People in other parts of the company definitely have big data. They use it to understand our customers’ needs and to do their jobs better. Fidelity has also released some big data products in the last few years, but I don’t work on those teams, so I really can’t comment on what they do or how they do it.
Are there any unusual places big data is used that people might not think of?
There’s a lot of behind the scenes fraud detection happening when you swipe your credit card. There’s a big data situation happening in the background.
Netflix actually is a great example of people who use big data very well. Netflix streams better than anybody else in the business. And part of the reason they do that is because they have a very sophisticated way of detecting, well, Melinda’s feed might slow down, so we’re going to spool her content a little bit differently and a little bit better, but that other person is on very fast Internet and it’s 3 a.m. over there, so we don’t have to worry about doing anything special for them.
So when you watch a movie on Netflix, there’s probably some big data decisions happening in the background just to get your content as quickly and seamlessly as possible.
How do you go from studying statistics to working with big data?
So, there are definitely a lot of different paths into a big data type of job. So my main job was doing the numerical algorithms and coming up with the decision rules. When the data points come in, the computer has to have a mathematical way to crunch all of that buried information into a single decision or a single action. All of that is some fairly complicated math, and it has to happen fast.
So what you get from a PhD in statistics is (1) an understanding of all of those operations that happen in the computer that you need to do in order to turn many, many, many pieces of information into a yes, no, stop, go, up, down, whatever you need, and (2) an understanding of the probability of being wrong. What’s an acceptable probability of being wrong? What’s the tiniest probability of being wrong that you can get? Those are the types of math that statistics focuses on.
What do you mean by “an acceptable probability of being wrong?”
When I swipe my credit card at the gas station, occasionally it’ll come back rejected even though it’s me. Maybe I got gas twice because I didn’t fill up enough the first time, or maybe I got gas in my husband’s car and now less than an hour later I’m putting gas in mine. And so the computer comes back and it says, Hey, Melinda, you can’t use your card here. If that happens too often I’m going to switch to another company, right? That’s too inconvenient. And so, there’s a whole lot of error in that process. There are acceptable reasons for things to happen that the computer might determine as fraud.
The self-driving car is a useful example because it’s been in the news a lot. All of the measurements that that car is taking—on the road, on the other cars that are in front of it—all of those measurements, they’re not perfect. There’s a little bit of error in each one. When you design an algorithm to turn all those imperfect measurements into a yes, no, up, down, stop, go, there’s a certain amount of error that is just part of the measurement process that you can’t get rid of.
And the mathematics of determining what that error is are actually kind of interesting and kind of fun. But it’s the stuff you learn how to do when you get an advanced degrees in statistics.
What are other paths into big data careers?
People who have degrees in computer science are more focused on making the programs work quickly—so writing really good, really efficient code. Sometimes they’ll come up with a programming language that only works on that system.
And then there are people with degrees in mathematics who will come up with what we call pure math, pure algorithms, pure theory, that the rest of us will then kind of use to try to make it work.
And so there are a lot of paths into big data. I’m just talking about the technical ones. There are certainly people in business, people in communications who work with the technology in a different way.
How will big data change the world?
One of the things I think big data’s doing on a human level is making technology easier to use. When I was in college and the Internet was first out there, search engine technology was just terrible. Your chances of getting something useful were tiny. And so a lot of us would keep these massive lists of websites that were useful. And a lot of those lists were lists of websites that tended to store good links to good content that we needed for whatever we were doing. A lot of people’s time was spent curating the web by hand. And then Google came along.
And now, if I get an error message in one of my programs that I don’t recognize, I copy it, I paste it into a Google search bar, and two or three links down I have the solution to my problem. That’s an incredible piece of technology. That is definitely big data. And it makes something as huge and complex as the Internet usable to anybody who can type into that search bar.
What are the current issues that are holding big data from doing even more?
I’d say we’re low on humans who can teach the computer in an effective way. There have been all these huge leaps within the last two years in the computer’s ability to take a while lot on unstructured information and spit out an answer. But you still need a person, and now that person has to be trained in some pretty sophisticated mathematics to figure out what the computer is doing and why. There are very few of those people.
I mean, when I went to get my PhD ten years ago, people who were in technology said, Melinda, this is a waste of time, why don’t you just work? And it’s a good thing I didn’t listen to them. Because it turns out that those skills that were considered kind of useless and ridiculous are exactly what you need for what the labor market is now.
The other limitation—and it’s not exactly a limitation—is that the computer is doing things that we never expected it to do. How do we use that? What do we do about it? There are a lot of people studying the ethics of a machine learning algorithm and the ethics of artificial intelligence because the computer now has gotten to a point where it can spit out an answer without us really understanding how it got there. It could have unconscious bias—like there could be bias actually coded into the data. The computer might be able to predict something, but it can’t teach me why it’s predicting that. So the computer’s ahead of a trained person, who should be able to at least follow the computer’s logic.
So I would say the biggest limitation right now is our human understanding of what big data can and cannot do.
Do you have any concerns about big data?
I have a myriad of concerns. It’s the we-can-but-should-we on a lot of things.
Facebook has been in the news a lot right now. We’ve made it possible for anyone to share any content at any time, and we’ve built algorithms so that the more something gets shared, the more people see it. I sincerely believe everyone in the Facebook team who says they had no idea this would have negative consequences and that it seemed really reasonable at the time. Again, we’re talking about the computer getting ahead of our human understanding.
We’re also thinking about people who are able to figure out the game, being able to game the computer under the covers in a way that the computer gives results that we never intended and that are, you could argue, bad in some measurable way—bad for society, bad for a certain person, bad for the country.
There is also a sincere worry that in the United States our whole society is built on work. You work for a living, and a lot of our social ties are built on work. And it really is true that computers are getting to a place where they can replace us. If you’ve ever gone into a Panera Bread recently, there are way more computers taking orders than there are people. If we don’t need every able-bodied adult working for pay, what does that mean? What should we do?
Do you have any advice for our students?
This is a fascinating field. But I got into it because I enjoy it. I feel like a lot of young people are being shoved toward this field because people think it’s going to be a source of income for them. I don’t think anybody should be working in this field who doesn’t love it.
So even if we’re got more jobs than we can fill right now, people who do this are passionate about what they do. That doesn’t mean someone who is passionate about art, or music, or caring for children, or caring for elderly people isn’t equally valid, and that’s not an equally good thing for someone to be doing.
So while I certainly think this is fun, and there are a lot of people who think this kind of job is fun, I don’t want anybody to every confuse fun with making a lot of money.
Michael says
I just attended a camp at NCSU sponsored by ID Tech regarding AI and machine learning.
I find AI and machine learning a fascinating field, but do not know where to go to receive more insight or training. I like the coding aspect but would like to explore the big picture aspect of big data as well.
What other fields might be explored besides coding and statistics please?
Also instead of studying for 10 years to obtain an advanced degree, is there a compressed degree?
It seems that after 10 years mathematics, statistics, etc. would already be obsolete if AI is moving so quickly.
Who are some contacts that I might be able to reach out to discuss the information with please?
I appreciate any insight that you can offer.
I look forward to hearing from you soon.
Thank you,
Michael G
Matt Hartman says
Hi Michael. Unfortunately, I don’t think I’m qualified to answer all your questions! But as Melinda mentioned in the interview, there are a number of other fields–from the hardware side of computer science, to the philosophical questions about the ethics of AI, to the kinds of things you’d study in business school. Many of those paths also do not require advanced degrees. One thing you could do to learn more is talk to your teachers to find out what other programs are available in your area. If you’re interested in what kind of training is required, you could also reach out to those departments at nearby colleges.