Alok Gupta, a Data Science Manager at Airbnb, is a model for applied knowledge. After completing advanced degrees in mathematics, finance and statistics, at some of England’s most prestigious universities, Gupta quickly made a name for himself in quant trading.
With an ability to extract insight from data, however, the possibility of working in tech as a data scientist proved an irresistible opportunity. In order to find out more about Gupta, Airbnb and data science, the team at Learnerbly sat down with him to discuss his journey so far.
How did you become the Data Science Manager at Airbnb?
I started my professional career in London, having completed an undergraduate degree in Maths at Cambridge, a Master’s at Imperial College London, and a PhD at Oxford. In 2010, I successfully defended my PhD, before taking up a Junior Research Fellowship in Statistical Finance.
I trained in derivative pricing but by the time I finished up at Oxford the markets had collapsed so no one wanted to touch a derivative. When I took the plunge from academia to finance I had to be fairly flexible, so I ended up looking at quantitative trading, in particular high frequency algorithmic trading.
“I worked as a high-frequency trader for Deutsche Bank for two years, first in London, then in New York, and it was in New York, in 2013, that my eyes were really opened to the possibilities of tech and data science.”
I was intrigued by the prospect of applying my trade to another industry, so I began to go to lots of tech meetups, such as the monthly Data Driven NYC meetup at the Bloomberg offices. When the offer came up at Airbnb it was a no-brainer really: it was an exciting time for the company and for tech more generally. I started there in April 2014, with the risk team, and worked on things like fraud prediction and detecting fake listings.
Because of my background in finance, I actually started fairly low down on the food chain. I was an unknown quantity for them, so it was a matter of starting out slow and proving my worth. It’s gone really well since then, and six months into the job I was promoted to a manager.
What skills helped you get this position?
Having strong technical skills definitely helped. If you want to get into this industry then you simply have to be technically competent. Can you get clean data, load it onto Python or R, play with it, and come up with interesting findings? That’s a really basic skillset that you need. But soft skills are equally important. Can you manage your time, for instance? Knowing how much time you have and which directions to focus on for the greatest impact can really set you apart from the crowd.
How would someone go about developing these skills?
The best thing to do is to get real data and do real problems, so that you can start to familiarise yourself with the main programming languages. Kaggle competitions are fantastic. Kaggle provides you with huge amounts of data, really good problems, lots of interactive platforms, and a community forum. It’s simple: take the data, upload it, and start playing with it.
Who are your heroes in the industry?
DJ Patil is a big name in data science. He was recently drafted by the US government to act as the country’s first Chief Data Scientist, and he’s trying to bring a more quantitative approach for the US government. The other data scientist that I admire is Hilary Mason, who is a huge “women in data science” advocate. She used to work at Bitly and has started up a number of ventures herself. She also advises an organisation called DataKind, a data science for social good platform. She’s very good at advocating in this space, she runs a blog, and she presents well – definitely one to follow. My boss Riley Newman, who was employee number seven at Airbnb, helped build all the data structures at Airbnb, such as data ingestion, data pipelines and data analysis. He’s certainly another person to follow.
Do you teach or present at any industry events?
I go to conferences every few months to talk about data science at Airbnb. I also do joint research at Stanford with people in the Social Sciences. Teaching wise I hope to do some more work at Imperial College London. The really interesting conferences that I haven’t been to yet are probably SXSW, which is a huge annual conference in Austin, and the Hadoop conference in New York, which gathers a lot of data science people and machine learning professionals. There are more details of conferences I have participated in on my data science blog.
What’s your best learning experience been?
Joining Airbnb and switching industries was interesting. I found the first month very very difficult, to the point that I nearly regretted my decision. It was the first time I had switched from Windows to Mac, so I couldn’t find my right click! I also had to switch from Matlab to Python, and I had never used Python before. I started to use a lot of command line interface, started to use a code version control called Github. This stuff was all very new to me, so it was very tough; it was like a baptism of fire. I just had to get on with it and learn quickly.
Where do your team members go for training?
A couple of people have recently done Udacity courses. But other than that it’s more peer to peer learning. We set up an internal knowledge based sharing platform whereby if we find something interesting, whether that’s in the product, or a piece of analysis or code, or a new method or technique, we write it up formally and check it in to our internal knowledge database, and then people can read it and share it. All the content is tagged, so if someone wants to learn something about Python then they can look at the tags and find out where to go.
What’s your view on traditional university learning versus vocational training?
I think that university is great for developing your problem solving skills. That being said, I think there are a number of ways to go about picking up what you need. You certainly don’t have to do maths or statistics at university. At Airbnb, for instance, we hire physicists, economists, chemists, really anyone who is adept with numbers and quantitative analysis. So there are many routes into the industry. There are also a number of data science bootcamps that are starting to proliferate, such as the Insight bootcamp and Zipfian bootcamp.
Is there anything that you believe data scientists should be aware of?
One thing that I took for granted in banking was that the end goal was always very clear. With high-frequency trading, you’re just trying to make more profit: it’s simple when you look at it like that. If we can predict the market with more accuracy, so we make the right bet, then we make more money. At a place like Airbnb, it’s less clear what the objective function is. Are we trying to get more people to make a booking? Or are we trying to get more people to make a good booking? I underestimated how much time and thought goes into shaping the objective function, which is where a data scientist is really looked to as a thought leader in a company like Airbnb. What’s the metric we should all be driving towards? Because once you have a metric that everybody can track, and that everyone understands and agrees on, it’s pretty much plain sailing. You decide which project is likely to move the metric, you A/B test it, and you’re done.
What trends do you think we’ll see in 2016 and beyond?
Everyone’s getting really excited about how data can be sourced from offline sources. We’re great at getting data from the online world. As soon as you log into your laptop, companies such as Google and Facebook can measure every click, every mouse movement, how many times someone goes to a certain page. But because Airbnb has this huge offline component, as soon as a guest knocks on the front door and is in the host’s home, between then and when they check out and send in a review, it’s like a big black hole. Are they having a good time or are they having a bad time? Or are they anxious? This overlaps with what other companies call the “Internet of Things.” I think you’ll start to see more developments in offline data service providers because you can’t do anything without the data. As soon as you find data in these offline spaces you can start to make interventions in real time.
What advice would you give to someone looking to enter your industry?
To get an interview with the big tech companies you need to have a CV that ticks all the right boxes. Having a data science masters or boot camp behind you probably helps, as does any experience working with data. If you demonstrate participation in online competitions, that can also make you stand out. Achieving a high rank in Kaggle competitions, for example, is a great way to prepare. It’s not good enough to just read books: you’ve got to get your hands on data. When you start trying to solve data problems like the ones on Kaggle, you’ll notice that the same sorts of issues keep cropping up such as missing values and zeroes when you were expecting a non-zero number. The more practice you get the easier it will be to impress when you’re under interview pressure.
Online courses such as Udemy and Coursera are also good, as they give you a good overview of the key concepts and tools that you need to know. They’re a great way of keeping in touch with the latest developments. I wouldn’t worry about learning every language under the sun; instead, I would focus on learning a really useful language such as Python or R very deeply. Another thing that can help is to start blogging. Having a blog that shows how you have solved certain problems can really make you stand out as a candidate. If you can talk about how you scraped a website because you were interested in a particular problem, for instance, that would be very impressive. It’s almost like a portfolio for an artist; it’s a demonstration of your ability to find interesting data sets from across the net and run analyses.
Are there any books, resources or events that you would recommend to readers wanting to find out more?
Python for Data Analysis is a staple bible for data scientists. DJ Patil just co-authored a book with Hilary Mason on what data science is. That’s another bible for data scientists. I also went through the Coursera course by a guy called Andrew Ng. But the best thing to do is to go to meetups. Go to meetup.com and search for data science – there were tons of really useful meetups in New York when I was just starting out. If you go to them you’ll meet people and get exposure to the big tech players. After a few months of going to these events, you’ll start to notice the same people there, and you’ll feel far more connected to the data science community.
What attributes do you look for in a candidate?
People who take responsibility for the data they work on, who really take ownership of it are hugely impressive. I’m always looking for people who are willing to go really deep into the data. A willingness to say, “I looked at this data and there were some problems, so I had to do some cleaning before I tackled the heart of the problem” is the sort of mindset that we’re after. And then the softer side to that is linking it to whatever objectives you have. These soft skills are really important. Can you communicate your findings well? And can you articulate the assumptions you have made? This stuff is really important if you want to be successful.
A deep academic grounding in mathematics and statistics underpins Alok Gupta’s approach to data science. Yet far from having knowledge for knowledge’s sake, Gupta has consistently demonstrated the value of applied learning. His ability to engage critically with data in a rigorous fashion is in this sense not only applicable to data scientists. On the contrary, it’s relevant for anyone interested in modelling disruption.