Stepping into Data Science


The hype started in 2014 for me. I was a year out of university. My bachelors training in Computer Science had landed me a role as a Software Engineer at the Singapore leg of a major US multinational firm. It started with a town hall meeting where they told us about Hadoop. We knew the company was very interested in the subject but the concept was too new for most of the employees to contextualize. Over the next 3 years, we saw a whole department emerge around Big Data and Data Science.

I can't say that I was drawn to the idea from the very start - mainly because none of it seemed new or groundbreaking to me. Most of the technologies had been around for a while and it seemed like a big marketing stunt. I was to learn later that the "big deal" about big data is not that the data is big now (it has always been big relative to time), or that we've suddenly figured out the mathematics behind ('course not). The big deal is in the processing power that we've managed to accomplish in the last 10 years - that is the real super power. 

Even while developing software applications, I was dabbling in very high level data analytics from the get-go. Collecting data from various sources, cleaning it, transforming it, writing ETL scripts to automate the flow, running simple statistics and finally visualizing dashboards for the management to make sense of it, was all part of my job. But none of it involved sophisticated machine learning, and that was pretty much a deal breaker when it came to looking for a Data Science role. The irony was that most of my "Data Scientist" friends (the first batch at least) picked up the skills on the job. Yet somehow the employers were expecting a post graduate degree and years of experience in a field that had just hit the limelight, and in turn required you to have a background in mathematics, computer science and the relevant industry domain. 

Deciding to make some sense of things, I reached out to all the Data Scientists I could connect with on LinkedIn. Some had similar backgrounds as myself, others held technical PhDs. I managed to talk to a few of them over the phone to really understand what they did and the type of core skills that came in handy. One of the things I learnt was that the way people portray themselves publicly, even in the setting of a professional network, is often far from reality. A few of the Data Scientists confessed that their company required them to hold the title and what they were doing was not exactly Machine Learning. This is important to know especially for someone who is looking to start out in the field and is feeling intimidated by the flood of seemingly great profiles out there. Having spoken with a lot of people both from within and outside the field of Data Science over the years, I can confidently say that a good proportion of them have no idea how things work. They pick up buzz words from the news and learn to regurgitate them as required - which is a skill in itself I'd say. That being said, we're blessed with technologies that allow us to be so connected and there are some really smart people out there to connect with. I got a tonne of great advice from them as I was starting out (some of which I'll try to dole out in this blog).

Lucky for us, the field of Data Analytics and AI is largely driven by the open source community so the tools are all out there for anybody to try out. There's no excuse not to. You don't need a post graduate degree - you need the core understanding of the concepts and a working knowledge of the tools. There are plenty of online communities such as Kaggle where you can build your profile up from scratch and set yourself apart from the rest. That's what the employers are after - the go-getters. In addition, I see two new trends - one is the demand for technical people who can talk and really explain what they're doing and how it affects the business. Instead of having a separate team of consultants to go and talk to the client and bring back a list of questions for the techies, the firms want the techies to go and directly talk to the client, cutting out the middlemen. This especially applies to Data Scientists. The second trend is more towards wanting Data Scientists who understand the length and depth of the actual deployment process - not just limited to creating models. This is being observed by firms who are trying to be cost-effective by having a single position to fulfill the role of a Data Scientist and a Data Engineer.

This blog is an attempt at clearing up the smoke and explaining this world bit by bit, starting with the absolute basic concepts - the way I taught myself. It is targeted at folks looking to get their feet wet and feeling overwhelmed by the amount of stuff out there. Hopefully I can get the heart rate down a notch and give you an increased sense of appreciation for the field and how far we've come.

Comments

Popular Posts