Over the past 18 months we have seen a significant increase in the use of the term Data Scientist. Maybe it is because the HBR and many other publications have been promoting it.
Yes the areas of statistics and predictive analytics has evolved to include a lot more techniques and technologies.
Unfortunately the term Data Scientist has been over used and a lot of people have joined in with the Marketing hype. There are reports of organisations hiring a data scientist only to fire them within a few months because they did not deliver anything useful. Data Science is not some silver bullet to an organization problems and data science may not deliver anything useful, but in the vast majority of cases it will.
One thing that has been emerging over the past few weeks is that there seems to be two main types of Data Scientist. There are the Data Scientists who perform certain tasks or are focused on specific technologies. Then there are the Data Scientists who are not as technical as the previous group but are focused on how they can use the technologies to deliver business benefit. I like to call these Type I and Type II Data Scientist.
The Type I Data Scientist
This is perhaps to most common type of Data Scientist we see around, or the most common type of person who is calling themselves a Data Scientist. These are people who know a lot about and are really good at a technique or technology that is associated with Data Science. Some of these would be the “old school” type of people and include:
- Data Miners
- Predictive Modellers
- Machine Learning
- Data Warehousing
- Business Intelligence & Visualization
- Big Data
- R / Oracle / SAS / SPSS / etc.
The people in each of these have a deep knowledge of their topic and can tell/show you lots of detail about how best to explore data in their given field.
Yes you don’t have to have a Stats background to call yourself a Data Scientist, but some knowledge of Stats would be useful (you don’t need a PhD or Master)
The Type II Data Scientist
A Type II Data Scientist is a slightly different breed of person. They would have a little bit of knowledge of some or all of the areas listed under the Type I Data Scientist, but would not have the depth of knowledge of a topic that a Type I Data Scientist would have.
The Type II Data Scientist approaches the types of problems that organisations are facing in a different way. They will concentrate on the business goals and business problems that the organisation are facing. Based on these they will identify what the data scientist project will focus on, ensuring that there is a measurable outcome and business goal. The Type II Data Scientist will be a good communicator, being able to translate between the business problem and the technical environment necessary to deliver what is needed. During the project the data science team will discovery various insight about the data. The Type II Data Scientist will prioritise these and feed them back to the various business units. Some of these insights can range from something new, verifying business knowledge beliefs, areas where better data capture is needed, improvements in applications, etc.
The Type II Data Scientist would be the Data Science team leader within the organisation that manages the Type I Data Scientists, keeping them focused on the key deliverables of delivering measurable business benefits.
I really like the following phrase that I have come across recently:
“We haven't learned how to handle small data well, let alone throw big data on there.”
Data Science is not about Big Data. There is much more an organization can do with Data Science without having to get involved with Big Data. This is where the skills of the Type II Data Scientist is important, as they can direct the managers of an organization to focus on their real data problems and not get carried away with some of the marketing hype. When the time is right they will look at incorporating typical big data problems within their existing analytical environment.
One thing is for sure. The definition of “what is a” Data Scientist is still evolving. But there does seem to be some consensus the corresponds to the separation of the Type I and Type II Data Scientist roles.