I’m not a data scientist, but I’ve worked on Big Data projects, so I have tangential knowledge of the field. I can;t tell you what it is to be a data scientist day to day, but I can answer why data science is becoming important.
There is a saying “When there is too much information, the curator becomes king”. We can see examples of this throughout history. Whenever, we invent a new skill, the growth of that skill goes through phases. First, it stays in the hand of the truly talented. It;s a new skill. Although, the skill has been invented, we haven;t figured out how to teach that skill. SO, the people who have an aptitude of that skill get to use it, because they can simply pick it up naturally. Over time, we get better at teaching that skill, and then there is an explosion of skilled workers. This is the “rennaisance” phase. A lot of skilled workers start using their skills, producing a good or providing a service. Obviously, not all skilled workers are equally skilled. So, the average consumer is inundated with choices. How does the consumer select between several goods/providers when s/he doesn’t have the skill/knowledge to choose between the good/services. This creates a demand for the curator. A curator is a person who has the skill, but starts devoting his effort to sort through the work products of others. WHen there is too much of a skill, there is too much “noise”, and the curator is able to handle the noise. S/he has the ability to say “Picasso is a maestro”. Eventually, people start trusting the curator, and you get to a point where the curator has all the power.
You see this happening time and again throughout history. Invention of the printing press lead to rise of libraries. The growth of textile technology in the East is what lead to traders picking the best work products and shipping it to the West, which is what lead to the Silk Road. Renassaince of the arts in Europe lead to museums.
The same thing is happening with computing right now. Computing is a skill. Being able to write software that can generate and handle data is a skill. Over the past 20 years, we have seen an explosion of skilled workers who can create that software. Lots of programmers, create lots of software. Lots of software creates lots of data.. That’s the biggest problem of the 21st century: Too much Data. and not all of it is good, or is organized in a way that makes it useful beyond a small scope.
Putting is rather simply, A Data Scientist is a curator of data. A Data Scientist is able to look at this whole mess of data, and sort out the noise from the signal. A Data Scientist is able to organize the data in a form that it can be used for lots of purposes. Right now, we are still figuring out what the tools of the Data Scientist should be. The curators haven;t figured out how to curate yet. Or put it another way, it’s like being a librarian without the Dewey decimal system. It can be exciting, or it can be stressful.
If you are training to be a data scientist right now, you just might be the next Dewey, or you might spend your life buried under a pile of data