Data scientist is the most promising job of 2019 and upcoming years. Because every second we increase data, so we have to manage data that’s why data science industry is boom.
A data scientist is better statistician than any software engineer and better engineer as compared to any statistician.
1.What are the Roles and Responsibilities of a Data Scientists:
Data scientists are big data wranglers. They take huge amount of messy data points (unstructured and structured) and clean, massage and organize them with their formidable skills in math, statistics and programming. Then they apply all their analytic powers to uncover hidden solutions to business challenges and present it to the business. In other words, Data scientists utilize their knowledge of statistics and modelling to convert data into actionable insights about everything from product development to customer retention to new business opportunities.
Data Scientist need to have both technical and non-technical skills to perform their job in an effective manner. Technical skills are involved at 3 stages in Data Science. They include:
- Data Capture & pre-processing
- Data Analysis & pattern recognition
- Presentation & visualization
For performing above 3 stages, 3 categories of tools are needed – tools for pulling data, tools for analyzing the data, and tools for presenting the results. Here are the different tools available for performing the same:
2. Tools for data pulling & pre-processing
This is a must skill for all data scientists, regardless of whether you are using structured or unstructured data. Companies are using latest SQL engines like Apache Hive, Spark-SQL, Flink-SQL, Impala, etc.
b. Big Data Technologies
This is a must skill for all the Data Scientists. The data scientist needs to know about different big data technologies – 1st Gen technologies like Apache Hadoop & its ecosystem (hive, pig, flume, etc.), Next Gen like – Apache Spark and Apache Flink (Apache Flink is replacing Apache Spark quickly as Flink is a general purpose Big data engine, which can handle real-time stream as well.
As most raw data is stored on a UNIX or Linux server before it’s put in a data-store so it’s nice to be able to access the raw data without the dependency of a database. So Unix knowledge is good for Data Scientists.
Python is most popular language for data scientist. Python is an interpreted, object-oriented programming language with dynamic semantics. It is high level language with dynamic binding and typing.
3. Tools for Data Analysis & pattern matching
This depends on your level of statistical knowledge. Some tools are used for more advanced statistics and some for more basic statistics.
Lots of companies use SAS, so some basic SAS understanding is good. You can manipulate equations easily.
R is most popular in statistical world. R is an open-source tool and language that is object oriented, so you can use that anywhere. It is the first choice of any data scientist as most things are implemented in R.
c. Machine Leaning
Machine learning is the most demanding and most useful tool the data scientists must have. Machine learning algorithms are used for advanced analytics, predictive analytics, advanced pattern matching. There are lots of machine learning tools are available in the market like weka, nltk, etc. but machine learning tools on top of big data technologies are grabbing industry attention like Mahout (on top of Hadoop), MLlib (on top of Spark), FlinkML (on top of Flink).
4. Tools for Visualization
It is a popular tool, especially in Silicon Valley.
b. JMP (SAS subsidiary)
JMP has some nice visualization.
R also has great visualization support such as ggplot2, lattice, rCharts, google charts, shiny for webapps, slidify for presentations, etc.
Apart from above mentioned tools following tools are also popular – JasperSoft, SAP BI, QlikView, MicroStrategy, etc.
5. Non-Technical Skills
a. Business Acumen
One needs to have a solid understanding of the industry he is working in, to know the issues faced by the organization. Data scientist should be able to determine which problems are critical and which aren’t, for identifying new ways by which the data can be used as a leverage.
b. Communication Skills
Companies are searching for data scientists who can clearly and confidently translate their insights on the data to other teammates. A data scientist arms them with quantified insights.
c. Analytical Problem-Solving
Analytical problem solving skill is highly demanding for Data Scientist, so that the right approach can be used to get maximum output in available time and resources.