Profile photo for Quora User

SELF STARTER WAY
For a self-starter novice, here is an outline that one can start with. (this is reproduced from my blog-
How to acquire the "Essential Skill Set"?- the Self Starter way). The idea is to pick one or two resources (links) from each sub group and learn about the same.

0. Basic Pre-requisites:

1. Acquire & Scrub Data:

2. Filter & Mine data:

Data Mining Map, Coursera - Machine Learning, Stanford - Statistical Learning, MITx: The Analytics Edge, STATS 202 Data Mining & Analysis, Learning From Data - CalTech, Coursera - Web Intelligence & Big Data

3. Represent & Refine Data: Tableau-Training & Tutorials, Data visualisation in R with ggplot2 and plyr, Predictive Analytics: Overview and Data visualization, Flowing Data-Tutorials, UC Berkeley-Data Visualization, D3.js Tutorial

4. Domain Knowledge: This skill is developed through experience working in an industry. Each dataset is different and comes with certain assumptions and industry knowledge. For example, a data analyst specializing in stock market data would need time to develop knowledge in analyzing transactional data for restaurants.

Combining all the above:
Data Literacy Course -- IAP
Coursera - Introduction to Data Science
Coursera - Data Science Specialization

For the business person:

Machine Learning for Business Professionals | Coursera

AI For Everyone | Coursera

Books:
Elements of Statistical Learning
Python Machine Learning

Apply the knowledge:
Harvard Data Science Course Homework
Kaggle: The Home of Data Science
Analyzing Big Data with Twitter
Analyzing Twitter Data with Apache Hadoop

FORMAL WAY
For a more formal way of becoming a data scientist one can look into this post (reproduced below)-
How to acquire the "Essential Skill Set"?- the Formal way.
The Essential Skill Set are the basic fundamental skills which every data scientist is expected to know. Traditionally, these can be acquired by undertaking a computer science degree or a statistics degree from an institution. The Stanford
Computer Science courses & Statistics courses provide a good reference list of courses to undertake. Now some of the courses are relevant while many others are not. For example in Computer Science while one would do good to learn about large scale distributed databases & algorithms but there is no need for learning HCI and UX, or pureplay storage and operating systems, networking, etc. Similarly some statistics courses focus too much on, lets say, "old school statistics" including thousands of ways of hypothesis testing instead of more on machine learning (clustering, regression, classification, etc). So both the streams have many nice to have courses and must have courses for a data scientist (I dare to claim that at present the percentage of must have courses seems to be greater in a traditional Statistics stream than a Computer Science stream). As such one needs to pick the courses wisely.

Or alternatively, one can also look into a number of new Data Science courses that some universities are offering harping on the points I mentioned above. They combine the must have courses from both the traditional statistics and computer science program to impart the 4 Essential Skills as well as include courses to develop the Differentiator Skills in students. The MS in Data Science at NYU & MS in Analytics at USF are good examples of such amalgamation of the requisite courses. A complete list of such courses is presented here- Colleges with Data Science Degrees.

The correct program obviously depends on the individual's goal. One of the recent O'Rielly publications titled 'Analyzing the Analyzers' does a very good job in aggregating the various data scientist roles into 4 main categories as per their skills. An individual may therefore select a program as per the category of data scientist he most identifies himself with, as shown below.

  • Data Businesspeople are the product and profit-focused data scientists. They're leaders, managers, and entrepreneurs, but with a technical bent. A common educational path is an engineering degree paired with an MBA or the new Data Science programs as mentioned above.
  • Data Creatives are eclectic jacks-of-all-trades, able to work with a broad range of data and tools. They may think of themselves as artists or hackers, and excel at visualization and open source technologies. They are expected to have a engineering degree (mostly in statistics or economics) but not much in business skills.
  • Data Developers are focused on writing software to do analytic, statistical, and machine learning tasks, often in production environments. They often have computer science degrees, and often work with so-called "big data".
  • Data Researchers apply their scientific training, and the tools and techniques they learned in academia, to organizational data. They may have a MS or PhDs in statistics, economic, physics, etc., and their creative applications of mathematical tools yields valuable insights and products.

The skills associated with the 4 main categories, which justify the above mentioned program recommendation, are as below:

View 100+ other answers to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025