Yes, you can become a self-taught data scientist. I'm assuming that you’re in a position of having a full-time job and want to self-teach yourself to become a data scientist. There are many other great responses here that've given you a ton of material to go through. This is great but the problem with this approach, is that when do you stop? There are years upon years of content and theoretically, you can learn data science forever. Here’s what it takes to become a self-taught data scientist:

Mindsets You Must Internalize:

  • Always Be Learning: The reality of this field, is that there are always new packages, libraries, algorithms created. This means, you must always be willing to learn new tools, new methodologies. Many things you do today, may be outdated in a few years.
  • Figuring things out on your own: Many times, you will encounter bugs or problems where you have nobody there to answer your questions. You must get good at figuring out things on your own. This means, reading stack overflow, blog posts, videos to teach yourself new concepts.
  • Handle Frustration: You must be able to withstand frustration when you are doing a lot of work and there is seemingly no progress. You must be comfortable with running lots of failed experiments. You must be comfortable spending hours debugging code.

Once you understand those mindsets, here's the progression I'd recommend:

1. Pick an interesting data problem that you're excited about: The purpose of data science is to solve problems. Learning data science is a difficult, difficult process . The way to maintain enough motivation to push through these obstacles, is to be working on a problem you’re genuinely interested in. Perhaps, it's composing music using deep learning, predicting the price of bitcoin, visualizing basketball shot charts etc. Start with an interesting problem and find interesting projects that people have done.

2. Find someone’s github who has built a project you’re excited about: Finding someone else’s open-sourced code will give you direct feedback on where you are in terms of skill level. By finding another project, this also gives you a solid “goal” to aim for with your project. Don’t worry about understanding the code, we just need a goal.

3. Break down the project into bite-sized chunks and then find resources that fill these chunks of knowledge: I like using the CRISP-DM methodology in building data science project. The principle here, is that we learn just enough to be able to move forward in the CRISP-DM methodology. Pick one resource and use the rest as supplemental resources. Don’t drown in information. Pick the resources that resonate best with your learning style.

a. Programming: You'll need to build your project in some language, so you'll need programming. Either R or Python will do:

- Zed Shaw's Learn Python the Hard Way

- Google's Python Course: Google's Python Class | Python Education | Google Developers

b. Data Acquisition: To get your data you could find them using the ready-made sites or scrape your data:

- Kaggle

- data.world

- 100+ Interesting Data Sets for Statistics - rs.io

or....

- Building your own web scraper: https://www.dataquest.io/course/apis-and- scraping

c. SQL: Building your own projects won’t require you to necessarily need SQL. However, SQL is EXTREMELY IMPORTANT if you want to work as a data scientist at any company. I guarantee you will also be tested on this in interviews. Good resources:

d. Data Cleaning/Transformation: So you know how to code and you have data. How do you actually start manipulating the dataset? If you chose Python, you'll need to learn Pandas or Numpy. If you're using R, these libraries are built-into the language:

- Numpy & Pandas: 10 Minutes to pandas

e. Data Visualization: Viz + cleaning/transformation iteratively go together. This means, that you transform to get a certain visualization and then transform again to get another visualization. Great viz resources:

- R: Hadley Wickham's R for Data Science: http://r4ds.had.co.nz/data- visualisation.html#the-layered-grammar-of-graphics

- Matplotlib: Data Visualization With Matplotlib Course

f. Statistics: Once you create histograms, boxplots etc, it'll be important to be able ot understand these diagrams. To do this, you'll need statistics. Khan Academy is great for these concepts.

g. Linear Regression/Logistic Regression:

- Read the Linear & Logistic Regression sections of ISLR: http://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf

h. Machine Learning: Use arxiv to find research papers on a variety of algorithms. Because you found someone who’s built your project, you already know which algorithms they used.

Grab a pen & notepad and really dig into the research papers. It’s likely that you won’t understand anything the first time you read a research paper. Don’t give up. I probably re-read a paper 10+ times to make sure I understand how the algorithm works.

4. Make your project public on github, a blog and write a good README: You want this project to be part of your portfolio, essentially proof that you can build data science projects. Write a good README explaining your thought process on why you chose certain algorithms. Articulating this also prepares you well for interviews, as companies will ask you about this.

5. Repeat: Do this multiple times so you build out your portfolio.

6. Networking/Job-Hunting: Do this concurrently with Step 5. This means, going out to meetup events, using LinkedIn to connect, asking for intros. This step is just as important for becoming a data scientist, however, this would require another post.

Data Science interviews are actually a separate beast to tackle, with whiteboarding, coding challenges, take-homes. This also, will require another post.

All in all, I’d say, becoming a self-taught data scientist, will require at least 500–700 hours of learning upfront. Whether you want to do this in 3 months, a year, two years, depends on your situation. After you finish these 500 hours, you should know just enough to get an entry-level data scientist position. Once you’ve got a solid portfolio setup & your skillset honed, you should split your time 50/50 studying for interviews + job applications.

If you love my content, visit my website at www.jefflichronicles.com.

View 68 other answers to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025