A friend of mine recently asked me to share some of my experiences in making the transition from a biophysics Ph.D. student to data scientist. I realized there are probably a lot of people interested in making a similar transition who could benefit from my experience.
A year and a half before I finished my Ph.D. I starting wondering what my plan for keeping a roof over my head after graduation was. I had two main goals:
- Stay in the Bay Area.
- Never touch another pipette again as long as I live.
It was at this time that I decided data science was where I would focus my future efforts. While many companies are hiring quantitative science Ph.D.s I realized that if I wanted one of the best, most interesting data science jobs I was going to have to put in a lot of time learning and practicing. Below are the things that I found improved my skills the most.
Establish a data science portfolio
Have a personal website (e.g. frankcleary.com). I learned a lot about the internet and using remote linux machines by building my personal website. It also helped give me motivation to work on other projects because I had a place to share them. My site now gets 10-20 users per day from searches, mostly landing on pages where I've posted tutorials.
Do data related projects in your spare time (and then post them on your website!). I've worked on a variety of fun side projects ranging from interactive D3.js graphs to tutorials on matrix decomposition that made be a better programmer and a better data scientist. Having these projects on your website also gives you a portfolio that will help you stand out by showing commitment to learning and (hopefully) skill.
Study books and videos to learn more about computer science and data science
Check out my recommended books page for the best books I've read.
Many great talks from conferences and meetups are available for free online. Besides searching directly on youtube, other sites like pyvideo.org have great video libraries. Take a look at my recommened videos for some of my favorites.
Take a free online class. I took Introduction to Databases with Prof. Jennifer Widom and Machine Learning with Prof. Andrew Ng. Both were engaging and cover essential knowledge for a data scientist that comes in handy when doing data science and when interviewing for data scientist jobs. Introduction to Databases was more polished and more rigorous.
Do an internship
- A summer internship is a great way to get first hand experience doing data science in industry. I did one the summer before I graduated and it was a very valuable. Look into career fairs on campus (how I found mine), job posting websites and company websites directly for opportunities.
Learn to use Git for version control
Git is great when writing software solo, and essential when working on team (which you will be in a job). The more you learn about Git the more efficient your code development process will be.
Create an account on GitHub and post your portfolio projects to it. Link to the account on your resume to make sure your projects are visible.