Hello and welcome to another episode of AI Buzz thank you so much for tuning in today.
This is going to be a series that will step through things that you can be doing right now to start a data career: whether that is data science, data analytics, data engineering. There are a lot of different terms for a career in data and very often there is not a clear boundary between them – all of those positions I just mentioned should have a working knowledge of the others since they are very intertwined (at least they should be).
So, this guide is going to cover the first steps you can take for making that move into data and I would say the first thing you should start with is learning the end-to-end process of solving a data problem. Learn what it means to create a data product.
- 3 Portfolio Projects you should attempt
I would say that there are three main ways that you can do this. If you’re just starting out the first is a simpler one but is a great project to be able to show off. For this your goal will be to find an open-source dataset and try to derive some insights out of it.
You might need to make API calls to get the data which would get you familiar with the requests library or you can check one out from the UCI machine learning repository. Work on cleaning this dataset with the Python Pandas library. What does that mean? Find ways to ensure that the integrity of the dataset is good, how many blank values are in each column of the data frame? How will those affect the final outcome of your analysis? Does one column have an extremely high variance? What is variance? Do this analysis and let your mind ask those questions and then answer them. In order to answer some of those questions, you will likely need some visualizations so create those with Matplotlib or seaborn. What factors are correlated with other features? Create some heatmaps – I love heatmaps. Your final product will be a Jupyter notebook hopefully with lots of notes of questions and answers where you pull in a dataset, clean it, and try to add value to that dataset through your analysis – back up your value proposition with visualizations and facts.
OK, the next project idea is more advanced, but in my opinion an even better showcase of your skills, try to build a small application that is centered around data. Building a small flask application is GREAT for this. Flask is a web framework that will allow you to open up your product if you do it right. I always got fired up if I thought real people would be using my stuff and that drove me to make it better. Let me give you an example of one of my web applications that I made a few years back that ended up being a good talking point in interviews, I made a dashboard web application where users could see some of the latest news stories and then provide their input on if they liked the story or not. I ended up doing some natural language processing to then try to suggest more similar stories to their preferences. It was similar to what Google News does with their product except much worse. It was an incredible learning experience though and I had a blast making it. I learned so much more about web frameworks, databases, calling APIs, and machine learning. You’ll learn about databases, bonus points if you can host this on an EC2 and use a distributed computing process to run a job. Comment down below if you want me to show you how to do this.
The last suggestion I have for building working knowledge within the data field is to participate in Kaggle competitions, try to participate in some Kaggle competitions and get your feet wet. You don’t need to be placing well, those who do really well in those are pretty advanced data scientists, but this will help you get familiar with the process. Learn from the more experienced users in the forums and the kernels. Ask questions – ideally even join to make a team with another user who is perhaps more experienced than you. Tell them you’re trying to learn but are willing to work hard to contribute value if they can share some of what they know.
In the end, ideally you will have a solid knowledge of what it means to solve a data science problem end-to-end. You will clean the data, mine it for insights, and try to make predictions.
All three of these project ideas feed into my next point which will be helpful for starting a data career which is building a portfolio and a presence on GitHub. You must be doing this, and by doing one of these project ideas you will have some original code centered around solving a data problem that you can showcase there. Portfolios are extremely important and will show the technical interviewers that you actually know your stuff. Make sure the code you commit to git is clean and commented and that you really understand it – it’s not unheard of for interviewers to ask questions about why you coded something a certain way. A portfolio will absolutely help you get interviews and if you can speak coherently about the projects that you did – they will get you jobs in the field.
So that concludes the first part of this guide where I covered three ways that you can get data project experience that results in code that you can add to your portfolio. Those projects will ideally get you revved up to dive even deeper into the field – remember if the field is right for you, this type of challenge should leave you energized rather than exhausted.
- Books, courses, and Certifications
So, we just discussed getting excited by working on a project, now try to learn some theory behind your application. But most importantly, try to find the holes in your theoretical understanding while building an application. The way I work is I get excited about projects, right, I love building things most of all. But in order to do that I have to gain some theoretical understanding. I am dying to go learn more theory so that it powers my application. It would be hard for me to go the other way: that’s the way most college students are forced to learn including myself. Watching hours of lectures is not fun. Get students excited about building and creating things, then supplement them with theory.
Anyways, if you’re working your way through the projects, I covered in the first section, supplement your understanding with the tips in this guide. For a book, I would get this book first of all: the One-Hundred-Page machine learning book. It is short enough to read all the way through and it will be a great reference book while you prepare for interviews. I did a YouTube video on other books too if you’re interested, I’ll link that in the description for you to check out.
I would highly recommend taking a data class through a program like Coursera. The first few courses I’ll talk about are more introductory crash courses – not intended for deep study.
I would recommend AI For Everyone by Andrew Ng which is absolutely an institution at this point. Everyone has taken this class and loves it. Professor Ng is a legend and is potentially the most renowned AI teacher in the world. I absolutely loved taking this class and would highly recommend it. It is also free to enroll which is incredible.
If you want something more intense, he has a deep learning specialization class as well. I have not gotten around to taking it, but it is on my list. If this is anything like his first class, then I will love it. It is a real treat to be taught by Professor Ng – if you don’t know he used to be the director of the Stanford AI lab before being a chief scientist at Baidu and then founding Coursera. He also founded Google Brain at Google which pioneered some very impressive convolutional neural net technology. So yeah, he could be literally the most qualified person on the entire planet to teach this stuff.
The next set of classes that I’ll talk about are nanodegrees. These are a little like bootcamps and can offer you a tangible degree that companies are beginning to respect more and more. One of the things that a class like this will enable you to do is to build that portfolio that I was discussing in the previous section. These are more expensive than the classes I mentioned above but they are much more involved. Classes like this will be for people who need the structure of a class to work through a project and stay motivated. I was fortunate in that I was able to stay motivated and build a data portfolio on my own self-study without needing to take classes but I realize that is not the case for everyone. So I have not taken any of the nanodegree programs before, but I have reviewed the curriculum of a bunch of them and I really like the one put forth by Udacity – if I was going to take one, I would take this one. It’s designed to be a 3-month nanodegree and if you finish it quicker you can skip payments on the course.
Some of the topics that are covered are Jupyter, NumPy, Anaconda, Pandas, and Matplotlib. It’s also got lots of hands-on projects that will enable you to build that portfolio. It’s got a heck of a team who will help teach it including Grant Sanderson – the creator of 3Blue1Brown – one of my favorite YouTube channels. The main instructors are all computer science or computer engineering Ph.D.’s so you will learn from some very impressive instructors. One downside is that its reasonably pricey so make sure you will want to stick with it through the whole thing. At some point, I do want to complete a class like this and document the whole thing for you guys – that could be really cool.
These are all fantastic classes and both Coursera and Udacity are incredible learning platforms. By completing a class you will also need to be sure you get your certificate of completion – if you have this, you’ll definitely be able to put these courses on your resume.
- Preparing your resume
This is the final part of this guide. Everything covered in the guide so far leads me to my last point, if you are following the other steps in this post, you will be building a nice foundation in data through both completing projects and advancing your theoretical understanding. You should start putting together your resume. This will help you identify any gaps in your application that you may need to work on.
You want a self-started project – this shows initiative and follow-through if you complete is successfully. You will list this under your experience section since it is probably some of the best experience you can get. This project should also be showcased in your GitHub profile.
You want to be able to show that you know a tech stack and have this highlighted in the skills section. If you have that project completed, you will already have some of those competencies. Your tech stack at a bare minimum should include Python as well as some sort of query language I would say – such as SQL. SQL is pretty easy to pick up if you’re learning Python. The reason for this is because most times you’ll need SQL to actually pull the data you’ll work with from the server.
You will also want to show some certifications and completed courses. Like I said, check out Andrew Ng’s course on Coursera and also think about taking other courses on the platform too.
In your experience section, ideally you will have some data-related job experience, but if you don’t that’s OK. A good portfolio can definitely take the place of that. For each position you’ve had, create three bullet points. First saying at a high level what the position is, next is your highest-impact project that you worked on. Last is a metric that quantifies the impact that this particular project had. Remember you are applying to data jobs.
You also need to have a theme for your resume depending on the domain you’re trying to go for. If you’re trying to go into healthcare data, you should have some familiarity with health data. Try to tell a story with your resume and have your personal projects centered around health data. Theme is very important; it helps guide the interviewer into thinking you’re right for the position.
This is a bit of an art to do this well. I can help show you how to do this. Go check out my resume and cover template at the location here. This is modelled after the resume I used to get a job working in data at a fortune 50 company. It will have all of the sections that you need to land a similar job, you just need to make sure those slots are filled in by heeding the advice in the rest of this guide. I am also likely going to be capping the sales of those materials since I don’t want there to be tons of copies of my resume floating around – at some point I might need to apply to another job and will need it. So go check it out if you want.
Also, if you want to get in touch about help tailoring your resume or cover letter, email me at the email here: email@example.com. It’s not free since I will want to spend a significant amount of time with you to actually add value. If that’s something you’re interested in – email me.