Machine Learning Process

Hi Codecademy team,

I have some questions regarding the general process of implementing Machine Learning. Just to recap, here are the steps of implementing ML:

  1. Formulate a question
  2. Find and Understanding the data
  3. Cleaning the data and feature engineering
  4. Choosing a model
  5. Tuning and Evaluating
  6. Using the model and presenting the results

The questions that I have regarding these non-linear implementation steps:

  1. From step no. 5 to step no. 6, once we achieved the metrics of success in our Machine Learning Model and we have successfully presenting the results to the organisation, what do we do next ? Do we stop training our model? Or do we keep training our model using new training data?
    Let’s say if we use Linear Regression Model and our coefficient of determination (R-squared) is very 0.9 (assuming this is the highest R-Squared we can achieve through the features we use in our training data), then what’s next ?

  2. In terms of the Machine Learning Platform, is Jupyter Notebook the most common place to implement our Machine Learning Model? or is there any other platform that most organisations use these days?

It would be great to have sheds of light on these matters.

Thank you in advance,
Jimmy

Good questions!
I’m not sure that I can answer that on my own, as I’m just at the beginning stages of ML. But, perhaps this article will help?

https://towardsdatascience.com/the-7-steps-of-machine-learning-2877d7e5548e

As far as platforms, this could be useful:
https://www.eteam.io/blog/best-platforms-for-data-science-and-machine-learning

1 Like

Hey Lisa,

Thanks so much for the links you have sent. Just finished reading them. The first link doesn’t exactly give me the answer I am looking for. It’s pretty much re-iterate my understanding of the ML process. It is still missing what happens after we make and present the predictions to our organisation. But I am assuming that most data scientists will leave their model as is after it has achieved their desired metrics.

In regards to the ML platforms, I am truly concern here as I have never used any of the platforms listed in that website. Not sure whether companies will make a big deal about this when it comes to hiring a junior data scientist or analyst such as myself.

By the way, Lisa, do you have a data science portfolio? If yes, can I please have a look? I need to start planning my own and am not sure how and where to start. I know some people mentioned to use GitHub platform to present your data science projects, but good lord that platform is so full of clutter.

Sincerely,
Jimmy

I am going to guess that yes, the models are left. Depending on the company and what their work environment is, if it’s Agile, then they move on to the next project and figure out how to address business goals, etc.

I think it’s a good idea to follow data people on blogs like Medium (Towards Data Science is a good one to follow) or dev.to for example. As for developing a portfolio, you know that data is everywhere (data.gov, Census, Kaggle, etc*). My suggestion is to find something that interests you and go from there. Is there a data question that has been bugging you? Seek out data/resources, download it, clean it, test, analyze, visualize etc. :slight_smile:

And, yea, Github can be confusing and the documentation is dense. But, it’s good to go through. Did you upload any of your Codecademy projects to Github (create your repository)? That’s also a good thing to do.

*Also, most cities have open data portals and that’s a great resource to grab data (in csv or json formats).

1 Like

I always worry about this for myself too. BUT, you cannot know how to use every platform and every technology. Just be aware of them and have some understanding of what the functionality is. I know that some job posts seem to be looking for a unicorn candidate (that obv doesn’t exist!), but, if you see something that you think you can do then apply for it. Follow the company and make sure their goals match yours too.

1 Like

Hey Lisa,

Thank you so much for your advise. I am at 97% of the data science path and therefore, this information has been very helpful. Thank you again. I would love to stay in touch with you for more advises and feedbacks on my data science portfolio. What’s the best way to connect with you without interrupting your work-life balance?

P.S: I really do wish that I can also get in touch with the data science instructor from Codecademy after completing the DS path :frowning: do you know how?

Jimmy

You’re welcome!
There are opportunities in the DS path where you can post your projects here for peer project feedback and also push them to Github at the same time. It gets easier after you first post to Github. I know the documentation can be confusing!

I do not know how to contact the DS course creator at CC. SU’s are only volunteers here.

1 Like