Is there a set process when performing data science?


In the context of this lesson, is there a set process of steps when performing data science?


Yes, this process is known as the Data Science Process. You will learn more about this process in a later lesson, and a summary of the steps in the process are as follows.

The first step is asking a question, where the aim is to discover what the scientific goal is, and what you are trying to predict or estimate.

Next is determining the necessary data, where we figure out what sample size is needed, and what data we actually need to prove or refute the initial hypothesis.

After this is getting the data, during which we discover things such as how the data was sampled, which of the data is relevant and whether there are privacy issues.

The next step is cleaning and organizing the data, where we determine if the data is readable, and also get rid of unnecessary values such as outliers.

Next, we explore the data, during which we plot the data, and check things such as where there are anomalies or patterns that might be important.

Next is modeling the data which includes building, fitting and then validating the accuracy of the model.

And last, but not least, is communicating your findings, where we share what we learned, determine if the results made sense and if we can tell a story from the data.


Yes, there is. Because before you make a deep analysis, you have to extract the data (e.g. from Kaggle), then you need to check if there are null-data and decide how to handle it. Also, you should remove the useless information in your dataset. And once you have cleaned your data, you can just start to explore it and find your insights out. In this step, you are able to make an analysis and prepare a presentation where you show all the things that you have discovered.

I know the Data Science process is complex, but I try to summarize it with a simple comment.