In the context of this lesson, is there a set process of steps when performing data science?
Yes, this process is known as the Data Science Process. You will learn more about this process in a later lesson, and a summary of the steps in the process are as follows.
The first step is asking a question, where the aim is to discover what the scientific goal is, and what you are trying to predict or estimate.
Next is determining the necessary data, where we figure out what sample size is needed, and what data we actually need to prove or refute the initial hypothesis.
After this is getting the data, during which we discover things such as how the data was sampled, which of the data is relevant and whether there are privacy issues.
The next step is cleaning and organizing the data, where we determine if the data is readable, and also get rid of unnecessary values such as outliers.
Next, we explore the data, during which we plot the data, and check things such as where there are anomalies or patterns that might be important.
Next is modeling the data which includes building, fitting and then validating the accuracy of the model.
And last, but not least, is communicating your findings, where we share what we learned, determine if the results made sense and if we can tell a story from the data.