Data Science Portfolio Projects use real data?

Do the Portfolio Projects in the Data Science Path use real data? Or are they made up data?
If it’s real data is it independently random sampled or is it a biased subset?

In the context of creating a portfolio to present,
I ask these questions because I want to know if I will declare these portfolio projects as “homework/coursework assignments given by Codecademy” or if I can give these projects a treatment where I indeed try to make real-world conclusions and inference?

I think at least one or two projects (not necessarily portfolio based) are based off datasets from kaggle or something similar. It is true that without the background it is quite difficult to make any serious inferences. Perhaps that’s intentional; it’s hard to bring your own bias to a dataset if you don’t know the background :slightly_smiling_face:.

Is there a specific portfolio project you’re asking about or just in general?

In most cases it is real data and the source is specified in the lessons. Some comes from Kaggle and I think some from the UC Santa Cruz (?) ML library.

Also, maybe the distinction doesn’t need to be made about the projects in your portfolio b/c work is work whether it’s paid or not.

1 Like

Regarding our portfolio projects hosted on github, should we also post or link the raw data (csv, json, etc) codecademy gave us? Are we allowed?

especially if the data could be real data?