Practice learned concepts for Data Scientist Path

Hello, everyone!

I wanted to know if the community has any ideas as to where one could get some practice on the newly learned concepts from the Data Scientist path.

E.g. I’m learning how to manipulate data and clean data with Pandas. After working through the modules and projects and practice packs on codecademy, where could I start implementing these newly learned concepts from?

I think connecting the concepts to real world applications would be an amazing jump-start for someone new. I started working with some datasets which I obtained from Kaggle but I wasn’t exactly sure if what I was doing was correct or not and what outcome was I looking out for in all that manipulation,

You’re on the right track! :slight_smile:
Download datasets, load them into Jupyter (or Colab), clean them up, do some EDA, maybe some hypothesis testing, build a model, etc. All of that is putting your skills to good use. :slight_smile: If you get stuck, consult your notes or search online for answers. All of this is real-world stuff.

There are many, many places to get data sets from. You could also build a web scraper and get data that way, or, by using an API. (Just read each site’s rules about using their data first).

Happy coding!

1 Like

Thank you for the answer, @lisalisaj. Although, if there are pre made models/examples of data cleaning that are available and if you know about them, it’d be a great help if you could direct me towards them. Then, maybe I could get a better idea about how I should go working around with the datasets that I’ve downloaded.

Do you know Pandas? You can clean a lot of data using Pandas.
I mean, the way I go about it after loading the csv into my notebook is the usual stuff just to get a feel for the data
df.info()
df.head()
df.shape()
df.columns()
df.isnull.sum or, df.isnull.sum.sum() to see if there are any nulls in cols & the total # of nulls in the df.

df.[col name].describe() (of a numerical col–gives you min, max, mean, std, etc)

I’m not sure how long ago you finished the DS Path, but, they did add a bunch of new materials. There’s an article in the “Data Wrangling” part of the DS path that’s helpful:

https://www.codecademy.com/paths/data-science/tracks/dscp-data-wrangling-and-tidying/modules/dscp-fundamentals-of-data-wrangling-and-tidying/articles/intro-data-wrangling-and-tidying

Oh, sorry, I miscommunicated a bit, it seems. What I meant is that I’ve completed the pandas portion on the newly updated path till Data Wrangling and Data Tidying and I’m not so confident while applying these new skills on real world datasets that I downloaded from Kaggle. I want to practice these concepts so I could get a better grip upon them but since I’m working on my own on these with no insights from experienced people to review it, I’m confused as to what I’m currently doing is helping me achieve a question that I wanted to get answered through the dataset.

But…this is where you apply what you’ve learned & gain confidence. There is no right or wrong way to answer here. You are the one who reviews your work. :slight_smile: (unless you have a meet up that you go to and others can chime in on your analysis). Are you part of a CC chapter where you can share what you’ve been working on? Maybe gather insights from others will help with your analysis.(?)

You might get a crappy data set and realize that you can’t do much with it/gain any insights even if you’ve cleaned it up. This happened to me when I got some data from NYC’s Open Data (public data sets) portal. I downloaded the results of some survey data for participatory budgeting and the data was in an awful state b/c the way they wrote the survey questions was terrible (ex: too many age categories and the ages overlapped). So, there wasn’t much I could do with it (statistical testing, vizzes) except just view the data.

1 Like

Honestly, sometimes you just gotta jump. Trust yourself and what you know.

Maybe, this is the train of thought I was looking for to just dive in. Thank you for the advice @lisalisaj! I am part of CC chapters and, I’m also a chapter leader. We’ve met before during the debugging 101 event of NYC chapter. This is Dharma. :laughing:

1 Like

Ah ha! :smile: hi! :wave:t2:

I think we may have a project discussion/roundtable/presentation event planned for the future. Stay tuned! :slight_smile:

1 Like

Awesome! looking forward to see you organizing these sessions. See you later today!