Do you know any Sites for Data Sets 💻

Aside from Kaggle… Are there any other sites for obtaining Datasets.

If someone know, then please tell us.

Hi Vishal,
Yep, there are tons of sites!
Many cities (worldwide I would assume) have open data portals where you can grab city (or state data in the case of the US) data in a .csv or a json file.
Ex: NYC Open Data

The U.S. Census also has a TON of data.

I’ve downloaded data from both. There’s lots of clean up involved with Census data, but I just do that in Excel or with something like Open Refine.

Anything you’re interested in–gov, art, sports, etc. probably has online repositories for their data. Because I’m a baseball nerd, I’ve also gotten data from Baseball Reference dot com. (They’re an example of a site that forbids scraping. They will ban your IP if you do scrape. All their data is available for copying/pasting as a csv file.)

Oh, and I recently realized that the NY Met Museum allows ppl to scrape data from their site. I want to do that soon.

If you know how to build a web scraper, you can grab data from websites that way too. BUT, read the site’s guidelines/rules about scraping data from their sites. Many don’t allow it and others have very stringent rules. Here are some guidelines.

Maybe these will be of some interest too?


Reddit, discord, and irc are surprisingly deep in terms of resources. It’s good to keep an eye out and see what they have to offer.

And why not making your own dataset!

Have you heard of this?

You can download the Science Journal app (both Android and iOS) and collect data from your microphone (decibel levels) or from the gyros and accelerometers and then export those data points for analysis!!

Example project: How about collecting noise levels at different places like your living room, a subway station, a restaurant, etc and trying to use K-Means to try and identify each location!


