Where to find data? IPUMS CPS

A reoccurring question, “where can I find data for off-platform projects??”

Here, let me google that for you. :wink:

An excellent resource for data people! (Thanks to the University of Minnesota & the Minnesota Population Center).
I think this is a really excellent repository of data. You can create a data extract or use their online tools for analysis. If you get stuck, they have training materials and video tutorials as well.
Check it out:


Thanks for sharing! I think there are more sources where to find data for off-platform projects.
I did a bit of research and I found these sources:

  1. Publicly available datasets: Websites such as Kaggle, UCI Machine Learning Repository, and government websites such as data.gov and data.world offer a vast array of publicly available datasets that you can use for your off-platform projects.
  2. APIs: Many companies offer APIs that provide access to their data, such as Twitter’s API, which provides access to tweets, and Google’s Maps API, which provides access to maps data.
  3. Web scraping: You can use tools like BeautifulSoup or Scrapy to scrape data from websites and use it for your off-platform projects and perform crm data enrichment.
  4. Surveys and questionnaires: You can collect data by conducting surveys or questionnaires, either online or in person.
  5. Commercial databases: There are several commercial databases that you can purchase data from, such as Experian, D&B, and InfoUSA.
  6. Open data initiatives: Many cities and governments have open data initiatives, where they make data available to the public. You can search for open data initiatives in your area to find relevant data for your off-platform projects.
  7. Social media: You can collect data from social media platforms such as Facebook, Twitter, and Instagram to use for your off-platform projects.

Yep! There are a ton out there. I just wanted to highlight ipums. It’s a topic that we’ve discussed a lot here on the forums. Back when we had a NYC CC chapter I did a presentation on this very subject (along with an intro to Colab). Your reply sounds exactly like that and the Medium article I wrote. :sweat_smile:

It all starts with whatever one is interested in—population demographics, sports data, voting data, etc. I think that we learn best when we pick something that really interests us.

As far as Twitter goes—I thought they were going to make people pay for access to their API? I use IFTTT and it might affect how I use some of their applets and their API going forward. I’m still waiting for a more definitive answer from IFTTT on that.

Also, if one is going to build a web scraper with the intention of getting data from a site—always read the site’s documentation about that and to see if they even allow it. Always abide by the rules of etiquette for that as well.

Happy coding!