The code for this can actually be super simple depending on the ambition of this project you have in mind. If that path makes the most sense, I don’t have time to do a freelance contract FOR you …at least not FOR free.
Doing this for fun or a learning exercise?
I think this is a great choice for a fun project! …and there are lots of google-able tutorials to building a web scraper. The toughest thing behind the technical design will be deciding how specific you want to get about what web technologies are being used for a site, and what defining criteria you’ll be able to deduce that. However, since this is for fun & learning, I wouldn’t get too caught up designing a precise end-product or design.
Doing this for commercial work?
We need to get way more specific about your deliverables (a.k.a. what precisely do they want this to help them do at the end of the day …and what do they assume that will look like?). Since you haven’t provided much of that information, I’m willing to wager that you are getting way too deep into this problem for your expertise and there’s likely a better way.
Disclaimer: I want way more people to learn how to code, but if you’re under a deadline already …you can’t afford to make a productive investment in building your knowledge for a project this wide in scope
However, there is hope! Again, I can only run with how little you’ve shared. You want a more specific answer? Try asking a more specific question.
I’m willing to wager that you’re working on more of a data science project that can just as easily be solved without “reinventing the wheel” yourself. If you’re just trying to help your clients make more data-informed decisions, then I wouldn’t invest a ton in computing resources to crawl “the entire internet” doing your own primary research. Why not leverage the data others have already collected on this problem?
There are many teams out there that have already answered these types of questions and have data widely available for you to leverage. You might be better off preparing a dataset (a.k.a. spreadsheet) with this data that you’ve manually acquired yourself. That way, you can answer the questions like “How many sites in the world are using Shopify? How many sites in the Top 1000 most visited sites are using Shopify?” and on and on.
The thing people who don’t code don’t understand, is that most technologies “stand on the shoulders of giants.” Every hand-coded solution is likely leveraging many libraries & frameworks that people have already done the heavy-lifting for. Here’s an extreme example: I didn’t have to build a crawler to get a “good enough” answer that one. It took one Google search:
How many sites in the world are using Shopify? 1,661,942