Good morning, I have some questions base on the text that I read:
“Typically companies prefer not to be scraped because it can lead to a misuse of their data, revenue loss, slowed user experience, and additional web infrastructure costs”
- How can anyone working in Data Science know that the work that we are doing does not infringe the law? If I am working for a company and I was asked asked to scrape a website. Is there a way to know that we follow the law of all 50 States?
- How can a company compromised revenue by scraping data from the web? A lawsuit?
- How scraping data slow user experience? Does the use of many cookies slow down the user interaction?
Seems pretty legal:
Web scraping is now legal. Here’s what that means for Data… | by Tom Waterman | Medium
At least, for public available data. Or you can always consult with the developers behind the website you intent to scrape. And what you intent to do with the data. Scraping tickets (For concert for example) and then reselling them could land you into trouble. But I think that is common sense?
loss in advertisement revenue. If people use your service instead
Computers can make requests a lot faster then humans, so all those scrapers and crawlers take up an awful amount of processing power. Which means you need more servers, which cost extra
only half (!!) of the internet traffic is human.
5 Likes
Also, VIP: always, always read a website’s policies and documentation before you attempt to scrape any data from it.
2 Likes
If you plan on writing a program that scrapes many pages on the same site, you’ll want to incorporate time.sleep()
into your program as well (I’m assuming you’re using Python based on your quote above). If you make your script mimic a human user by pausing for a few seconds between each page request, you’re less likely to overload their servers or get your IP address blacklisted by the company.
4 Likes