I am trying to follow the example, but seems web scraping Yelp is prevented(?).
Not a big problem for day 96, but functional example would be nice
if I simply add the example code as below, then I get the below screenshot result. And this has happened with every Yelp link I’ve tried so far.
If I try link from another site, I do get proper results.
After some googling I’m understanding that CloudFront might be the reason blocking web crawler access.
Any possibility to get around this?
import requests from bs4 import BeautifulSoup url = "https://www.yelp.co.uk/search?find_desc=Restaurants&find_loc=San+Francisco%2C+CA%2C+United+States" response = requests.get(url) html = response.text print(html)
If this is part of the course directly, I would be very, very confused as it is against replit’s own ToS to scrape websites that lack API’s
Some of the websites now use CloudFront to prevent scraping (I used to have a similar scraper which aggregated info from fortnitetracker for example and the same thing happened).
Hi @bigminiboss do you mean this?
As this applies to replit.
Looks like they’ve cracked down on this since the video! I’d suggest trying a different site for reviews if you can, the process is the same
Yes that is what I meant, sorry for confusion
No worries @bigminiboss ! I had to check it myself just in case I’d misunderstood!
I found this thread because I am on the same course and curiously, I didnt have any cloudflare related blocking issues, but I couldn’t figure out where to locate the silly
<a> tags after inspecting the element - I can inspect the ‘card result’ element alright, but I only see
the simplest answer is that I’m probably missing something, however I just went ahead and blind pasted the code from the tutorial and with the parser instructions and I successfully got the expected output soooo… win?
just use the .com instead of .co.uk and it runs…