If you have any questions, comments, or issues with this project please post them here!
Hi everyone,
I am trying to follow the example, but seems web scraping Yelp is prevented(?).
Not a big problem for day 96, but functional example would be nice
if I simply add the example code as below, then I get the below screenshot result. And this has happened with every Yelp link Iāve tried so far.
If I try link from another site, I do get proper results.
After some googling Iām understanding that CloudFront might be the reason blocking web crawler access.
Any possibility to get around this?
import requests
from bs4 import BeautifulSoup
url = "https://www.yelp.co.uk/search?find_desc=Restaurants&find_loc=San+Francisco%2C+CA%2C+United+States"
response = requests.get(url)
html = response.text
print(html)
If this is part of the course directly, I would be very, very confused as it is against replitās own ToS to scrape websites that lack APIās
Thanks @Monco-Carser Iāll pass this on to @DavidAtReplit to take a look!
Some of the websites now use CloudFront to prevent scraping (I used to have a similar scraper which aggregated info from fortnitetracker for example and the same thing happened).
Looks like theyāve cracked down on this since the video! Iād suggest trying a different site for reviews if you can, the process is the same
Yes that is what I meant, sorry for confusion
I found this thread because I am on the same course and curiously, I didnt have any cloudflare related blocking issues, but I couldnāt figure out where to locate the silly <a>
tags after inspecting the element - I can inspect the ācard resultā element alright, but I only see <divs>
!
the simplest answer is that Iām probably missing something, however I just went ahead and blind pasted the code from the tutorial and with the parser instructions and I successfully got the expected output soooo⦠win?
just use the .com instead of .co.uk and it runsā¦