Day 096 - Project 96 : Let's get scraping

If you have any questions, comments, or issues with this project please post them here!

Hi everyone,
I am trying to follow the example, but seems web scraping Yelp is prevented(?).
Not a big problem for day 96, but functional example would be nice :see_no_evil:

if I simply add the example code as below, then I get the below screenshot result. And this has happened with every Yelp link I’ve tried so far.
If I try link from another site, I do get proper results.

After some googling I’m understanding that CloudFront might be the reason blocking web crawler access.
Any possibility to get around this?


import requests
from bs4 import BeautifulSoup

url = "https://www.yelp.co.uk/search?find_desc=Restaurants&find_loc=San+Francisco%2C+CA%2C+United+States"

response = requests.get(url)
html = response.text

print(html)

If this is part of the course directly, I would be very, very confused as it is against replit’s own ToS to scrape websites that lack API’s

Thanks @Monco-Carser I’ll pass this on to @DavidAtReplit to take a look!

Some of the websites now use CloudFront to prevent scraping (I used to have a similar scraper which aggregated info from fortnitetracker for example and the same thing happened).

1 Like

Hi @bigminiboss do you mean this?

As this applies to replit.

1 Like

Looks like they’ve cracked down on this since the video! I’d suggest trying a different site for reviews if you can, the process is the same

2 Likes

Yes that is what I meant, sorry for confusion

1 Like

No worries @bigminiboss ! I had to check it myself just in case I’d misunderstood!

I found this thread because I am on the same course and curiously, I didnt have any cloudflare related blocking issues, but I couldn’t figure out where to locate the silly <a> tags after inspecting the element - I can inspect the ā€˜card result’ element alright, but I only see <divs>!

the simplest answer is that I’m probably missing something, however I just went ahead and blind pasted the code from the tutorial and with the parser instructions and I successfully got the expected output soooo… win?

just use the .com instead of .co.uk and it runs…

2 Likes