Scraping = Can't use Repl?

Is using beautiful soup, selenium headless, and/or scraping against the TOS?

If not I can’t fathom why I keep getting the browser messaage about not being able to use this Replt.

In a bit of a fit I started hooking up brew (osx), vscode & it’s various extensions and tools on my local machine & Replit is far easier to get up and running etc etc…

If not breaking the TOS, what is happening??

Can we see your code and try to identify the problem? (by the way, this would be a better fit in Replit Help rather than General)

1 Like

Hi @SteveMallett !
What website are you trying to scrape?
If you are trying to scrape secure sites like government sites etc, I think that they could be blocked by Replit.

It’s a reddit sub, and I’m not scheduling it or hitting it as much as I would reddit itself.

Should/Can I move it to that topic?

I’ve already moved the topic to Replit Help. I also notified staff BTW, so hopefully they respond soon.

2 Likes

I used Beautiful Soup once to scrape my own website (for testing purposes). I got no such error. To my knowledge (which could be wrong), web scrapers are not inherently against the Replit ToS.

3 Likes

If there is a visible error, can you please show it? If you’re just asking about the nature of scraping, it should be allowed. Just remember to not scrape websites that Replit banned or your repl will go down.

IIRC (probably wrong) scraping is in the 100 days of Python thing? IDK, just remember seeing it somewhere on Replit.

4 Likes

Could we have your repl link?

1 Like

The Repl: https://replit.com/@SteveMallett/UFO3-reddit

I’ve just finished the 100 days of python course so:

  1. Primarily looking for a reason why the code would trigger not being to 'use this Repl… TOS… " msg in the browser.
  2. Secondary, newb python coder tips welcome.

TY

The link you have provided is a 404.

try now. private → public

1 Like

We do not do this, please do not suggest things without evidence :+1:

5 Likes

Hello :wave: I took a quick look at your Repl and it seems to run fine, but I do have a theory on why you would be running into errors :slight_smile:

We have a tool to track outgoing network requests from the machine you are using and attempting to scrape Reddit (especially if you run the Repl multiple times) would cause a lot of requests to be sent and to us it would appear as patterns similar to a DoS attack, resulting in temporary cooldowns.

My suggestion would be to use the Reddit API, but that doesn’t seem likely given the recent controversy :sweat_smile: Alternatively try loading the webpage with Javascript disabled/through some web caching service like Wayback? :slight_smile: Sorry you are running into this issue, we are actively tuning our anti-abuse tools :+1:

6 Likes

also https://old.reddit.com is more lightweight, especially with JS disabled

2 Likes