Can't get the chromedriver to work

itsjustmyemail · April 21, 2023, 3:47pm

I’ve narrowed down jobs based on what I do in Google jobs. I’m trying to use selenium to click each job and then when the new part of the page loads on the right side, to copy all href links showing where to apply (eg. If the job is “Manager” and you can apply on LinkedIn, Indeed, and Monster, it will copy the links for each of the possible application locations). The issue is that I cannot get chromedriver working. I’m also a total noob so that’s probably another issue…

Blockquote
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time

options = webdriver.ChromeOptions()
options.add_argument(‘–disable-extensions’)
options.add_argument(‘–headless’)
options.add_argument(‘–disable-gpu’)
options.add_argument(‘–no-sandbox’)
options.add_argument(‘–disable-dev-shm-usage’)

specify the path to your Chrome webdriver

driver_path = ‘.files/chromedriver’
service = Service(executable_path=driver_path)
driver = webdriver.Chrome(service=service)

load the webpage

driver.get(‘("Manager of continuous improvement" OR "director of continuous improvement" OR "continuous improvement consultant" OR "business process consultant" OR "improvement consultant" OR "change management consultant" OR "process improvement director" OR "process improvement manager" OR "change management manager" OR "change management director" OR "Powerapps Professional" OR "Business Analyst" OR "Manager of Implementation" OR "Director of Implementation")’)

wait for the page to load

time.sleep(5)

get all the job list items in the unordered list

job_list_items = driver.find_elements_by_xpath(‘//*[@id=“VoQFxe”]/div/div/ul/li’)

loop through each list item and click on it

for item in job_list_items:
item.click()
time.sleep(2) # wait for the new page to load

# get all the application links from the middle left of the page
app_links = driver.find_elements_by_xpath('//*[@id="_nJxCZPDUA-aC0PEPyLuCyA8_3"]/div[1]/div/a')

# print the links to the console
for link in app_links:
    print(link.get_attribute('href'))

close the web browser

driver.quit()

Blockquote

itsjustmyemail · April 21, 2023, 3:49pm

Sorry, not sure if that’s formatted correctly… Not sure how to put all the code in a code block.

Firepup650 · April 21, 2023, 4:01pm

Put a ``` before and after your code. That will format it for you.

itsjustmyemail · April 21, 2023, 4:24pm

Can I not edit it after posting? Trying to but I don’t see an edit option.

Firepup650 · April 21, 2023, 4:25pm

You can’t edit at TL0, try reading around a bit to increase your TL.

bigminiboss · April 21, 2023, 4:26pm

this is a webdriver, which is against replit ToS if you use it for school purposes

Firepup650 · April 21, 2023, 4:27pm

Doesn’t seem to be school related in this case, but I know where you’re coming from BMB.

itsjustmyemail · April 21, 2023, 4:29pm

I’m using it just for my own purposes to look for jobs based on specific requirements. Hoping to use selenium to run through each job application type based on the website and auto apply for me. That’s the end goal.

bigminiboss · April 21, 2023, 4:29pm

yeah but for some reason recently iirc the output terminals for like VNC related stuff hasn’t been working well so we’d just try pinging @ShaneAtReplit

bigminiboss · April 21, 2023, 4:29pm

job application? What platform lol

itsjustmyemail · April 21, 2023, 4:30pm

Trying to focus on LinkedIn, Indeed, and Monster at the start.
There’s an app called “LazyApply” that someone made for Chrome, but I wanted to make my own version.

bigminiboss · April 21, 2023, 4:32pm

ah I see… I haven’t fiddled much with those apps but I’m sure they have an API you can try deconstructing with network logs…

itsjustmyemail · April 21, 2023, 4:37pm

Google Jobs is what I’m using to aggregate all of the data. The data is all there. It’s just that the hrefs load dynamically when each job is clicked on the left. Because of that, I can’t use a library like beautifulsoup to parse the html at the start. Unless I’m mistaken, I need to use selenium or a similar library to select each job from the unordered list, then target the element that contains the hrefs, then pull the html, then parse it with beautiful soup. Loop that to get all the jobs. Am I wrong? Also, is there another online IDE that I can use Selenium in? Or can you recommend an alternative to selenium that could accomplish this task?

bigminiboss · April 21, 2023, 4:38pm

I’m not a webdriver expert nor do I fiddle a lot in VMs (for that I’m pinging @9pfs1 ).

9pfs1 · April 21, 2023, 4:51pm

Web browsers don’t usually work well in VMs, but I think someone got Selenium working in a repl (maybe @LuisAFK , can’t remember who it was)

GrimSteel · April 21, 2023, 7:11pm

I got selenium working by installing a bunch of nix packages:
https://replit.com/@GrimSteel/seleniumtest

itsjustmyemail · April 21, 2023, 8:33pm

I actually got it working by forking another users code. Now the issue I’m encountering is that my code finds the jobs, urls, etc. fine, but it isn’t getting the urls for the first job in the ul. Can someone have a look and tell me why?

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(options=chrome_options)

# Pointing to Google Search and navigating there
url = 'https://www.google.com/search?q=jobs+"director"+OR+"consultant"+OR+"analyst"+AND+"improvement"+OR+"change"+OR+"innovation"+OR+"power+platform"+OR+"implementation"+AND+Calgary'
driver.get(url) 

# Find and click the "Jobs" Area
location_button = driver.find_element(By.XPATH, '//*[@id="fMGJ3e"]/a/g-tray-header/div[1]')
location_button.click()

# Find the unordered list of jobs on the left side of the screen
unordered_list = driver.find_element(By.XPATH, '//*[@id="immersive_desktop_root"]/div/div[3]/div[1]/div[1]/div[3]/ul')

# Find all the list items within the unordered list
list_items = unordered_list.find_elements(By.TAG_NAME, 'li')

# Loop over each list item
for item in list_items:
    # Get the HTML source of the list item
    html_source = item.get_attribute("innerHTML")
    
    # Use Beautiful Soup to parse the HTML and extract the href attributes
    soup = BeautifulSoup(html_source, "html.parser")
    links = soup.find_all("a", href=True)

    # Loop over the a tags and print their href attributes
    for link in links:
        href = link.get("href")
        if href and (href.startswith("http") or href.startswith("https")):
            print(href)

    # Move to the next list item
    item.click()

LuisAFK · April 21, 2023, 10:12pm

Yeah I had a template.

itsjustmyemail · April 25, 2023, 4:13pm

I think it was your template I used. It worked and the project is much further along. I’ve posted a new issue I’ve encountered if you’re interested in helping out