Question:
Im trying to print a list useing pprint that separates all of the lists from eachother, but I have to use pprint.pprint so that I can format it so that the Title is above the Link. I will show the output I want, The output Im getting, and the Code I have so far. I will not be beck until Monday Im taking my dad on vacation for his bday so if someone can hep me that would be nice.
Output I want:
[{'Title': 'What is Web Scraping and How to Use It? - GeeksforGeeks',
'Link': 'https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/'},
{'Title': 'Web scraping - Wikipedia',
'Link': 'https://en.wikipedia.org/wiki/Web_scraping'},
{'Title': "What Is Web Scraping? A Complete Beginner's Guide",
'Link': 'https://careerfoundry.com/en/blog/data-analytics/web-scraping-guide/'},
{'Title': 'What Is Scraping | About Price & Web Scraping Tools | Imperva',
'Link': 'https://www.imperva.com/learn/application-security/web-scraping-attack/'},
{'Title': 'What is Web Scraping and What is it Used For? | ParseHub',
'Link': 'https://www.parsehub.com/blog/what-is-web-scraping/'},
{'Title': 'A Practical Introduction to Web Scraping in Python',
'Link': 'https://realpython.com/python-web-scraping-practical-introduction/'},
{'Title': 'Web Scraping with Python: Everything you need to know (2022)',
'Link': 'https://www.scrapingbee.com/blog/web-scraping-101-with-python/'},
{'Title': 'Web Scraper - The #1 web scraping extension',
'Link': 'https://webscraper.io/'},
{'Title': 'What Is Web Scraping? - Zyte',
'Link': 'https://www.zyte.com/learn/what-is-web-scraping/'}]
Output Im getting:
[{'Title': 'What is Web Scraping and How to Use It? - GeeksforGeeks',
'Link': 'https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/'},
{'Title': 'Web scraping - Wikipedia',
'Link': 'https://en.wikipedia.org/wiki/Web_scraping'},
{'Title': "What Is Web Scraping? A Complete Beginner's Guide",
'Link': 'https://careerfoundry.com/en/blog/data-analytics/web-scraping-guide/'},
{'Title': 'What Is Scraping | About Price & Web Scraping Tools | Imperva',
'Link': 'https://www.imperva.com/learn/application-security/web-scraping-attack/'},
{'Title': 'What is Web Scraping and What is it Used For? | ParseHub',
'Link': 'https://www.parsehub.com/blog/what-is-web-scraping/'},
{'Title': 'A Practical Introduction to Web Scraping in Python',
'Link': 'https://realpython.com/python-web-scraping-practical-introduction/'},
{'Title': 'Web Scraping with Python: Everything you need to know (2022)',
'Link': 'https://www.scrapingbee.com/blog/web-scraping-101-with-python/'},
{'Title': 'Web Scraper - The #1 web scraping extension',
'Link': 'https://webscraper.io/'},
{'Title': 'What Is Web Scraping? - Zyte',
'Link': 'https://www.zyte.com/learn/what-is-web-scraping/'}]
Code Im useing:
import requests
import urllib
import pandas as pd
import pprint
from requests_html import HTML
from requests_html import HTMLSession
def get_source(url):
"""Return the source code for the provided URL.
Args:
url (string): URL of the page to scrape.
Returns:
response (object): HTTP response object from requests_html.
"""
try:
session = HTMLSession()
response = session.get(url)
return response
except requests.exceptions.RequestException as e:
print(e)
def get_results(query):
query = urllib.parse.quote_plus(query)
response = get_source("https://www.google.co.uk/search?q=" + query)
return response
def parse_results(response):
css_identifier_result = ".tF2Cxc"
css_identifier_title = "h3"
css_identifier_link = ".yuRUbf a"
css_identifier_text = ".VwiC3b"
results = response.html.find(css_identifier_result)
output = []
for result in results:
item = {
'Title': result.find(css_identifier_title, first=True).text,
'Link': result.find(css_identifier_link, first=True).attrs['href']
}
output.append(item)
pprint.pprint(output, stream=None, indent=0, width=80, depth=None, compact=False, sort_dicts=False)
def google_search(query):
response = get_results(query)
return parse_results(response)
google_search("web scraping")