Web Scraper Improved by Sarthacker · Pull Request #516 · wasmerio/Python-Scripts
Closes Issue #499
PR Title
Upgraded Existing Web Scraper Using Custom Search Engine ID and Google API
Summary
This PR introduces a fully functional web scraping tool that extracts search results dynamically, logs all key actions, and performs basic data analysis on the gathered data.
Description
The changes are as follows:
- Scope change: Old script scrapes using
requests+BeautifulSoup. New script performs Google Custom Search queries and saves search results (Title/URL/Snippet), then produces a generated summary using Google Gemini (Generative AI). - The old script scrapes
<h2 class="blog-title">elements and prints and writes them toblog_titles.txt. - The new script:
- Accepts a search
queryvia command-line argument. - Queries Google Custom Search API to fetch search results (Title, URL, Snippet).
- Summarizes snippets using Google Gemini (Generative AI).
- Adds structured logging into
data/logs/. - Saves result in a structured CSV form.
- Accepts a search
Screenshots
- The user needs to write their query through the CLI as shown, the results are stored in a CSV file and a summary of those results is also stored as a text file.
Checks
in the repository
- Made no changes that degrades the functioning of the repository
- Gave each commit a better title (unlike updated README.md)
in the PR
- Followed the format of the pull_request_template
- Made the Pull Request in a small level (for the creator's wellfare)
- Tested the changes you made
Thank You,
Sarthak


