Web Scraper Improved by Sarthacker · Pull Request #516

Web Scraper Improved by Sarthacker · Pull Request #516 · wasmerio/Python-Scripts

Closes Issue #499

PR Title

Upgraded Existing Web Scraper Using Custom Search Engine ID and Google API

Summary

This PR introduces a fully functional web scraping tool that extracts search results dynamically, logs all key actions, and performs basic data analysis on the gathered data.

Description

The changes are as follows:

Scope change: Old script scrapes using requests + BeautifulSoup. New script performs Google Custom Search queries and saves search results (Title/URL/Snippet), then produces a generated summary using Google Gemini (Generative AI).
The old script scrapes <h2 class="blog-title"> elements and prints and writes them to blog_titles.txt.
The new script:
- Accepts a search query via command-line argument.
- Queries Google Custom Search API to fetch search results (Title, URL, Snippet).
- Summarizes snippets using Google Gemini (Generative AI).
- Adds structured logging into data/logs/.
- Saves result in a structured CSV form.

Screenshots

The user needs to write their query through the CLI as shown, the results are stored in a CSV file and a summary of those results is also stored as a text file.

Checks

in the repository

Made no changes that degrades the functioning of the repository
Gave each commit a better title (unlike updated README.md)

in the PR

Followed the format of the pull_request_template
Made the Pull Request in a small level (for the creator's wellfare)
Tested the changes you made

Thank You,
Sarthak