Web Scraper Improved by Sarthacker · Pull Request #516 · wasmerio/Python-Scripts

Closes Issue #499

PR Title

Upgraded Existing Web Scraper Using Custom Search Engine ID and Google API

Summary

This PR introduces a fully functional web scraping tool that extracts search results dynamically, logs all key actions, and performs basic data analysis on the gathered data.

Description

The changes are as follows:

  • Scope change: Old script scrapes using requests + BeautifulSoup. New script performs Google Custom Search queries and saves search results (Title/URL/Snippet), then produces a generated summary using Google Gemini (Generative AI).
  • The old script scrapes <h2 class="blog-title"> elements and prints and writes them to blog_titles.txt.
  • The new script:
    • Accepts a search query via command-line argument.
    • Queries Google Custom Search API to fetch search results (Title, URL, Snippet).
    • Summarizes snippets using Google Gemini (Generative AI).
    • Adds structured logging into data/logs/.
    • Saves result in a structured CSV form.

Screenshots

logs csv_file data_analysis

  • The user needs to write their query through the CLI as shown, the results are stored in a CSV file and a summary of those results is also stored as a text file.

Checks

in the repository

  • Made no changes that degrades the functioning of the repository
  • Gave each commit a better title (unlike updated README.md)

in the PR

  • Followed the format of the pull_request_template
  • Made the Pull Request in a small level (for the creator's wellfare)
  • Tested the changes you made

Thank You,
Sarthak