Official Python SDK for the ScrapeGraph AI API - Intelligent web scraping and search powered by AI. Extract structured data from any webpage or perform AI-powered web searches with natural language prompts.
Get your API key!

Features
- 🤖 SmartScraper: Extract structured data from webpages using natural language prompts
- 🔍 SearchScraper: AI-powered web search with structured results and reference URLs
- 📝 Markdownify: Convert any webpage into clean, formatted markdown
- 🕷️ SmartCrawler: Intelligently crawl and extract data from multiple pages
- 🤖 AgenticScraper: Perform automated browser actions with AI-powered session management
- 📄 Scrape: Convert webpages to HTML with JavaScript rendering and custom headers
- ⏰ Scheduled Jobs: Create and manage automated scraping workflows with cron scheduling
- 💳 Credits Management: Monitor API usage and credit balance
- 💬 Feedback System: Provide ratings and feedback to improve service quality
🚀 Quick Links
ScrapeGraphAI offers seamless integration with popular frameworks and tools to enhance your scraping capabilities. Whether you're building with Python, using LLM frameworks, or working with no-code platforms, we've got you covered with our comprehensive integration options..
You can find more informations at the following link
Integrations:
- API: Documentation
- SDK: Python
- LLM Frameworks: Langchain, Llama Index, Crew.ai, CamelAI
- Low-code Frameworks: Pipedream, Bubble, Zapier, n8n, LangFlow
- MCP server: Link
📦 Installation
pip install scrapegraph-py
🎯 Core Features
- 🤖 AI-Powered Extraction & Search: Use natural language to extract data or search the web
- 📊 Structured Output: Get clean, structured data with optional schema validation
- 🔄 Multiple Formats: Extract data as JSON, Markdown, or custom schemas
- ⚡ High Performance: Concurrent processing and automatic retries
- 🔒 Enterprise Ready: Production-grade security and rate limiting
🛠️ Available Endpoints
🤖 SmartScraper
Using AI to extract structured data from any webpage or HTML content with natural language prompts.
Example Usage:
from scrapegraph_py import Client import os from dotenv import load_dotenv load_dotenv() # Initialize the client client = Client(api_key=os.getenv("SGAI_API_KEY")) # Extract data from a webpage response = client.smartscraper( website_url="https://example.com", user_prompt="Extract the main heading, description, and summary of the webpage", ) print(f"Request ID: {response['request_id']}") print(f"Result: {response['result']}") client.close()
🔍 SearchScraper
Perform AI-powered web searches with structured results and reference URLs.
Example Usage:
from scrapegraph_py import Client import os from dotenv import load_dotenv load_dotenv() # Initialize the client client = Client(api_key=os.getenv("SGAI_API_KEY")) # Perform AI-powered web search response = client.searchscraper( user_prompt="What is the latest version of Python and what are its main features?", num_results=3, # Number of websites to search (default: 3) ) print(f"Result: {response['result']}") print("\nReference URLs:") for url in response["reference_urls"]: print(f"- {url}") client.close()
📝 Markdownify
Convert any webpage into clean, formatted markdown.
Example Usage:
from scrapegraph_py import Client import os from dotenv import load_dotenv load_dotenv() # Initialize the client client = Client(api_key=os.getenv("SGAI_API_KEY")) # Convert webpage to markdown response = client.markdownify( website_url="https://example.com", ) print(f"Request ID: {response['request_id']}") print(f"Markdown: {response['result']}") client.close()
🕷️ SmartCrawler
Intelligently crawl and extract data from multiple pages with configurable depth and batch processing.
Example Usage:
from scrapegraph_py import Client import os import time from dotenv import load_dotenv load_dotenv() # Initialize the client client = Client(api_key=os.getenv("SGAI_API_KEY")) # Start crawl job crawl_response = client.crawl( url="https://example.com", prompt="Extract page titles and main headings", data_schema={ "type": "object", "properties": { "title": {"type": "string"}, "headings": {"type": "array", "items": {"type": "string"}} } }, depth=2, max_pages=5, same_domain_only=True, ) crawl_id = crawl_response.get("id") or crawl_response.get("task_id") # Poll for results if crawl_id: for _ in range(10): time.sleep(5) result = client.get_crawl(crawl_id) if result.get("status") == "success": print("Crawl completed:", result["result"]["llm_result"]) break client.close()
🤖 AgenticScraper
Perform automated browser actions on webpages using AI-powered agentic scraping with session management.
Example Usage:
from scrapegraph_py import Client import os from dotenv import load_dotenv load_dotenv() # Initialize the client client = Client(api_key=os.getenv("SGAI_API_KEY")) # Perform automated browser actions response = client.agenticscraper( url="https://example.com", use_session=True, steps=[ "Type email@gmail.com in email input box", "Type password123 in password inputbox", "click on login" ], ai_extraction=False # Set to True for AI extraction ) print(f"Request ID: {response['request_id']}") print(f"Status: {response.get('status')}") # Get results result = client.get_agenticscraper(response['request_id']) print(f"Result: {result.get('result')}") client.close()
📄 Scrape
Convert webpages into HTML format with optional JavaScript rendering and custom headers.
Example Usage:
from scrapegraph_py import Client import os from dotenv import load_dotenv load_dotenv() # Initialize the client client = Client(api_key=os.getenv("SGAI_API_KEY")) # Get HTML content from webpage response = client.scrape( website_url="https://example.com", render_heavy_js=False, # Set to True for JavaScript-heavy sites ) print(f"Request ID: {response['request_id']}") print(f"HTML length: {len(response.get('html', ''))} characters") client.close()
⏰ Scheduled Jobs
Create, manage, and monitor scheduled scraping jobs with cron expressions and execution history.
💳 Credits
Check your API credit balance and usage.
💬 Feedback
Send feedback and ratings for scraping requests to help improve the service.
🌟 Key Benefits
- 📝 Natural Language Queries: No complex selectors or XPath needed
- 🎯 Precise Extraction: AI understands context and structure
- 🔄 Adaptive Processing: Works with both web content and direct HTML
- 📊 Schema Validation: Ensure data consistency with Pydantic
- ⚡ Async Support: Handle multiple requests efficiently
- 🔍 Source Attribution: Get reference URLs for search results
💡 Use Cases
- 🏢 Business Intelligence: Extract company information and contacts
- 📊 Market Research: Gather product data and pricing
- 📰 Content Aggregation: Convert articles to structured formats
- 🔍 Data Mining: Extract specific information from multiple sources
- 📱 App Integration: Feed clean data into your applications
- 🌐 Web Research: Perform AI-powered searches with structured results
📖 Documentation
For detailed documentation and examples, visit:
💬 Support & Feedback
- 📧 Email: support@scrapegraphai.com
- 💻 GitHub Issues: Create an issue
- 🌟 Feature Requests: Request a feature
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ by ScrapeGraph AI