Substack2Markdown is a Python tool for downloading free and premium Substack posts and saving them as both Markdown and HTML files, and includes a simple HTML interface to browse and sort through the posts. It will save paid for content as long as you're subscribed to that substack.
🆕 @Firevvork has built a web version of this tool at Substack Reader - no installation required! (Works for free Substacks only.)
Once you run the script, it will create a folder named after the substack in /substack_md_files,
and then begin to scrape the substack URL, converting the blog posts into markdown files. The script automatically
downloads all images referenced in the articles and saves them locally in an images/ subdirectory, replacing
the original URLs with local paths for offline viewing. Once all the posts have been saved, it will generate an
HTML file in /substack_html_pages directory that allows you to browse the posts.
You can either hardcode the substack URL and the number of posts you'd like to save into the top of the file, or specify them as command line arguments.
File Structure
The script creates the following directory structure:
substack_md_files/
└── author_name/
├── 2024-10-01_article-title.md
├── 2024-10-02_another-article.md
└── images/
├── 3abb814d.png
├── cdba8659.jpeg
└── ...
- Markdown files: Saved with date prefixes for easy sorting
- Images directory: Contains all downloaded images with unique filenames
- Self-contained: Markdown files reference images using relative paths (
images/filename.png)
Features
- Converts Substack posts into Markdown files.
- Automatically downloads and localizes images from articles for offline viewing.
- Generates an HTML file to browse Markdown files.
- Supports free and premium content (with subscription).
- The HTML interface allows sorting essays by date or likes.
- Creates self-contained markdown files with local image references.
Installation
Clone the repo and install the dependencies:
git clone https://github.com/yourusername/Substack2Markdown cd Substack2Markdown # # Optionally create a virtual environment # python -m venv venv # # Activate the virtual environment # .\venv\Scripts\activate # Windows # source venv/bin/activate # Linux pip install -r requirements.txt
For the premium scraper, create a .env file in the root directory with your configuration:
# Substack credentials EMAIL=your-email@domain.com PASSWORD=your-password # Remote server configuration (optional) REMOTE_SERVER=192.168.104.209 REMOTE_USER=ubuntu REMOTE_BASE_DIR=/home/ubuntu/substacks REMOTE_HTML_DIR=/home/ubuntu/substacks/html # SSH key path - customize for your system # For macOS/Linux with default SSH key: SSH_KEY_PATH=~/.ssh/id_rsa # For systems with ed25519 keys: # SSH_KEY_PATH=~/.ssh/id_ed25519 # For custom key names: # SSH_KEY_PATH=~/.ssh/id_ed25519_bazzite
Alternatively, you can update the config.py file directly with your values.
You'll also need Brave Browser installed for the Selenium webdriver (ChromeDriver will be automatically managed).
Usage
Specify the Substack URL and the directory to save the posts to:
You can hardcode your desired Substack URL and the number of posts you'd like to save into the top of the file and run:
python substack_scraper.py
For free Substack sites:
python substack_scraper.py --url https://example.substack.com --directory /path/to/save/posts
For premium Substack sites:
python substack_scraper.py --url https://example.substack.com --directory /path/to/save/posts --premium
Note: For premium content, you may need to complete a captcha or handle popups manually in the browser window that opens.
To scrape a specific number of posts:
python substack_scraper.py --url https://example.substack.com --directory /path/to/save/posts --number 5
Online Version
For a hassle-free experience without any local setup:
- Visit Substack Reader
- Enter the Substack URL you want to read or export
- Click "Go" to instantly view the content or "Export" to download Markdown files
This online version provides a user-friendly web interface for reading and exporting free Substack articles, with no installation required. However, please note that the online version currently does not support exporting premium content. For full functionality, including premium content export, please use the local script as described above. Built by @Firevvork.
Viewing Markdown Files in Browser
To read the Markdown files in your browser, install the Markdown Viewer browser extension. But note, we also save the files as HTML for easy viewing, just set the toggle to HTML on the author homepage.
Or you can use our Substack Reader online tool, which allows you to read and export free Substack articles directly in your browser. (Note: Premium content export is currently only available in the local script version)
