A Python package for working with SEC filings at scale. Full Documentation | Website
Related packages:
- datamule-data Contains datasets for use with datamule-python
- datamule-indicators Create economic indicators from SEC filings
- txt2dataset Create datasets from unstructured text
- secsgml Parse SEC filings in SGML format
- doc2dict Convert documents to dictionaries. Not ready for public use.
Features
- Download SEC filings quickly and efficiently
- Monitor EDGAR for new filings in real-time
- Parse filings at scale
- Access comprehensive datasets (10-Ks, SIC codes, etc.)
Quick Start
Basic Installation
Basic Usage
Portfolio
from datamule import Portfolio # Create a Portfolio object portfolio = Portfolio('output_dir') # can be an existing directory or a new one # Download submissions portfolio.download_submissions( filing_date=('2023-01-01','2023-01-03'), submission_type=['10-K'] ) # Iterate through documents by document type for ten_k in portfolio.document_type('10-K'): ten_k.parse() print(ten_k.data['document']['part2']['item7']) # Iterate through documents by what strings they contain for document in portfolio.contains_string('United States'): print(document.path) # You can also use regex patterns for document in portfolio.contains_string(r'(?i)covid-19'): print(document.type) # For faster operations, you can take advantage of built in threading with callback function def callback(submission): print(submission.path) submission_results = portfolio.process_submissions(callback)
Sheet
from datamule import Sheet sheet = Sheet('apple') sheet.download_xbrl(ticker='AAPL')
Index
from datamule import Index index = Index() results = index.search_submissions( text_query='tariff NOT canada', submission_type="10-K", start_date="2023-01-01", end_date="2023-01-31", quiet=False, requests_per_second=3)
Examples (Out of Date - Will be updated soon)
Create a discord bot, use insider trading disclosures to map relationships in Silicon Valley, and more in examples.
Data Provider
Default is the SEC, but for faster downloads you can use datamule.
from datamule import Config config = Config() config.set_default_source("datamule") # set default source to datamule, can also be "sec" print(f"Default source: {config.get_default_source()}")
To use datamule as a provider, you need an API key.
Articles
- How to host the SEC Archive for $20/month
- Creating Structured Datasets from SEC filings
- Deploy a Financial Chatbot in 5 Minutes