refactor!: Refactor HttpCrawler, BeautifulSoupCrawler, ParselCrawler inheritance by Pijukatel · Pull Request #746 · apify/crawlee-python
added 12 commits
November 21, 2024 09:41labels
Nov 26, 2024
Pijukatel
changed the title
refactor: Refactor HttpCrawler, BeautifulSoupCrawler, ParselCrawler inheritance
refactor!: Refactor HttpCrawler, BeautifulSoupCrawler, ParselCrawler inheritance
Pijukatel
deleted the
new-class-hier-current-middleware
branch
Pijukatel added a commit that referenced this pull request
Dec 10, 2024This should have been part of #746
Mantisus pushed a commit to Mantisus/crawlee-python that referenced this pull request
Dec 10, 2024…inheritance (apify#746) Reworked http based crawlers inheritance. StaticContentCrawler is parent of BeautifulSoupCrawler, ParselCrawler and HttpCrawler. StaticContentCrawler is generic. Specific versions depend on the type of parser used for parsing http response. **Breaking change:** Renamed BeautifulSoupParser to BeautifulSoupParserType (it is just string literal to properly set BeautiflSoup) BeautifulSoupParser is used for new class that is the parser used by BeautifulSoupCrawler - Closes: [ Reconsider crawler inheritance apify#350 ](apify#350) --------- Co-authored-by: Jan Buchar <jan.buchar@apify.com> Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters