refactor!: Refactor HttpCrawler, BeautifulSoupCrawler, ParselCrawler inheritance by Pijukatel · Pull Request #746 · apify/crawlee-python

added 12 commits

November 21, 2024 09:41
UTs working.

Generics stretched to limits, probably not worth it to keep BScrawlingcontext
Solved middleware issues.
HttpCrawler made generic.
BeautifulSoup and Parsel crwalers inherit from this new generic.

@Pijukatel Pijukatel added t-tooling

Issues with this label are in the ownership of the tooling team.

debt

Code quality improvement or decrease of technical debt.

labels

Nov 26, 2024

@Pijukatel

@Pijukatel

github-actions[bot]

github-actions[bot]

@Pijukatel

@Pijukatel

@Pijukatel

janbuchar

@Pijukatel

@Pijukatel

vdusek

janbuchar

vdusek

@Pijukatel Pijukatel changed the title refactor: Refactor HttpCrawler, BeautifulSoupCrawler, ParselCrawler inheritance refactor!: Refactor HttpCrawler, BeautifulSoupCrawler, ParselCrawler inheritance

Dec 3, 2024
Co-authored-by: Jan Buchar <jan.buchar@apify.com>

@Pijukatel

@Pijukatel

@Pijukatel

vdusek

Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>

@Pijukatel

@Pijukatel

Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>

vdusek

vdusek

Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>

@Pijukatel

vdusek

@Pijukatel

@Pijukatel Pijukatel deleted the new-class-hier-current-middleware branch

December 6, 2024 12:10

Pijukatel added a commit that referenced this pull request

Dec 10, 2024
This should have been part of
#746

Mantisus pushed a commit to Mantisus/crawlee-python that referenced this pull request

Dec 10, 2024
…inheritance (apify#746)

Reworked http based crawlers inheritance.
StaticContentCrawler is parent of BeautifulSoupCrawler, ParselCrawler
and HttpCrawler.

StaticContentCrawler is generic. Specific versions depend on the type of
parser used for parsing http response.

**Breaking change:**
Renamed BeautifulSoupParser to BeautifulSoupParserType (it is just
string literal to properly set BeautiflSoup)
BeautifulSoupParser is used for new class that is the parser used by
BeautifulSoupCrawler

- Closes: [ Reconsider crawler inheritance apify#350
](apify#350)

---------

Co-authored-by: Jan Buchar <jan.buchar@apify.com>
Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>