feat: add `on_skipped_request` decorator, to process links skipped according to `robots.txt` rules by Mantisus · Pull Request #1166 · apify/crawlee-python

added 12 commits

April 17, 2025 15:43

@Mantisus

Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>

@Mantisus

@Mantisus

@Mantisus

@Mantisus

@renovate @Mantisus

@renovate @Mantisus

### Description

Update `UnprocessedRequest` to match actual data.
Add test.

### Issues

- Closes: apify#1150

@Mantisus

… and the handler is executed for `PlaywrightCrawler` (apify#1163)

### Description

- For `PlaywrightCrawler`, cookies should only be saved to the session
store when the handler is fully executed. This is because the browser
may continue to set cookies while the handler is being executed

### Testing

- Add a test simulating the installation of a cookie in the browser
during the `default_handler` execution process
- Update the `test_isolation_cookies` test

@Mantisus

### Description
Adds retry to unprocessed requests in call `add_requests_batched`.
Retry calls recursively `_process_batch`, which initially works on full
request batch and then on batches of unprocessed requests until retry
limit is reached or all requests are processed. Each retry is done after
linearly increasing delay with each attempt.

Unprocessed requests are not counted in `request_queue.get_total_count`
Add test.

### Issues

- Closes: [Handle unprocessed requests in
batch_add_requests](apify/apify-sdk-python#456)

Apify Release Bot and others added 6 commits

April 24, 2025 14:38
…tation count exceeds maximum (apify#1147)

- Call `failed_request_handler` for `SessionError` when session rotation
count exceeds maximum

@Mantisus Mantisus marked this pull request as ready for review

April 24, 2025 15:56

vdusek

@Mantisus

vdusek

vdusek

@Mantisus

Pijukatel