feat: Add `always_enqueue` option to bypass URL deduplication by Rutam21 · Pull Request #621 · apify/crawlee-python

@Rutam21

vdusek

Co-authored-by: Vlada Dusek <v.dusek96@gmail.com>

@Rutam21

janbuchar

@janbuchar

janbuchar

Pijukatel added a commit to apify/apify-sdk-python that referenced this pull request

Nov 28, 2025
#677)

### Description

- Make sure that storage from `ApifyFileSystemStorageClient` does not
get purged twice due to storage from `FileSystemStorageClient` pointing
to the same location. (Those storage clients will have the same cache
key and thus there can be only one.)
- Ensure that `Actor` will open input containing KVS on initialization
to ensure that an aware storage client is used.
- Support any possible pre-existing input key and file that is defined
through `Configuration.input_key`. Different input files will have
different handling based on their suffix:
  - ".json" is parsed as json.
  - ".txt" is opened as plain text
  - everything else is opened as bytes
- without extension is tried to be parsed as json first, but falls back
to bytes
- Create a metadata file for the valid pre-existing input file, without
modifying the input file (otherwise, cli might detect the change to the
input, which would be a false positive)
- Raise an error if two valid pre-existing input files exist in the
expected storage directory.
- CLI does not respect env variables with the input key so far. TODO:
apify/apify-cli#960

### Issues

Closes: apify/crawlee-python#621 
Related to: [#INPUT.json Automatically Deleted on Each Run (Python SDK
Local Storage
Issue)](#686)

### Testing

- Added unit tests.
- Manually tested with
[apify-cli@1.1.2-beta.20](https://www.npmjs.com/package/apify-cli/v/1.1.2-beta.20)
- npx apify-cli@1.1.2-beta.20 run -i {\"a\":\"c\"} with pre-existing
input file or without input and multiple times in a row
- npx apify-cli@1.1.2-beta.20 run with pre-existing input file or
without input and multiple times in a row