fix: fix handling of loading empty `metadata` file for queue by Mantisus · Pull Request #1042 · apify/crawlee-python
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suppose it is because of wb + the fact that we work with files asynchronously that we have a situation where empty files are created.
We open the file in wb mode, deleting the contents and switching the asynchronous context. If the crawler is interrupted at this point, the file remains empty.
I settled on r+b mode because we always write formatted json files. These are also metadata files, so the fields in them are not changed. So I think it should work, since we will be overwriting the same number of lines each time.