Concurrent push_to_hub by lhoestq · Pull Request #7708 · huggingface/datasets

Retry the step that (download + update + upload) the README.md using create_commit(..., parent_commit=...) if there was a commit in the meantime. This should enable concurrent push_to_hub() since it won't overwrite the README.md metadata anymore.

Note: we fixed an issue server side to make this work:

Details

DO NOT MERGE FOR NOW since it seems there is one bug that prevents this logic from working:

I'm using parent_commit to enable concurrent push_to_hub() in datasets for a retry mechanism, but for some reason I always run into a weird situation.
Sometimes create_commit(.., parent_commit=...) returns error 500 but the commit did happen on the Hub side without respecting parent_commit

e.g. request id

huggingface_hub.errors.HfHubHTTPError: 500 Server Error: Internal Server Error for url: https://huggingface.co/api/datasets/lhoestq/tmp/commit/main (Request ID: Root=1-6888d8af-2ce517bc60c69cb378b51526;d1b17993-c5d0-4ccd-9926-060c45f9ed61)

fix coming in internal

close #7600