fix: Instrument creation race condition by herin049 · Pull Request #4913 · open-telemetry/opentelemetry-python
Description
Fixes Instrument creation race condition between API and SDK.
Fixes #4892
The original issue in #4892 occurs because of a race condition between a call to _register_instrument in the API Meter and the actual update to the _instrument_id_instrument map in the SDK Meter. The issue surfaces in the following scenario:
Thread A registers an instrument in the API, but before it can update the SDK map, Thread B calls the same method. Upon calling the register instrument function, Thread B would see that the instrument is "already registered" and immediately attempt to fetch the instrument from the SDK map, raising a KeyError since Thread A still has yet to populate the map.
The fix in this PR addresses the original issue by acquiring the lock to the map while registering the instrument to ensure that registration and updates to the SDK map are atomic.
In the original scenario above, if Thread B where to arrive while Thread A is still creating the instrument, it will block on the SDK lock until it is released. Once Thread B has acquired the lock, the SDK map will be populated.
Given that the lock ordering is always SDK Lock -> API Lock, there is no risk of deadlock. Furthermore, to keep the critical section as short as possible, the conflict logging as been moved outside the critical section.
Type of change
Please delete options that are not relevant.
- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing functionality to not work as expected)
- This change requires a documentation update
How Has This Been Tested?
Unit tests have been added that fail without the changes made in this PR.
Does This PR Require a Contrib Repo Change?
- Yes. - Link to PR:
- No.
Checklist:
- Followed the style guidelines of this project
- Changelogs have been updated
- Unit tests have been added