Intuitive and extendable checksumming for python objects
| Latest Release |
|
|
|
|
| Build Status |
|
| Coverage |
|
| License |
|
| Downloads |
|
| Platforms |
|
Goals
- Provide a checksumming toolkit for python with out of the box support for common types
- Architect a framework for implementing customized checksumming logic
- Produce high quality checksums with extraordinarily low collision rates
- Build a toolkit for using and manipulating checksums
- Test it all with 100% coverage and support python 3.7, 3.8, 3.9, 3.10, 3.11, 3.12, 3.13 and 3.14
Where to get it
Source code is available on github: https://github.com/QCoding
Install with conda:
# from conda forge: https://anaconda.org/conda-forge/qsum
conda install qsum
Install with pip:
# from PyPI: https://pypi.org/project/qsum/
pip install qsum
How to use it
# Functional Interface
from qsum import checksum
checksum('abc')
# Class Interface
from qsum import Checksum
Checksum('abc').checksum_bytes
Design
- QSUM CHECKSUM = TYPE PREFIX + DATA CHECKSUM
- The first two bytes of every checksum represent the type and will be referred to as the 'type prefix'
- The rest of the checksum in a digest of the byte representation of the object and will be refered to as the 'data checksum'
Relationship to __hash__
- Respect the same contract as
__hash__with regards to: 'The only required property is that objects which compare equal have the same hash value' - Do not salt hash values (unless requested) and maintain stability in checksums throughout python sessions and versions along with releases of this package
- PYTHONHASHSEED should have no effect on checksums
- Provide significantly longer checksums than
__hash__which 'is typically 8 bytes on 64-bit builds and 4 bytes on 32-bit builds' - Represent all checksums as bytes but provide a toolkit to view more human readable formats like hexdigests
- Base checksums on object contents and permit the calculation of checksums on mutable objects
Adding Salt
- By default the environment is not included in the checksum but individual package versions can be included if the package name is added via the depends_on argument
- To include the entire python environment in the checksum:
from qsum import checksum, DependsOn checksum('abc', depends_on=DependsOn.PythonEnv)
Type Support
- The great majority of Built-in Types including collections are checksummable
- bool, int, float, complex, str, bytes, tuple, list, dict, set, deque, etc.
- Common types have registered type prefixes which can be used to recover the type from the checksum
Custom Containers
- Custom container classes that inherit from common python containers (E.g. tuple, list, set, dict) are checksummable
- The class name is not recoverable from the type prefix but will be added as salt to the data checksum to prevent collisions
Functions and Modules
- Functions are checksummed based on a combination of their source code, attributes and module location
- Modules are checksummed simply based on the hash of their source code
Files
- When passed an open file handle qsum will include all the bytes of the file in the checksum calculation