BUG? sha-moduel returns same crc for different files
Tim Peters
tim_one at email.msn.com
Mon Sep 18 12:24:13 EDT 2000
More information about the Python-list mailing list
Mon Sep 18 12:24:13 EDT 2000
- Previous message (by thread): BUG? sha-moduel returns same crc for different files
- Next message (by thread): BUG? sha-moduel returns same crc for different files
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Treutwein Guido] >> While the probabilty of a file, to have a certain hash code is >> 2^(-hash_bitlength), the probability of finding two files with >> the same hash value is MUCH bigger; this is the so-called birthday >> paradox. (due to the fact, that having 23 persons in a room, the >> probabilty of having to with the same birthday is better than 50%; >> for a 32bit-CRC the corresponding limit is about 77000 files for >> a 50% chance). [Erno Kuusela] > 2^32 is much smaller than 2^160 (2^128 times smaller infact). how many > files would be needed for there to be a 50% change of a sha-1 hash > collision? (how is it calculated?) 2^160 is > 1461501637330902918203684832716283019655932542976... To a first approximation, which is more than adequate for this level of analysis, if there are N birthdays then you can expect at least one duplicate after roughly sqrt(N) random samples are drawn. To a second approximation, sqrt(pi*N/2) is closer, but it makes no practical difference with numbers this large. So about 2**80 for SHA (160-bit digest), 2**64 for MD5 (128-bit digest), 2**16 for CRC32. That's under assumptions of "true randomness", of course, but they're not truly random, so it may be prudent to be paranoid. For example, it's very easy to construct any number of distinct files with a given CRC32 hash; it's just *believed* to be intractably difficult to do the same with MD5 or SHA. ... >> If you don't have the time to write a hash function as >> C extension package consider to use a combination of >> sha-1, md-5, file size and crc32 > since there are 4294967296 times more possible values for sha-1 > than for md5, methinks this would not make much difference. Then it depends on how valuable a small difference is to your application. If you're betting someone's life on it, it's a good idea to combine a variety of methods with different underpinnings (to guard against currently-unknown systematic weakness in any one of them). If you're just trying to save a few of bytes of disk storage, a plain CRC32 is much cheaper and probably adequate.
- Previous message (by thread): BUG? sha-moduel returns same crc for different files
- Next message (by thread): BUG? sha-moduel returns same crc for different files
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list