Issue33784
Created on 2018-06-06 15:29 by Francois Schneider, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Messages (6) | |||
|---|---|---|---|
| msg318835 - (view) | Author: Francois Schneider (Francois Schneider) | Date: 2018-06-06 15:29 | |
>>> import ipaddress >>> hash(ipaddress.ip_network(u'20.0.2.3/32')) == hash(ipaddress.ip_network(u'20.0.2.0/30')) True |
|||
| msg318901 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2018-06-07 06:58 | |
This shouldn't be a problem: there's no rule that says that different objects should have different hashes. Indeed, with a countable infinity of possible different hashable inputs, a deterministic hashing algorithm, and only finitely many outputs, such a rule would be a mathematical impossibility. For example:
>>> hash(-1) == hash(-2)
True
Are these hash collisions causing real issues in your code? While a single hash collision like this shouldn't be an issue, if there are many collisions within a single (non-artificial) dataset, that _can_ lead to performance issues.
Looking at the code, we could probably do a better job of making the hash collisions less predictable. The current code looks like:
def __hash__(self):
return hash(int(self.network_address) ^ int(self.netmask))
I'd propose hashing a tuple instead of using the xor. For example:
def __hash__(self):
return hash((int(self.network_address), int(self.netmask)))
Hash collisions would almost certainly still occur with this scheme, but they'd be a tiny bit less obvious and harder to find.
|
|||
| msg318904 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2018-06-07 07:07 | |
Concur with Mark. |
|||
| msg318905 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2018-06-07 07:09 | |
> I'd propose hashing a tuple instead of using the xor. To be clear; I'd propose this _only_ if there's evidence that the current hashing scheme is unsatisfactory for real-world use-cases. Otherwise, on an "if it's not broken, don't fix it" basis, I don't think there's any change to be made here. |
|||
| msg318976 - (view) | Author: Francois Schneider (Francois Schneider) | Date: 2018-06-07 22:17 | |
Thanks for the analysis, I agree completely. Actually the problem was coming from my code where one of the __eq__ method was implemented like this: >>> def __eq__(self, other): >>> return hash(self) == hash(other) so 2 instances with only a slight difference in their ip_network attribute (ip_network(u'20.0.2.3/32') and ip_network(u'20.0.2.0/30')) were having the same hash and being equal -> they could not be inserted both in the same collection. I will just rewrite my __eq__ method properly. |
|||
| msg319030 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2018-06-08 07:49 | |
Thanks, Francois. Indeed, comparing hashes isn't a good way to implement equality. :-) Closing here. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:01 | admin | set | github: 77965 |
| 2018-06-08 07:49:35 | mark.dickinson | set | status: open -> closed messages:
+ msg319030 |
| 2018-06-07 22:17:51 | Francois Schneider | set | status: pending -> open messages: + msg318976 |
| 2018-06-07 07:20:14 | mark.dickinson | set | status: open -> pending resolution: not a bug |
| 2018-06-07 07:10:44 | mark.dickinson | set | nosy:
+ serhiy.storchaka |
| 2018-06-07 07:09:34 | mark.dickinson | set | nosy:
- serhiy.storchaka messages: + msg318905 |
| 2018-06-07 07:07:43 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg318904 |
| 2018-06-07 06:58:40 | mark.dickinson | set | nosy:
+ mark.dickinson messages: + msg318901 |
| 2018-06-06 15:32:09 | Francois Schneider | set | versions: + Python 3.4, Python 3.5 |
| 2018-06-06 15:29:28 | Francois Schneider | create | |
