a pure python locality senstive hashing implementation
Installation
lsh is packaged with setuptools so it can be easily installed with pip like this:
$ cd lsh/
$ [sudo] pip install -e .
Usage
from lsh import LSHCache cache = LSHCache() docs = [ "lipstick on a pig", "you can put lipstick on a pig", "you can put lipstick on a pig but it's still a pig", "you can put lipstick on a pig it's still a pig", "i think they put some lipstick on a pig but it's still a pig", "putting lipstick on a pig", "you know you can put lipstick on a pig", "they were going to send us binders full of women", "they were going to send us binders of women", "a b c d e f", "a b c d f"] dups = {} for i, doc in enumerate(docs): dups[i] = cache.insert(doc.split(), i) ...
Roadmap
- add more tests
- add
save()andfrom_file()methods - rewrite with redis backend?