Mass Text Indexing Tools
jerry_spicklemire at my-deja.com
jerry_spicklemire at my-deja.com
Tue Oct 17 12:10:50 EDT 2000
More information about the Python-list mailing list
Tue Oct 17 12:10:50 EDT 2000
- Previous message (by thread): MS HTML help docs
- Next message (by thread): Mass Text Indexing Tools
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In article <qrUG5.8500$Qf5.153919 at newsread1.prod.itd.earthlink.net>, "Ender" <kthangavelu at earthlink.net> wrote: > Does anyone know of some good mass text indexing/searching tools > (preferrable open source) that are accessible from python. i've tried > using popen2 calls to grep but it starts to flag around 50Mbs. text > material consists of around a hundredb thousand small files (emails). > Check out: http://ransacker.sourceforge.net/ "Ransacker is a scriptable, incrementally-double-indexed search engine written in python. It's scriptable in that you can index any text with any key. This makes it easy to index content ("pages") stored in databases, file systems, the web, etc. It can index incrementally. This means you can add content or update the entry for a particular page without touching the rest of the index. It's double-indexed, meaning that not only does every word have a list of pages, every page has a list of words. This is used for the incremental indexer, but also allows you to determine which pages have the most in common. This will allow ransacker to produce "what's related" pages." Sent via Deja.com http://www.deja.com/ Before you buy.
- Previous message (by thread): MS HTML help docs
- Next message (by thread): Mass Text Indexing Tools
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list