Python library to break text into words
Abdur-Rahmaan Janhangeer
arj.python at gmail.com
Thu May 31 23:29:27 EDT 2018
More information about the Python-list mailing list
Thu May 31 23:29:27 EDT 2018
- Previous message (by thread): Python library to break text into words
- Next message (by thread): Attachments? Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
1-> search in dict, identify all words example : meaningsofoffers .. identified words : me an mean in meaning meanings so of of offer offers 2-> next filter duplicates, i.e. of above in a new list as the original list serves as chronological reference 3-> next chose the words whose lengths make up the length of the string 4-> if several solutions choose non-overlapping and chronologically sound ones 5-> unused letters are treated as words where non-natural words are included, that can be problematic if sub words are found in it and point 7 might be the way to go 6-> in the case of non-regular words included, the program returns the best solutions for the user to choose from i have branded the above 6 points algorithm as the Arj.mu Algorithm of Word Extraction in Connected Letters 7-> if machine learning is enacted, the above point (6) serves as training (on an everyday usage app) or it can directly train on predefined examples 8-> if typos are assumed to be found titles, then the title should be assumed to have the corrected words and a new search is done on this assumed title. in which case the results are added to the non corrected version and then point 6 above is executed 8.1-> for assumptions in 8, Natural Language modules might be used 9-> titles can contain numbers, dates, author names and others and as such is not covered by the points above Abdur-Rahmaan Janhangeer https://github.com/Abdur-rahmaanJ
- Previous message (by thread): Python library to break text into words
- Next message (by thread): Attachments? Re: Indented multi-line strings (was: "Data blocks" syntax specification draft)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list