Message 305153 - Python tracker

Message305153

Author	Siltaar
Recipients	Siltaar
Date	2017-10-28.10:03:32
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1509185013.27.0.213398074469.issue31889@psf.upfronthosting.co.za>
In-reply-to

Content
I, it's my 1st post here. I'm a French computer-science engineer with 10 years XP and manager at Acoeuro.com SSLL compagny. I suggested a better regexp integration on python-ideas a few months ago failing to convince getting things done. Despite issue https://bugs.python.org/issue25391 closed in 2010, nothing seems visible on https://docs.python.org/3/library/difflib.html to help users anticipating that a string greatly matching at 199 characters length, won't at all at the 200th one. It's an inconsistent behavior that looks like a bug. #!/usr/bin/env python3 from difflib import SequenceMatcher a = 'ab'400 b = 'ba'400 for i in range(1, 400): diff_ratio = SequenceMatcher(None, a=a[:i], b=b[:i]).ratio() print('%3.i %.2f' % (i, diff_ratio), end=' ') not i % 10 and print('') EOF At 199c I have a 0.99 ratio, and 0.00 at 200c. The results are nearly the same with strings made of emails like in https://github.com/Siltaar/drop_alternatives especially comparing strings like : "performantsetdvelopperducontenusimilairepourvosprochainsTweets.Suiveznous@TwitterBusinesspourdautresinfosetactus.TesterlesPublicitsTwitterbusiness.twitter.com\|@TwitterBusiness\|Aide\|SedsinscrireLemailsadresse@gggTwitter,Inc.MarketStreet,SuiteSanFrancisco,CA" "rducontenusimilairepourvosprochainsTweets.Suiveznous@TwitterBusinesspourprofiterdautresinfosetactus.TesterlesPublicitésTwitterbusiness.twitter.com@TwitterBusinessAideSedésinscrireTwitterInternationalCompanyOneCumberlandPlace,FenianStreetDublin,DAXIRELAND" Fortunately, I didn't experienced the problem using quick_ratio(). The documentation is not clear about ratio / quick_ratio / real_quick_ratio ; but looks like unstable. With in addition an inconsistent behavior it looks like worthing some fix.

Content

I, it's my 1st post here. I'm a French computer-science engineer with 10 years XP and manager at Acoeuro.com SSLL compagny. I suggested a better regexp integration on python-ideas a few months ago failing to convince getting things done.

Despite issue https://bugs.python.org/issue25391 closed in 2010, nothing seems visible on https://docs.python.org/3/library/difflib.html to help users anticipating that a string greatly matching at 199 characters length, won't at all at the 200th one. It's an inconsistent behavior that looks like a bug.

#!/usr/bin/env python3

from difflib import SequenceMatcher

a = 'ab'*400                                                                                    
b = 'ba'*400

for i in range(1, 400):
    diff_ratio = SequenceMatcher(None, a=a[:i], b=b[:i]).ratio()
    print('%3.i %.2f' % (i, diff_ratio), end=' ')
    not i % 10 and print('')

EOF

At 199c I have a 0.99 ratio, and 0.00 at 200c. The results are nearly the same with strings made of emails like in https://github.com/Siltaar/drop_alternatives especially comparing strings like : 

"performantsetdvelopperducontenusimilairepourvosprochainsTweets.Suiveznous@TwitterBusinesspourdautresinfosetactus.TesterlesPublicitsTwitterbusiness.twitter.com|@TwitterBusiness|Aide|SedsinscrireLemailsadresse@gggTwitter,Inc.MarketStreet,SuiteSanFrancisco,CA"

"rducontenusimilairepourvosprochainsTweets.Suiveznous@TwitterBusinesspourprofiterdautresinfosetactus.TesterlesPublicitésTwitterbusiness.twitter.com@TwitterBusinessAideSedésinscrireTwitterInternationalCompanyOneCumberlandPlace,FenianStreetDublin,DAXIRELAND"

Fortunately, I didn't experienced the problem using quick_ratio().

The documentation is not clear about ratio / quick_ratio / real_quick_ratio ; but looks like unstable. With in addition an inconsistent behavior it looks like worthing some fix.

History
Date	User	Action	Args
2017-10-28 10:03:33	Siltaar	set	recipients: + Siltaar
2017-10-28 10:03:33	Siltaar	set	messageid: <1509185013.27.0.213398074469.issue31889@psf.upfronthosting.co.za>
2017-10-28 10:03:33	Siltaar	link	issue31889 messages
2017-10-28 10:03:32	Siltaar	create