bpo-44149: Add `key` argument to difflib.get_close_matches() by mustafaquraish · Pull Request #26170 · python/cpython

mustafaquraish

https://bugs.python.org/issue44149

This features allows you to specify a key function to extract the correct value from an element to be able to find close matches. Currently the only way to do it without re-implementing the function is to extract all the strings into a list, find the close matches and then once again go through the objects to find the corresponding ones.

A simple test case has also been added, and documentation within the file updated as well.

https://bugs.python.org/issue44149

@mustafaquraish

@the-knights-who-say-ni

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

Recognized GitHub username

We couldn't find a bugs.python.org (b.p.o) account corresponding to the following GitHub usernames:

@mustafaquraish

This might be simply due to a missing "GitHub Name" entry in one's b.p.o account settings. This is necessary for legal reasons before we can look at this contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@blurb-it

DimitrisJim

@@ -710,11 +710,18 @@ def get_close_matches(word, possibilities, n=3, cutoff=0.6):
Optional arg cutoff (default 0.6) is a float in [0, 1]. Possibilities
that don't score at least that similar to word are ignored.

Optional arg key specifies a function of one argument that is used to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the impression the trailing whitespace at this and the following line cause travis to complain.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for pointing it out. I'll update this along with the other changes.

@DimitrisJim

The docs should definitely be updated for this. I also see a test for this specific function does not exist yet, this seems like a good chance to add one?

@mustafaquraish

The docs should definitely be updated for this. I also see a test for this specific function does not exist yet, this seems like a good chance to add one?

@DimitrisJim , thanks for the feedback. I'm still new to the project, and would like to help out with this, so can you point me as to how to go about this? From what I see the documentation for this function looks like it has been auto-generated from the docstrings, and the tests (which I can run with ./python.exe -m test test_difflib) are just using the tests from the docstrings for each function. I have updated those to reflect the new argument as well as added a test for the key feature.

Is there a more official place to put these? I'm not 100% confident on the internal algorithms for difflib, so I'm not sure I would be the best one to be writing a full test suite for it (I wouldn't know how to properly check the edge cases with the cutoff parameter for instance). My development process for this PR was just finding every occurrence of the function inside the source with the updated version.

@mustafaquraish

@DimitrisJim

Of course! Docs are relatively straight-forward to add, from what I remember, they aren't built automatically.

The Devguide section on this explains most of what you'll need to know. It basically boils down to opening the corresponding file for difflib in the Docs/lib folder (difflib.rst) and adding the docs you've added in the function docstring (looking at difflib.rst it does contain exactly the same text as in the function docstring, so I see how it looks as if they've been auto-generated).

For tests, usually you'd add a unit test to Lib/test/test_difflib.py but I'm thinking a reason might exist that this function hasn't been explicitly tested yet (maybe doctests suffice). The reviewer, who maintains the module, will probably have more to say on this.

@mustafaquraish

@github-actions

This PR is stale because it has been open for 30 days with no activity.

@mustafaquraish

@tim-one any chance you could review this anytime soon? It's a pretty minor change.

@MaxwellDupre

Please add to Docs as get_close_matches(word, possibilities, n=3, cutoff=0.6) needs changing. Also, please add version changed: 3.12 (is earliest).