Issue 13121: collections.Counter's += copies the entire object
Created on 2011-10-07 08:50 by lars, last changed 2022-04-11 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| cpython-counter-iadd.diff | lars, 2011-10-07 08:50 | Patch that adds __iadd__ to collections.Counter | review | |
| counter.diff | rhettinger, 2011-10-19 05:06 | Patch for in-place multiset operations | review | |
| Messages (5) | |||
|---|---|---|---|
| msg145063 - (view) | Author: Lars (lars) | Date: 2011-10-07 08:50 | |
I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows:
count_total = Counter()
for doc in documents:
count_current = Counter(analyze(doc))
count_total += count_current
count_per_doc.append(count_current)
Performance was horrible. After some digging, I found out that Counter [2] does not have __iadd__ and += copies the entire left-hand side in __add__. I've attached a patch that fixes the issue (for += only, and I've not patched the testsuite.)
[1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af
|
|||
| msg145065 - (view) | Author: Petri Lehtinen (petri.lehtinen) * ![]() |
Date: 2011-10-07 08:58 | |
This is slightly backwards incompatible, as some people may depend on the old behavior. Otherwise sounds like a good idea to me. |
|||
| msg145067 - (view) | Author: Lars (lars) | Date: 2011-10-07 09:17 | |
If this is not implemented because it is backwards incompat, then it might be useful to add a note to update's docstring explaining that it is much more efficient than +=. I was very surprised that it took *minutes* to add a few thousand moderate-sized Counters. |
|||
| msg145071 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2011-10-07 11:49 | |
I'll add the in-place methods including __iadd__, __isub__, __iand__, and __ior__. If speed is your issue, you should continue to use the update() method which will always be faster because it doesn't have a step to strip zeros and negative values from the existing Counter. Also, update() has a fast-path written in C. |
|||
| msg145959 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2011-10-19 20:40 | |
New changeset 5cced40374df by Raymond Hettinger in branch 'default': Issue #13121: Support in-place math operators for collections.Counter(). http://hg.python.org/cpython/rev/5cced40374df |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:22 | admin | set | github: 57330 |
| 2011-10-19 20:41:20 | rhettinger | set | status: open -> closed resolution: fixed |
| 2011-10-19 20:40:49 | python-dev | set | nosy:
+ python-dev messages: + msg145959 |
| 2011-10-19 05:06:59 | rhettinger | set | files: + counter.diff |
| 2011-10-07 11:49:28 | rhettinger | set | priority: normal -> low messages: + msg145071 |
| 2011-10-07 09:17:07 | lars | set | messages: + msg145067 |
| 2011-10-07 09:11:35 | mark.dickinson | set | type: behavior -> enhancement |
| 2011-10-07 09:11:18 | mark.dickinson | set | assignee: rhettinger |
| 2011-10-07 08:58:13 | petri.lehtinen | set | versions:
+ Python 3.3, - Python 3.4 nosy: + rhettinger, petri.lehtinen messages: + msg145065 stage: patch review |
| 2011-10-07 08:50:33 | lars | create | |

