Issue37624
Created on 2019-07-18 20:51 by Ted Whalen, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 14854 | closed | aldwinaldwin, 2019-07-19 03:36 | |
| PR 14855 | merged | rhettinger, 2019-07-19 07:47 | |
| PR 14858 | merged | miss-islington, 2019-07-19 08:57 | |
| Messages (6) | |||
|---|---|---|---|
| msg348128 - (view) | Author: Ted Whalen (Ted Whalen) | Date: 2019-07-18 20:51 | |
The behavior of random.choices when negative weights are provided is unexpected. While giving a negative weight for one value is probably bad, it's really unfortunate that providing a negative weight for one value affects the probability of selecting an adjacent value. Throwing a ValueError exception when there was a use of negative weights was considered in #31689, but at that time, there wasn't an example provided that failed silently. Note below that providing a weight of -1 for 'c' causes both 'c' and 'd' to drop out of the results. Python 3.7.2 (default, Jan 13 2019, 12:50:01) [Clang 10.0.0 (clang-1000.11.45.5)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from collections import Counter >>> from random import choices >>> Counter(choices("abcdefg", weights=(1,1,-1,1,1,1,1), k=10000)) Counter({'f': 2040, 'a': 2019, 'e': 2017, 'g': 2009, 'b': 1915}) |
|||
| msg348140 - (view) | Author: Aldwin Pollefeyt (aldwinaldwin) * | Date: 2019-07-19 03:18 | |
This is what happens with your weights: >>> list(itertools.accumulate(weights)) [1, 2, 1, 2, 3, 3, 4] using bisect.bisect certain amount of times, will distribute on this: a<1<b<2<c<1<d<2<e<3<f<4<g random numbers between 1 and 2 will go to 1<'b'<2 and not 'd' that is also 1<'d'<2 As discussed in issue31689, if no negative weights should be used, then a ValueError exception should be re-concidered. |
|||
| msg348144 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2019-07-19 06:51 | |
We can add a note the the docs that weights are assumed to be non-negative, but I don't want to add an extra O(n) step to check for unusual inputs with undefined meaning -- that would just impair the normal use cases for near zero benefit. |
|||
| msg348154 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2019-07-19 08:56 | |
New changeset 8dbe563aa6bd18b8c581951537dfb406d36e8063 by Raymond Hettinger in branch 'master': bpo-37624: Document weight assumptions for random.choices() (GH-14855) https://github.com/python/cpython/commit/8dbe563aa6bd18b8c581951537dfb406d36e8063 |
|||
| msg348156 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2019-07-19 09:17 | |
New changeset a50a6225a06e5a83ce2a880a7eb4496043fdbb55 by Raymond Hettinger (Miss Islington (bot)) in branch '3.8': bpo-37624: Document weight assumptions for random.choices() (GH-14855) (GH-14858) https://github.com/python/cpython/commit/a50a6225a06e5a83ce2a880a7eb4496043fdbb55 |
|||
| msg348158 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2019-07-19 09:27 | |
Misc side notes: * There is no expected behavior for negative a negative weight. Arguably, the only reasonable interpretation is what it already does (reduce the cumulative total). * Running simulations is a primary use case for choices(). Generally, running time there is important. * During the design phase, none of the other implementations studied had incorporated a scan of the inputs for negative weights. * bisect() does not check to make sure its inputs are sorted. The results for unsorted data are undefined. It is a documented precondition. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:18 | admin | set | github: 81805 |
| 2019-07-19 09:27:50 | rhettinger | set | status: open -> closed resolution: fixed messages: + msg348158 stage: patch review -> resolved |
| 2019-07-19 09:17:38 | rhettinger | set | messages: + msg348156 |
| 2019-07-19 08:57:01 | miss-islington | set | pull_requests: + pull_request14648 |
| 2019-07-19 08:56:56 | rhettinger | set | messages: + msg348154 |
| 2019-07-19 07:47:33 | rhettinger | set | pull_requests: + pull_request14645 |
| 2019-07-19 06:51:49 | rhettinger | set | priority: normal -> low nosy:
+ docs@python assignee: docs@python |
| 2019-07-19 03:36:12 | aldwinaldwin | set | keywords:
+ patch stage: patch review pull_requests: + pull_request14644 |
| 2019-07-19 03:19:23 | xtreak | set | nosy:
+ rhettinger, mark.dickinson |
| 2019-07-19 03:18:54 | aldwinaldwin | set | nosy:
+ aldwinaldwin messages: + msg348140 |
| 2019-07-18 20:51:27 | Ted Whalen | create | |
