Issue24068
Created on 2015-04-28 08:53 by wolma, last changed 2022-04-11 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| statistics._sum.patch | wolma, 2015-04-28 08:56 | review | ||
| statistics._sum.v2.patch | wolma, 2015-05-02 20:05 | review | ||
| Messages (6) | |||
|---|---|---|---|
| msg242169 - (view) | Author: Wolfgang Maier (wolma) * | Date: 2015-04-28 08:53 | |
the mean function in the statistics module gives nonsensical results with boolean values in the input, e.g.: >>> mean([True, True, False, False]) 0.25 >>> mean([True, 1027]) 0.5 This is an issue with the module's internal _sum function that mean relies on. Other functions relying on _sum are affected more subtly, e.g.: >>> variance([1, 1027, 0]) 351234.3333333333 >>> variance([True, 1027, 0]) 351234.3333333334 The problem with _sum is that it will try to coerce its result to any non-int type found in the input (so bool in the examples), but bool(1028) is just True so information gets lost. I've attached a patch preventing the type cast when it would be to bool. I don't have time to write a separate test though so if somebody wants to take over .. :) |
|||
| msg242362 - (view) | Author: R. David Murray (r.david.murray) * ![]() |
Date: 2015-05-02 01:00 | |
I wonder if it would be better to reject Bool data in this context? Bool is only a numeric type for historical reasons. |
|||
| msg242370 - (view) | Author: Steven D'Aprano (steven.daprano) * ![]() |
Date: 2015-05-02 02:20 | |
The patch seems simple and straightforward enough. It just needs some tests, and a Round Tuit. |
|||
| msg242428 - (view) | Author: Wolfgang Maier (wolma) * | Date: 2015-05-02 20:05 | |
uploading an alternate, possibly slightly clearer version of the patch |
|||
| msg242451 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2015-05-03 06:09 | |
> I wonder if it would be better to reject Bool data in this context? It's not uncommon (and quite useful) in NumPy world to compute basic statistics on arrays of boolean dtype: the sum of such an array gives a count of the `True`s, and the mean gives the proportion of `True` entries. I think it would be handy to allow the statistics module to work with lists of bools, if possible. |
|||
| msg315095 - (view) | Author: Wolfgang Maier (wolma) * | Date: 2018-04-08 20:03 | |
Fixed as part of resolving issue 25177. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:16 | admin | set | github: 68256 |
| 2018-04-08 20:03:14 | wolma | set | status: open -> closed resolution: fixed messages: + msg315095 stage: test needed -> resolved |
| 2016-05-02 21:41:14 | r.david.murray | link | issue26913 superseder |
| 2015-05-20 12:48:52 | della | set | nosy:
+ della |
| 2015-05-11 06:25:01 | rhettinger | set | nosy:
+ rhettinger |
| 2015-05-03 06:09:27 | mark.dickinson | set | nosy:
+ mark.dickinson messages: + msg242451 |
| 2015-05-02 20:05:30 | wolma | set | files:
+ statistics._sum.v2.patch messages: + msg242428 |
| 2015-05-02 02:20:49 | steven.daprano | set | stage: test needed |
| 2015-05-02 02:20:27 | steven.daprano | set | assignee: steven.daprano messages: + msg242370 |
| 2015-05-02 01:00:15 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg242362 |
| 2015-04-28 08:56:15 | wolma | set | files: + statistics._sum.patch |
| 2015-04-28 08:54:54 | wolma | set | files: - statistics._sum.patch |
| 2015-04-28 08:53:26 | wolma | create | |
