btw, this was my suggestion. Steven opened the issue on my behalf (I'm new).
1) Documentation change.
I would suggest that, to this paragraph:
"If neither weights nor cum_weights are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the population sequence. It is a TypeError to specify both weights and cum_weights."
The following sentence be added:
"A cum_weights sequence, if supplied, must be in strictly-ascending order, else incorrect results will be (silently) returned."
[BTW, in the current documentation, when I read this sentence:
:For example, the relative weights [10, 5, 30, 5] are equivalent to the cumulative weights [10, 15, 45, 50]," it wasn't clear to me that this was the format of the cum_weights *argument*. I thought that this conversion happened internally. So, I'd prefer that something more explicit be stated (especially the part about silently giving bad results).]
2) I'm giving up on suggesting a code change. However, I'll just
respond that
a) I believe that the big win of the cum_weights option is for people who already have the sequence in that form, rather than that they will save the O(n) cost of having the list built.
b) If I have big list, but also an other-than-tiny k value (eg, k=100), then the time (with the change) would be 400 time the O(log n) plus one times O(n), so this may or may not be significant.
c) I agree that, if someone did, eg, 400 *separate* calls, each with k=1, the cost would be higher. This seems unlikely to me but...
thanks
Paul Czyzewski |