Message 316727 - Python tracker

Message316727

Author	PaulSFO
Recipients	PaulSFO, mark.dickinson, miss-islington, paul.moore, rhettinger, serhiy.storchaka, steven.daprano
Date	2018-05-15.20:56:58
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1526417818.58.0.682650639539.issue33494@psf.upfronthosting.co.za>
In-reply-to

Content
btw, this was my suggestion. Steven opened the issue on my behalf (I'm new). 1) Documentation change. I would suggest that, to this paragraph: "If neither weights nor cum_weights are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the population sequence. It is a TypeError to specify both weights and cum_weights." The following sentence be added: "A cum_weights sequence, if supplied, must be in strictly-ascending order, else incorrect results will be (silently) returned." [BTW, in the current documentation, when I read this sentence: :For example, the relative weights [10, 5, 30, 5] are equivalent to the cumulative weights [10, 15, 45, 50]," it wasn't clear to me that this was the format of the cum_weights argument. I thought that this conversion happened internally. So, I'd prefer that something more explicit be stated (especially the part about silently giving bad results).] 2) I'm giving up on suggesting a code change. However, I'll just respond that a) I believe that the big win of the cum_weights option is for people who already have the sequence in that form, rather than that they will save the O(n) cost of having the list built. b) If I have big list, but also an other-than-tiny k value (eg, k=100), then the time (with the change) would be 400 time the O(log n) plus one times O(n), so this may or may not be significant. c) I agree that, if someone did, eg, 400 separate calls, each with k=1, the cost would be higher. This seems unlikely to me but... thanks Paul Czyzewski

Content

btw, this was my suggestion. Steven opened the issue on my behalf (I'm new).

1) Documentation change.
 I would suggest that, to this paragraph:
"If neither weights nor cum_weights are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the population sequence. It is a TypeError to specify both weights and cum_weights."

The following sentence be added:
"A cum_weights sequence, if supplied, must be in strictly-ascending order, else incorrect results will be (silently) returned."

[BTW, in the current documentation, when I read this sentence:
:For example, the relative weights [10, 5, 30, 5] are equivalent to the cumulative weights [10, 15, 45, 50]," it wasn't clear to me that this was the format of the cum_weights *argument*.  I thought that this conversion happened internally.  So, I'd prefer that something more explicit be stated (especially the part about silently giving bad results).]

2) I'm giving up on suggesting a code change.  However, I'll just 
respond that
  a) I believe that the big win of the cum_weights option is for people who already have the sequence in that form, rather than that they will save the O(n) cost of having the list built.  
  b) If I have big list, but also an other-than-tiny k value (eg, k=100), then the time (with the change) would be 400 time the O(log n) plus one times O(n), so this may or may not be significant.
  c) I agree that, if someone did, eg, 400 *separate* calls, each with k=1, the cost would be higher.  This seems unlikely to me but...

thanks
  Paul Czyzewski

History
Date	User	Action	Args
2018-05-15 20:56:58	PaulSFO	set	recipients: + PaulSFO, rhettinger, paul.moore, mark.dickinson, steven.daprano, serhiy.storchaka, miss-islington
2018-05-15 20:56:58	PaulSFO	set	messageid: <1526417818.58.0.682650639539.issue33494@psf.upfronthosting.co.za>
2018-05-15 20:56:58	PaulSFO	link	issue33494 messages
2018-05-15 20:56:58	PaulSFO	create