selecting a random item from a set
Alexander Schmolck
a.schmolck at gmx.net
Tue Jan 6 23:23:02 EST 2004
More information about the Python-list mailing list
Tue Jan 6 23:23:02 EST 2004
- Previous message (by thread): My Bad!
- Next message (by thread): selecting a random item from a set
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[sorry for the late reply the server here had been down a couple of days and then I was busy] "Tim Peters" <tim.one at comcast.net> writes: > [Tim, asks which algorithms require random selection, as opposed > to arbitrary selection, from a set] > > [Alexander Schmolck] > > Feature selection schemes, for example? > > Eh? That's like "numerical methods" <wink/frown>. IOW, too vague. Well, pretty much any scheme where the feature space is to large to explore exhaustively, IOW from forward-backward feature selection to markov chain monte carlo. If that's still to vague, I'd be happy to provide more detail. > You can do it in O(1) space now at essentially full C speed via, e.g., > > i = random.randrange(len(some_set)) > random_elt = itertools.islice(some_set, i, None).next() > some_set.remove(random_elt) > > That tricks it into finding the starting point with a C loop, and then the > iterator is discarded after extracting its first element. Yes, this is a much better (more readable and efficient) > But (of course) that still takes average time proportional to the # of > elements in the set. Well, that wishes away a detail: this can actually be > arbitrarily expensive, as there's no bound on how sparse a Python dict can > get, and the C dict iteration code has to visit every slot, occupied or not, > to find "the next" occupied slot. Sure. > If the sets are small, you're not going to beat random.choice(tuple(s)). Yep, I had a vague feeling this might be the case and because it's also simple and clear, so that's in fact what I used (almost -- I did list(s), which ought to be only marginally slower I would think). > If the sets are large, low-level optimization of an approach with the wrong > O() behavior is just a monumentally bad idea. Unlikely, for the current application they should be < 100 elements. So I guess I shall stick to the tuple-cast for the time being. Thanks -- this has been quite informative (and the other suggestions you made might still come in handy at some point). 'as
- Previous message (by thread): My Bad!
- Next message (by thread): selecting a random item from a set
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list