programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
• Campbell Ritchie
• Devaka Cooray
• Ron McLeod
• Paul Clapham
• Liutauras Vilda
Sheriffs:
• paul wheaton
• Jeanne Boyarsky
• Tim Cooke
Saloon Keepers:
• Stephan van Hulst
• Tim Holloway
• Tim Moores
• Mikalai Zaikin
• Carey Brown
Bartenders:

# Picking random items from a large list to create a smaller list

Sheriff
Posts: 17633
300
• Number of slices to send:
Optional 'thank-you' note:
I was working on something today and ended yak shaving my way to this thread in SO: https://stackoverflow.com/questions/31703226/how-to-get-random-objects-from-a-stream

This problem is inspired by that thread. I added my own stipulations to the spec to eliminate some ambiguity.

Goal: Create a new list of specified size by randomly selecting items from another list.

1. The new list will have a specified length, N.
2. The source list must have at least N elements.
3. The new list elements must not be in the same order as they are in the source list (unless they got that way by some astronomically improbable coincidence, in the same way the Bogosort algorithm could eventually produce a sorted list)
4. If the source list has duplicates elements, the new list may contain at most the same number of duplicates; it may have none of the duplicated element.

Method:

Marshal
Posts: 78698
374
• Number of slices to send:
Optional 'thank-you' note:
Sounds a bit like a shuffling algorithm. Suggestion:-Lines 5 and 6 can readily be combined.

Marshal
Posts: 8831
631
• Number of slices to send:
Optional 'thank-you' note:

Junilu Lacar wrote:2. The source list must have at least N elements.

Isn't this redundant? N could be 0.

The source list must have at least 0 elements.

Ok, these are natural numbers.

Liutauras Vilda
Marshal
Posts: 8831
631
• Number of slices to send:
Optional 'thank-you' note:

Picking random items

Junilu Lacar wrote:3. The new list elements must not be in the same order as they are in the source list.

Hm.. This requirement also makes me stop for a while. Because now we kind of limiting randomization if I could say like that, sort of have to influence it in some sense.

Marshal
Posts: 28009
94
• 1
• Number of slices to send:
Optional 'thank-you' note:
Requirement 3 is a bit sketchy given the possibility of duplicates. Or N=1. But how about this:

1. Copy the original list into a new list.
2. Shuffle that new list randomly.
3. Truncate it to the first N elements.

But I must be missing something because the SO posters were fighting with much more complicated solutions.

Campbell Ritchie
Marshal
Posts: 78698
374
• Number of slices to send:
Optional 'thank-you' note:
If you want a non‑small list, such that N! > 2⁶⁴, you will run into the problem that you won't be able randomly to choose all combinations with a Random object, which has a theoretical limit set by the size of the long datatype. I don't know whether there are any classes in this package that will overcome this problem. You could consider a new Random object for each run of the loop, but that will become cumbersome quite quickly for large loops.

My JShell wrote:jshell> long factorial(int i)
...> {
...>     return i == 0 ? 1 : i * factorial(i - 1);
...> }
|  created method factorial(int)

jshell> factorial(21)
\$2 ==> -4249290049419214848

jshell> factorial(20)
\$3 ==> 2432902008176640000

Note overflow error at 21. I should have written 1L rather than 1.

Junilu Lacar
Sheriff
Posts: 17633
300
• Number of slices to send:
Optional 'thank-you' note:
For this one, think identity equality. That is, for any element I in the new list, there can be only 1 element in source list where == holds true.

On second thought, this might not work for immutable cached objects like Strings that are in the String pool. In that case, if the source list were

List<String> sourceList = List.of("Apple", "Banana", "Cat", "Dog", "Apple")

the new list could have at most two "Apple" in it and depending on its size, it could have no "Apple" in it, say if you specified 3 as the new list size, you could conceivably get

{"Dog", "Banana", "Cat"} as the new list

but never

{"Apple", "Apple", "Apple"}

because the source list only has two "Apple" in it.
Does that clarify things?

Campbell Ritchie
Marshal
Posts: 78698
374
• Number of slices to send:
Optional 'thank-you' note:
Do we have such a constraint? My suggestion would only allow duplicates in the destination if the source contains duplicates, and no more of them than in the sourtce. But my idea of a List is that it implicitly allows duplicates.

Junilu Lacar
Sheriff
Posts: 17633
300
• Number of slices to send:
Optional 'thank-you' note:

Campbell Ritchie wrote:only allow duplicates in the destination if the source contains duplicates, and no more of them than in the sourtce.

I thought that's exactly what I said or at least meant to say.  Basically, pickRandom(5, List.of(1, 2, 3, 4, 1, 2, 5)) could return {4, 1, 5, 2, 1} where the first 1 is the 5th element of the source list and the second 1 is the first element of the source array. The 2 in the new list could be the second 2 in the source list.

What I'm really getting at is that the picking algorithm should never choose the same index from the source list more than once.

EDIT: (fixed typo)

Junilu Lacar
Sheriff
Posts: 17633
300
• Number of slices to send:
Optional 'thank-you' note:

Liutauras Vilda wrote:

Picking random items

Junilu Lacar wrote:3. The new list elements must not be in the same order as they are in the source list.

Hm.. This requirement also makes me stop for a while. Because now we kind of limiting randomization if I could say like that, sort of have to influence it in some sense.

Ok, the elements could be in the same order they appear in the source list by some cosmic coincidence, but in the same way that the Bogo sort algorithm could produce a sorted list.

Junilu Lacar
Sheriff
Posts: 17633
300
• Number of slices to send:
Optional 'thank-you' note:

Paul Clapham wrote:Requirement 3 is a bit sketchy given the possibility of duplicates. Or N=1. But how about this:

1. Copy the original list into a new list.
2. Shuffle that new list randomly.
3. Truncate it to the first N elements.

But I must be missing something because the SO posters were fighting with much more complicated solutions.

That's exactly what I did. I don't know why they were arguing about more complicated solutions either, maybe it's the same thing going on with the Monty Hall problem, we start out with a solution that's way more complicated than it needed to be. It's a curious phenomenon but one that we see often, and not just limited to code posted in the Beginner's forum.

Liutauras Vilda
Marshal
Posts: 8831
631
• Number of slices to send:
Optional 'thank-you' note:

Paul Clapham wrote:Requirement 3 is a bit sketchy given the possibility of duplicates. Or N=1. But how about this:

1. Copy the original list into a new list.
2. Shuffle that new list randomly.
3. Truncate it to the first N elements.

But I must be missing something because the SO posters were fighting with much more complicated solutions.

This sounds like a plan, but, I just read thread's title once again, which reads as: "Picking random items from a large list to create a smaller list".

So that "large" might be important and be a huge factor in performance. But nobody said performance requirements, so we might be fine :)

Paul Clapham
Marshal
Posts: 28009
94
• Number of slices to send:
Optional 'thank-you' note:
If it's not necessary to preserve the original list then you could just take the list, shuffle it, and truncate it.

But the time and memory usage is obvious here. If you write functional code with streams then the memory usage is obscure, as is the time required to execute.

Junilu Lacar
Sheriff
Posts: 17633
300
• Number of slices to send:
Optional 'thank-you' note:
This where the idea of using a BitSet came from yesterday. I figured a BitSet is a relatively efficient data structure so if the new list is relatively much smaller than the source list, it might be a good idea to try to generate as many random indices as needed first, then pick out those specific items from the source list. This way, you wouldn't need to copy and shuffle large lists.

Here's the logic I had:

Campbell Ritchie
Marshal
Posts: 78698
374
• Number of slices to send:
Optional 'thank-you' note:

Junilu Lacar wrote:. . . the Bogo sort algorithm could produce a sorted list.

BogoSort will produce a sorted list if you wait long enough.

Saloon Keeper
Posts: 15276
350
• Number of slices to send:
Optional 'thank-you' note:

Campbell Ritchie wrote:BogoSort will produce a sorted list if you wait long enough.

I don't think that's true.

The limit of the chance of producing a sorted list as the number of shuffles approaches infinity equals 100%. That does not mean it will happen in a finite amount of shuffles.

Campbell Ritchie
Marshal
Posts: 78698
374
• Number of slices to send:
Optional 'thank-you' note:
If you use the version of bogosort that takes all possible permutations, all n! of them, it will produce a sorted list. The examples I have seen of bogosort have been this sort.
I think you meant the version of bogosort that randomises the list. Using java.util.Random, to sort cards (n = 52), the randomisation process goes through a sequence 2⁶⁴ long of possible shuffles, and there is a probability ≥ 99.999999999999999999999999999999999999999999999977(approx)% that the correct shuffle will never occur.

Stephan van Hulst
Saloon Keeper
Posts: 15276
350
• Number of slices to send:
Optional 'thank-you' note:
Do you have such an example somewhere, Campbell?

The entire point of Bogosort is that it randomizes the list. If it doesn't, it's not Bogosort.

Campbell Ritchie
Marshal
Posts: 78698
374
• Number of slices to send:
Optional 'thank-you' note:

Stephan van Hulst wrote:. . . it randomizes the list. If it doesn't, it's not Bogosort.

Thank you; I was obviously mistaken, or confused. I hope I got the right number of 9s in my percentage, That shows the chances of it running in O(∞) time with a 46‑but randomness generator.
I found some sites saying it runs in O(n!) time (link) and, “Worst case time complexity: O((n+1)!)” (link), or,“For any collection of fixed size, the expected running time of the algorithm is finite for much the same reason that the infinite monkey theorem holds...” (Wikipedia), but for a naïve implementation using an ordinary random number generator, only Geeks for Geeks seems to have the pessimistic outlook I had earlier. “Worst Case : O(∞) (since this algorithm has no upper bound)    Average Case: O(n*n!)    Best Case : O(n)(when array given is already sorted)”

Stephan van Hulst
Saloon Keeper
Posts: 15276
350
• Number of slices to send:
Optional 'thank-you' note:
I have no idea where the second link gets a worst case time complexity class O((n+1)!) from. The worst case is obviously that the array will never get sorted.

Master Rancher
Posts: 4661
63
• Number of slices to send:
Optional 'thank-you' note:

Campbell Ritchie wrote:I hope I got the right number of 9s in my percentage,

You did.  Or at least, we got the same answer.  But it's easier to just say the chance of success is 2.287e-49.  Because no one wants to count nines in the chance of failure.

Campbell Ritchie wrote:That shows the chances of it running in O(∞) time with a 46‑but randomness generator.

I'm going to have to take off points on your landing, there.

But of course you aren't limited to java.util.Random - you could use a SecureRandom with a much longer seed, and use getBytes() to get something longer than a long.

Not that it matters that much, because no one would actually use a bogosort for this - it's just a theoretical nightmare.

Campbell Ritchie
Marshal
Posts: 78698
374
• Number of slices to send:
Optional 'thank-you' note:

Mike Simmons wrote:. . . I'm going to have to take off points on your landing, there. . . .

Damn! I went from 6.4 to 4.6.
You need a much longer seed to have any chance of keeping up with 52!. Or even 25!; bogosort for a half‑pack of cards would take many times longer than the known age of the Universe.

Mike Simmons
Master Rancher
Posts: 4661
63
• Number of slices to send:
Optional 'thank-you' note:

Campbell Ritchie wrote:You need a much longer seed to have any chance of keeping up with 52!. Or even 25!; bogosort for a half‑pack of cards would take many times longer than the known age of the Universe.

As far as the seed, you'd need at least 226 bits in your seed to handle 52! possibilities.  That's doable, though, since SecureRandom allows you to set the seed with a byte array.  The bigger issue, by far, is the whole age-of-the-universe issue.  Unless there's the possibility of a quantum computing bogosort, trying all the possibilities at once...

Campbell Ritchie
Marshal
Posts: 78698
374
• Number of slices to send:
Optional 'thank-you' note:

Mike Simmons wrote:. . . . Unless there's the possibility of a quantum computing bogosort . .

When I searched for bogosort earlier today, I think I did see mention of a quantum algorithm. It obviously changes the algorithm from an ordinary spaceship to the Heart of Gold.
[Additional]This would appear to be what I saw earlier. It does however say, “just a theoretical concept.” This link is probably no better.

Mike Simmons
Master Rancher
Posts: 4661
63
• Number of slices to send:
Optional 'thank-you' note:

Campbell Ritchie wrote:When I searched for bogosort earlier today, I think I did see mention of a quantum algorithm. It obviously changes the algorithm from an ordinary spaceship to the Heart of Gold.

Powered by the Infinite Improbability Drive.  Nice.

 The harder you work, the luckier you get. This tiny ad brings luck - just not good luck or bad luck. a bit of art, as a gift, that will fit in a stocking https://gardener-gift.com
reply
Similar Threads