>It's always impossible to tell, for any given sequence, whether it was produced by a fair die.
If the sequence is long enough you can model how likely it is to have been produced by a fair die. Are all numbers equally distributed? Are some numbers more likely to follow or not follow other numbers? Are some patterns repeating?
Of course any sequence can be produced by a fair die, but you can still create some objective metric that will tell you how truly random a sequence is, and the longer it is the more accurate it will be. It's what tests like Diehard are about after all.
Can you roll a fair die a thousand times and only get 6s? Well yes of course. It'll never happen though.
>Can you roll a fair die a thousand times and only get 6s? Well yes of course. It'll never happen though.
Somewhat related, but humans are also REALLY bad at generating randomness and one of our big tells is an aversion to repeats. If you ask someone to pick 0-9 randomly, repeatedly, they will rarely repeat numbers. But in a truly random sample a repeat is likely 10% of the time, and a three-peat will happen roughly once every 100 numbers. An average person will rarely if ever repeat a number and sure as heck won't ever come up with it three times in a row.
Yeah but even then most people cap the repeats if they’re faking. If 100 people have to roll 10 random numbers, it is very likely some people will have 3-4 in a row of the same number. It’s less likely none will. My brother in law teaches stats and he runs this scenario with a seminar he teaches. He then runs the results through a simple formula he built in excel and can ascertain who faked vs. did it for real with about 95% certainty IIRC. I love little games like that haha
And funnily enough, you'll often hear this trait as being desirable in a pseudo-random number generator. People often want something that will jump around fairly unpredictably but that will come close to outputting all possible numbers once before getting into re-runs.
Quasi-Monte Carlo has a rate of convergence close to O(1/N), whereas the rate for the Monte Carlo method is O(N^(−0.5))
For such applications it's best to use quasi-random numbers (a.k.a. low-discrepancy sequences) such as the Halton sequence or the Sobol sequence instead of pseudorandom numbers.
Thank you for the link - I had not heard of this kind of sequence. It looks like something I'd like to know about, but I think it's beyond my schoolboy-level mathematical abilities. Anyway, I guess I'll have a peek in the rabbit-hole.
I’ve coded up this exact algorithm, it’s really fun. It’s useful for “shuffling” in the music sense (not the cards sense).
I think it’s actually the prototypical real-world software engineering problem. User says they want X (random music). X is a term in software, so you give them that (you get a random song). They’re not happy. You dig and find out they really want A, B, and C (next song is unknown, songs don’t repeat too soon or too infrequently). This new problem is harder to verify (how soon is too soon?).
Editing in tips on solving this sort of problem. You can turn vague requirements into precise requirements. Rather than make the precise requirements exactly equivalent to the vague ones, it's easier to make them more restrictive. Is playing a song again within 50% of the length of the playlist "too soon"? Maybe. How about within 80% of the length of the playlist? Definitely not. We can give ourselves the requirement "Songs must always play again between 80% and 125% of the length of the playlist." Much easier to solve, much easier to test.
Sometimes the extra restriction make the problem harder (not usually I've found). Still, this is a great trade because understanding requirements is harder than solving well defined problems.
[To the point of this whole post] Requirements can be turned into testable properties even if it's not programmatic. "When I look at a list of chosen songs, there must be no obvious patterns." Who says what's obvious? You do! Then, have someone else do the same.
Consider extreme cases. Extreme cases tend to be the most or least important. If they're least important, create a new set of easier requirements or drop it all together. "If 3 - 10 songs, always play within double the playlist, no obvious patterns, never twice in a row. If 2 alternate, if 1 repeat."
I actually had an engineering ask from somebody to produce "random sequences" which turned out to be anything but random. Short version: letter sequences, no repeats in a sequence or in adjacent sequences.
Took months and a lot of patience to extract that crucial piece of information from the customer. I actually coded a recursive algorithm which to my surprise generated every possible (three letter) sequence in an acceptable sequence, enough for them for several lifetimes.
> Extreme cases tend to be the most or least important.
There's an overlap in meaning between random, strange and unknown. "This random guy came up to me..." Keeping that in mind helps when talking to people about "random".
What is an example of a situation in which this is desirable:
> come close to outputting all possible numbers once before getting into re-runs
In a dice rolling game, you want as close to true random as your PRNG can get. In card drawing, you typically want EXACTLY all possible cards once before getting "repeats". Where do you want something in between?
Last time I encountered this was creating random IDs for things, either as a random string of characters or when selecting from lists of attributes and animals like all the tools that'll name things like "curious possum". It's not actually a hard requirement to hit every possibility before repeating, but if you do see 2 or 3 clusters in a sample it gives people the impression it isn't random.
> Can you roll a fair die a thousand times and only get 6s?
Anecdote time:
A few years ago with my friends we were discussing how easy is to roll 5 dices simultaneously and get the same result in all of them. This is a possible way to win a popular game here https://en.wikipedia.org/wiki/Generala We estimated how often you can roll the dices and the probability, and we estimated that you must try during 2 or 3 hours to get that result.
We were young, had a lot of free time, so one of my friend started rolling 5 dices while he was talking with us and eating. After about 2 hour he got the 5 equal dices in a roll. (IIRC he tried again, with a similar result.)
(Note that 2 or 3 hours is consistent with how this outcome is used in the game to win automatically. It's possible to get this in a normal game, but it's not super common.)
Also, each time you add a dice, the time increase exponentially. With 6 dice it's 14-21 hours, like a day. With 7 dice it's like a week. With 9 dice is like a month. With 10 dice is half a year. 1000 dice will take longer (and x6 if you want to get only 6s instead of any repeated number).
(Note that parallelizing this to all humans only adds 14 dices. If everyone on Earth start rolling 24 dices, it will take like half a year until one of us get 24 equal dices.)
Any single sequence of numbers has the exact same probability of being produced by a fair die (that's how the definition of "fair" goes). The probability of getting all 6 is the same of any other one you get.
For sure, but I think it's more helpful to this about "classes" of sequences. Sequences which have a uniform distribution of digits (within some margin of error), sequences which do not have repeating patterns, sequences that do not contain the same digit twice in a row etc... Any single one of these sequence is as likely as any other, but some "classes" are vastly bigger (and therefore, more probable) than others. By deciding which classes of sequence any result belongs to, you can decide if it's likely to have been produced by a fair die or not.
This intersects with the concept of entropy: assuming that you have a box containing a gas whose particles move randomly about the volume of the box, then at some point you take a snapshot of the position of every single particle in the box and you discover that they're all in the right half of the box, the left side being in a vacuum. Would you assume that it's just random chance? It could be. It certainly isn't.
Meanwhile any of the trillions and trillions of snapshots showing particles more or less uniformly distributed within the box are all more "random looking" and are what is expected from such an experiment. These configurations as a group occupy the vast majority of the phase space for the contents of the box.
That's true given that we have a fair die. But the question is, given some results, what is the probability that the die is fair? And there are statistical tools for that.
You work with crypto according to your profile. I hope that when you see a random generator return a series of a hundred 6s, you go and check what's wrong with it instead of assuming you just got lucky this time ;-)
This is much more relevant in crypography than on statistiscs. If your PRNG always returns the same 4, it's buggy, but the really problematic outputs look exactly like random numbers.
(Looks like my profile is out of date, by the way.)
We disagree on the definition (or divination) of "dice" and "sequence" and "roll".
The sequence length of a single die roll is 1. The probability of any particular roll is the same.
Let's do the next part with two-sided dice because it will be quicker.
The sequence length of a roll of two dice is 2. The roll is a bag; the probability of either [0] or [1] is lower than the probability of [1,2] because it's a set membership and [1,2] has the same members and is the same as [2,1]
If the sequence is long enough you can model how likely it is to have been produced by a fair die. Are all numbers equally distributed? Are some numbers more likely to follow or not follow other numbers? Are some patterns repeating?
Of course any sequence can be produced by a fair die, but you can still create some objective metric that will tell you how truly random a sequence is, and the longer it is the more accurate it will be. It's what tests like Diehard are about after all.
Can you roll a fair die a thousand times and only get 6s? Well yes of course. It'll never happen though.