In the similar situations I've run, what I've often done is: 1. Start with a kno...

In the similar situations I've run, what I've often done is:

1. Start with a known seed. 2. Run a single test run like what you say, and verify this run by hand. 3. Freeze the test in this state, that is, assert you get that exact result every time on the given seed.

What this creates is not what I would strictly speaking call a "unit test", but it does sort of pin the algorithm to your examined and verified output. In this case, a human would quite likely have caught this problem on a decent test set. Obviously, there are other pathologies that would slip right by a human; the human being careful only raises the bar for such pathologies, it doesn't completely eliminate them.

But at least freezing it solves the problem where a change you did not realize would be a change slips by unnoticed and this function suddenly has a completely different outcome.

This has worked for me, in the sense it has caught a couple of bugs that would have had non-trivial customer implications. But I've never worked on an MMORPG or anything else where randomness was intrinsic to my problem; it has always been incidental, like, is my password generation algorithm correct and does this sample of my data look like what I expect, not the core of my system.