Random distributions in programming and video games • Minimum Viable Blog

Uniform distribution

If you roll a 20-sided die (commonly called a d20), there’s an equal chance of rolling any number between 1 and 20. Over the course of infinite rolls, you’ll roll about the same number of 1s, 2s, 3s, and so on. That’s called a “uniform distribution.” The d20, popularized by Dungeons and Dragons, is the best and worst thing to happen to tabletop gaming: the best because it creates moments of triumph and despair in otherwise ho-hum games, and the worst because it creates periods of boredom in otherwise exciting games.

I’ve heard the d20 described as “swingy.” That’s a good way to think about it: you’re equally as likely to roll a 20 (extraordinary success) as a 1 (devastating failure) or a 9 (mediocre result). Your “luck” is constantly swinging around. In fact, luck is barely relevant. If you roll three 20s in a row you might feel unstoppable, but three 20s is no less likely than three 10s. You could go a whole session without seeing a 20, or it could be the only thing you roll. True randomness is only uniform on a large scale; one person rolling one die at one table isn’t guaranteed any kind of fairness. No wonder tabletop gamers develop so many superstitions.

A d20 is easy to simulate in basically every programming language:

final rnd = Random();
final d20 = rnd.nextInt(20) + 1;

If you graph the distribution, it looks like this:

A cartesian graph in the I quadrant with a straight line running horizontally rightward from halfway up the Y axis, and shaded green underneath it down to the X axis. The X axis is labeled with a d20 on each end, with the 1 face showing on the left side and the 20 face showing on the right. Several arrows are pointing to the horizontal line from above, with a handwritten label that reads 'Every result equally likely on every roll'

You should be booing and hissing right now, but I’ll get to that.

Video games took this mechanic and ran with it. Random Number Generation, or RNG, is so hot right now. It’s always been part of games, but it’s more prominent and popular than ever, partially (I suspect) because it’s so easy to code. Upwards of 2,000 roguelike games were released on Steam last year—“roguelike” meaning you’re intended to play the game repeatedly, and certain elements, like enemies, powerups, and level layouts, are randomly chosen on each playthrough. Roguelike junkies like me will trudge through 20 disappointing rounds to feel the rush of a superb one, one where you get all the best powerups and your foes are helpless against you.

It’s okay for the dice to be swingy when the stakes are low: in a roguelike, you rarely lose more than 20 minutes because of a bad roll. You can always just start again. But when the stakes are high, it becomes a problem.

XCOM 2 and percent chance to hit

It’s time to make myself unpopular.

XCOM 2 is a bad video game. This is despite the fact that everything about it is awesome: the concept, the art, the story, the characters, everything. It’s a bad game because when you set up your soldiers to shoot at the aliens, it gives you a certain chance of hitting them, expressed as a percent. If your luck is bad, it doesn’t matter if your percentages are all 70% and up: you’ll miss all your shots, get slaughtered, and lose the level. And when your soldiers die in XCOM, they’re permanently dead. A few minutes of bad luck can spoil hours of careful gameplay.

Someone’s about to send me a strongly-worded email about how I’m “not playing the game right.” Let me save you the trouble: I am playing it right, and it’s a bad game.

Don’t get me wrong, I’m not opposed to random elements in games. They can add a lot of depth and replayability for relatively little effort. What I am opposed to is uniform random distributions in games where they don’t make sense.

What other choice is there? Well, with some math, you can get any distribution you want.

Triangular distribution

With a triangular distribution, you’re most likely to get values in the middle of the range (like 10 on a d20), and progressively less likely to get values toward the top and bottom of the range (like 1 and 20). It looks like this:

A cartesian graph in the I quadrant. The X axis is labeled with d20 dice from left to right on the 1 face, 10 face, and 20 face. An isosceles triangle is drawn in green with bottom corners on the extremes of the X axis and top corner centered above the middle, with a dotted line drawn down to the 10-faced die, dividing the triangle in half. An arrow points to the top corner with the label 'Most likely', and an arrow points to each bottom corner with the label 'Least likely.

A triangular distribution is what you get if you roll two d10 dice instead of a single d20. The highest possible result is still 20, but you’re ten times as likely to roll an 11, because there’s only one way for the dice to add up to 20 (10 + 10) and ten ways for them to add up to 11 (10 + 1, 9 + 2, etc.). This feels more natural. Just like in real life, it’s rare for things to go as poorly as possible or as well as possible. Most of the time they’re closer to mediocre. A triangular distribution still “feels” random, but it also feels fair.

For my money, some version of this should be the default in games. Middling results should be the norm. Spectacular results, for better or worse, should be rare. You should be able to strategize based on the mediocrity principle.

This is a little more complicated to code, but it introduces a technique you can use for any distribution you like: rejection sampling. In short, you plot your desired distribution on a two-dimensional graph, then generate two (uniform) random numbers, x and y, as coordinates on the same graph. If they’re outside the distribution, you reject them and try again. If they’re inside it, you keep x.

^{Of course, for one kind of triangular distribution you can just generate 2d10 and add them together, but that technique doesn’t generalize.}

Let’s say you want a random decimal number between 0 and 1, but you want the value 0.5 to be twice as likely as either 0 or 1, with linear probability in between. You could plot the distribution curve like this:

A cartesian graph in quadrant I. The X axis goes from 0 to 1, with a mark at 0.5, as does the Y axis. A house shape is drawn so that its peak is at x=0.5 and y=1, and its eaves are at y=0.5, x=0 and 1, and shaded green from the roof down to the X axis.

Think of this as two intersecting lines. The first one has a slope of 1, so its formula is y = 1x + 0.5 or simply y = x + 0.5. The second has a slope of -1, so its formula is y = -x + 1.5. Now you can code your RNG:

final rnd = Random();

double nextDoubleTriangular() {
  double x, y;
  do {
    (x, y) = (rnd.nextDouble(), rnd.nextDouble());
  } while (y > (x + 0.5) || y > (-x + 1.5));

  return x;
}

This is admittedly less efficient than just calling rnd.nextDouble(), and it admittedly invokes the halting problem, but it’s quick and reliable in practice.

Parabolic distribution

What if you want a smooth curve between the top and bottom of the range, so the center value is still the most likely, but the difference is less dramatic across the middle of the range? Like this:

A cartesian graph in the I quadrant. The X axis goes from 0 to 1, with a mark at 0.5, as does the Y axis. A wide parabola is drawn curving downward such that its peak is at x=0.5 and y=1 and it intercepts the Y axis at 0.5, and symmetrically, intercepts x=1 at y=0.5 on the other side. From the parabola down to the X axis is shaded green.

This is similar, you just have to tweak the rejection clause. The formula for this curve is y = -2(x - 0.5)² + 1.

^{Thanks to my wife for helping me derive this formula.}

final rnd = Random();

double nextDoubleTriangular() {
  double x, y;

  final plot = (x) => (-2 * pow(x - 0.5, 2)) + 1;

  do {
    (x, y) = (rnd.nextDouble(), rnd.nextDouble());
  } while (y > plot(x));

  return x;
}

This version is nice because you can tweak the coefficient -2 to make the curve drop off more or less sharply on each side, without affecting its centerpoint and maximum.

Micro-uniform distribution

^{I don’t know if there’s an official math name for this one, I’m just a programmer.}

Suppose a triangular or parabolic distribution isn’t right for your game. You really do need every result to be equally likely. But you still want it to feel fair—no one should get stuck rolling 1s, and no one should get stuck rolling 20s.

One way to do this is to take the gambler’s fallacy to an extreme: make it so across 20 rolls, every value from 1 to 20 comes up once:

List<int> possibleValues = [];

int nextD20Constrained() {
  if (possibleValues.isEmpty) {
    // Add all the integers from 1 to 20, inclusive, in random order
    possibleValues = [for (var i = 1; i <= 20; i++) i].shuffled();
  }

  return possibleValues.removeLast(); // You may know this function as pop()
}

It’s still a uniform distribution. It’s still, for most intents and purposes, random. But normal random number generators sometimes produce bullshit like 1 1 1 1 1 1 1 1 1 1 1 1, which isn’t fun for anyone, whereas this one makes sure your luck balances out in the short term.

If you want a slightly less constrained version, there are a couple ways to do it:

Double up the starting array so each number comes up twice per cycle. You’ll still get an even distribution across 40 calls, but a streak of good or bad luck can last a bit longer.
Don’t wait for the array to be empty before regenerating it; once you’re down to 5 items or so, go ahead and start over. Depending on the number you use, this can help you find a sweet spot between balanced results and true randomness.

Gaussian distribution

Here’s where I admit defeat. I know it’s possible to generate random numbers according to a bell curve, like this:

A cartesian graph in the I quadrant. The X axis goes from 0 to 1, with a mark at 0.5, as does the Y axis. A simple Gaussian bell curve is drawn such that its peak is at x=0.5 and y=1. Its tails intercept the X axis at x=0 and x=1. The area underneath the curve is shaded green, and a dotted line is drawn from its peak straight down to the X axis at x=0.5.

And I know all the formulas are right there on Wikipedia. But I haven’t taken a math class since high school calculus, so I can’t read them, and all the relevant Stack Overflow answers assume you know how to invert a CDF, which sounds like a sex thing. If you can write a function that accepts a σ and returns a normally-distributed random double, please send it over, I’d love to use it.

Returning to XCOM 2

I know XCOM 2 isn’t actually a bad game. It encourages a specific play style which I just can’t make myself adopt. But I think it would be an even better game if it used a curved distribution to determine hits and misses, even if the displayed percentages were curved as well and the difference were purely cosmetic. When a game says you have a 90% chance of success, failure should feel rare—or, at least, it should only happen 1 in 10 times in practice, not just in mathematical theory.

That’s my wish for game devs: stop using the uniform RNG built into your programming language unless you can articulate why. Triangular distributions aren’t hard to generate, and they feel more natural and fair. And if you really do need a uniform distribution, make sure it balances out on a reasonable timescale.