Random Level Generation

In this entry, I will give some insight into how the game’s random level generator works. The game is still on Steam Greenlight, your vote is very much appreciated.
(Remember that a graph is simply a collection of nodes with connections between them.)

When selecting the random level option in the main menu, the user is taken to this screen:


The left side contains a list of modes (these correspond to specific presets) and the right side has a list of additional options that the user can set. This allows players to disable features they don’t enjoy, or to ask the generator for a very specific level; maybe a level a friend played that they now want to play, too. This last usage is facilitated by so-called level descriptors that combine the seed with a few other parameters to make it possible to re-generate a specific level.

The level generation process can be roughly divided into three stages:

  1. generating the underlying graph (that is, the network of nodes)
  2. distributing information on that graph
  3. improving the level until it is solvable

Let’s go over the steps in order.

Generating Graphs

Each of the modes corresponds to a specific way to generate the underlying graph. Currently, none of the modes generates a truly random graph (but there is nothing stopping you from creating your own mode that does just that — modes can be loaded from external .NET/Mono assemblies). Instead, each node corresponds to a specific form of graph that can be generated given a requested approximate size for the level. Sometimes, a bit of randomness is then added to the graph by omitting edges. Here are some examples:



This is the Full Grids mode. Unsurprisingly, it generates square grids without omitting any edges (thus full). Square grids like this have an interesting property, namely that there are no triangles in the graph: No two neighboring nodes share a neighbor. This has certain implications for the levels. Last time, I said that I skipped a variant of radius information. This variant concerns connectivity of the nodes, that is, it tells you whether the pattern nodes within that radius are all connected or not. Since the neighbors of a single node are never all connected in this kind of graph, this information is meaningless for radius 1 information in this setting.

This is the Circles mode. As you can see, the generated level contains some directed edges, placed in a very regular fashion. The Circles mode may decide that a specific set of edges can only be used in one direction. For example, it could decide that all edge that go counter-clockwise cannot be used in this level.


This is the Random Hex Grids mode. In this mode, the nodes are arranged in a hexagonal fashion. Modes with the prefix random decide independently for each edge of the hex grid whether it should be included or not. This leads to graphs like this one, where some times both directions of an edge exist, sometimes only one direction is present, yet othertimes an edge that should be there for a true hex grid is missing.


This is the Triangle-Free mode. The graphs generated by this node share the amazing property that they contain neither triangles nor rectangles — no two neighboring nodes share a neighbor, and any two nodes (neighboring or not) share at most two neighbors. Furthermore, each node has exactly 3 edges going out, making this a very symmetric family of graphs. This makes levels generated in this mode often more difficult than average.

Generating Information

At this point, the layout of the level has been decided upon. The current implementation of the random level generator will now first assign colors to the nodes. Depending on the specific settings used, this can be done either complete randomly or based on the position of a node. For example, in the triangle-free mode screenshot above, only nodes in the lower half of the graph are violet.

When that is done, the level generator randomly distributes information on the graph, again according to the settings specified by the player (she could for example deactivate the user of color information). Additionally, a set of nodes that will be revealed initially is selected. The random level generator tries to make sure that no redundant information is added to the level, but does not completely forbid that either (since that would be quite hard to check in general). It will also try not to place information on the board that is completely trivial (for example, the lower left node in the triangle-free screenshot has two neighbors and states that both of them are pattern nodes. No thinking involved). Again, this is hard to avaoid in general.

You may have noticed that the options also contain a slider labeled Difficulty. The difficulty settings impacts this stage (and the next) of the generation in three ways:

  1. On a lower difficulty level, trivial information is less likely to be dropped (e.g., a node that has exactly 3 neighbors, all of which are in the pattern – no thinking required)
  2. A higher difficulty levels skews the distribution of the distance used for radius information towards higher numbers. The idea here is that a larger radius means that it is less likely that you can make any inferences from a single node alone. You will more likely have to combine multiple nodes to deduce anything.
  3. On a high difficulty level, information may be degraded by inserting inequalities instead of specifying the exact number of pattern nodes. For example, a node could state that there are at most 2 pattern nodes among its neighbors.

Improve Until Solvable

This is the hardest part. To start off, we have to check whether the puzzle at hand is even solvable at all. It turns out that this is a quite difficult problem to deal with (in fact, it is NP-hard). Loosely speaking, this means that the best known algorithms to solve these problems have exponential running time in the worst case — adding a single node to the graph means (roughly) twice the running time. This is bad news. Basically, there are two ways to tackle this problem:

  1. Make sure that the puzzles are easy enough for the solver: If the runtime is only bad for the worst case, we could try to avoid the worst case by adding more restrictions to the way the level is constructed.
  2. Bite the bullet and try to make the best of the situation.

I went with the second option, and one of the main reasons is that I want the game to surprise me. If I only allowed certain rules in the random level generation, there would hardly be anything new for me to learn. Having to solve a difficult problem to solve in this case just implies that we should focus on small levels (<= 50 nodes). The largest randomly generated levels have 49 nodes, and this is actually plenty of room to hide some clever puzzles in. I felt that HexCells’ random level generator went a bit overboard with the size anyway, so this is not that bad a restriction.

To solve a level, the game uses a so called constraint solver. This is just what it sounds like: An algorithm that produces a valid assignment to variables given a set of constraints for these variables. In our case, each node is a variable that could be either 0 or 1. The constraints are given by the information that is currently known. For example, the information that a given node has exactly two neighbors translates into the constraint that the sum over the variables associated to its neighbors is equal to 2. To find a solution to such a set of constraints, the solver will (essentially) try out all possible assignments for the variables. In doing so, it tries to be clever and use the contraints to deduce that certain partial assignments cannot lead to a valid solution (so it can exit early). For example, if there are 10 nodes there are 1024 different assignments that we have to try. If we can figure out that setting the first variable to 0 can never produce a valid solution, this cuts the number of different assignments down to 512 immediately!

Unfortunately, since one of the game’s main mechanics is revealing new information, this is not enough yet. In the beginning, the nodes are almost all unknown, and for a lot of nodes there will be no information on them at all — this translates into no constraints, and no constraints a very easy so satisfy (they’re always satisfied). So what we actually want to find is not a solution to the level that is consistent with the current constraints, but solvable with all currently known constraints. Since the whole point of the level generator is to generate the information, we have to take an iterative approach. In essence, the following procedure is applied:

  1. find all nodes that are completely solvable from the current constraints
  2. add these nodes to the known nodes
  3. check whether there are any unknown nodes left, if not, exit
  4. add more information to the level, repeat

There are of course many subtleties to this (e.g., where will we add new information when required?), but we will ignore them for now and focus on the first point. How do we find nodes that are completely solvable? With a bit of thought, we can use the constraint solver: For each unknown node, try to find a solution (that is consistent with our current information) where this node is a pattern node, and a solution where this node is a non-pattern node. If we can find both, then we cannot deduce anything about the state of the node from the information we currently have. If there is no solution that has this node as a non-pattern node, it must be a pattern node, and vice-versa. Clearly, we do not have to go over all possible solutions: We can simply add the constraint that the node under consideration must be a pattern/non-pattern node before starting the search.

(And once again, model theory saves the day!)

I hope this post about the random level generation process was interesting, at least in parts. Let me know if you have any questions and/or feedback.

PS: Here’s a question that I have not yet been able to find a satisfying answer for: I mentioned that the constraints are used to deduce that certain partial assignments cannot lead to a solution. This is called constraint propagation. Is there an efficient way to use propagation for component information? (A good answer could speed up the level solver massively for some levels.)

Trackbacks & Pings