Interestingness as an Inductive Heuristic for Future Compression Progress

Theoretical Profiles, Probabilities, and Expectations

Propositions 4.1, 4.2, and 4.4 in the paper describe how observed past compression progress affects expected future compression progress. We have created interactive visualizations to make these theoretical results more intuitive.

Suppose we have an observed partial log-size vs. complexity profile $\hat{P}$, up to a complexity limit $t$. Beyond $t$, the profile could continue in several ways: it might plateau with no further drops, it could drop immediately, or it might experience multiple future drops at much higher complexities. We want to know how the shape of our observation $\hat{P}$ influences our expectation of this continuation.

Interactive Elements & Layout

Log-Size vs. Complexity (Top Left): The observed partial profile $\hat{P}$ is characterized by three adjustable values: the minimum size at complexity 0 ($\hat{n}$), the complexity of the last observed drop ($\hat{m}$), and the current estimated complexity ($\hat{k}$). You can drag the red handles to freely adjust these values alongside the observation boundary $t$. The exact shape between 0 and $\hat{m}$ is irrelevant for our purposes; we only care about how $\hat{P}$’s macro-shape influences the expected continuation.</li>
Complexity vs. Runtime (Bottom Left): Using the abstract time-scale transformation $(i, j) \mapsto (\text{BB}(i), i + j)$, the upper $\hat{P}$ and $P_x$ profiles map directly to equivalent complexity vs. runtime profiles ($\hat{D}$ and $D_x$).</li>
The Prior (Top Right): We assume a distribution over all computable strings, selectable via a dropdown: the Length Prior $L$, Algorithmic Prior $M$, or Speed Prior $S$. $P_x$ is the full profile of a specific string $x$ sampled from this prior. Given our observation $\hat{P}$, what is the expected complexity of its last plot ($m_x$) and its minimum complexity at log-size 0 ($k_x$)?</li>
Probability of a Future Drop (Top Right): The first key question is whether another drop occurs after $t$ (the condition $m_x \geq t$). This probability $\color{#e67e22} p(m_x \geq t \mid \hat{P})$ is plotted as a function of $t - \hat{m}$. For priors $L$ and $M$, this probability decays exponentially. The chart also visualizes the probability of any drop within a given distance $\Delta c$ after $t$.</li>
Expected Relative Position (Middle Right): Without conditioning on a future drop, this chart shows how many steps after $\hat{m}$ the expected last drop occurs ($\color{#228b22} \mathbb{E}[m_x \mid \hat{P}] - \hat{m}$). The expected last drop is to the right of $t$ only if this value exceeds the $t - \hat{m}$ boundary (the gray dashed line).</li>
Expected Future Compression Progress (Bottom Right): This chart displays the expected total future compression progress, calculated as $\color{#4682b4} \hat{k} - \mathbb{E}[k_x \mid \hat{P}]$. It additionally shows the expected compression progress specifically within $\Delta c$ steps after $t$.</li>

Conditioned Expectations (The Shaded Regions)

If we assume there will be a further drop after $t$ (i.e., conditioning on $m_x \geq t$), the left-hand plots visualize the expected continuation:

The shaded green area shows the probability distribution $p(m_x \mid \hat{P}, m_x \geq t)$ for specific drop values $m_x$. The dashed green line labeled $m_x$ marks the expected value $\mathbb{E}[m_x \mid \hat{P}, m_x \geq t]$.</li>
The shaded blue area shows the distribution $p(k_x \mid \hat{P}, m_x \geq t)$ and the expected ultimate complexity $\mathbb{E}[k_x \mid \hat{P}, m_x \geq t]$ (dashed line labeled $k_x$).</li>

How the Priors Affect Expectations

Length Prior ($L$): Both $p_L(m_x \mid \hat{P}, m_x \geq t)$ and $p_L(k_x \mid \hat{P}, m_x \geq t)$ decay quickly as the distance from $t$ and $\hat{k}$ increases. The overall probability of a future drop $p_L(m_x \geq t \mid \hat{P})$ immediately decays exponentially as a function of $t - \hat{m}$.</li>
Algorithmic Prior ($M$): The drop position distribution $p_M(m_x \mid \hat{P}, m_x \geq t)$ behaves similarly to the Length Prior. However, the ultimate complexity distribution $p_M(k_x \mid \hat{P}, m_x \geq t)$ is much more uniform across the interval $[t, \hat{k}]$, leading to a larger expected drop. The future drop probability still decays exponentially, but slower than under $L$.</li>
Speed Prior ($S$): Because it heavily penalizes runtime, conditioning on $m_x \geq t$ leads to an immediate expected drop in the limit of the large runtimes considered here. However, the unconditioned probability $p_S(m_x \geq t \mid \hat{P})$ is essentially zero.</li>

Interact with the visualization: Drag the red handles to adjust the shape of the observed profiles ($\hat{P}$ or $\hat{D}$), select a prior from the dropdown, and observe how these variables alter the expected continuation.

Prior:

Empirical Results

Our theoretical results hold for the busy beaver regime, which involves runtimes that are not physically realizable. While they provide insight into the fundamental algorithmic properties of objects and their time-bounded compressibility, it is not immediately obvious how relevant these results are to real-world scenarios. To address this, we conduct empirical experiments that mirror our theoretical results. We choose three fundamentally different, yet still universal and practical, computational paradigms that represent a wide variety of possible universal computers: 2-Tag Systems, elementary cellular automata with Rule 110, and the Brainfuck (BF) language.

In each of these systems, we run all programs up to a certain length with generous runtime limits. This allows us to construct real-world complexity vs. runtime profiles for all objects (i.e., output strings $x$) computed by multiple different programs. From these profiles, we can compute the relationship between steps since the last observed progress and the empirically achieved further compression progress. Using importance sampling, we can estimate the expected future compression progress under the different priors. These are the empirical equivalents of the theoretical expectations shown above.

We observe that in all three computational paradigms, the fundamental trends clearly hold: Recent compression progress implies further compression progress in the future, especially under the assumption of the Algorithmic Prior $M$. The most significant difference from the theoretical results is with respect to the Speed Prior: in the heavily constrained empirical setting, the logarithmic bias against long runtimes is less pronounced. This is especially the case for Rule 110 cellular automata, for which—due to their construction—no programs with very short runtimes exist. The exact experimental setups for the three systems are detailed below.

Tag Machines

In this system, we run 2-tag systems with the alphabet ${a, b, c, H}$, where $H$ is the halting symbol. These kinds of systems have been shown to be Turing complete. We enumerate all possible production rules up to a combined length of 13, with $H$ appearing only at the end of a single rule. The starting word is always ‘$aaa$’. This leads to approximately 120 million programs which we run for up to 100k steps. If no output is produced within this limit, we consider the program non-halting.

2-Tag Machine Execution (Initial: aaa)

a: b: c:

Rule 110 Cellular Automata

Rule 110 is a Turing-complete elementary cellular automaton. We use a state of size 512 with a cyclic boundary condition (we do this purely for practical reasons and are aware that this technically reduces the automaton to a finite state machine). The programs are defined as the leftmost bits of the tape, up to the last value of 1. We enumerate all programs up to length 25, resulting in approximately 34 million simulations. A program halts as soon as any particular state appears for a second time; this state is then the output of the program. If no state repeats within 100k steps, we consider the program non-halting.

Rule 110 Cellular Automaton (Tape: 512 bits)

Tape:

Brainfuck

BF is a highly compact Turing-complete programming language. Since we are not providing any external inputs, we do not use the read instruction ‘,’. We enumerate all programs up to length 11 consisting of the remaining 7 instructions, leading to approximately 2.3 billion different programs, which we run for up to 100k steps.

Brainfuck Execution

Code:

Tape:

Output: