(This post also came out of discussions with @moralofstory and @alleleofgene.)
Easy Version: Subroutines
The simple case is when one program calls another. For this to be meaningful, we need a syntactic notion of procedure call. Many computing formalisms provide this. In Lambda Calculus it's easy; however, Lambda Calculus is Turing complete, but not universal. (A universal Turing machine is needed for the invariance theorem of algorithmic information theory; Turing-complete formalisms like lambda calculus are insufficient for this, because they can introduce a multiplicative cost in description length.) For a universal machine, it's convenient to suppose that there's a procedure call much like that in computer chip architectures.In any case, this is more or less useless to us. The interesting case is when a program is embedded in a larger program with no "markings" telling us where to look for it.
An Orthodox Approach: Reductions
For the study of computational complexity, a notion of reduction is often used. One problem class P is polynomial-time reducible to another Q if we can solve P in polynomial time when we have access to an oracle for Q problems. An "oracle" is essentially the same concept as a subroutine, but where we don't count computation time spent inside the subroutine. This allows us to examine how much computation we save when we're provided this "free" subroutine use.This has been an extremely fruitful concept, especially for the study of NP-completeness / NP-hardness. However, it seems of little use to us here.
- Only input/output mappings are considered. The use of oracles allows us to quantify how useful a particular subroutine would be for implementing a specific input/output mapping. What I intuitively want to discuss is whether a particular program embeds another, not whether an input-output mapping (which can be implemented in many different ways) can be reduced to another. For example, it's possible that a program takes no input and produces no output, but simulates another program inside it. I want to define what this means rigorously.
- Polynomial-time reducibility is far too permissive, since it means all poly-time algorithms are considered equivalent (they can be reduced to each other). However, refining things further (trying things like quadratic-time reducibility) becomes highly formalism-dependent. (Different Turing machine formalisms can easily have poly-time differences.)
Bit-To-Bit Embedding
Here's a simplistic proposal, to get us off the ground. Consider the execution history of two Turing machines, A and B. Imagine these as 2D spaces, with time-step t and tape location l. The intuition is that B embeds A if there is a computable functionembed(t,l)
which takes a t,l
for A and produces one for B, and the bits are always exactly the same in these two time+locations.The problem is, this can be a coincidence. embed(t,l) might be computing A completely, and finding a 0 bit or a 1 bit accordingly. This means there will always be an embedding, making the definition trivial. This is similar to the problem which is solved in "reductions" by restricting the reduction to be polynomial time. We could restrict the computational complexity of embed to try and make sure it's not cheating us by computing A. however, I don't think that works in our favor. I think we need to solve it by requiring causal structure.
My intuition is that causality is necessary to solve the problem, not just for this "internal simple embedding" thing, but more generally.
The causal structure is defined by the interventions (see Pearl's book, Causality). If we change a bit during the execution of a program, this has well-defined consequences for the remainder of the program execution. (It may even have no consequences whatsoever, if the bit is never used again.)
We can use all computable
embed(t,l)
as before, but now we don't just require that the bits at t,l
in A are the same as the bits at embed(t,l)
in B; we also require the interventions to be the same. That is, when we change bit t,l
in A and embed(t,l)
in B, then the other bits still correspond. (We need to do multi-bit interventions, not just single-bit; but I think infinite-bit interventions are never needed, due to the nature of computation.)Bit-To-Pattern Embedding
The embeddings recognized by the above proposal are far too limited. Most embeddings will not literally translate the program history bit for bit. For example, suppose we have a program which simulates Earth-like physics with enough complexity that we can implement a transistor-based computer as a structure within the simulation. B could be a physical implementation of A based on this simulation. There will not necessarily be specific t,l correspondences which give a bit-to-bit embedding. Instead, bits in A will map onto electrical charges in predictable locations within B's physics. Recognizing one such electrical charge in the execution history of B might require accessing a large number of bits from B's low-level physics simulation.This suggests that we need
embed(t,l)
to output a potentially complicated pattern-matcher for B's history, rather tan a simple t,l
location.A difficulty here is how to do the causal interventions on the pattern-matched structure. We need to "flip bits" in B when the bit is represented by a complicated pattern.
We can make useful extensions to simple embedding by restricting the "pattern matcher" in easily reversible ways.
embed(t,l)
can give a list of t,l
locations in B along with a dictionary/function which classifies these as coding 1 or 0, or invalid. (This can depend on t,l
! It doesn't need to be fixed.) An intervention changing t,l
in A would be translated as any of the changes to the set embed(t,l)
which change its classification. I'd say all the (valid) variations should work in order for the embedding to be valid. (There might be somewhat less strict ways to do it, though.)This approach is the most satisfying for me at the moment. It seems to capture almost all of the cases I want. I'm not totally confident that it rules out all the non-examples I'd want to rule out, though. We can make a "causality soup" program, which computes every Boolean expression in order, caching values of sub-expressions so that there's a causal chain from the simplest expressions to the most complicated. This program embeds every other program, by the definition here. I'm not sure this should be allowed: it feels like almost the same error as the claim that the digits of pi are Turing-complete because (if pi is normal, as it appears to be) you can find any computable bit sequence in them. While the set of Boolean expressions gives a lot of structure, it doesn't seem like as much structure as the set of all programs.
Another case which seems problematic: suppose B embeds a version of A, but wired to self-destruct if causal interventions are detected. This can be implemented by looking for a property which the real execution history always has (such as a balance of 1s and 0s that never goes beyond some point), and stopping work whenever the property is violated. Although this is intuitively still an embedding, it lacks some of the causal structure of A, and therefore would not be counted.
Pattern-To-Pattern Embedding
Bit-to-pattern embeddings may still be too inflexible to capture everything. What if we want some complex structures in A to map to simple structures in B, so long as the causal structure is preserved? An important example of this is a bit which is modified by the computation at time t, left untouched for a while, and then used again at time t+n. In terms of bit-to-pattern embeddings, each individual time t, t+1, t+2, ... t+n has to have a distinct element in B to map to. This seems wrong: it's requiring too much causal structure in B. We want to treat the bit as "one item" while it is untouched by the computation.Rather than looking for an
embed
function, I believe we now need an embedding
relation. I'm not sure exactly how this goes. One idea:- A "pattern frame" is an ordered set of
t,l
locations. - A "pattern" is an ordered set of bits (which can be fit in a frame of equal size).
- An "association" is a frame for A, a frame for B, a single pattern for A (size matching the frame), and a set of patterns for B (size matching the frame).
embedding
is a program which enumerates associations. A proper embedding between A and B is one which:- Covers A completely, in the actual run and in all interventions. ("Covers" means that the associations contain a pattern frame with matching pattern for every t,l location in A.)
- For all interventions on A, for all derived interventions on B, the execution of B continues to match with A according to the embedding.
This concept is asymmetric, capturing the idea that B embeds all the causal structure of A, but possibly has more causal structure besides. We could make symmetric variants, which might also be useful.
In any case, this doesn't seem to work as desired. Suppose B is the physics simulation mentioned before, but without any computer in it. A embeds B anyway, by the following argument. Let the pattern frames be the whole execution histories. Map the case where A has no interventions to the case where B has no intervention. Map the cases with interventions to entirely altered versions of B, containing appropriate A-computers with the desired interventions. This meets all the requirements, but intuitively isn't a real embedding of A in B.
Pattern-to-pattern embeddings seem necessary for this to apply to theories of truth, as I'm hoping for, because a belief will necessarily be represented by a complex physical sign. For example, a neural structure implementing a concept might have a causal structure which at a high level resembles something in the physical world; but, certainly, the internal causal structure of the individual neurons is not meant to be included in this mapping.
In any case, more work is needed.