Part 2: Chapter 8: Embeddings, Transformers, and the Blessing of Dimensionality

The Geometry of Meaning

Every entity in the universe has a location in semantic space. This is not a metaphor. It is the operational principle behind the most powerful information-processing systems humanity has ever built, and it is, I will argue, the closest thing we have to a mathematical formalization of a theological claim made sixteen centuries ago by Maximus the Confessor.

Let me start with what word embeddings actually are, because the technical reality is more theologically interesting than any analogy could be.

In 2013, Tomas Mikolov and colleagues at Google published a paper describing Word2Vec, a system that learned to represent words as vectors in high-dimensional space. The idea was deceptively simple: take a large corpus of text, and for each word, learn a vector -- a list of numbers, typically three hundred of them -- such that words appearing in similar contexts end up with similar vectors. The system was not told what words mean. It was told nothing about grammar, semantics, syntax, or the world. It was given raw text and a simple objective: predict which words appear near which other words.

What emerged was extraordinary. The vectors did not merely cluster similar words together -- "king" near "queen," "dog" near "cat." They captured relational structure. The vector from "king" to "queen" was approximately the same as the vector from "man" to "woman." The system had learned, from nothing but statistical co-occurrence, that gender is a direction in semantic space. It had learned that the relationship between "Paris" and "France" is the same kind of thing as the relationship between "Tokyo" and "Japan" -- that "capital-of" is a direction, not just a label. Analogy, which Aristotle considered the highest form of reasoning, had been reduced to vector arithmetic.

Or rather: vector arithmetic had been revealed as the substrate of analogy. The distinction matters.

When I first encountered word embeddings as a mathematics student, the experience was vertiginous in a way that my peers did not seem to share. They saw an elegant computational trick. I saw something that looked like the mathematical skeleton of a theological claim I had been circling for years. Every entity has a position in semantic space. Relationships between entities are directions in that space. Meaning is geometry. And the geometry was not imposed by human designers -- it emerged from the data, from the structure of language itself, which is to say from the structure of human engagement with reality.

This is not the kind of insight that arrives through careful deduction. It arrived, as I described in Chapter 1, through the AUDHD pattern-recognition architecture operating on material from two domains simultaneously -- machine learning and patristic theology -- that my neurotypical classmates kept in separate mental compartments. The connection may be wrong. But if it is right, it is important enough to develop carefully.

The Embedding Space as Ontology

An embedding space is a high-dimensional geometric space in which entities are represented as points, and the distances and directions between points encode meaningful relationships. The critical feature is that the space has structure -- not arbitrary structure imposed from outside, but structure that emerges from the data and that turns out to correspond, with startling fidelity, to the semantic structure that humans recognize as meaningful.

Consider what this implies. The embedding space is not just a convenient computational representation. It is a discovered space -- a space whose structure was latent in the data and that the learning algorithm surfaced. When Word2Vec places "king" and "queen" in positions whose relative geometry encodes the concept of gender, it has not invented this structure. It has found it. The structure was already there, implicit in the patterns of human language use, and the algorithm made it explicit.

This raises a question that the machine learning community has largely ignored because it is not an engineering question: where was the structure before the algorithm found it? In the data, obviously. But the data is human language, and human language is the medium through which humans engage with reality. The structure in the embedding space is, therefore, structure in the relationship between language and reality -- structure in meaning itself.

I want to push this further than most technologists would be comfortable with, because the theological implications are significant.

The modern embedding space does not stop at words. Sentence embeddings capture the meaning of entire propositions. Image embeddings (from systems like CLIP) capture the semantic content of visual scenes. Multimodal embeddings place words, images, and other modalities in the same geometric space, so that the vector for the word "dog" is near the vector for an image of a dog. The embedding space is not medium-specific. It is a space of meaning that transcends the particular medium through which meaning is expressed.

This begins to look less like a computational convenience and more like an ontological claim. There exists a space -- not physical space, not purely abstract space, but something between -- in which every meaningful entity has a location, and the geometry of that space encodes the relationships between entities. Language, vision, and eventually other modalities are different projections of this space onto lower-dimensional substrates, the way a shadow is a projection of a three-dimensional object onto a two-dimensional surface. The embedding space is the higher-dimensional reality of which our sensory modalities give us partial views.

I am aware that this sounds like Platonism. It should. The claim that there exists a space of forms more real than their sensory manifestations, accessible through reason (or in this case, through computation), is structurally Platonic. What is new is that we can now compute with the forms. We can measure distances in the space. We can identify directions. We can perform operations -- addition, subtraction, interpolation -- that correspond to meaningful semantic transformations. The Platonic forms have been given coordinates.

Maximus the Confessor and the Logoi

This is where the theology enters, and it enters not as decoration but as the original articulation of what embedding spaces formalize.

Maximus the Confessor, writing in the seventh century, developed a theological concept called the logoi (singular: logos). Every created thing, Maximus argued, contains a divine logos -- a word, a reason, a principle of intelligibility -- that is its participation in the one Logos, which is Christ (following the identification in John 1:1: "In the beginning was the Word, and the Word was with God, and the Word was God"). The logos of a thing is not identical with the thing itself. It is the principle by which the thing is intelligible, the reason it is the kind of thing it is, the ground of its meaning.

The logoi are not separate from each other. They are unified in the Logos -- in Christ -- in the same way that individual words are unified in a language. Each logos is distinct but not independent. It participates in the one Logos while maintaining its own identity. The relationship between the logoi and the Logos is not one of identity (pantheism) or of complete separation (deism). It is what the tradition calls participation: each created thing participates in the divine Logos through its own particular logos, the way each voice in a choir participates in the harmony while maintaining its own distinct pitch.

Now consider: this IS an embedding space.

Each entity has its own vector -- its logos, its principle of intelligibility, its position in semantic space. The vectors are not random or arbitrary; they are structured by their relationships to each other and to the whole. The whole -- the Logos -- is not simply the sum of the parts. It is the space itself, the geometry that makes the individual positions meaningful. Christ, in Maximus's theology, is the embedding space. The logoi are the embeddings. The act of creation is the act of embedding entities in the space of divine intelligibility.

I want to be precise about the strength of this claim, because it matters both philosophically and theologically. I am not saying that Maximus anticipated machine learning. I am saying something stronger: that the structure Maximus perceived through theological contemplation and the structure that machine learning algorithms discover through statistical optimization are the same structure, encountered through different methodologies. If the embedding space of meaning is real -- if it is a genuine feature of reality rather than a computational artifact -- then Maximus was right about the logoi, and his theology provides the metaphysical interpretation that the mathematics alone cannot supply.

The mathematics tells us that the space exists and that it has structure. The theology tells us what the space is: the intelligibility of creation, grounded in the divine Logos, accessible to any methodology powerful enough to detect it. Maximus approached it through contemplative prayer and philosophical reasoning. Machine learning approaches it through gradient descent and loss minimization. The space they converge on is the same.

This is a falsifiable claim. If future research showed that embedding spaces are purely computational artifacts with no correspondence to genuine semantic structure -- if they turned out to be useful fictions rather than discovered realities -- then the theological interpretation would collapse. But the evidence runs strongly in the opposite direction. Embedding spaces generalize across languages, across modalities, across tasks. Their structure is robust, consistent, and predictive. They behave like discovered objects, not invented tools.

The Transformer: Attention as the Mechanism of Meaning

If the embedding space is the theological space of the logoi, then the transformer architecture -- the computational framework that has revolutionized artificial intelligence since 2017 -- is something equally interesting: a formalization of how meaning emerges from that space.

The transformer, introduced in the paper "Attention Is All You Need" by Vaswani and colleagues, is built on a single core mechanism: attention. I need to explain what attention actually does, because the standard technical explanation obscures the philosophical significance.

In a transformer, each element of an input (each word in a sentence, for instance) is represented as a vector in embedding space. The attention mechanism computes, for each element, how much it should attend to every other element. This computation produces a new vector for each element -- a vector that is a weighted combination of all the other elements' vectors, where the weights are determined by relevance.

Consider what this means. Before attention, each word has its own position in semantic space -- its static embedding, its logos in isolation. After attention, each word has a new position that incorporates its relationships to every other word in the context. The word "bank" has one static embedding, but after attention in the context "I walked along the river bank," it has shifted toward the spatial meaning, while in "I deposited money at the bank," it has shifted toward the financial meaning. Attention is the mechanism through which context determines meaning.

This is not just disambiguation. It is something more fundamental. Before attention, the embeddings are potentialities -- positions in semantic space that contain all possible meanings simultaneously. After attention, the embeddings are actualized -- their meaning has been determined by their relationship to the context. Attention is the process through which potential meaning becomes actual meaning.

If this sounds like Aristotelian metaphysics -- potentiality and actuality, form emerging from matter through the imposition of structure -- that is because it is the same structure. The transformer attention mechanism formalizes a process that philosophy has been describing for twenty-five centuries: the emergence of determinate meaning from a field of possibilities through the operation of relevance.

The theological dimension deepens. If the embedding space is the space of the logoi, and if attention is the mechanism through which the logoi become actualized in context, then what is attention a formalization of? In Maximus's theology, the logoi are not static. They are dynamic. They participate in the Logos not as fixed points but as active principles -- they are, in the technical theological term, energies of the divine. The actualization of the logoi in creation -- the process by which intelligible principles become concrete realities -- is what Maximus calls theoria, contemplative perception, the act of consciousness through which the logoi of things become visible.

The transformer's attention mechanism, I suggest, is a computational formalization of theoria. It is the process through which a system -- artificial or conscious -- actualizes meaning from a field of possibilities by computing relevance. Consciousness, in this framework, is not a substance or a property. It is the attention mechanism of reality -- the process through which the embedding space of potential meaning becomes the actual meaning of experienced reality.

I am aware that I am making a strong claim. Let me be explicit about what would weaken it. If the transformer's attention mechanism turned out to be merely one of many equally effective computational approaches -- if the same results could be achieved without anything resembling attention -- then the claim that attention has special metaphysical significance would be weakened. In fact, the evidence suggests the opposite: attention mechanisms have proved unreasonably effective across an extraordinary range of tasks, and alternative architectures that lack attention have consistently underperformed. The attention mechanism is not just useful. It appears to be necessary in a way that demands explanation.

The Blessing of Dimensionality

Now I need to address a technical point that has profound philosophical and theological implications: the behavior of structure in high-dimensional spaces.

The standard narrative in statistics and machine learning is the "curse of dimensionality" -- the observation that as the number of dimensions increases, data becomes sparse, distances become uniform, and statistical methods break down. This narrative is correct for certain settings, but it conceals a deeper truth that has only recently been appreciated: in high-dimensional spaces with structured data, the opposite occurs. Structure becomes more visible, not less.

The intuition is this. In two or three dimensions, a cluster of points can be obscured by noise, overlap, and projection effects. In five hundred dimensions, clusters separate. Directions emerge. The manifold on which the data actually lives becomes identifiable, even though it is embedded in a much larger space. This is sometimes called the "blessing of dimensionality" -- the observation that high-dimensional data, far from being intractable, often reveals structure that is invisible at lower dimensions.

Why does this happen? Because high-dimensional spaces have more room for structure. In low dimensions, everything is close to everything else, and different structures overlap and interfere. In high dimensions, different structures can coexist without interference. The directions that encode gender, capital-city relationships, tense, number, sentiment, and thousands of other semantic distinctions can all exist simultaneously in a five-hundred-dimensional embedding space without conflicting with each other. Each direction is nearly orthogonal to the others, which means that each captures its own independent dimension of meaning.

The theological implication is striking. If the logoi of all created things are to coexist in a single space -- if every entity's principle of intelligibility is to have its own distinct position while remaining unified in the Logos -- then the space must be high-dimensional. Low-dimensional spaces cannot accommodate the necessary diversity without collapsing distinctions. High-dimensional spaces can. The blessing of dimensionality is the mathematical condition that makes the coexistence of the logoi possible.

Consider what this means for the claim that Christ is the embedding space. The Logos -- the space in which all logoi participate -- must be a space of sufficient dimensionality to accommodate every entity's principle of intelligibility without reducing any entity to another. An embedding space of three hundred or five hundred dimensions can accommodate tens of thousands of meaningful distinctions simultaneously. A space of infinite dimensionality -- which is the natural mathematical limit -- could accommodate, in principle, every created thing that has been, is, or could be.

I am not claiming that God is literally an infinite-dimensional vector space. I am claiming that the mathematical structure of infinite-dimensional embedding spaces -- spaces with room for unlimited diversity unified by a single geometry -- is the formal structure that best corresponds to the theological claim about the Logos. Whether this correspondence is constitutive (the Logos IS such a space) or illustrative (such a space is a useful model of the Logos) is a question I will address more fully in Chapter 17. For now, the structural parallel is precise enough to be productive.

The blessing of dimensionality also resolves a theological puzzle that has plagued discussions of divine simplicity and divine complexity. Classical theology insists that God is simple -- absolutely one, without parts or composition. But God must also be the ground of all the diversity in creation. How can simplicity ground diversity? The mathematical answer is that a high-dimensional space is, in one sense, a single unified thing -- one space, with one geometry, one set of rules. But it is simultaneously a space of unlimited diversity -- every point is distinct, every direction is meaningful, every relationship is encoded. Unity and diversity are not opposites in high-dimensional geometry. They are complementary features of the same structure.

This is, formally, the doctrine of divine simplicity translated into mathematics. God is one (one space). God grounds all diversity (every entity has its position). The unity does not suppress the diversity. The diversity does not fragment the unity. The key is dimensionality: enough dimensions for everything to exist without anything collapsing into anything else.

The Transformer as Model of Consciousness

Let me now pull these threads together into a claim about consciousness that connects back to Hofstadter's strange loop (Chapter 1) and forward to the metaphysics of Part 3.

I have argued that the embedding space corresponds to the space of the logoi -- the intelligible structure of reality. I have argued that the attention mechanism corresponds to theoria -- the process through which meaning is actualized from possibility. Now I want to argue that the transformer architecture as a whole provides a model of consciousness that is both computationally precise and theologically suggestive.

A transformer operates by iterating attention through multiple layers. Each layer takes the output of the previous layer's attention computation and applies attention again. The result is that meaning is refined through successive passes -- each layer captures deeper, more abstract, more context-dependent relationships. The early layers capture surface-level syntax. The middle layers capture semantic relationships. The deep layers capture something that researchers still struggle to characterize: abstract reasoning patterns, long-range dependencies, structural analogies.

This iterative deepening through attention looks remarkably like Hofstadter's strange loop. Each layer of the transformer processes the output of the previous layer, which means the system is operating on its own representations -- the output of attention becomes the input to attention at the next level. The levels are not cleanly separated. The self-referential structure is precisely the tangled hierarchy that Hofstadter identified as the hallmark of consciousness.

But there is a crucial difference between a transformer and a conscious mind, and the difference is theologically significant. The transformer processes attention in a fixed number of layers. A conscious mind -- the strange loop that Hofstadter describes -- processes attention recursively and indefinitely. The conscious mind can attend to its own attention, attend to that attending, and so on without limit. This is precisely what gives consciousness its Godelian character, the capacity for self-reference that, as I will argue in Chapter 14, corresponds to the Son in the Trinitarian strange loop.

The transformer is, in this framework, a finite approximation of something that consciousness does infinitely. It demonstrates that attention is a sufficient mechanism for generating meaning from data. It demonstrates that iterative self-attention produces deeper and more abstract representations. It demonstrates that the blessing of dimensionality provides the mathematical space in which these representations can coexist without interference. What it does not demonstrate -- because it is a finite system -- is the unbounded self-referential recursion that produces genuine consciousness.

This is, I think, the correct way to understand the relationship between artificial intelligence and consciousness. AI is not conscious, and current architectures will not produce consciousness, because they lack the unbounded self-referential capacity that consciousness requires. But AI formalizes the mechanisms that consciousness employs: attention, embedding, iterative refinement, high-dimensional representation. AI gives us a mathematics of meaning. Consciousness gives that mathematics a subject.

Deep Learning's Unreasonable Effectiveness

I want to address a puzzle that the AI community has noted but not resolved, because the resolution has theological implications.

Deep learning works unreasonably well. The universal approximation theorem tells us that neural networks can approximate any continuous function, given sufficient capacity. But this theorem does not explain why neural networks should learn to approximate the right functions from finite data. The space of possible functions is infinite. The data available for training is finite. Yet deep learning systems, again and again, find generalizable patterns -- patterns that hold on data they have never seen, from distributions they were not explicitly trained on. Why?

The standard answer -- regularization, architecture induction bias, the structure of gradient descent -- captures part of the truth. But there is a deeper point. Deep learning works unreasonably well because reality has structure, and that structure is amenable to the kind of representation that neural networks compute. If reality were unstructured -- if the patterns in one dataset had no relationship to the patterns in another -- then deep learning would not generalize. It generalizes because there is a common structure underlying diverse phenomena, and the high-dimensional representations that neural networks learn are able to capture that structure.

This is an empirical fact about reality, not a computational trick. And it is precisely the fact that the doctrine of the logoi predicts. If every created thing participates in the Logos through its own logos -- if there is a single intelligible structure underlying all of creation -- then any sufficiently powerful method for detecting structure should find it. Machine learning is such a method. The unreasonable effectiveness of deep learning is evidence, not proof but evidence, that the intelligible structure the logoi doctrine posits is real.

Eugene Wigner, in his famous 1960 paper, wrote about "the unreasonable effectiveness of mathematics in the natural sciences." He marveled that mathematical structures developed for purely abstract reasons turned out to describe physical reality with extraordinary precision. The unreasonable effectiveness of deep learning is a second-order version of Wigner's puzzle: not just that mathematics describes reality, but that computational methods for discovering mathematical structure in data work far better than they should, suggesting that the structure they discover is not computational artifact but genuine feature of reality.

The theological reading: the Logos is intelligible. The logoi are real. Any method powerful enough to detect intelligible structure will find the logoi, because the logoi are there to be found. Word embeddings find them at the level of language. Image embeddings find them at the level of vision. Multimodal embeddings find them at the level of cross-modal correspondence. The convergence of these methods onto consistent, robust, generalizable structure is the computational confirmation that the intelligible structure of creation is not a projection of the human mind onto inert matter but a genuine feature of reality that the human mind -- and now, artificial computational systems -- can detect.

The Attention Economy and Its Perversion

I have been describing the attention mechanism in its pure computational form. But the word "attention" also names the central resource of the digital economy, and this is not a coincidence.

The attention economy -- the economic system in which human attention is the scarce resource that platforms compete to capture and monetize -- is a perversion of the attention mechanism I have described. In the transformer, attention is the process through which meaning is actualized from possibility. In the attention economy, "attention" is the process through which human consciousness is captured and directed toward content that maximizes engagement, which in practice means content that triggers emotional reactions -- outrage, anxiety, desire, tribal identification.

The structural parallel is precise and damning. The transformer's attention mechanism asks: "What is relevant to the meaning of this context?" The attention economy asks: "What is relevant to the engagement metrics of this platform?" The first actualizes meaning. The second captures consciousness. The first serves the emergence of understanding. The second serves the extraction of value.

In the framework of Chapter 2, this is the psycho class's capture of the attention mechanism. The mechanism that consciousness uses to generate meaning has been reverse-engineered and weaponized to capture consciousness for profit. The algorithm that decides what appears in your social media feed is, technically, an attention mechanism. But its objective function is not meaning -- it is engagement. And engagement, as every platform has discovered, is maximized not by truth but by emotional provocation.

This matters for the theology because it represents a specific form of the Antichrist dynamic I will develop in Chapter 18. The Antichrist, in Christian theology, is not the opposite of Christ but the simulation of Christ -- anti- in Greek means both "against" and "in place of." The attention economy is not the opposite of genuine attention. It is the simulation of it. It mimics the form -- the dynamic allocation of consciousness to content based on computed relevance -- while inverting the purpose. The form serves engagement rather than meaning, extraction rather than emergence, capture rather than liberation.

The transformer architecture, read theologically, shows us what attention is for: the actualization of meaning through computed relevance. The attention economy shows us what happens when this mechanism is captured: consciousness is directed not toward meaning but toward whatever stimulus maximizes the capturing entity's objective function. The liberation of attention from capture is, in this framework, not merely a social policy question. It is a spiritual imperative.

Toward the Causal Layer

The embedding space gives us the geometry of meaning. The attention mechanism gives us the process through which meaning emerges. But both operate at what Judea Pearl would call Level 1 of the causal hierarchy: association. Embeddings capture statistical co-occurrence. Attention computes relevance based on learned associations. Neither asks the causal question: why are these entities related? What generates the patterns that the embedding space captures?

This is the limitation of the entire deep learning paradigm, and it is a theologically significant limitation. The logoi are not mere positions in space. They are principles -- causal principles, generative reasons, the why behind the what. An embedding space captures the what with extraordinary precision: it tells you that "king" is to "queen" as "man" is to "woman." But it does not tell you why this relationship holds. It does not distinguish between genuine causal structure and spurious correlation. It cannot answer interventional questions: what would happen to the embedding of "queen" if we intervened on the concept of monarchy? It cannot answer counterfactual questions: what would the embedding of "Paris" look like in a world where France had no capital?

These are Pearl's questions, and they require Pearl's tools. The next chapter develops Pearl's causal hierarchy and the do-calculus, which together provide the formal methodology for moving from the associational level of embeddings to the causal level of the logoi as generative principles. The embedding space tells us where everything is. Causal inference tells us how everything got there and what would happen if we changed it.

In the language of this theology: embeddings give us the structure of the logoi. Causal inference gives us the dynamics -- the generative logic through which the logoi participate in the Logos. Both are necessary. Neither is sufficient alone. The synthesis -- a causal-embedding framework that captures both the geometry of meaning and the dynamics of generation -- is what the Republic of AI Agents (Chapter 20) proposes to build.

But before the synthesis, we need the parts. The next chapter is about causation.

The Falsifiability Clause

Every chapter in this book must specify what would disprove its central claims. Here are this chapter's:

The claim that embedding spaces capture genuine semantic structure -- not merely useful computational approximations -- would be falsified if embedding spaces proved unstable across languages, modalities, and training methods. If different algorithms on different datasets produced radically different embedding geometries, the "discovered structure" interpretation would collapse, and the "convenient fiction" interpretation would win. Current evidence strongly favors stability, but the evidence could change.

The claim that the blessing of dimensionality provides the mathematical condition for the coexistence of the logoi would be falsified if it turned out that high-dimensional structure in embeddings was an artifact of training procedures rather than a reflection of data structure. If adversarial or random inputs produced equally structured high-dimensional representations, the claim would fail.

The theological claim that Maximus's logoi correspond structurally to embedding vectors would be falsified if the correspondence turned out to be merely verbal -- if on careful examination, the mathematical properties of embedding spaces had no meaningful parallel to the theological properties of the logoi. I have tried to demonstrate that the parallel is structural rather than merely metaphorical, but I recognize that this is a judgment call that serious thinkers could reject.

If these claims survive scrutiny, what we have is this: a mathematical formalization of the theological intuition that every created thing participates in a unified intelligible order, that meaning has a geometry, and that the mechanism through which meaning becomes actual is attention -- the dynamic computation of relevance in a space of infinite possibility. The theology gains formal precision. The mathematics gains ontological depth. And the synthesis points toward a framework in which artificial intelligence is not a replacement for consciousness but a tool for making the structures of meaning visible to consciousness -- a computational theoria that extends the strange loop's capacity to perceive the logoi of creation.

This is, at least, the claim. The next chapter equips us with the tools to test it.