Loom: A Scalable Computer Architecture for Looped Transformers

Mehmet Kerem Turkcan

Loom is a computer architecture for looped transformers. Its weights are derived analytically, requiring no training data and no gradient descent. Programs are written in C, compiled to a 22-opcode ISA (including indirect memory write), and executed as iterated matrix multiplications through 8 fixed-weight transformer layers. The entire machine state fits in a fixed-size bipolar tensor. The architecture scales from 146×512 (7.4 MB, 320 instruction slots) through 155×1024 (4.7M parameters, 928 slots) to 164×2048 (9.0M parameters, 1,792 slots).

	This Work	Percepta, Tzamos 2026	Autoregressive LLM
Paradigm	Looped, fixed state	Autoregressive, trace	Autoregressive, tokens
State	Fixed-size tensor, `155×1024`	Growing execution trace	Growing token sequence
Cost per step	O(1), same 8 layers	O(log t) via HullKVCache	O(t) or O(log t)
Weights	Analytically computed	Constructed or trained, unspecified	Trained on data
Architecture	8 layers, d=155, custom ISA	7 layers, d=36, 18 2D-heads	Varies, billions of params
Program location	In the state tensor	Interpreter in weights; program as input tokens	In the prompt
Programs	C → custom ISA → state tensor	C/C++ → WASM bytecode → input tokens	Natural language prompts
Determinism	Exact, no sampling	Exact, greedy decoding	Probabilistic
Memory	Constant, fixed tensor	Grows with execution time	Grows with context

This Work

Percepta, Tzamos 2026

Autoregressive LLM

Paradigm

Looped, fixed state

Autoregressive, trace

Autoregressive, tokens

State

Fixed-size tensor, 155×1024

Growing execution trace

Growing token sequence

Cost per step

O(1), same 8 layers

O(log t) via HullKVCache

O(t) or O(log t)

Weights

Analytically computed

Constructed or trained, unspecified

Trained on data

Architecture

8 layers, d=155, custom ISA

7 layers, d=36, 18 2D-heads

Varies, billions of params

Program location

In the state tensor

Interpreter in weights; program as input tokens

In the prompt

Programs

C → custom ISA → state tensor

C/C++ → WASM bytecode → input tokens

Natural language prompts

Determinism

Exact, no sampling

Exact, greedy decoding

Probabilistic

Memory

Constant, fixed tensor

Grows with execution time

Grows with context

Looped Transformers as Programmable Computers by Giannou et al., 2023, showed that a fixed-depth transformer with analytically constructed weights can execute the SUBLEQ instruction. Our work addresses what follows: how to architect a multi-operation ISA on this substrate, how to compile high-level programs to it, and how the resulting architecture scales.

Extensions of the programmable-computer framework. Looped ReLU MLPs May Be All You Need as Practical Programmable Computers by Liang et al., 2024, shows that a 23-layer ReLU-MLP can also emulate a programmable computer with analytical weights, using MLPs instead of transformers. Simulation of Graph Algorithms with Looped Transformers by Giannou et al., 2024, extends the original framework with a dual-memory SUBLEQ variant for graph problems but remains single-instruction. Neural Algorithmic Reasoning for Hypergraphs with Looped Transformers by Li et al., 2025, extends looped transformer algorithmic reasoning from graphs to hypergraphs. On Expressive Power of Looped Transformers by Xu and Sato, ICML 2025, establishes formal approximation rates for looped transformers and proposes timestep encoding to enhance expressivity.

Exact neural computation. Learning to Add, Multiply, and Execute Algorithmic Instructions Exactly with Neural Networks by Back de Luca et al., NeurIPS 2025, proves that two-layer networks can learn to execute binary addition, multiplication, and Subtract-and-Branch-if-Negative exactly, using the NTK framework with trained rather than analytically constructed networks. Algorithmic Language Models with Neurally Compiled Libraries by Saldyt and Kambhampati, 2024, augments LLaMA3 with memory, registers, and basic operations, compiling algorithms into differentiable libraries.

Analytical weight construction and compilation. Tracr by Lindner et al., 2023, compiles RASP programs to transformer weights for interpretability research. Thinking Like Transformers by Weiss et al., 2021, introduced RASP as a programming language that maps to transformer primitives. Both construct weights analytically but target interpretability, not general-purpose computation. ALTA by Shaw et al., 2024, extends RASP/Tracr with loops and compiles to Universal Transformers. Transformers are Efficient Compilers, Provably by Bai et al., 2024, shows that trained transformers can perform compilation. Weights to Code, 2026, works in the reverse direction, extracting executable programs from trained transformer weights.

Transformer Turing completeness. Autoregressive Large Language Models are Computationally Universal by Schuurmans et al., 2024, shows that autoregressive decoding itself realizes universal computation via Lag systems. Constant Bit-size Transformers Are Turing Complete by Li and Wang, 2025, proves constant-parameter transformers are Turing complete with sufficient context. Softmax Transformers are Turing-Complete by Jiang et al., 2025, settles the open question for softmax attention. Efficient Turing Machine Simulation with Transformers by Li and Wang, 2025, shows that sparse attention with fixed geometric offsets suffices for efficient universal computation.

Trained neural computers. Can LLMs Be Computers? by Tzamos et al. at Percepta, 2026, implements a WebAssembly interpreter inside a 7-layer autoregressive transformer with d=36 and 18 two-dimensional attention heads. The model generates execution traces token-by-token. HullKVCache uses 2D attention heads to build convex hulls over prior tokens, giving O(log t) lookups instead of linear scans, yielding 30k tok/s throughput and multi-million-step executions. The weight construction method is not fully specified in the blog post; the authors describe both "implementing" the interpreter in weights and the model "learning" it. The approach is architecturally distinct from ours: in Percepta, the execution logic, a WASM interpreter, is encoded in the weights, with programs supplied as input tokens and executed as a growing autoregressive trace. In Loom, the weights are program-independent. Programs live in the state tensor, and the same fixed-weight model executes any compiled program at O(1) per step with no state growth.

Program execution and simulation. Universal Length Generalization with Turing Programs by Hou et al., 2024, decomposes algorithms into Turing machine steps as chain-of-thought, constructing RASP programs that simulate arbitrary TMs. Code Simulation as a Proxy for High-order Tasks in Large Language Models by La Malfa et al., 2025, studies LLMs simulating code execution as a reasoning proxy.