Manifold Geometry
When Models Manipulate Manifolds · Anthropic 2026
Helix Manifold — Character Count Representation
drag to rotate · scroll to zoom
0
characters
Manifold
A manifold is a curved surface that locally looks flat. A sphere is a 2D manifold — it bends in 3D space but any small patch looks like a flat plane. Here, the model stores a count (one number) on a 1D manifold: a curve twisting through high-dimensional space. Different positions on the curve = different counts.
Low-Dimensional Subspace
The residual stream has thousands of dimensions. But the count manifold only uses ~6 of them (capturing 95% of the variance). Those 6 dimensions form a "subspace" — a smaller flat space embedded inside the larger one. The helix lives inside this subspace. The rest of the dimensions are used for other things.
Helix
A helix is a spiral that advances through space as it rotates. Counting on a helix means: as the count increases, you rotate around the circle AND move forward. Points far apart in count are orthogonal (perpendicular) — making them easy to distinguish even in noisy space. It's more resolution-efficient than a straight line or a circle alone.
Why Not Integers?
Storing count=42 as a dedicated neuron #42 requires N neurons for N possible values — dimensionally expensive. A helix uses ~6 dimensions for arbitrarily large counts. Dense but distinguishable.
Why Not a Scalar?
A single number (magnitude) loses fine-grained resolution under the noise of a high-dimensional residual stream. The helix encodes count as angular position, which is robust to magnitude noise because you're reading direction, not size.
Discretization via Sparse Features
The continuous helix curve is discretized by sparse features — each feature fires for a range of counts, like place cells in the brain. You can view the same representation as: (a) a family of discrete features firing at different thresholds, or (b) angular position on a continuous helix. Both describe the exact same thing.
Stage 1 — Accumulate
Each token has a character length (e.g. "hello" = 5). These are summed across tokens into a running character count, stored as position on the helix manifold.
Stage 2 — Twist
Attention heads geometrically transform the count manifold to produce a "distance-to-boundary" estimate. The line width constraint (e.g. 80 chars) acts as a reference. The twist operation is literally a rotation/shear in the 6D subspace.
Stage 3 — Decide
Multiple "distance to boundary" estimates from different attention heads are arranged orthogonally to each other. This creates a linear decision boundary that's easy to threshold: if the projection crosses zero, insert newline.
Helix manifold (count positions)
Current count position
Sparse feature activations
Decision boundary plane
View
Count
0
Animate
Width limit
60