A needle retrieved from 32,000 tokens of context, with the cold KV cache served off an NVMe drive, in ~1.8 GB of RAM. 8× KV sparsification at +0.69% perplexity.
A memory architecture that attaches to a frozen pretrained transformer and preserves its outputs. Proof-of-mechanism on one small model — every number reproduces from a command.
git clone https://github.com/nihilistau/Position_Is_Arithmetic.git && cd Position_Is_Arithmetic/papers/01-two-ring-memory/repro ./run_r9_32k_needle.ps1 -Model Qwen3-0.6B-f16.gguf -Drive F:\ -Corpus wiki.test.raw
You'll watch an out-of-distribution secret surface from a context that never fully lived in RAM. Correctness reproduces on any NVMe; the latency figure needs Optane.
Memory wall
910×
resident KV cache shrunk (8.3 MB vs 7.5 GB at 32k) via a two-ring offload to byte-addressable storage.
Intelligence wall
+0.69%
perplexity at 8× sparsification — four pinned attention-sink tokens; 2× and 4× go negative.
Compute wall
O(N)
a ±1 projection router + quickselect: directional recall at 32 bytes/token, linear selection.
| Claim | Number | Caveat |
|---|---|---|
| Quality at 8× sparsification | +0.69% PPL (2× −0.71, 4× −0.92) | 0.6B, 2k, one corpus |
| Needle retrieval, no recency bias | HIT at depth 10 / 50 / 90 | one model, one needle type |
| KV served off physical Optane | HIT off NVMe, poison-gated | 512 proven; 32k pending |
| Random-read latency | 7.57 µs / read | Optane-specific |
| KV-RAM footprint | 910× cache · 1.8 GB live | net ~8×, router-index-dominated |
| Bit-exact when disabled | argmax-identical to the stock model | the invariant under everything |
The systems paper →
Every mechanism as idea → receipt → payoff → implementation. The receipts, in full.
The algebraic companion →
The elliptic-curve framework that motivated the design. Not required to run or validate anything. A door for the curious.
The biggest open question is scale: does 8×-at-<2% hold past 0.6B? Other one-person-can't-do-it asks: independent repros on your own drive (the Linux path especially), and compressing the ±1 router index (a sign-packed popcount form should cut its ~950 MB by ~32×). See CONTRIBUTING.