Randomness, Parallelization, and the DRAM Advantage

How 17 Small Matrices Beat One Large Matrix at the Same Bit Mass

Andreas Otto — 20 June 2026

The Otto Score classifier uses frozen random projections (W0) followed by MAJ3 majority to extract 32-bit feature strings from MNIST inputs. The quality of the random projection directly affects classification accuracy — but only for single-projection configurations. Ensemble parallelization makes the system immune to RNG quality while delivering higher accuracy, faster convergence, and stable error decay.

A large single matrix (H=1088, N=1) suffers from RNG-dependent accuracy (95.7-96.2%), wild error jitter (factor 300-1500×), and overfitting. An ensemble of small matrices with the same total bit mass (H=64, N=17) achieves stable 96.4% regardless of RNG quality, smooth error decay, and genuine generalization.

The DRAM implication is decisive: many small MAJ3 banks with separate random projections outperform one large bank at the same silicon budget — and the diversity makes the system robust against imperfect on-chip randomness.

Same bit mass. Same MAJ3 operations. Same storage.
E=17: stable 96.4% across all RNGs | E=1: 95.7-96.2%, oscillates 1500×

Contents

1. Randomness and Accuracy
2. Bit Mass — The Fundamental Currency
3. The Critical Experiment
4. Why Parallelization Stabilizes
5. Implications for DRAM Architecture
6. Conclusion

1. Randomness and Accuracy

1.1 The RNG Quality Problem

The Otto Score's first layer is a random projection:

W0:     random uint32[H][196]   (frozen, never trained)
H0[h]:  MAJ3( ~(in ⊗ W0[h]) )  → uint32

Each W0 row is a 196-dimensional random vector (6272 bits). Three RNGs tested:

RNG	Method	Range	Quality	History
`broken-31`	glibc `rand()` LCG	0…2³¹−1	Poor — Bit 31 always 0	Original bug (May 2026)
`fix-gnu`	`rand()<<16 ^ rand()`	0…2³²−1	Fair — still LCG	First fix (June 2026)
`fix-splitmix`	splitmix64 (BigCrush)	0…2³²−1	Good — passes all tests	Current default

1.2 The 31-Bit Bug and Its Cost

The original RNG used glibc rand() returning values in 0…2³¹−1. Bit 31 was always 0 — a systematic bias: one of 32 bit-positions always zero. This cost approximately 2-3% accuracy.

2. Bit Mass — The Fundamental Currency

2.1 Definition

Bit mass = total bits in W0:

W0 bits = H × NC × 32

H	N	W0 Entries	W0 Bits	Target Size
64	1	12,544	401,408	80 KB
1088	1	213,248	6,823,936	1.36 MB
64	17	213,248	6,823,936	1.36 MB

H=1088, N=1 and H=64, N=17 have identical bit mass — same W0 entries, same storage, same memory bandwidth. Only the organization differs.

3. The Critical Experiment — Same Bit Mass, Different Organization

3.1 Configuration

E=17:  H=64,  N=17  → 17 independent 64-neuron projections
E=1:   H=1088, N=1  → one single 1088-neuron projection

3.2 Results

RNG Mode	E=17 (64×17) Eval	E=1 (1088) Eval	Δ
`broken-31` (LCG, 31 bit)	96.2%	95.7%	+0.5pp
`fix-gnu` (LCG hack)	96.4%	96.2%	+0.2pp
`fix-splitmix` (BigCrush)	96.3%	96.1%	+0.2pp
FILE (true random)	96.5%	96.2%	+0.3pp

E=17 is immune to RNG quality. Even broken-31 (the worst RNG, costing 2-3% in single mode) produces 96.2% when combined with 17-fold parallelization.

3.3 Error Convergence

E=17 — smooth exponential decay:

err: 7700 → 1629 → 1121 → 838...
      Ep1    Ep2    Ep3    Ep4

Monotonic: err decreases every epoch
Smooth: max/min ratio < 30×
Convergent: err→247 after 20 epochs

E=1 — wild overfitting oscillations:

err: 7700 → 1370 → 815 → 813 → ...
      Ep1    Ep2    Ep3    Ep4

Oscillates by factor 300-1500×
Overfits: train=100%, eval plateaus
False convergence: err→0 is memorization

4. Why Parallelization Stabilizes

4.1 Score-Summing as Regularizer

The ensemble output:

total[k] = Σ_m score_m[k]

Each member computes an independent Bayes log-Score. The correction pass adjusts each member independently — but only for samples where the ensemble, not the individual member, is wrong. One member's over-correction is diluted by the other 16.

4.2 Feature Diversity

17 independent W0s = 17 different random projections. Different projections capture different feature patterns. A bad projection from broken-31 is outvoted by 16 others. Score-summing across diverse feature sets smooths individual weaknesses.

4.3 The 37% Overlap

Analysis of failure sets across different W0s: only 37% of failures overlap between independent seeds. The remaining 63% are precisely the errors that score-summing fixes — each member fails on different samples, the ensemble succeeds where most members agree.

5. Implications for DRAM Architecture

5.1 Double Parallelization

Level 1 — Row-level (within one chip):

┌──────────────────────────────────────┐
│ Row 0:  W0[0]+MAJ3 → 32 bits         │
│ Row 1:  W0[1]+MAJ3 → 32 bits         │
│ ...                                  │
│ Row 63: W0[63]+MAJ3 → 32 bits        │
├──────────────────────────────────────┤
│ Peripheral: log-odds sum + argmax    │
└──────────────────────────────────────┘

Level 2 — Chip-level (ensemble across chips):

Chip 0:  W0₀ (random₁)  → score₀[10]
Chip 1:  W0₁ (random₂)  → score₁[10]
...
Chip 16: W0₁₆ (random₁₇) → score₁₆[10]
         ↓
         Σ score_m[k]  →  argmax

5.2 Imperfect Randomness is OK

The single-matrix (H=1088) requires high-quality randomness for every row. A defect in the on-chip RNG costs accuracy. The ensemble (H=64×17) with 17 chips:

Each chip only needs 64 high-quality rows — 17× easier to guarantee
A chip with slightly imperfect W0 is outvoted
Even broken-31 achieves 96.2% at E=17
No expensive true-RNG hardware needed — simple PRNG + unique seed suffices

5.3 Cost Comparison (6.82 Mbit W0)

Metric	E=1 (H=1088)	E=17 (H=64×17)
MAJ3 banks	1 large (1088 rows)	17 small (64 rows each)
Total MAJ3 ops	1088	1088 (same)
W0 storage	6.82 Mbit	6.82 Mbit (same)
Accuracy (best RNG)	96.2%	96.5%
Accuracy (worst RNG)	95.7%	96.2%
err stability	❌ factor 300-1500×	✅ factor <30×
RNG requirement	high-quality	any PRNG works
Manufacturing	one large mask	identical small chips
Training time (20 ep)	~92s	~94s (same)

5.4 Scaling Law

Given fixed total bit mass, split across as many independent random projections as possible — down to H=64 per chip. Below H=64, individual members become too weak (1-pass accuracy drops below 84%). Above H=64×N, the ensemble saturates at the MAJ3 method limit (~96.5%).

6. Conclusion

The experiment with identical bit mass but different organization reveals: parallelization with diverse random projections is not just an efficiency gain — it is a statistical necessity.

A single large random matrix is fragile:
RNG quality directly affects accuracy. Error oscillates wildly. Every bit must be perfect.

An ensemble of small random matrices is robust:
Immune to RNG quality. Error decays smoothly. Imperfect chips are outvoted.

For DRAM manufacturing: many small identical chips with simple PRNGs and unique seeds, rather than one large chip with expensive true-random generation.

Same bit mass. Same operations. Same storage.
Better stability. Better manufacturability. Better accuracy.

References
Otto Score main page: forward-prop.nhi1.de
Seed mode experiment: plans/plan-2026-06-19-ensemble-seed.md
RNG modes experiment: plans/plan-2026-06-20-rng-seed-modes.md
Data: logs/run-research.log

📁 Demo source code (otto-score-ifc/) | View ensemble trainer source →