Randomness, Parallelization, and the DRAM Advantage
How 17 Small Matrices Beat One Large Matrix at the Same Bit Mass
Andreas Otto — 20 June 2026
The Otto Score classifier uses frozen random projections (W0) followed by MAJ3 majority to extract 32-bit feature strings from MNIST inputs. The quality of the random projection directly affects classification accuracy — but only for single-projection configurations. Ensemble parallelization makes the system immune to RNG quality while delivering higher accuracy, faster convergence, and stable error decay.
A large single matrix (H=1088, N=1) suffers from RNG-dependent accuracy (95.7-96.2%), wild error jitter (factor 300-1500×), and overfitting. An ensemble of small matrices with the same total bit mass (H=64, N=17) achieves stable 96.4% regardless of RNG quality, smooth error decay, and genuine generalization.
The DRAM implication is decisive: many small MAJ3 banks with separate random projections outperform one large bank at the same silicon budget — and the diversity makes the system robust against imperfect on-chip randomness.
E=17: stable 96.4% across all RNGs | E=1: 95.7-96.2%, oscillates 1500×
1. Randomness and Accuracy
1.1 The RNG Quality Problem
The Otto Score's first layer is a random projection:
W0: random uint32[H][196] (frozen, never trained) H0[h]: MAJ3( ~(in ⊗ W0[h]) ) → uint32
Each W0 row is a 196-dimensional random vector (6272 bits). Three RNGs tested:
| RNG | Method | Range | Quality | History |
|---|---|---|---|---|
broken-31 | glibc rand() LCG | 0…2³¹−1 | Poor — Bit 31 always 0 | Original bug (May 2026) |
fix-gnu | rand()<<16 ^ rand() | 0…2³²−1 | Fair — still LCG | First fix (June 2026) |
fix-splitmix | splitmix64 (BigCrush) | 0…2³²−1 | Good — passes all tests | Current default |
1.2 The 31-Bit Bug and Its Cost
The original RNG used glibc rand() returning values in 0…2³¹−1.
Bit 31 was always 0 — a systematic bias: one of 32 bit-positions always zero.
This cost approximately 2-3% accuracy.
2. Bit Mass — The Fundamental Currency
2.1 Definition
Bit mass = total bits in W0:
W0 bits = H × NC × 32
| H | N | W0 Entries | W0 Bits | Target Size |
|---|---|---|---|---|
| 64 | 1 | 12,544 | 401,408 | 80 KB |
| 1088 | 1 | 213,248 | 6,823,936 | 1.36 MB |
| 64 | 17 | 213,248 | 6,823,936 | 1.36 MB |
H=1088, N=1 and H=64, N=17 have identical bit mass — same W0 entries, same storage, same memory bandwidth. Only the organization differs.
3. The Critical Experiment — Same Bit Mass, Different Organization
3.1 Configuration
E=17: H=64, N=17 → 17 independent 64-neuron projections E=1: H=1088, N=1 → one single 1088-neuron projection
3.2 Results
| RNG Mode | E=17 (64×17) Eval | E=1 (1088) Eval | Δ |
|---|---|---|---|
broken-31 (LCG, 31 bit) | 96.2% | 95.7% | +0.5pp |
fix-gnu (LCG hack) | 96.4% | 96.2% | +0.2pp |
fix-splitmix (BigCrush) | 96.3% | 96.1% | +0.2pp |
| FILE (true random) | 96.5% | 96.2% | +0.3pp |
E=17 is immune to RNG quality. Even broken-31 (the worst RNG, costing 2-3% in single mode) produces 96.2% when combined with 17-fold parallelization.
3.3 Error Convergence
E=17 — smooth exponential decay:
err: 7700 → 1629 → 1121 → 838...
Ep1 Ep2 Ep3 Ep4
- Monotonic: err decreases every epoch
- Smooth: max/min ratio < 30×
- Convergent: err→247 after 20 epochs
E=1 — wild overfitting oscillations:
err: 7700 → 1370 → 815 → 813 → ...
Ep1 Ep2 Ep3 Ep4
- Oscillates by factor 300-1500×
- Overfits: train=100%, eval plateaus
- False convergence: err→0 is memorization
4. Why Parallelization Stabilizes
4.1 Score-Summing as Regularizer
The ensemble output:
total[k] = Σ_m score_m[k]
Each member computes an independent Bayes log-Score. The correction pass adjusts each member independently — but only for samples where the ensemble, not the individual member, is wrong. One member's over-correction is diluted by the other 16.
4.2 Feature Diversity
17 independent W0s = 17 different random projections. Different projections capture different feature patterns. A bad projection from broken-31 is outvoted by 16 others. Score-summing across diverse feature sets smooths individual weaknesses.
4.3 The 37% Overlap
Analysis of failure sets across different W0s: only 37% of failures overlap between independent seeds. The remaining 63% are precisely the errors that score-summing fixes — each member fails on different samples, the ensemble succeeds where most members agree.
5. Implications for DRAM Architecture
5.1 Double Parallelization
Level 1 — Row-level (within one chip):
┌──────────────────────────────────────┐ │ Row 0: W0[0]+MAJ3 → 32 bits │ │ Row 1: W0[1]+MAJ3 → 32 bits │ │ ... │ │ Row 63: W0[63]+MAJ3 → 32 bits │ ├──────────────────────────────────────┤ │ Peripheral: log-odds sum + argmax │ └──────────────────────────────────────┘
Level 2 — Chip-level (ensemble across chips):
Chip 0: W0₀ (random₁) → score₀[10]
Chip 1: W0₁ (random₂) → score₁[10]
...
Chip 16: W0₁₆ (random₁₇) → score₁₆[10]
↓
Σ score_m[k] → argmax
5.2 Imperfect Randomness is OK
The single-matrix (H=1088) requires high-quality randomness for every row. A defect in the on-chip RNG costs accuracy. The ensemble (H=64×17) with 17 chips:
- Each chip only needs 64 high-quality rows — 17× easier to guarantee
- A chip with slightly imperfect W0 is outvoted
- Even
broken-31achieves 96.2% at E=17 - No expensive true-RNG hardware needed — simple PRNG + unique seed suffices
5.3 Cost Comparison (6.82 Mbit W0)
| Metric | E=1 (H=1088) | E=17 (H=64×17) |
|---|---|---|
| MAJ3 banks | 1 large (1088 rows) | 17 small (64 rows each) |
| Total MAJ3 ops | 1088 | 1088 (same) |
| W0 storage | 6.82 Mbit | 6.82 Mbit (same) |
| Accuracy (best RNG) | 96.2% | 96.5% |
| Accuracy (worst RNG) | 95.7% | 96.2% |
| err stability | ❌ factor 300-1500× | ✅ factor <30× |
| RNG requirement | high-quality | any PRNG works |
| Manufacturing | one large mask | identical small chips |
| Training time (20 ep) | ~92s | ~94s (same) |
5.4 Scaling Law
Given fixed total bit mass, split across as many independent random projections as possible — down to H=64 per chip. Below H=64, individual members become too weak (1-pass accuracy drops below 84%). Above H=64×N, the ensemble saturates at the MAJ3 method limit (~96.5%).
6. Conclusion
The experiment with identical bit mass but different organization reveals: parallelization with diverse random projections is not just an efficiency gain — it is a statistical necessity.
A single large random matrix is fragile:
RNG quality directly affects accuracy. Error oscillates wildly. Every bit must be perfect.
An ensemble of small random matrices is robust:
Immune to RNG quality. Error decays smoothly. Imperfect chips are outvoted.
Same bit mass. Same operations. Same storage.
Better stability. Better manufacturability. Better accuracy.
References
Otto Score main page: forward-prop.nhi1.de
Seed mode experiment: plans/plan-2026-06-19-ensemble-seed.md
RNG modes experiment: plans/plan-2026-06-20-rng-seed-modes.md
Data: logs/run-research.log
📁 Demo source code (otto-score-ifc/) | View ensemble trainer source →