DRAM-Native &|~ Classification

Randomness, Parallelization, and the DRAM Advantage

How 17 Small Matrices Beat One Large Matrix at the Same Bit Mass

Andreas Otto — 20 June 2026

The Otto Score classifier uses frozen random projections (W0) followed by MAJ3 majority to extract 32-bit feature strings from MNIST inputs. The quality of the random projection directly affects classification accuracy — but only for single-projection configurations. Ensemble parallelization makes the system immune to RNG quality while delivering higher accuracy, faster convergence, and stable error decay.

A large single matrix (H=1088, N=1) suffers from RNG-dependent accuracy (95.7-96.2%), wild error jitter (factor 300-1500×), and overfitting. An ensemble of small matrices with the same total bit mass (H=64, N=17) achieves stable 96.4% regardless of RNG quality, smooth error decay, and genuine generalization.

The DRAM implication is decisive: many small MAJ3 banks with separate random projections outperform one large bank at the same silicon budget — and the diversity makes the system robust against imperfect on-chip randomness.

Same bit mass. Same MAJ3 operations. Same storage.
E=17: stable 96.4% across all RNGs  |  E=1: 95.7-96.2%, oscillates 1500×
Contents

1. Randomness and Accuracy

1.1 The RNG Quality Problem

The Otto Score's first layer is a random projection:

W0:     random uint32[H][196]   (frozen, never trained)
H0[h]:  MAJ3( ~(in ⊗ W0[h]) )  → uint32

Each W0 row is a 196-dimensional random vector (6272 bits). Three RNGs tested:

RNGMethodRangeQualityHistory
broken-31glibc rand() LCG0…2³¹−1Poor — Bit 31 always 0Original bug (May 2026)
fix-gnurand()<<16 ^ rand()0…2³²−1Fair — still LCGFirst fix (June 2026)
fix-splitmixsplitmix64 (BigCrush)0…2³²−1Good — passes all testsCurrent default

1.2 The 31-Bit Bug and Its Cost

The original RNG used glibc rand() returning values in 0…2³¹−1. Bit 31 was always 0 — a systematic bias: one of 32 bit-positions always zero. This cost approximately 2-3% accuracy.

2. Bit Mass — The Fundamental Currency

2.1 Definition

Bit mass = total bits in W0:

W0 bits = H × NC × 32
HNW0 EntriesW0 BitsTarget Size
64112,544401,40880 KB
10881213,2486,823,9361.36 MB
6417213,2486,823,9361.36 MB

H=1088, N=1 and H=64, N=17 have identical bit mass — same W0 entries, same storage, same memory bandwidth. Only the organization differs.

3. The Critical Experiment — Same Bit Mass, Different Organization

3.1 Configuration

E=17:  H=64,  N=17  → 17 independent 64-neuron projections
E=1:   H=1088, N=1  → one single 1088-neuron projection

3.2 Results

RNG ModeE=17 (64×17) EvalE=1 (1088) EvalΔ
broken-31 (LCG, 31 bit)96.2%95.7%+0.5pp
fix-gnu (LCG hack)96.4%96.2%+0.2pp
fix-splitmix (BigCrush)96.3%96.1%+0.2pp
FILE (true random)96.5%96.2%+0.3pp

E=17 is immune to RNG quality. Even broken-31 (the worst RNG, costing 2-3% in single mode) produces 96.2% when combined with 17-fold parallelization.

3.3 Error Convergence

E=17 — smooth exponential decay:

err: 7700 → 1629 → 1121 → 838...
      Ep1    Ep2    Ep3    Ep4
  • Monotonic: err decreases every epoch
  • Smooth: max/min ratio < 30×
  • Convergent: err→247 after 20 epochs

E=1 — wild overfitting oscillations:

err: 7700 → 1370 → 815 → 813 → ...
      Ep1    Ep2    Ep3    Ep4
  • Oscillates by factor 300-1500×
  • Overfits: train=100%, eval plateaus
  • False convergence: err→0 is memorization

4. Why Parallelization Stabilizes

4.1 Score-Summing as Regularizer

The ensemble output:

total[k] = Σ_m score_m[k]

Each member computes an independent Bayes log-Score. The correction pass adjusts each member independently — but only for samples where the ensemble, not the individual member, is wrong. One member's over-correction is diluted by the other 16.

4.2 Feature Diversity

17 independent W0s = 17 different random projections. Different projections capture different feature patterns. A bad projection from broken-31 is outvoted by 16 others. Score-summing across diverse feature sets smooths individual weaknesses.

4.3 The 37% Overlap

Analysis of failure sets across different W0s: only 37% of failures overlap between independent seeds. The remaining 63% are precisely the errors that score-summing fixes — each member fails on different samples, the ensemble succeeds where most members agree.

5. Implications for DRAM Architecture

5.1 Double Parallelization

Level 1 — Row-level (within one chip):

┌──────────────────────────────────────┐
│ Row 0:  W0[0]+MAJ3 → 32 bits         │
│ Row 1:  W0[1]+MAJ3 → 32 bits         │
│ ...                                  │
│ Row 63: W0[63]+MAJ3 → 32 bits        │
├──────────────────────────────────────┤
│ Peripheral: log-odds sum + argmax    │
└──────────────────────────────────────┘

Level 2 — Chip-level (ensemble across chips):

Chip 0:  W0₀ (random₁)  → score₀[10]
Chip 1:  W0₁ (random₂)  → score₁[10]
...
Chip 16: W0₁₆ (random₁₇) → score₁₆[10]
         ↓
         Σ score_m[k]  →  argmax

5.2 Imperfect Randomness is OK

The single-matrix (H=1088) requires high-quality randomness for every row. A defect in the on-chip RNG costs accuracy. The ensemble (H=64×17) with 17 chips:

  • Each chip only needs 64 high-quality rows — 17× easier to guarantee
  • A chip with slightly imperfect W0 is outvoted
  • Even broken-31 achieves 96.2% at E=17
  • No expensive true-RNG hardware needed — simple PRNG + unique seed suffices

5.3 Cost Comparison (6.82 Mbit W0)

MetricE=1 (H=1088)E=17 (H=64×17)
MAJ3 banks1 large (1088 rows)17 small (64 rows each)
Total MAJ3 ops10881088 (same)
W0 storage6.82 Mbit6.82 Mbit (same)
Accuracy (best RNG)96.2%96.5%
Accuracy (worst RNG)95.7%96.2%
err stability❌ factor 300-1500×✅ factor <30×
RNG requirementhigh-qualityany PRNG works
Manufacturingone large maskidentical small chips
Training time (20 ep)~92s~94s (same)

5.4 Scaling Law

Given fixed total bit mass, split across as many independent random projections as possible — down to H=64 per chip. Below H=64, individual members become too weak (1-pass accuracy drops below 84%). Above H=64×N, the ensemble saturates at the MAJ3 method limit (~96.5%).

6. Conclusion

The experiment with identical bit mass but different organization reveals: parallelization with diverse random projections is not just an efficiency gain — it is a statistical necessity.

A single large random matrix is fragile:
RNG quality directly affects accuracy. Error oscillates wildly. Every bit must be perfect.

An ensemble of small random matrices is robust:
Immune to RNG quality. Error decays smoothly. Imperfect chips are outvoted.

For DRAM manufacturing: many small identical chips with simple PRNGs and unique seeds, rather than one large chip with expensive true-random generation.

Same bit mass. Same operations. Same storage.
Better stability. Better manufacturability. Better accuracy.

References
Otto Score main page: forward-prop.nhi1.de
Seed mode experiment: plans/plan-2026-06-19-ensemble-seed.md
RNG modes experiment: plans/plan-2026-06-20-rng-seed-modes.md
Data: logs/run-research.log

📁 Demo source code (otto-score-ifc/)  |  View ensemble trainer source →