Status Report — July 2026
DRAM-native Classification via Bit-Logic Operations
1. Target Initialization: Classical vs Random — Identical Convergence
Question: Does the initial value of the trained target/offset matrix affect final accuracy?
Experiment: Compare two init strategies for CIFAR-10 Otto Score (H=256, 5 ep):
| Init | Description |
|---|---|
| Bayesian | build_target() computes log-odds from class frequencies, logit_convert() + center_target() |
| Random | target_ens[i] = (int32_t)(w0_random() >> OT_PRECISION) — correction builds from scratch |
Result: Both converge to the same final accuracy (~55%). Bayesian init gives a head start in epoch 1 (~30% vs ~10%), but by epoch 5 both are within 0.5pp. Pure zero initialization does NOT work — the correction loop needs a non-zero starting point to break symmetry.
Conclusion: The correction process is robust — target initialization does not determine the ceiling. The 55-57% barrier is architectural, not initialization-dependent.
1b. Correction Distribution: Central vs Per-Sample — Same Ceiling
Question: Does distributing the log-odds correction across individual misclassifications (instead of one aggregated step) improve convergence?
| Mode | Description |
|---|---|
| Central (batch) | Collect all misclassifications in one epoch, compute average correction vector, apply one single step |
| Distributed (per-sample) | For each misclassified sample, apply the correction immediately — multiple small steps per epoch |
Result: Both converge to the same final accuracy (~55%). The distributed approach shows slightly less oscillation during training (smoother accuracy curve), but at significantly higher computational cost. The ceiling is identical.
Conclusion: The correction architecture determines the ceiling, not how the correction is scheduled. The distributed mode's reduced oscillation does not translate to higher final accuracy — confirming that the 55-57% barrier is fundamental and not an optimization artifact.
2. Encoding is Mandatory for Continuous Data — MNIST is the Exception
Finding: Binary neural networks (uint32 containers, XNOR+popcount operations) on continuous data (photographs: CIFAR-10) require a thermometer encoding of pixel values. MNIST is the exception: handwritten digits are binary by nature (ink/no-ink), so raw packing works directly.
MNIST — Special case: binary by nature
| Approach | Encoding | Accuracy | Note |
|---|---|---|---|
| Otto Score | raw, down, sig, lin8, exp | 97.0% | Works WITHOUT encoding — binary pixels |
MNIST digits are inherently binary (>128 = ink, <128 = paper, slight anti-aliasing). Four binary pixels packed into one uint32 preserve the bit-level pattern — XNOR+popcount has structure to exploit. Encoding is optional for MNIST, not mandatory.
CIFAR-10 — Continuous data REQUIRES encoding
| Encoding | Approach | Members | Accuracy | Note |
|---|---|---|---|---|
| Raw (R|G|B plane) | Hebbian | 1 | 10.0% | Random — no bit-level structure |
| exp8 on RGB | Hebbian | 1 | 23.1% | Encoding creates structure |
--encoding latest | Hebbian | 11 | 32.4% | Multi-member vote |
--encoding latest | Otto Score | 11 | 55.0% | Same encoding, stronger learner |
CIFAR-10 pixels are continuous 8-bit color values (0..255 per channel, 16M colors). Four adjacent pixels packed into uint32 have no bit-level correlation with visual similarity — XNOR between two random-looking containers yields random popcounts. Thermometer encoding restores the concept of “nearness” in bit space.
Why encoding matters: exp8 maps brightness to bit patterns where popcount(encode(pv)) ∝ pv. Pixel values 127 and 128 produce nearly identical bit patterns (differ by one bit) instead of completely unrelated ones. XNOR+popcount can exploit this structure — without it, the binary operations have no semantic gradient.
3. Repository Structure
Current state (master branch):
otto-score-ifc/ ← Public distribution root ├── mnist/ ← MNIST trainers (Otto + Hebbian, unified) ├── cifar/ ← CIFAR trainers (symlinks → mnist/) ├── reference/ ← AdamW baselines ├── lib/ ← Shared headers (maj3.h, ki-encoding.h, w0_random.h) └── models/ ← Cached trained models
Only otto-score-ifc/ is part of the public distribution on GitHub. Local research directories (mnist-1/, cifar-1/, www/, etc.) are not in the remote repository.
Key architectural decisions
- Every trainer binary doubles as inference via
--model— no separate IFC source files - Encoding infrastructure in
lib/ki-encoding.h— self-contained, noki_Argsdependency - Hebbian uses the same multi-member architecture as Otto Score
- All 6 approaches are comparable at equal H and epochs
GitHub: github.com/aotto1968/forward-prop (master branch)
4. The 60% CIFAR-10 Barrier — A Fundamental Limit
Claim: 60% accuracy on CIFAR-10 is a fundamental limit for DRAM-native bit-logic classifiers that use no dataset-specific prior knowledge.
Evidence
- Single frozen random projection (Otto Score, Hebbian, AdamW all use this):
- Best Otto Score result: 58.7% (June 2026, extensive encoding sweep)
- Current Otto Score: 55.0% (H=256, latest encoding, 5 ep)
- No configuration tested exceeded 59%
- Scaling does not help beyond ~59%:
- Larger H (256→512): marginal gains (+0.3pp)
- More ensemble members (1→16): marginal gains
- More epochs (5→20): marginal gains
- Encoding sweeps (all 9 types × widths): plateau at ~58-59%
- Target init invariance: Bayesian, random — all converge to same ceiling
- Comparison with published results >60%:
| Source | Accuracy | Note |
|---|---|---|
| This work (Otto Score) | 55-59% | Pure &|~ + int32 |
| Kaggle CIFAR-10 leaderboard | >90% | Deep CNNs, augmentation, transfer learning |
| Binary networks (XNOR-Net, etc.) | 85-90% | Backprop + STE, batch norm, no DRAM constraints |
Every published result >60% uses at least one of:
- Multiple passes over the data (backprop through W0)
- Data augmentation (random crops, flips, color jitter)
- Dataset-specific channel normalization
- Transfer learning from larger datasets (ImageNet)
- Batch normalization or similar adaptive scaling
- These techniques encode prior knowledge about natural images
Why 60% is the wall
- CIFAR-10 has 10 classes with 6000 images each
- A frozen random projection + MAJ3 + linear classifier has ~500K parameters
- The random projection destroys fine-grained spatial information
- MAJ3 further compresses 768 containers to 32 bits per neuron — severe information loss
- Without dataset-specific preprocessing, the model cannot distinguish classes that share coarse color/texture statistics
Position
We consider any CIFAR-10 result above 60% that does not disclose its full training pipeline (including data augmentation, transfer learning, and architecture search) to be potentially influenced by dataset-specific prior knowledge. The Kaggle CIFAR-10 leaderboard is not a valid comparison for DRAM-native classifiers because those entries use GPU-optimized deep learning with extensive prior knowledge built into the architecture.
Summary
| Finding | Status |
|---|---|
| Target init does not affect final accuracy | ✅ Confirmed |
| Thermometer encoding is mandatory for continuous data (CIFAR) | ✅ MNIST works without (binary by nature) |
| 60% CIFAR-10 is a fundamental barrier | 🧪 Evidence supports this claim |
| Multi-member Hebbian works (like Otto) | ✅ 11 members, --encoding latest |
Repository consolidated under otto-score-ifc/ | ✅ Complete |
Andreas Otto — July 2026
github.com/aotto1968/forward-prop