Otto Score: DRAM-native Classification via Bit-Logic + MAJ3
Status Paper — June 2026 | forward-prop research project
Abstract
We present Otto Score, a classification method designed for DRAM inference chips that use only bit-level operations (&, |, ~, XNOR, popcount). No floating-point, no integer multiply-accumulate, no division during inference.
The method achieves 99.0% on MNIST and 58.7% on CIFAR-10 using pure &|~ + int32 at inference time. Training uses float32 SGD as a compromise (gradients need continuous values), but the trained model is exported to pure integer weights for deployment.
Key architectural insight: Every ensemble member is computed independently and in parallel. Adding more members increases accuracy without increasing inference latency — members simply occupy more DRAM rows running simultaneously.
Terminology
| Term | Definition |
|---|---|
| Channel | A transformed view of the input pixels (e.g., Luminance, Red-Green opponent, Hue). Each channel produces NC_slice uint32 containers per image. |
| Encoding | A bit-level mapping from pixel values to thermometer patterns (e.g., exp8, sig8). Converts continuous float values to binary patterns with similar-popcount adjacency. |
| Member | One independent classifier: one frozen random W0 matrix + one trained target matrix. A member sees exactly one channel with one encoding. |
| Ensemble | A collection of members whose scores are summed for the final prediction. Members are parallel by design — each occupies a separate DRAM row. |
| HiddenN (H) | Number of MAJ3 neurons per member. Each neuron produces 1 bit per class vote. |
| MAJ3 | Majority-of-3: compresses NC_slice uint32 containers into 32 bits via bitwise majority voting. A lossy but noise-robust projection. |
| W0 | The frozen random projection matrix (uint32 bit patterns). Never trained. |
| Target | The trained log-odds correction matrix (int32 per class × neuron × bit). The only learned parameter. |
| NC_slice | Number of uint32 containers per member input. Determined by splitHN (horizontal split factor). |
| trainN / evalN | Number of training / evaluation samples. |
| epochsN (Ep) | Number of iterative correction passes over the training data. |
| step-err | Step decay mode: cos-time, pow, cos-err, const. |
| target-err | Soft convergence target: training stops when error rate drops below threshold. |
| Bit-Mass | Total information capacity: H × EN × 32 bits per neuron. |
| Container | A uint32 value packing multiple pixels (typically 4 pixels × 8 bits). The unit of MAJ3 voting. |
1. Parallelism: More Members, Same Latency
Every ensemble member is a fully independent classifier:
Member 0: channel_0 + encoding_0 → W0[0] → MAJ3 → target[0] → score_0[c] Member 1: channel_1 + encoding_1 → W0[1] → MAJ3 → target[1] → score_1[c] Member 2: channel_2 + encoding_2 → W0[2] → MAJ3 → target[2] → score_2[c] → SUM → argmax ... Member N: channel_N + encoding_N → W0[N] → MAJ3 → target[N] → score_N[c]
On a DRAM chip, each member occupies one row (or a small group of rows). All members compute simultaneously — the chip's row buffers activate every member's W0 in parallel. Score summing is a lightweight reduction.
Consequence: Accuracy is limited by data, not by compute time. Doubling the ensemble (EN: 7 → 17) costs zero additional inference latency. On MNIST, adding members from EN=1 to EN=17 gains +8pp (89.9% → 97.6%) with no runtime penalty at chip level.
2. Methodic: The Number World vs Binary World
2.1 The Core Problem
A DRAM chip can perform bitwise operations on entire rows in parallel — but it cannot efficiently multiply or add floating-point numbers. Every input pixel must be represented as a bit pattern that the chip can compare via XNOR + popcount.
This creates a fundamental tension: integer/floating-point values are continuous, but binary patterns are discrete.
2.2 MNIST: When Raw Input Works
MNIST handwritten digits are binary by nature: each pixel is either "ink" (>128, treat as 1) or "paper" (<128, treat as 0). When packed into uint32 containers, the resulting 32-bit pattern directly represents the digit's shape. No encoding transformation is needed.
Raw MNIST pixel (8-bit): 0x00 0xFF 0xFF 0x00 → 0x00FF00FF
(paper, ink, ink, paper)
Standard Otto Score reaches 86% in 1 pass and 97%+ with iteration on MNIST without any special encoding — the signal is already binary.
2.3 CIFAR: The Continuity Requirement
CIFAR-10 color images have grayscale gradients. A pixel value of 127 and 128 differ by only 1 in the number world, but in raw 8-bit binary they differ by up to 8 bits (0x7F → 0x80 = bit pattern flip from 01111111 to 10000000). This discontinuity breaks the XNOR+popcount similarity measure.
2.4 The Solution: Thermometer Encoding
Thermometer encoding restores continuity: each value v (0–255) is mapped to a pattern where the first v>>s bits are 1, the rest 0:
raw 8-bit: 127→0x7F (popcount=7) 128→0x80 (popcount=1) ← DISCONTINUOUS thermometer:127→0x1F (popcount=5) 128→0x20 (popcount=6) ← CONTINUOUS
Two adjacent values now have similar popcount — the bit-level similarity reflects the numerical similarity.
2.5 Channels + Encodings Define a Member
Every member is defined by the pair (channel, encoding). Multiple members can share the same channel with different encodings, or the same encoding on different channels.
Channels
| Token | Name | Formula | Description |
|---|---|---|---|
r | Red | raw R | Raw red pixel |
g | Green | raw G | Raw green pixel |
b | Blue | raw B | Raw blue pixel |
y | Luminance (BT.601) | (77R+150G+29B)>>8 | Broadcast luminance |
yl | Luminance (BT.709) | (54R+183G+18B)>>8 | HDTV luminance |
lum/al | Luminance (R+G) | (R+G)>>1 | Simple luminance sum |
rg/am | Red-Green opp. | (R-G+255)>>1 | Red vs Green (L-M) |
by/ap | Blue-Yellow opp. | (2B-R-G+510)>>2 | Blue vs Yellow (S-(L+M)) |
bl | Luminance (R+B) | (R+B)>>1 | Alternative luminance |
bm | R-B opponent | (R-B+255)>>1 | Red-Blue difference |
bp | G-(R+B)/2 opp. | (2G-R-B+510)>>2 | Green vs Red+Blue |
h | Hue | atan2(2R-G-B, G-B) | Color angle (HSV) |
s | Saturation | max(R,G,B)-min(R,G,B) | Color purity |
c | Contrast | Sobel 3×3 edge magnitude | Spatial edges on LUM |
Encodings
Each encoding first computes a level (0 to max_lev, where max_lev = width for most encodings, except lin7 where max_lev = 7). The level is then converted to a thermometer bitmask:
output = ((uint32_t)1 << level) - 1 // level lower bits set to 1 // Examples: level=0 → 0x00000000, level=3 → 0x00000007 (popcount=3), // level=8 → 0x000000FF (popcount=8), level=32 → 0xFFFFFFFF
Term definitions:
pv = pixel value (0–255) — the raw 8-bit input from the image after channel transformation.
width = encoding output width in bits (8, 16, or 32). Determines the number of thermometer levels: max_lev = width (except lin7).
max_lev = maximum thermometer level = number of bits in the output pattern. Default: width (so width=8 → max_lev=8 → output ranges from 0x00 to 0xFF).
low_lev = segment boundary level used by multi-segment encodings (down, up, mid) to allocate thermometer resolution across bright/dark/mid ranges.
mid_lev = max_lev − low_lev — the remaining levels allocated to the opposite segment.
After computing level, the final output is always: ((uint32_t)1 << level) − 1 (thermometer mask: exactly level lower bits set to 1).
| Encoding | level formula | max_lev | Best For |
|---|---|---|---|
raw |
return pv (identity, no thermometer) |
— | MNIST, binary signals |
lin7 |
pv >> 5 |
7 (fixed) | Legacy 7-level linear |
lin/lin8 |
(pv × max_lev) >> 8 |
width | Luminance — uniform slope |
down |
pv < 64 → (pv × low_lev) / 64pv ≥ 64 → low_lev + ((pv-64) × (max_lev-low_lev)) / 192low_lev = max_lev × 3 / 7 |
width | Shadows — more levels in dark range |
up |
pv > 192 → low_lev + ((pv-192) × (max_lev-low_lev)) / 63pv ≤ 192 → (pv × low_lev) / 192low_lev = max_lev × 4 / 7 |
width | Highlights — more levels in bright range |
mid |
pv < 80 → (pv × low_lev) / 80pv > 175 → low_lev + ((pv-175) × mid_lev) / 80else → low_lev + ((pv-80) × mid_lev) / 95low_lev = max_lev × 3/8, mid_lev = max_lev - low_lev |
width | Mid-tones — band-pass, compresses shadows and highlights |
log |
log₂(pv + 1) × max_lev / log₂(256) |
width | Power-law distributions (edges) — high res for dark |
exp |
(pv / 255)e × max_lev, with e = 0.30 (width=8/32) or e = 0.35 (width=16) |
width | Brightness-sparse — high res for bright pixels |
sig |
σ(12 × (pv/255 − 0.5)) × max_lev,where σ(x) = 1 / (1 + e−x) |
width | Opponent channels (RG,BY) — S-shaped, saturates extremes |
Examples for width=8 (max_lev=8):
| pv | raw | lin | exp | log | sig | up | down | mid |
|---|---|---|---|---|---|---|---|---|
| 0 | 0x00 (0) | 0x00 (0) | 0x00 (0) | 0x00 (0) | 0x00 (0) | 0x00 (0) | 0x00 (0) | 0x00 (0) |
| 32 | 0x20 (1) | 0x01 (1) | 0x07 (3) | 0x3F (6) | 0x00 (0) | 0x00 (0) | 0x03 (2) | 0x07 (3) |
| 64 | 0x40 (1) | 0x03 (2) | 0x1F (5) | 0x7F (7) | 0x00 (0) | 0x01 (1) | 0x0F (4) | 0x0F (4) |
| 128 | 0x80 (1) | 0x0F (4) | 0xFF (8) | 0xFF (8) | 0x0F (4) | 0x07 (3) | 0x7F (7) | 0xFF (8) |
| 192 | 0xC0 (2) | 0x3F (6) | 0xFF (8) | 0xFF (8) | 0xFF (8) | 0x1F (5) | 0xFF (8) | 0x7F (7) |
| 255 | 0xFF (8) | 0xFF (8) | 0xFF (8) | 0xFF (8) | 0xFF (8) | 0xFF (8) | 0xFF (8) | 0xFF (8) |
Note: popcount is shown in parentheses. The raw encoding produces popcount equal to the number of set bits in the 8-bit value — completely unrelated to the pixel's brightness (0x80 = 128 has popcount 1, same as 0x01 = 1). This is why thermometer encoding is essential for photographic images.
Member Definition Examples
# One member: channel=mnist, encoding=exp8 --channels mnist --encoding exp8 # Two members: same channel, different encodings --channels mnist --encoding exp8,log8 # Three members: different channels with per-channel encoding --channels g,bl,bm --encoding g=up8,bl=down8,bm=sig # 10 members: full CIFAR best config --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8
2.6 Container Width: 8/16/32 Does Not Help
The natural question: does widening the thermometer (8 → 16 → 32 bits) improve accuracy? Empirically: no.
Each thermometer value is passed through MAJ3, which compresses NC_slice containers into exactly 32 bits via bitwise majority voting. After MAJ3, every bit-width produces exactly 32 bits. A wider container does not increase the information reaching the target.
Width 8: 4 pixels × 8 bit = 1 uint32 container → MAJ3 → 32 bits Width 16: 2 pixels × 16 bit = 1 uint32 container → MAJ3 → 32 bits Width 32: 1 pixel × 32 bit = 1 uint32 container → MAJ3 → 32 bits
The correct lever is more members, not wider containers. Duplication (channel + encoding variants) creates independent MAJ3 projections with different random W0.
2.7 Encoding Orthogonality
Encodings should be orthogonal: two encodings on the same pixel should produce bit patterns that are as decorrelated as possible. The best MNIST config (exp8,log8,sig8) exploits this:
| Encoding | Maps v=128 to | Popcount | Character |
|---|---|---|---|
exp8 | 0x01 | 1 | Compressed dark range |
log8 | 0xFE | 7 | Compressed bright range |
sig8 | 0x60 | 2 | S-curve, mid-range emphasis |
For CIFAR, orthogonality is achieved by pairing channels with encodings matched to their statistical distribution:
| Channel | Distribution | Encoding | Why |
|---|---|---|---|
| G (raw green) | Uniform 0–255 | up8 | Ascending highlights bright greens |
| BL (R+B lum) | Uniform 0–255 | down8 | Descending highlights dark areas |
| BM (R-B opp.) | Peaked at 128 | sig8 | Sigmoid saturates opponent extremes |
| H (hue angle) | Cyclic 0–255 | up8 | Ascending preserves angle order |
| C (Sobel edges) | Power-law (80% < 30) | log8 | Log gives dark edges more levels |
3. Architecture: Otto Score
3.1 Inference (DRAM-native)
Input pixels → Thermometer encoding → uint32 containers
↓
Layer 0: Frozen random projection (W0) via MAJ3
W0: random uint32 bit patterns (permanent, never trained)
MAJ3: For each of H neurons, majority-vote over NC_slice containers
→ 1 bit per neuron per class vote
↓
Layer 1: Trained Target (Bayesian log-Score)
target[c][h][b]: 32-bit log-odds correction
score[c] = sum over h,b of target[c][h][b] * h0_bit[h][b]
↓
argmax(score) → predicted class
3.2 Training (float32 SGD)
for each sample (x, y_true):
h0 = MAJ3(W0 @ encode(x))
scores[c] = sum(target[c] * h0)
pred = argmax(scores)
if pred != y_true:
target[y_true][h0_bit==1] += step // reward correct
target[pred][h0_bit==1] -= step // penalize wrong
3.3 Step Size Physics
| Mode | Formula | Behavior |
|---|---|---|
pow | step_init × (err/total)power | Decays with error — can oscillate |
cos-time | step_init × cos(progress × π/2) | Smooth decay — independent of error. Winner. |
cos-err | step_init × cos(err/total × π/2) | Error-dependent — unsuitable |
const | step_init | Oscillates in late epochs |
cos-time is the clear winner for CIFAR. The step decreases independently of the current error, preventing the oscillation spiral: high error → large step → overshoot → error stays high.
4. Results
4.1 MNIST
| evalN | Best eval | EN | H | Ep | Ratio (train:eval) |
|---|---|---|---|---|---|
| 100 | 99.0% | 7 | 128 | 8 | 599:1 |
| 1000 | 98.1% | 7 | 128 | 8 | 59:1 |
| 10000 | 97.7% | 7 | 128 | 8 | 5:1 |
The 98%-wall is data-limited, not method-limited. With 599:1 ratio we reach 99.0%. At 5:1, only 97.7%.
./mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8
4.2 CIFAR-10
| Config | eval | EN | H | Ep | Time |
|---|---|---|---|---|---|
| 10-channel + Hue + Sobel C | 58.7% | 17 | 128 | 8 | 357s |
| 10-channel (no H/C) | 57.2% | 7 | 128 | 10 | 124s |
| 6-channel (opponent blocks) | 56.9% | 6 | 128 | 10 | 107s |
| 4-channel baseline | 56.6% | 16 | 128 | 10 | 193s |
./cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 17 --epochsN 8 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8 --target-err 0.4
4.3 Bit-Mass Scaling Law
Doubling the bit-mass yields ~0.3pp. Logarithmic scaling, as predicted by information theory.
| H_eff (H×EN) | MNIST eval | Δ |
|---|---|---|
| 512 | 96.9% | — |
| 1024 | 97.3-97.4% | +0.4pp |
| 2048 | 97.6-97.7% | +0.3pp |
| 4096 (extrapolated) | ~97.9-98.0% | +0.3pp |
5. Key Insights
5.1 W0 Diversity Is the Ensemble Feature
The random projection W0 is never trained. Its role is to create diverse binary fingerprints of the input. --member-seed const (all members share identical W0) regresses to 89.9% — barely better than N=1.
5.2 Human Color Vision as Blueprint
The Otto Score channel set is directly inspired by the human visual system: three cone types (L=red, M=green, S=blue) transformed into opponent channels in the retina. See color-vision-opponent-channels.md.
| Human Visual System | Otto Score Channel | Purpose |
|---|---|---|
| L+M (luminance) | lum / al | Brightness, edges, texture |
| L−M (red-green) | rg / am | Red vs Green opponent |
| S−(L+M) (blue-yellow) | by / ap | Blue vs Yellow opponent |
| R−B | bm | Red-Blue difference |
| Spatial precision | h, c | Hue angle + Sobel edges |
The opponent transformation is critical: raw R,G,B are highly correlated (all rise with brightness). Opponent channels decorrelate brightness from color.
5.3 MAJ3 Is an Information Bottleneck
MAJ3 compresses NC_slice uint32 containers → 32 bits. For MNIST (NC=196) the compression is benign. For CIFAR (NC=768), the compression loses critical shade information — hence thermometer encoding is essential.
5.4 Per-Member Architecture
Every member has its own W0, target matrix, step counter, and error counter. This allows members processing different channels to converge at different rates.
5.5 True Random vs PRNG
True random W0 improves eval by +0.5pp over splitmix64 PRNG. Correlated PRNG values produce less diverse W0 → less diverse projections.
6. Open Questions
6.1 The Data Wall
The Otto Score is data-limited. Larger datasets (ImageNet-1K with 1.2M images, or a dedicated DRAM-scale dataset with 10M+ samples) would directly translate to higher accuracy.
6.2 Depth vs Ensemble
Deep MAJ3 stacks (2+ layers) lose information. The ensemble approach (parallel MAJ3 banks) is strictly better than depth at equal bit-mass.
6.3 CIFAR Gap to Float
Otto Score at 58.7% already matches float32 shallow nets. Deep nets (ResNet-18: 93%+) remain out of reach for the MAJ3 framework.
7. Conclusion
Otto Score reaches 99.0% on MNIST and 58.7% on CIFAR-10 using only &|~ + int32 at inference time. The method is data-limited: accuracy scales logarithmically with bit-mass and linearly with training data per evaluation sample.
The core innovations are:
- Thermometer encoding — bridges the number-world/binary-world gap
- Per-channel encoding selection — each channel type gets its optimal binary mapping
- cos-time step decay — gentle error-independent step reduction prevents oscillation
- Multi-encoding ensemble —
exp8+log8+sig8provides complementary projections - Target-err convergence — soft stopping criterion based on error-rate target
- Free parallelism — ensemble members run simultaneously on separate DRAM rows
Appendix A: MNIST Scoreboard (Top 20)
run-grep --sort eval,-time mnist | head -20
[2026-06-29_10-15-48] train=99.8% eval=99.0% err=30 lr=0.0500 time=39908ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 --trainN 59900 --evalN 100 [2026-06-29_10-14-37] train=99.6% eval=98.1% err=32 lr=0.0500 time=39418ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 --trainN 59000 --evalN 1000 [2026-06-29_09-38-14] train=100.0% eval=97.8% err=0 lr=0.0500 time=49370ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 3 --epochsN 10 --step-err cos-time --encoding exp8,log8,sig8 --encoding exp --step-power 0.01 [2026-06-27_09-21-23] train=100.0% eval=97.7% err=1 lr=0.0500 time=31367ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 4 --epochsN 10 --encoding exp --step-power 0.01 [2026-06-29_09-14-02] train=100.0% eval=97.7% err=23 lr=0.0500 time=36107ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 [2026-06-29_09-18-29] train=100.0% eval=97.6% err=9 lr=0.0500 time=29919ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 3 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 [2026-06-27_09-00-56] train=100.0% eval=97.6% err=6 lr=0.0500 time=31496ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 7 --epochsN 10 --encoding exp --step-power 0.1 [2026-06-27_09-12-47] train=100.0% eval=97.6% err=5 lr=0.0500 time=31515ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 8 --epochsN 10 --encoding exp --step-power 0.1 [2026-06-27_09-20-33] train=100.0% eval=97.6% err=0 lr=0.0500 time=31714ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 8 --epochsN 10 --encoding exp --step-power 0.01 [2026-06-27_09-18-37] train=100.0% eval=97.6% err=7 lr=0.0500 time=32455ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 16 --epochsN 10 --encoding exp --step-power 0.1 [2026-06-27_09-19-47] train=100.0% eval=97.6% err=1 lr=0.0500 time=33006ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 16 --epochsN 10 --encoding exp --step-power 0.01 [2026-06-29_09-55-24] train=100.0% eval=97.6% err=20 lr=0.0500 time=35978ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 --random-file assets/random.bin [2026-06-29_09-37-21] train=100.0% eval=97.6% err=1 lr=0.0500 time=39146ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 3 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 --encoding exp --step-power 0.01 [2026-06-29_09-52-56] train=100.0% eval=97.6% err=0 lr=0.0500 time=49415ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 3 --epochsN 10 --step-err cos-time --encoding exp8,log8,sig8 --encoding exp --step-power 0.01 --random-file assets/random.bin [2026-06-29_09-40-10] train=100.0% eval=97.6% err=7 lr=0.0500 time=58752ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 3 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 [2026-06-29_09-42-27] train=100.0% eval=97.6% err=20 lr=0.0500 time=87202ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 17 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 [2026-06-29_09-14-51] train=100.0% eval=97.5% err=9 lr=0.0500 time=19772ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 1 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 [2026-06-27_11-32-28] train=100.0% eval=97.5% err=5 lr=0.0500 time=28216ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 3 --epochsN 10 --encoding exp8 --step-power 0.009 [2026-06-27_11-33-29] train=100.0% eval=97.5% err=10 lr=0.0500 time=28331ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 3 --epochsN 10 --encoding exp8 --step-power 0.09 [2026-06-27_09-17-36] train=100.0% eval=97.5% err=7 lr=0.0500 time=32426ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 4 --epochsN 10 --encoding exp --step-power 0.1
Appendix B: CIFAR-10 Scoreboard (Top 20)
run-grep --sort eval,-time cifar | head -20
[2026-06-28_16-07-07] train=83.4% eval=58.7% err=8296 lr=0.0100 time=357125ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 17 --epochsN 8 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8 --target-err 0.4 [2026-06-28_15-44-06] train=84.7% eval=57.4% err=7635 lr=0.0100 time=52568ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 2 --epochsN 10 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8 --target-err 0.4 [2026-06-28_15-48-15] train=85.0% eval=57.3% err=7512 lr=0.1000 time=52643ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 2 --epochsN 10 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8 --target-err 0.4 --lr 0.1 [2026-06-28_11-38-51] train=89.0% eval=57.2% err=5486 lr=0.1000 time=124230ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding al=down8,am=sig,ap=sig,bl=down8,bm=sig,bp=sig --lr 0.1 --step-err cos-time [2026-06-28_12-35-33] train=89.0% eval=57.2% err=5504 lr=0.1000 time=166985ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig --lr 0.1 --step-err cos-time --batchN 64 [2026-06-28_12-31-24] train=83.6% eval=57.1% err=8198 lr=0.1000 time=150303ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig --lr 0.1 --step-err cos-time [2026-06-28_12-27-13] train=83.8% eval=57.0% err=8117 lr=0.0900 time=146618ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig --lr 0.09 --step-err cos-time [2026-06-28_11-25-43] train=88.6% eval=56.9% err=5692 lr=0.1000 time=106503ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 6 --epochsN 10 --encoding al=down8,am=sig,ap=sig,bl=down8,bm=sig,bp=sig --lr 0.1 --step-err cos-time [2026-06-28_11-21-01] train=88.2% eval=56.7% err=5922 lr=0.1000 time=88618ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 5 --epochsN 10 --encoding al=down8,am=sig,ap=sig,bl=down8,bm=sig,bp=sig --lr 0.1 --step-err cos-time [2026-06-27_21-40-29] train=88.8% eval=56.6% err=5586 lr=0.1000 time=192821ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 16 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-err cos-time [2026-06-27_21-24-49] train=85.9% eval=56.2% err=7045 lr=0.1000 time=198089ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 16 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-power 5 [2026-06-28_11-24-08] train=87.8% eval=56.1% err=6120 lr=0.1000 time=70777ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 4 --epochsN 10 --encoding al=down8,am=sig,ap=sig,bl=down8,bm=sig,bp=sig --lr 0.1 --step-err cos-time [2026-06-27_22-01-22] train=92.3% eval=55.9% err=3861 lr=0.1000 time=203612ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 8 --epochsN 20 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-err cos-time [2026-06-27_20-11-55] train=85.2% eval=55.7% err=7394 lr=0.1000 time=97352ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 8 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-power 5 [2026-06-28_18-32-46] train=78.6% eval=55.6% err=10684 lr=0.0100 time=20424ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 1 --epochsN 8 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8 --target-err 0.4 --random-file assets/2026-06-18.bin [2026-06-27_21-46-51] train=87.8% eval=55.6% err=6091 lr=0.1000 time=97040ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 8 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-err cos-time [2026-06-27_21-21-40] train=87.4% eval=55.5% err=6294 lr=0.1000 time=127119ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 8 --epochsN 10 --batchN 32 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-power 5 [2026-06-28_11-34-09] train=90.5% eval=55.4% err=4753 lr=0.1000 time=70632ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 2 --epochsN 10 --encoding al=down8,am=sig,ap=sig,bl=down8,bm=sig,bp=sig --lr 0.1 --step-err cos-time [2026-06-27_18-48-11] train=85.0% eval=55.4% err=7513 lr=0.1000 time=83512ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-power 5 [2026-06-27_18-29-32] train=85.0% eval=55.4% err=7513 lr=0.1000 time=84175ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-power 5