DRAM-Native &|~ Classification

Otto Score: DRAM-native Classification via Bit-Logic + MAJ3

Status Paper — June 2026  |  forward-prop research project

Abstract

We present Otto Score, a classification method designed for DRAM inference chips that use only bit-level operations (&, |, ~, XNOR, popcount). No floating-point, no integer multiply-accumulate, no division during inference.

The method achieves 99.0% on MNIST and 58.7% on CIFAR-10 using pure &|~ + int32 at inference time. Training uses float32 SGD as a compromise (gradients need continuous values), but the trained model is exported to pure integer weights for deployment.

Key architectural insight: Every ensemble member is computed independently and in parallel. Adding more members increases accuracy without increasing inference latency — members simply occupy more DRAM rows running simultaneously.

Terminology

TermDefinition
ChannelA transformed view of the input pixels (e.g., Luminance, Red-Green opponent, Hue). Each channel produces NC_slice uint32 containers per image.
EncodingA bit-level mapping from pixel values to thermometer patterns (e.g., exp8, sig8). Converts continuous float values to binary patterns with similar-popcount adjacency.
MemberOne independent classifier: one frozen random W0 matrix + one trained target matrix. A member sees exactly one channel with one encoding.
EnsembleA collection of members whose scores are summed for the final prediction. Members are parallel by design — each occupies a separate DRAM row.
HiddenN (H)Number of MAJ3 neurons per member. Each neuron produces 1 bit per class vote.
MAJ3Majority-of-3: compresses NC_slice uint32 containers into 32 bits via bitwise majority voting. A lossy but noise-robust projection.
W0The frozen random projection matrix (uint32 bit patterns). Never trained.
TargetThe trained log-odds correction matrix (int32 per class × neuron × bit). The only learned parameter.
NC_sliceNumber of uint32 containers per member input. Determined by splitHN (horizontal split factor).
trainN / evalNNumber of training / evaluation samples.
epochsN (Ep)Number of iterative correction passes over the training data.
step-errStep decay mode: cos-time, pow, cos-err, const.
target-errSoft convergence target: training stops when error rate drops below threshold.
Bit-MassTotal information capacity: H × EN × 32 bits per neuron.
ContainerA uint32 value packing multiple pixels (typically 4 pixels × 8 bits). The unit of MAJ3 voting.

1. Parallelism: More Members, Same Latency

Every ensemble member is a fully independent classifier:

Member 0: channel_0 + encoding_0  →  W0[0]  →  MAJ3  →  target[0]  →  score_0[c]
Member 1: channel_1 + encoding_1  →  W0[1]  →  MAJ3  →  target[1]  →  score_1[c]
Member 2: channel_2 + encoding_2  →  W0[2]  →  MAJ3  →  target[2]  →  score_2[c]  → SUM → argmax
...
Member N: channel_N + encoding_N  →  W0[N]  →  MAJ3  →  target[N]  →  score_N[c]

On a DRAM chip, each member occupies one row (or a small group of rows). All members compute simultaneously — the chip's row buffers activate every member's W0 in parallel. Score summing is a lightweight reduction.

Consequence: Accuracy is limited by data, not by compute time. Doubling the ensemble (EN: 7 → 17) costs zero additional inference latency. On MNIST, adding members from EN=1 to EN=17 gains +8pp (89.9% → 97.6%) with no runtime penalty at chip level.

DRAM row budget: A modern DDR5 bank has 65,536 rows. With H=128 neurons per member, that's enough for 512 members in parallel — far beyond what our experiments have tested.

2. Methodic: The Number World vs Binary World

2.1 The Core Problem

A DRAM chip can perform bitwise operations on entire rows in parallel — but it cannot efficiently multiply or add floating-point numbers. Every input pixel must be represented as a bit pattern that the chip can compare via XNOR + popcount.

This creates a fundamental tension: integer/floating-point values are continuous, but binary patterns are discrete.

2.2 MNIST: When Raw Input Works

MNIST handwritten digits are binary by nature: each pixel is either "ink" (>128, treat as 1) or "paper" (<128, treat as 0). When packed into uint32 containers, the resulting 32-bit pattern directly represents the digit's shape. No encoding transformation is needed.

Raw MNIST pixel (8-bit):   0x00 0xFF 0xFF 0x00  → 0x00FF00FF
                            (paper, ink, ink, paper)

Standard Otto Score reaches 86% in 1 pass and 97%+ with iteration on MNIST without any special encoding — the signal is already binary.

2.3 CIFAR: The Continuity Requirement

CIFAR-10 color images have grayscale gradients. A pixel value of 127 and 128 differ by only 1 in the number world, but in raw 8-bit binary they differ by up to 8 bits (0x7F → 0x80 = bit pattern flip from 01111111 to 10000000). This discontinuity breaks the XNOR+popcount similarity measure.

2.4 The Solution: Thermometer Encoding

Thermometer encoding restores continuity: each value v (0–255) is mapped to a pattern where the first v>>s bits are 1, the rest 0:

raw 8-bit:  127→0x7F (popcount=7)    128→0x80 (popcount=1)    ← DISCONTINUOUS
thermometer:127→0x1F (popcount=5)     128→0x20 (popcount=6)    ← CONTINUOUS

Two adjacent values now have similar popcount — the bit-level similarity reflects the numerical similarity.

2.5 Channels + Encodings Define a Member

Every member is defined by the pair (channel, encoding). Multiple members can share the same channel with different encodings, or the same encoding on different channels.

Channels

TokenNameFormulaDescription
rRedraw RRaw red pixel
gGreenraw GRaw green pixel
bBlueraw BRaw blue pixel
yLuminance (BT.601)(77R+150G+29B)>>8Broadcast luminance
ylLuminance (BT.709)(54R+183G+18B)>>8HDTV luminance
lum/alLuminance (R+G)(R+G)>>1Simple luminance sum
rg/amRed-Green opp.(R-G+255)>>1Red vs Green (L-M)
by/apBlue-Yellow opp.(2B-R-G+510)>>2Blue vs Yellow (S-(L+M))
blLuminance (R+B)(R+B)>>1Alternative luminance
bmR-B opponent(R-B+255)>>1Red-Blue difference
bpG-(R+B)/2 opp.(2G-R-B+510)>>2Green vs Red+Blue
hHueatan2(2R-G-B, G-B)Color angle (HSV)
sSaturationmax(R,G,B)-min(R,G,B)Color purity
cContrastSobel 3×3 edge magnitudeSpatial edges on LUM

Encodings

Each encoding first computes a level (0 to max_lev, where max_lev = width for most encodings, except lin7 where max_lev = 7). The level is then converted to a thermometer bitmask:

output = ((uint32_t)1 << level) - 1    // level lower bits set to 1
// Examples: level=0 → 0x00000000, level=3 → 0x00000007 (popcount=3),
//           level=8 → 0x000000FF (popcount=8), level=32 → 0xFFFFFFFF

Term definitions:
pv = pixel value (0–255) — the raw 8-bit input from the image after channel transformation.
width = encoding output width in bits (8, 16, or 32). Determines the number of thermometer levels: max_lev = width (except lin7).
max_lev = maximum thermometer level = number of bits in the output pattern. Default: width (so width=8 → max_lev=8 → output ranges from 0x00 to 0xFF).
low_lev = segment boundary level used by multi-segment encodings (down, up, mid) to allocate thermometer resolution across bright/dark/mid ranges.
mid_lev = max_lev − low_lev — the remaining levels allocated to the opposite segment.
After computing level, the final output is always: ((uint32_t)1 << level) − 1 (thermometer mask: exactly level lower bits set to 1).

Encodinglevel formulamax_levBest For
raw return pv (identity, no thermometer) MNIST, binary signals
lin7 pv >> 5 7 (fixed) Legacy 7-level linear
lin/lin8 (pv × max_lev) >> 8 width Luminance — uniform slope
down pv < 64(pv × low_lev) / 64
pv ≥ 64low_lev + ((pv-64) × (max_lev-low_lev)) / 192
low_lev = max_lev × 3 / 7
width Shadows — more levels in dark range
up pv > 192low_lev + ((pv-192) × (max_lev-low_lev)) / 63
pv ≤ 192(pv × low_lev) / 192
low_lev = max_lev × 4 / 7
width Highlights — more levels in bright range
mid pv < 80(pv × low_lev) / 80
pv > 175low_lev + ((pv-175) × mid_lev) / 80
elselow_lev + ((pv-80) × mid_lev) / 95
low_lev = max_lev × 3/8, mid_lev = max_lev - low_lev
width Mid-tones — band-pass, compresses shadows and highlights
log log₂(pv + 1) × max_lev / log₂(256) width Power-law distributions (edges) — high res for dark
exp (pv / 255)e × max_lev, with e = 0.30 (width=8/32) or e = 0.35 (width=16) width Brightness-sparse — high res for bright pixels
sig σ(12 × (pv/255 − 0.5)) × max_lev,
where σ(x) = 1 / (1 + e−x)
width Opponent channels (RG,BY) — S-shaped, saturates extremes

Examples for width=8 (max_lev=8):

pvrawlinexplogsigupdownmid
00x00 (0)0x00 (0)0x00 (0)0x00 (0)0x00 (0)0x00 (0)0x00 (0)0x00 (0)
320x20 (1)0x01 (1)0x07 (3)0x3F (6)0x00 (0)0x00 (0)0x03 (2)0x07 (3)
640x40 (1)0x03 (2)0x1F (5)0x7F (7)0x00 (0)0x01 (1)0x0F (4)0x0F (4)
1280x80 (1)0x0F (4)0xFF (8)0xFF (8)0x0F (4)0x07 (3)0x7F (7)0xFF (8)
1920xC0 (2)0x3F (6)0xFF (8)0xFF (8)0xFF (8)0x1F (5)0xFF (8)0x7F (7)
2550xFF (8)0xFF (8)0xFF (8)0xFF (8)0xFF (8)0xFF (8)0xFF (8)0xFF (8)

Note: popcount is shown in parentheses. The raw encoding produces popcount equal to the number of set bits in the 8-bit value — completely unrelated to the pixel's brightness (0x80 = 128 has popcount 1, same as 0x01 = 1). This is why thermometer encoding is essential for photographic images.

Member Definition Examples

# One member: channel=mnist, encoding=exp8
--channels mnist --encoding exp8

# Two members: same channel, different encodings
--channels mnist --encoding exp8,log8

# Three members: different channels with per-channel encoding
--channels g,bl,bm --encoding g=up8,bl=down8,bm=sig

# 10 members: full CIFAR best config
--encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8

2.6 Container Width: 8/16/32 Does Not Help

The natural question: does widening the thermometer (8 → 16 → 32 bits) improve accuracy? Empirically: no.

Each thermometer value is passed through MAJ3, which compresses NC_slice containers into exactly 32 bits via bitwise majority voting. After MAJ3, every bit-width produces exactly 32 bits. A wider container does not increase the information reaching the target.

Width 8:   4 pixels × 8 bit  = 1 uint32 container  → MAJ3 → 32 bits
Width 16:  2 pixels × 16 bit = 1 uint32 container  → MAJ3 → 32 bits
Width 32:  1 pixel  × 32 bit = 1 uint32 container  → MAJ3 → 32 bits

The correct lever is more members, not wider containers. Duplication (channel + encoding variants) creates independent MAJ3 projections with different random W0.

2.7 Encoding Orthogonality

Encodings should be orthogonal: two encodings on the same pixel should produce bit patterns that are as decorrelated as possible. The best MNIST config (exp8,log8,sig8) exploits this:

EncodingMaps v=128 toPopcountCharacter
exp80x011Compressed dark range
log80xFE7Compressed bright range
sig80x602S-curve, mid-range emphasis

For CIFAR, orthogonality is achieved by pairing channels with encodings matched to their statistical distribution:

ChannelDistributionEncodingWhy
G (raw green)Uniform 0–255up8Ascending highlights bright greens
BL (R+B lum)Uniform 0–255down8Descending highlights dark areas
BM (R-B opp.)Peaked at 128sig8Sigmoid saturates opponent extremes
H (hue angle)Cyclic 0–255up8Ascending preserves angle order
C (Sobel edges)Power-law (80% < 30)log8Log gives dark edges more levels

3. Architecture: Otto Score

3.1 Inference (DRAM-native)

Input pixels → Thermometer encoding → uint32 containers
                                            ↓
Layer 0: Frozen random projection (W0) via MAJ3
         W0: random uint32 bit patterns (permanent, never trained)
         MAJ3: For each of H neurons, majority-vote over NC_slice containers
               → 1 bit per neuron per class vote
                                            ↓
Layer 1: Trained Target (Bayesian log-Score)
         target[c][h][b]: 32-bit log-odds correction
         score[c] = sum over h,b of target[c][h][b] * h0_bit[h][b]
                                            ↓
         argmax(score) → predicted class

3.2 Training (float32 SGD)

for each sample (x, y_true):
    h0 = MAJ3(W0 @ encode(x))
    scores[c] = sum(target[c] * h0)
    pred = argmax(scores)
    if pred != y_true:
        target[y_true][h0_bit==1] += step    // reward correct
        target[pred][h0_bit==1]   -= step    // penalize wrong

3.3 Step Size Physics

ModeFormulaBehavior
powstep_init × (err/total)powerDecays with error — can oscillate
cos-timestep_init × cos(progress × π/2)Smooth decay — independent of error. Winner.
cos-errstep_init × cos(err/total × π/2)Error-dependent — unsuitable
conststep_initOscillates in late epochs

cos-time is the clear winner for CIFAR. The step decreases independently of the current error, preventing the oscillation spiral: high error → large step → overshoot → error stays high.

4. Results

4.1 MNIST

evalNBest evalENHEpRatio (train:eval)
10099.0%71288599:1
100098.1%7128859:1
1000097.7%712885:1

The 98%-wall is data-limited, not method-limited. With 599:1 ratio we reach 99.0%. At 5:1, only 97.7%.

Best known config (standard split, evalN=10000):
./mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8

4.2 CIFAR-10

ConfigevalENHEpTime
10-channel + Hue + Sobel C58.7%171288357s
10-channel (no H/C)57.2%712810124s
6-channel (opponent blocks)56.9%612810107s
4-channel baseline56.6%1612810193s
Best known config:
./cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 17 --epochsN 8 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8 --target-err 0.4

4.3 Bit-Mass Scaling Law

Doubling the bit-mass yields ~0.3pp. Logarithmic scaling, as predicted by information theory.

H_eff (H×EN)MNIST evalΔ
51296.9%
102497.3-97.4%+0.4pp
204897.6-97.7%+0.3pp
4096 (extrapolated)~97.9-98.0%+0.3pp

5. Key Insights

5.1 W0 Diversity Is the Ensemble Feature

The random projection W0 is never trained. Its role is to create diverse binary fingerprints of the input. --member-seed const (all members share identical W0) regresses to 89.9% — barely better than N=1.

5.2 Human Color Vision as Blueprint

The Otto Score channel set is directly inspired by the human visual system: three cone types (L=red, M=green, S=blue) transformed into opponent channels in the retina. See color-vision-opponent-channels.md.

Human Visual SystemOtto Score ChannelPurpose
L+M (luminance)lum / alBrightness, edges, texture
L−M (red-green)rg / amRed vs Green opponent
S−(L+M) (blue-yellow)by / apBlue vs Yellow opponent
R−BbmRed-Blue difference
Spatial precisionh, cHue angle + Sobel edges

The opponent transformation is critical: raw R,G,B are highly correlated (all rise with brightness). Opponent channels decorrelate brightness from color.

5.3 MAJ3 Is an Information Bottleneck

MAJ3 compresses NC_slice uint32 containers → 32 bits. For MNIST (NC=196) the compression is benign. For CIFAR (NC=768), the compression loses critical shade information — hence thermometer encoding is essential.

5.4 Per-Member Architecture

Every member has its own W0, target matrix, step counter, and error counter. This allows members processing different channels to converge at different rates.

5.5 True Random vs PRNG

True random W0 improves eval by +0.5pp over splitmix64 PRNG. Correlated PRNG values produce less diverse W0 → less diverse projections.

6. Open Questions

6.1 The Data Wall

The Otto Score is data-limited. Larger datasets (ImageNet-1K with 1.2M images, or a dedicated DRAM-scale dataset with 10M+ samples) would directly translate to higher accuracy.

6.2 Depth vs Ensemble

Deep MAJ3 stacks (2+ layers) lose information. The ensemble approach (parallel MAJ3 banks) is strictly better than depth at equal bit-mass.

6.3 CIFAR Gap to Float

Otto Score at 58.7% already matches float32 shallow nets. Deep nets (ResNet-18: 93%+) remain out of reach for the MAJ3 framework.

7. Conclusion

Otto Score reaches 99.0% on MNIST and 58.7% on CIFAR-10 using only &|~ + int32 at inference time. The method is data-limited: accuracy scales logarithmically with bit-mass and linearly with training data per evaluation sample.

The core innovations are:

  1. Thermometer encoding — bridges the number-world/binary-world gap
  2. Per-channel encoding selection — each channel type gets its optimal binary mapping
  3. cos-time step decay — gentle error-independent step reduction prevents oscillation
  4. Multi-encoding ensembleexp8+log8+sig8 provides complementary projections
  5. Target-err convergence — soft stopping criterion based on error-rate target
  6. Free parallelism — ensemble members run simultaneously on separate DRAM rows

Appendix A: MNIST Scoreboard (Top 20)

run-grep --sort eval,-time mnist | head -20

[2026-06-29_10-15-48] train=99.8% eval=99.0% err=30 lr=0.0500 time=39908ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 --trainN 59900 --evalN 100
[2026-06-29_10-14-37] train=99.6% eval=98.1% err=32 lr=0.0500 time=39418ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 --trainN 59000 --evalN 1000
[2026-06-29_09-38-14] train=100.0% eval=97.8% err=0 lr=0.0500 time=49370ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 3 --epochsN 10 --step-err cos-time --encoding exp8,log8,sig8 --encoding exp --step-power 0.01
[2026-06-27_09-21-23] train=100.0% eval=97.7% err=1 lr=0.0500 time=31367ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 4 --epochsN 10 --encoding exp --step-power 0.01
[2026-06-29_09-14-02] train=100.0% eval=97.7% err=23 lr=0.0500 time=36107ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8
[2026-06-29_09-18-29] train=100.0% eval=97.6% err=9 lr=0.0500 time=29919ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 3 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8
[2026-06-27_09-00-56] train=100.0% eval=97.6% err=6 lr=0.0500 time=31496ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 7 --epochsN 10 --encoding exp --step-power 0.1
[2026-06-27_09-12-47] train=100.0% eval=97.6% err=5 lr=0.0500 time=31515ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 8 --epochsN 10 --encoding exp --step-power 0.1
[2026-06-27_09-20-33] train=100.0% eval=97.6% err=0 lr=0.0500 time=31714ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 8 --epochsN 10 --encoding exp --step-power 0.01
[2026-06-27_09-18-37] train=100.0% eval=97.6% err=7 lr=0.0500 time=32455ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 16 --epochsN 10 --encoding exp --step-power 0.1
[2026-06-27_09-19-47] train=100.0% eval=97.6% err=1 lr=0.0500 time=33006ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 16 --epochsN 10 --encoding exp --step-power 0.01
[2026-06-29_09-55-24] train=100.0% eval=97.6% err=20 lr=0.0500 time=35978ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 --random-file assets/random.bin
[2026-06-29_09-37-21] train=100.0% eval=97.6% err=1 lr=0.0500 time=39146ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 3 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8 --encoding exp --step-power 0.01
[2026-06-29_09-52-56] train=100.0% eval=97.6% err=0 lr=0.0500 time=49415ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 3 --epochsN 10 --step-err cos-time --encoding exp8,log8,sig8 --encoding exp --step-power 0.01 --random-file assets/random.bin
[2026-06-29_09-40-10] train=100.0% eval=97.6% err=7 lr=0.0500 time=58752ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 3 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8
[2026-06-29_09-42-27] train=100.0% eval=97.6% err=20 lr=0.0500 time=87202ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 17 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8
[2026-06-29_09-14-51] train=100.0% eval=97.5% err=9 lr=0.0500 time=19772ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 1 --epochsN 8 --step-err cos-time --encoding exp8,log8,sig8
[2026-06-27_11-32-28] train=100.0% eval=97.5% err=5 lr=0.0500 time=28216ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 3 --epochsN 10 --encoding exp8 --step-power 0.009
[2026-06-27_11-33-29] train=100.0% eval=97.5% err=10 lr=0.0500 time=28331ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 3 --epochsN 10 --encoding exp8 --step-power 0.09
[2026-06-27_09-17-36] train=100.0% eval=97.5% err=7 lr=0.0500 time=32426ms → ./mnist-1/mnist-mlp-otto-score-trn-xnor.exe --hiddenN 512 --ensembleN 4 --epochsN 10 --encoding exp --step-power 0.1

Appendix B: CIFAR-10 Scoreboard (Top 20)

run-grep --sort eval,-time cifar | head -20

[2026-06-28_16-07-07] train=83.4% eval=58.7% err=8296 lr=0.0100 time=357125ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 17 --epochsN 8 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8 --target-err 0.4
[2026-06-28_15-44-06] train=84.7% eval=57.4% err=7635 lr=0.0100 time=52568ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 2 --epochsN 10 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8 --target-err 0.4
[2026-06-28_15-48-15] train=85.0% eval=57.3% err=7512 lr=0.1000 time=52643ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 2 --epochsN 10 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8 --target-err 0.4 --lr 0.1
[2026-06-28_11-38-51] train=89.0% eval=57.2% err=5486 lr=0.1000 time=124230ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding al=down8,am=sig,ap=sig,bl=down8,bm=sig,bp=sig --lr 0.1 --step-err cos-time
[2026-06-28_12-35-33] train=89.0% eval=57.2% err=5504 lr=0.1000 time=166985ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig --lr 0.1 --step-err cos-time --batchN 64
[2026-06-28_12-31-24] train=83.6% eval=57.1% err=8198 lr=0.1000 time=150303ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig --lr 0.1 --step-err cos-time
[2026-06-28_12-27-13] train=83.8% eval=57.0% err=8117 lr=0.0900 time=146618ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig --lr 0.09 --step-err cos-time
[2026-06-28_11-25-43] train=88.6% eval=56.9% err=5692 lr=0.1000 time=106503ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 6 --epochsN 10 --encoding al=down8,am=sig,ap=sig,bl=down8,bm=sig,bp=sig --lr 0.1 --step-err cos-time
[2026-06-28_11-21-01] train=88.2% eval=56.7% err=5922 lr=0.1000 time=88618ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 5 --epochsN 10 --encoding al=down8,am=sig,ap=sig,bl=down8,bm=sig,bp=sig --lr 0.1 --step-err cos-time
[2026-06-27_21-40-29] train=88.8% eval=56.6% err=5586 lr=0.1000 time=192821ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 16 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-err cos-time
[2026-06-27_21-24-49] train=85.9% eval=56.2% err=7045 lr=0.1000 time=198089ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 16 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-power 5
[2026-06-28_11-24-08] train=87.8% eval=56.1% err=6120 lr=0.1000 time=70777ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 4 --epochsN 10 --encoding al=down8,am=sig,ap=sig,bl=down8,bm=sig,bp=sig --lr 0.1 --step-err cos-time
[2026-06-27_22-01-22] train=92.3% eval=55.9% err=3861 lr=0.1000 time=203612ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 8 --epochsN 20 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-err cos-time
[2026-06-27_20-11-55] train=85.2% eval=55.7% err=7394 lr=0.1000 time=97352ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 8 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-power 5
[2026-06-28_18-32-46] train=78.6% eval=55.6% err=10684 lr=0.0100 time=20424ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 1 --epochsN 8 --encoding g=up8,bl=down8,bm=sig,bp=sig,b=up8,al=down8,am=sig,ap=sig,h=up8,c=log8 --target-err 0.4 --random-file assets/2026-06-18.bin
[2026-06-27_21-46-51] train=87.8% eval=55.6% err=6091 lr=0.1000 time=97040ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 8 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-err cos-time
[2026-06-27_21-21-40] train=87.4% eval=55.5% err=6294 lr=0.1000 time=127119ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 8 --epochsN 10 --batchN 32 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-power 5
[2026-06-28_11-34-09] train=90.5% eval=55.4% err=4753 lr=0.1000 time=70632ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 256 --ensembleN 2 --epochsN 10 --encoding al=down8,am=sig,ap=sig,bl=down8,bm=sig,bp=sig --lr 0.1 --step-err cos-time
[2026-06-27_18-48-11] train=85.0% eval=55.4% err=7513 lr=0.1000 time=83512ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-power 5
[2026-06-27_18-29-32] train=85.0% eval=55.4% err=7513 lr=0.1000 time=84175ms → ./cifar-1/cifar-mlp-otto-score-trn-xnor.exe --hiddenN 128 --ensembleN 7 --epochsN 10 --encoding b=down,lum=lin8,by=sig,rg=sig --lr 0.1 --step-power 5