The Algorithm
--epochsN N : iterations (default 1 = 1-pass)
--lr STEP : log-odds step (default 0.05 = 5000/100000)
For each epoch:
1. Classify ALL training samples (same forward)
2. For each MISCLASSIFIED sample:
true_pred: logit + step (strengthen correct class)
false_pred: logit - step (weaken wrong class)
3. Early stopping: save best eval
Why it works
The Bayes log-Score maximizes log-likelihood under the (approximately correct) independence assumption. The 0-1 loss is a different objective. Iterative correction in log-odds space directly optimizes the 0-1 loss — without leaving int32 arithmetic. With enough epochs this surpasses AdamW because the correction is targeted at the 0-1 classification boundary.