Backpropagation has been the dominant training algorithm since 2012. It works by computing gradients (derivatives) through the entire network — backwards from output to input — and adjusting each weight proportionally.
Core limitations:
- Requires continuous, differentiable functions — excludes boolean/binary logic
- Training and inference use different computational paths — hardware duplication
- Biologically implausible — no known brain mechanism performs global backprop
- Massive memory overhead storing activations for the backward pass