In Depth
Batch normalization (BatchNorm) is a technique introduced in 2015 that normalizes the values flowing between layers of a neural network. For each mini-batch of training data, it adjusts the values to have zero mean and unit variance, then applies learnable scaling and shifting parameters. This stabilizes the distribution of inputs that each layer receives.
Without batch normalization, the distribution of inputs to deeper layers shifts as earlier layers update their weights during training, a problem called internal covariate shift. This makes training slower and more sensitive to initialization and learning rate choices. Batch normalization addresses this by ensuring consistent input distributions, allowing higher learning rates and faster convergence.
Batch normalization has become a standard component in many neural network architectures, particularly convolutional neural networks for computer vision. However, it can behave differently during training versus inference (since batch statistics aren't available at inference time), and alternatives like layer normalization and group normalization are often preferred in transformer architectures and other settings.