Pruning

Pruning

Definition A model compression technique that removes unnecessary parameters or connections from a neural network to make it smaller and faster without significantly hurting performance.

In Depth

Pruning reduces the size and computational cost of neural networks by identifying and removing parameters that contribute least to the model's performance. Just as pruning a tree removes unnecessary branches to promote healthy growth, neural network pruning removes redundant weights, neurons, or entire layers to create a more efficient model.

There are two main approaches: unstructured pruning removes individual weights (setting them to zero), while structured pruning removes entire neurons, channels, or attention heads. Structured pruning is generally more practical because it produces architectures that run faster on standard hardware, while unstructured pruning requires specialized sparse computation support to realize speedups.

Pruning is typically applied after training (post-training pruning) or iteratively during training (gradual pruning). Research has shown that large models can often be pruned by 50-90% with minimal accuracy loss, supporting the 'lottery ticket hypothesis' that dense networks contain smaller subnetworks that could achieve similar performance. Pruning is essential for deploying large models on resource-constrained devices like phones and edge hardware.

In Depth

Browse more terms