In Depth
A large teacher model generates outputs on many examples, then a smaller student model learns to produce similar outputs. The result is a model that is 10-100x cheaper to run with 80-95% of the performance. This is how most production AI deployments work.