In Depth

Quantization can shrink a model by 4-8x, allowing large models to run on consumer hardware or cheaper cloud instances. Popular formats include GGUF and GPTQ. Most open-source model deployments use some form of quantization.