In Depth
FLOPS (Floating Point Operations Per Second) is the standard metric for measuring the computational power of processors and the computational requirements of AI workloads. In AI contexts, you encounter TFLOPS (trillion), PFLOPS (quadrillion), and EFLOPS (quintillion) to describe the massive scale of modern hardware and training runs.
AI hardware specifications typically list peak FLOPS at different numerical precisions. For example, NVIDIA's H100 GPU achieves approximately 990 TFLOPS at FP16 (half precision) and nearly 2,000 TFLOPS with sparsity enabled. Training large language models requires enormous total FLOP budgets: GPT-4 is estimated to have required approximately 2x10^25 FLOPS total, taking thousands of GPUs running for months.
FLOPS metrics are essential for capacity planning, cost estimation, and understanding scaling laws. Researchers use FLOP counts to compare models trained with different hardware and durations on a common scale. The Chinchilla scaling laws, for example, describe optimal model size and training data as a function of total compute budget measured in FLOPS. Understanding FLOPS helps businesses estimate training costs, compare hardware options, and plan infrastructure investments.