AI Compute

Definition The total computational resources required for training and running AI models, encompassing hardware (GPUs, TPUs), cloud services, and the energy needed to power them.

In Depth

AI compute refers to the computational power needed for training AI models and running them in production. The scale of compute used in AI has grown exponentially: training compute for frontier models has increased by roughly 4x per year, with the largest training runs now consuming thousands of GPUs for months at costs exceeding $100 million.

The AI compute landscape involves several layers: hardware manufacturers (NVIDIA, AMD, Intel, custom silicon from Google, Amazon, Microsoft), cloud providers offering GPU instances (AWS, Azure, GCP, CoreWeave, Lambda), and the organizations consuming compute for model development and deployment. The supply-demand imbalance for AI compute, particularly NVIDIA GPUs, has made it a strategic resource comparable to oil in the energy industry.

For businesses, AI compute strategy, whether to build on-premises infrastructure, use cloud resources, or consume AI via APIs, is a critical decision affecting costs, capabilities, and competitive positioning. The energy consumption of AI compute has also become a significant concern, with major AI data centers consuming hundreds of megawatts and driving new investments in energy infrastructure. Efficient use of compute through techniques like quantization, distillation, and speculative decoding has become increasingly important.

AI Compute

In Depth

Browse more terms