What It Is

AI chips are semiconductor processors optimized for the matrix multiplications, tensor operations, and parallel computations that dominate machine learning and deep learning workloads. While general-purpose CPUs can run AI models, specialized chips deliver 10x to 1000x better performance per watt, making them essential for both training frontier models and running inference at scale.

The AI chip market exceeded $50 billion in 2025 and is the single most important bottleneck in AI development. NVIDIA controls roughly 80% of the training chip market. Access to compute — measured in GPU-hours — determines which organizations can train large language models, and geopolitical competition over chip manufacturing has become a national security issue.

Chip Categories

GPUs (Graphics Processing Units) — originally designed for rendering graphics, GPUs excel at parallel computation. NVIDIA's data center GPUs — A100, H100, H200, and the Blackwell B200 — are the workhorses of AI training. Each H100 delivers 3,958 teraflops of FP8 performance. AMD's MI300X competes on memory capacity with 192GB of HBM3. GPUs run CUDA (NVIDIA) or ROCm (AMD) software stacks.

TPUs (Tensor Processing Units) — Google's custom chips designed specifically for neural network operations. TPUs power Google's internal AI workloads and are available through Google Cloud. The TPU v5p delivers strong performance on transformer training and inference. TPUs use Google's XLA compiler rather than CUDA.

Custom ASICs — application-specific integrated circuits designed for particular AI workloads. Amazon's Trainium and Inferentia chips target training and inference respectively. Microsoft is developing Maia for Azure AI. Meta has built MTIA for recommendation workloads. Intel's Gaudi (acquired from Habana Labs) offers an alternative for training.

Edge AI chips — designed for inference on devices with power constraints. Qualcomm's AI Engine runs models on smartphones. Apple's Neural Engine processes on-device ML. Google's Edge TPU, NVIDIA Jetson, and Intel Movidius serve IoT and embedded applications. See edge computing.

Neuromorphic chips — inspired by biological neural circuits, these processors (Intel Loihi, IBM NorthPole) use spiking neural networks and event-driven computation. They promise extreme energy efficiency for specific workloads but remain largely experimental.

Architecture Fundamentals

AI chip performance depends on three factors: compute throughput, memory bandwidth, and interconnect speed.

Compute — measured in TOPS (trillions of operations per second) or FLOPS. Modern AI chips perform operations in reduced precision (FP8, INT8, FP4) rather than FP32, trading minor accuracy loss for massive throughput gains. NVIDIA's Blackwell B200 delivers 9 petaflops of FP4 performance.

Memory — AI models are memory-hungry. A 70-billion parameter model in FP16 requires 140GB just to store weights. HBM3e (High Bandwidth Memory) provides the bandwidth to feed compute units fast enough. The H100 has 80GB of HBM3 at 3.35 TB/s bandwidth. Memory capacity and bandwidth are often the true bottleneck, not raw compute.

Interconnect — training large models requires distributing work across thousands of chips. NVIDIA's NVLink connects GPUs within a server at 900 GB/s. NVSwitch scales to 256 GPUs in a single NVLink domain. InfiniBand and custom fabrics connect servers across a data center.

The Supply Chain

AI chip manufacturing is the most concentrated supply chain in the world. TSMC in Taiwan fabricates roughly 90% of advanced AI chips, including NVIDIA, AMD, Apple, and Qualcomm designs. This concentration creates geopolitical risk — U.S. export controls restrict advanced chip sales to China, and the CHIPS Act allocates $52 billion to build domestic fabrication capacity.

A single fabrication plant (fab) costs $15-20 billion and takes 3-4 years to build. Samsung and Intel also manufacture advanced chips, but TSMC's process technology lead (N3, N2 nodes) keeps most AI chip designers as customers.

Cost and Access

An NVIDIA H100 GPU costs approximately $25,000-$35,000. Building a training cluster for frontier models requires 10,000 to 100,000 GPUs — a capital investment of $500 million to $4 billion before power, cooling, and networking costs. Cloud GPU pricing runs $2-$4 per H100-hour.

This cost structure concentrates frontier AI development in a handful of organizations: OpenAI, Google, Meta, Anthropic, xAI, and a few others. Startups access compute through cloud providers or GPU-as-a-service companies like CoreWeave, Lambda, and Together AI.

Challenges

  • Supply constraints — demand for AI chips chronically exceeds supply. Lead times of 6-12 months for flagship GPUs are common, and hyperscalers place orders years in advance.
  • Power consumption — a single H100 draws 700W. A 100,000-GPU cluster consumes over 100 megawatts, equivalent to a small city. Power availability is becoming the limiting factor for new data centers.
  • Software lock-in — NVIDIA's CUDA ecosystem has a 15-year head start. Switching to AMD, Intel, or custom chips requires rewriting or adapting software, creating enormous switching costs.
  • Export controls — U.S. restrictions on selling advanced chips to China have fragmented the global AI hardware market and motivated Chinese firms to develop domestic alternatives.
  • Moore's Law slowdown — transistor scaling is slowing. Each new process node delivers smaller improvements at higher cost, putting pressure on architectural innovation to sustain performance gains.