In Depth

Tensor Processing Units (TPUs) are application-specific integrated circuits (ASICs) designed by Google specifically for neural network computation. First deployed in 2015, TPUs are optimized for the matrix multiplications and convolutions that dominate deep learning, offering high performance per watt compared to general-purpose GPUs.

TPUs are available in multiple generations (v1 through v5 and beyond) with increasing capabilities. They can be connected into large pods of thousands of chips for distributed training of the largest models. Google used TPU pods to train PaLM, Gemini, and many other landmark models. TPUs are available to external users through Google Cloud.

For businesses, TPUs offer a compelling alternative to NVIDIA GPUs, especially for organizations already in the Google Cloud ecosystem. TPUs are typically more cost-effective for training large transformer models, and Google's JAX framework is optimized for TPU execution. However, the TPU ecosystem is more limited than NVIDIA's CUDA ecosystem, and not all software frameworks support TPUs equally well.