In Depth
TOPS (Tera Operations Per Second) is a performance metric primarily used for AI inference accelerators and edge AI chips. While FLOPS measures floating-point operations (used in training), TOPS typically refers to integer operations at lower precision (INT8 or INT4), which are common in optimized inference workloads where quantized models run with reduced numerical precision for efficiency.
Consumer and edge AI chips are frequently marketed using TOPS ratings. Apple's Neural Engine, Qualcomm's AI Engine, Intel's NPU, and NVIDIA's Jetson series all advertise TOPS as a headline specification. Microsoft's Copilot+ PC standard requires a minimum of 40 TOPS from the device's neural processing unit.
However, TOPS alone can be misleading because real-world performance depends on memory bandwidth, software optimization, supported model architectures, and data movement efficiency. A chip with higher TOPS may actually run specific models slower than one with lower TOPS but better memory bandwidth or software stack. Businesses should evaluate TOPS alongside benchmarks on representative workloads rather than comparing raw numbers alone.