In Depth

Latency matters for real-time applications like chatbots and voice assistants. Typical API latencies: 200-500ms for first token, with streaming delivering subsequent tokens continuously. Smaller models have lower latency. On-device models have the lowest latency but less capability.