Latency

Definition The time delay between sending a request to an AI model and receiving the first token of the response.

In Depth

Latency matters for real-time applications like chatbots and voice assistants. Typical API latencies: 200-500ms for first token, with streaming delivering subsequent tokens continuously. Smaller models have lower latency. On-device models have the lowest latency but less capability.

Browse more terms

AI Agent AI Alignment AI Audit AI Bill of Rights AI Compute AI Governance AI Orchestration AI Readiness AI Risk Management AI Watermarking AI-as-a-Service Activation Function Active Learning Adversarial Attack Agentic AI Agentic Workflow Algorithmic Fairness Arctic Artificial General Intelligence Artificial Superintelligence