NexChron

Transformer

Definition A neural network architecture that uses self-attention mechanisms to process sequences of data in parallel, enabling highly efficient training at scale. It is the foundation of virtually all modern large language models.

In Depth

Introduced in the 2017 paper "Attention Is All You Need," the transformer replaced recurrent architectures with multi-head attention layers that weigh the relevance of each token to every other token. This parallelism allows training on massive datasets using GPU clusters. Transformers now power models across text, image, audio, and video domains.

Browse more terms

AI Agent AI Alignment AI Audit AI Bill of Rights AI Compute AI Governance AI Orchestration AI Readiness AI Risk Management AI Watermarking AI-as-a-Service Activation Function Active Learning Adversarial Attack Agentic AI Agentic Workflow Algorithmic Fairness Arctic Artificial General Intelligence Artificial Superintelligence