What is AI Alignment?

NexChron

AI Alignment

Definition The research field focused on ensuring that AI systems pursue goals and exhibit behaviors that are safe and beneficial to humans, even as they become more capable. Alignment asks: how do we build AI that reliably does what we actually want?

In Depth

Alignment encompasses value learning (inferring human preferences), scalable oversight (supervising AI that may be smarter than humans), robustness to distributional shift, and avoiding reward hacking. Leading research labs dedicate significant resources to alignment work, driven by concern that misaligned advanced AI could cause large-scale harm. Key open problems include specification gaming, inner misalignment, and deceptive alignment.

Browse more terms

AI Agent AI Alignment AI Audit AI Bill of Rights AI Compute AI Governance AI Orchestration AI Readiness AI Risk Management AI Watermarking AI-as-a-Service Activation Function Active Learning Adversarial Attack Agentic AI Agentic Workflow Algorithmic Fairness Arctic Artificial General Intelligence Artificial Superintelligence