In Depth
Alignment encompasses value learning (inferring human preferences), scalable oversight (supervising AI that may be smarter than humans), robustness to distributional shift, and avoiding reward hacking. Leading research labs dedicate significant resources to alignment work, driven by concern that misaligned advanced AI could cause large-scale harm. Key open problems include specification gaming, inner misalignment, and deceptive alignment.