In Depth
AI red teaming borrows from cybersecurity's adversarial mindset: testers try jailbreaks, prompt injections, roleplay-based bypasses, and edge-case inputs to surface harms the model might produce. Major AI labs conduct both internal red team exercises and external engagements with independent researchers. Findings inform additional safety training, guardrail tuning, and deployment restrictions.