In Depth
Sora is OpenAI's text-to-video generation model, first previewed in February 2024. It can create videos up to 60 seconds long featuring complex scenes with multiple characters, specific types of motion, and detailed backgrounds. Sora understands how objects exist in the physical world, producing videos with consistent physics and spatial relationships.
The model works by processing video as sequences of spacetime patches, similar to how large language models process tokens. This approach allows it to handle videos of varying durations, resolutions, and aspect ratios. Sora can also extend existing videos, fill in missing frames, and generate videos from still images.
Sora represents a significant step toward AI systems that understand and simulate the physical world. For industries like film, advertising, education, and gaming, it offers the potential to dramatically reduce video production costs and timelines. However, it also raises important questions about deepfakes, content authenticity, and the future of creative professions.