In a groundbreaking development, OpenAI has introduced Sora, a text-to-photorealistic video artificial intelligence, making waves with its potential applications beyond the realm of video. The company showcased impressive sample clips, featuring scenarios ranging from a couple strolling through a snowy landscape to a seamlessly tracked vintage SUV navigating a dirt road.
Described as a “world simulator” by OpenAI, Sora boasts the ability to comprehend crucial aspects of the three-dimensional world. It can produce CGI-like scenes of digital landscapes or create videos depicting individuals navigating neon-lit streets at night.
OpenAI researchers express optimism about the broader implications of scaling video generation models. Tim Brooks, a research scientist on the Sora project, emphasized the emergence of 3D geometry and consistency from extensive exposure to data. “It learns about 3D geometry and consistency. We didn’t bake that in — it just entirely emerged from seeing a lot of data,” Brooks stated in an interview with Wired.
Sora represents an evolution of diffusion transformer models, traditionally employed for generating high-resolution images. In essence, diffusion models introduce noise to an original image and progressively learn to eliminate this noise, resulting in the creation of a new image.
To train Sora, OpenAI fed the model substantial amounts of captioned videos to establish a connection between video footage and text input. Beyond generating new footage, Sora can extend existing clips and transform AI-generated images into videos.
OpenAI acknowledges the emergence of intriguing capabilities during Sora’s scale training. The AI can simulate aspects of people, animals, and environments from the physical world, showcasing a notable understanding of 3D spaces.
While the potential for Sora extends to gaming, OpenAI acknowledges its imperfections. Notably, the model struggles with understanding cause and effect. Instances include a person biting into a cookie without leaving a mark, or a glass cup leaking without shattering first.
Despite these limitations, Sora offers a glimpse into a future where AI-generated video may become indistinguishable from reality. OpenAI, aware of the potential for misuse, plans a cautious rollout, involving red team assessments to identify potential harms or risks.
“We’re going to be very careful about all the safety implications for this,” affirmed project researcher Bill Peebles in an interview with Wired.