Generating video from text

Sora is an AI model capable of generating realistic and creative scenarios based on text prompts.

Sora is able to generate complex scenes with
multiple characters, specific types of motion
and accurate details of the subject and background.
The model understands not only what the user has
asked for in the prompt, but also how
those things exist in the physical world.

Prompt: A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast, the view showcases historic and magnificent architectural details and tiered pathways and patios, waves are seen crashing against the rocks below as the view overlooks the horizon of the coastal waters and hilly landscapes of the Amalfi Coast Italy, several distant people are seen walking and enjoying vistas on patios of the dramatic ocean views, the warm glow of the afternoon sun creates a magical and romantic feeling to the scene, the view is stunning captured with beautiful photography.

Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

Designing Captivating Graphics

The model possesses a deep understanding of language, enabling it to accurately interpret prompts and create compelling characters with vivid emotions. Furthermore, Sora can generate multiple scenes within a single video, ensuring consistency in both characters and visual style.

Safety

We will take several important safety steps before making Sora available
in OpenAI’s products. We are collaborating with domain experts in areas
such as misinformation, hateful content, and bias, who will
be adversarially testing the model.

We are also developing tools to detect misleading content
such as a detection classifier that can identify when a video was
generated by Sora. In the future, if we deploy the model in an
OpenAI product, we plan to include C2PA metadata.

In addition to developing new techniques for deployment
we are leveraging existing safety methods that we
built for our products using DALL·E 3, which are also applicable to Sora.

For example, once integrated into an OpenAI product, our text
classifier will check and reject text input prompts that violate
our usage policies, such as those requesting extreme
violence, sexual content, hateful imagery, celebrity
likeness, or others' intellectual property. We have also developed
robust image classifiers that review every frame of generated
videos to ensure they adhere to our usage policies before they are shown to the user.

We will engage policymakers, educators, and artists worldwide
to understand their concerns and identify positive use cases
for this new technology. Despite extensive research and testing
we cannot predict all the beneficial ways people will use our technology
nor all the ways people will abuse it. This is why we believe that learning
from real-world use is a critical component of creating and releasing
increasingly safe AI systems over time.

Video Description

Video Description




Sora is capable of generating complex scenes with multiple characters, specific types of motion, and accurate details of both the subject and background. The model not only understands what the user has requested in the prompt but also how those elements exist in the physical world.

Today, Sora is being made available to red teamers to assess critical areas for potential harms or risks. We are also providing access to several visual artists, designers, and filmmakers to gather feedback on how to improve the model for creative professionals.

We are sharing our research progress early to collaborate with and receive feedback from individuals outside of OpenAI, as well as to inform the public about the upcoming AI capabilities.

Our goal is to teach AI to understand and simulate the physical world in motion, ultimately training models that assist people in solving problems requiring real-world interaction.

Introducing Sora, our text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and fidelity to the user's prompt.



Prompt: The camera directly faces colorful buildings in Burano Italy. An adorable dalmation looks through a window on a building on the ground floor. Many people are walking and cycling along the canal streets in front of the buildings.

Prompt: A cat waking up its sleeping owner demanding breakfast. The owner tries to ignore the cat, but the cat tries new tactics and finally the owner pulls out a secret stash of treats from under the pillow to hold the cat off a little longer.

Exploring Research Methods

Sora is a diffusion model that generates videos by starting with an initial state resembling static noise and gradually refining it through multiple steps to remove the noise.

Like GPT models, Sora uses a transformer architecture, enabling exceptional scalability.

© Copyright 2024 Sora - All Rights Reserved