
Imagen: Text-to-Image Diffusion Models
Imagen is an AI system that creates photorealistic images from input text Visualization of Imagen. Imagen uses a large frozen T5-XXL encoder to encode the input text into embeddings. A …
Imagen Video
Imagen Video is another step forward in generative modelling capabilities, advancing text-to-video AI systems. Video generative models can be used to positively impact society, for example by …
Imagen Editor & EditBench
A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built by fine-tuning …
To probe image quality, the rater is asked to select between the model generation and reference image using the question: “Which image is more photorealistic (looks more real)?”.
We confirm that recent findings in the text-to-image setting transfer to video generation, such as the effectiveness of frozen encoder text conditioning and classifier-free guidance.