Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/text-to-image

Text-to-image AI models are a powerful technology that can generate images based on textual descriptions, making them an essential tool for content creation, assistive technology, entertainment, and education.

The text description is first processed by a natural language processing (NLP) model, which extracts relevant features and keywords. This information is then passed to a generative model, which uses trained parameters to generate an image that matches the textual description. This innovative technology has the potential to transform visual content creation, making it more accessible and user-friendly.

For marketing and advertising professionals, text-to-image AI models can help create images that are tailored to specific campaigns or target audiences. Visually impaired individuals can use these models to better understand and interact with their environment, making them a valuable assistive technology. The entertainment industry can use text-to-image models to generate images for video games, virtual reality, and other immersive experiences. Finally, educators can use text-to-image models to create interactive diagrams, charts, and other resources to help students better understand complex concepts.

black-forest-labs/FLUX-1-dev cover image
featured
$0.02 x (width / 1024) x (height / 1024) x (iters / 25)
  • text-to-image

FLUX.1-dev is a state-of-the-art 12 billion parameter rectified flow transformer developed by Black Forest Labs. This model excels in text-to-image generation, providing highly accurate and detailed outputs. It is particularly well-regarded for its ability to follow complex prompts and generate anatomically accurate images, especially with challenging details like hands and faces.

black-forest-labs/FLUX-1-schnell cover image
featured
$0.0005 x (width / 1024) x (height / 1024) x iters
  • text-to-image

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. This model offers cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps.

CompVis/stable-diffusion-v1-4 cover image
$0.0005 / sec
  • text-to-image

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.

XpucT/Deliberate cover image
$0.0005 / sec
  • text-to-image

The Deliberate Model allows for the creation of anything desired, with the potential for better results as the user's knowledge and detail in the prompt increase. The model is ideal for meticulous anatomy artists, creative prompt writers, art designers, and those seeking explicit content.

prompthero/openjourney cover image
$0.0005 / sec
  • text-to-image

Text to image model based on Stable Diffusion.

runwayml/stable-diffusion-v1-5 cover image
$0.0005 / sec
  • text-to-image

Most widely used version of Stable Diffusion. Trained on 512x512 images, it can generate realistic images given text description

stability-ai/sdxl cover image
$0.0005 / sec
  • text-to-image

SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/) specialized for the final denoising steps. Note that the base model can be used as a standalone module.

stabilityai/stable-diffusion-2-1 cover image
$0.0005 / sec
  • text-to-image

Stable Diffusion is a latent text-to-image diffusion model. Generate realistic images given text description

uwulewd/custom-diffusion cover image
$0.0005 / sec
  • text-to-image

Stable diffusion with the ability to change checkpoint, still wip.