Stable Diffusion 3 vs ChatGPT Dalle-3 vs Midjourney [NEW Best Image Generator?]

AI Andy
3 Mar 202420:50

TLDRIn this video, the host compares three image generation models: Stable Diffusion 3, Midjourney, and Dalle-3, using the same prompts for each to evaluate their performance. The evaluation criteria include detail, adherence to the prompt, and the 'coolness' factor. The video showcases various prompts, such as a cinematic photo of a red apple, a painting of an astronaut riding a pig, and a sports car with text on the side. The host finds that Stable Diffusion excels at text and placement accuracy but lacks in the coolness factor. Midjourney provides high-quality images with a strong coolness factor but struggles with text adherence. Dalle-3 offers a stylish approach and good detail, making it the host's preferred choice for its blend of realism and creative flair. The video concludes with the host's recommendation of Dalle-3 for its style and effectiveness in generating compelling images.

Takeaways

  • 🔍 The video compares three image generators: Stable Diffusion 3, Midjourney, and Dalle-3, using the same prompts to evaluate them on detail, adherence, and coolness.
  • 🍎 For the prompt of a red apple in a classroom, Stable Diffusion V3 lacked coolness, Midjourney had better detail clarity but struggled with text, and Dalle-3 excelled in detail and coolness with dramatic lighting.
  • 🎨 In creating a painting of an astronaut riding a pig, Stable Diffusion perfectly adhered to the prompt with a unique style, while Midjourney's output was more like street art with good adherence but less clarity, and Dalle-3 struggled with the prompt.
  • 📸 A studio photograph of a chameleon was highly detailed in Stable Diffusion, with Midjourney also providing a cool and detailed image, and Dalle-3 offering a stylized and dramatic photo.
  • 🖥 For a prompt of a 90's desktop computer, Stable Diffusion 3 excelled with nostalgic vibes, Midjourney provided a gritty, steampunk style, and Dalle-3 created a retro UI with a cool factor.
  • 🏎 In depicting a fast-moving sports car, Stable Diffusion 3 turned up the style with motion lines and text, Midjourney offered neon lights and speed, but Dalle-3 did not perform well with the prompt.
  • 🥤 When generating images of glass bottles with colored liquids, Midjourney struggled with the order and colors, while Dalle-3 provided a more accurate and stylized result.
  • 🌙 For an embroidered cloth with a tiger and the text 'good night', Stable Diffusion created a beautiful texture but missed the lighting effect, Midjourney did not adhere to the prompt well, and Dalle-3 offered a detailed and moody scene.
  • 🏎️ In a night photo of a sports car, Stable Diffusion 3 and Midjourney both provided cool and high-quality images with good adherence to the prompt, but Dalle-3 did not include the required text.
  • 🐎 A prompt for a horse balancing on a ball was unrealistic in Midjourney's depiction, while Dalle-3 offered a more stylized and believable image with a dramatic background.
  • 🌄 Lastly, for an anime-style illustration of a stand with text and a stormy background, Stable Diffusion was accurate but basic, Midjourney was creative but off-target, and Dalle-3 provided a vibrant and detailed anime scene that was preferred by the reviewer.

Q & A

  • What is the main purpose of the video?

    -The main purpose of the video is to compare three different image generators—Stable Diffusion 3, Midjourney, and Dolly 3—based on the same prompts, ranking them on detail, adherence to the prompt, and coolness factor.

  • What are the three factors used to rank the image generators in the video?

    -The three factors used to rank the image generators are detail, adherence to the prompt, and coolness.

  • What is the first prompt given to the image generators in the video?

    -The first prompt is to create a cinematic photo of a red apple on a table in a classroom with the words 'go big or go home' written on the blackboard.

  • How does Stable Diffusion 3 perform in terms of coolness factor according to the video?

    -According to the video, Stable Diffusion 3 is criticized for lacking in the coolness factor compared to the other generators.

  • What is the second prompt used in the video, and how does Midjourney perform with it?

    -The second prompt is for a painting of an astronaut riding a pig wearing a tutu, holding a pink umbrella, with a robin bird wearing a top hat next to the pig and the words 'stable diffusion' in the corner. Midjourney performs well, offering a high coolness factor and good adherence to the prompt, despite some minor issues with text clarity.

  • What issue does the video highlight with Dolly 3's first image generation?

    -The video highlights that Dolly 3's first image generation looks like a low-quality, cheap generation with an acrylic painting style that doesn't work well for the given prompt.

  • Which image generator does the video suggest is best at creating detailed and realistic images?

    -The video suggests that Stable Diffusion 3 is best at creating detailed and realistic images, particularly when it comes to text and adherence to the prompt.

  • How does the video compare the performance of the image generators when creating an image of a chameleon?

    -The video compares the performance by showing that all three generators—Stable Diffusion 3, Midjourney, and Dolly 3—create high-quality and detailed images of a chameleon, each with its own unique style and coolness factor.

  • What is the main criticism of Midjourney's performance in the video?

    -The main criticism of Midjourney's performance is that it struggles with text clarity and adherence to specific details in the prompts, such as the correct placement and style of text.

  • Which image generator does the video suggest has the most stylized and visually appealing output?

    -The video suggests that Dolly 3 has the most stylized and visually appealing output, often creating images with a high coolness factor and unique artistic styles.

  • What conclusion does the video draw about the best image generator to use?

    -The video concludes that while all three image generators have their strengths, the host's personal preference leans towards using Dolly 3 and Chat GPT for their style and text capabilities over Stable Diffusion 3.

Outlines

00:00

🎨 Comparison of Stable Diffusion 3, Mid Journey, and Dolly 3

The video script discusses a comparison between three different AI image generation models: Stable Diffusion 3, Mid Journey, and Dolly 3. The comparison is based on three criteria: detail, adherence to the prompt, and coolness factor. The script outlines a series of prompts given to each model and discusses the resulting images. The first prompt involves a cinematic photo of a red apple in a classroom with a specific phrase on the blackboard. The video goes on to compare the models' outputs for various prompts, including a painting of an astronaut on a pig, a close-up of a chameleon, a desktop computer with graffiti, glass bottles with colored liquids, an embroidered cloth with a tiger, a sports car on a racetrack, and more. The script concludes with the video creator's personal preference for Dolly 3 and Chachi BT for their style and adherence to the prompts, despite Stable Diffusion 3's strong performance with text and positioning elements in images.

05:02

🚗 Sports Car and Chachi BT's Superior Style

The paragraph focuses on the comparison of generated images for a prompt featuring a sports car with the text 'sd3' on the side, racing on a track with a 'faster' road sign. The video creator appreciates the style and detail in the Stable Diffusion 3 image, noting the motion lines and the text on the car. Mid Journey's rendition is also praised for its neon lights and correct text placement, although it struggles with adhering strictly to the text in some instances. Chachi BT's image is admired for its unique and cool perspective, with a retro UI and subtle 'sd3' sign, which the creator finds more appealing than the Stable Diffusion 3 version.

10:05

🌐 Mid Journey's Struggles with Text and Realism

This section of the script highlights the challenges Mid Journey faces when generating images with text and realistic elements. The video creator points out that while Mid Journey produces high-quality and cool-looking images, it often fails to accurately represent text and details as per the prompt. Examples include a prompt for a horse balancing on a ball in a field, which Mid Journey fails to render with correct physics, and an anime-style illustration that ends up looking like a vending machine. The creator suggests that Mid Journey might be focusing more on the user interface than on the accuracy of its image generation.

15:06

📸 Dolly 3's Creative and Stylized Approach

The video script praises Dolly 3 for its creative and stylized approach to image generation. Despite not always being the most realistic, Dolly 3 is noted for its high coolness factor and unique interpretations of the prompts. The creator particularly likes Dolly 3's handling of a prompt involving a horse on a ball and an anime-style illustration, where the model adds creative elements like vines and a stormy background. The video concludes with the creator expressing a preference for Dolly 3's style over a more traditional or academic approach.

20:09

🔍 Final Thoughts and Personal Preferences

In the concluding paragraph, the video creator summarizes their thoughts on the image generation models. They acknowledge that Stable Diffusion 3 excels at handling text and positioning elements within the generated images. However, the creator expresses a personal preference for the style of Chachi BT and Dolly 3, suggesting that these models offer a more appealing aesthetic. The script ends with a call to action, inviting viewers to find their preferred Chachi BT prompt and to continue watching the creator's videos for more content.

Mindmap

Keywords

💡Stable Diffusion 3

Stable Diffusion 3 refers to a version of the AI-driven image generation model known for its stability and ability to create detailed images. In the video, it is compared with other models such as Midjourney and Dalle-3 based on factors like detail, adherence to the prompt, and coolness. For example, the script mentions that Stable Diffusion 3 is criticized for lacking on the 'coolness factor,' but excels in text generation and adherence to the prompt's requirements.

💡Midjourney

Midjourney is another AI image generation tool that is compared alongside Stable Diffusion 3 and Dalle-3 in the video. It is noted for its higher 'coolness factor' and the ability to create images with a more artistic and less realistic style. The script illustrates this with examples where Midjourney's outputs have a higher aesthetic appeal, despite sometimes not adhering strictly to the text or details of the prompt.

💡Dalle-3

Dalle-3 is the third version of the Dalle AI image generator, which is compared with Stable Diffusion 3 and Midjourney in the video. It is highlighted for its ability to create images with good typography and dramatic lighting, contributing to a high 'coolness factor.' However, the script also points out that Dalle-3 sometimes struggles with creating the correct text or following the prompt accurately.

💡Adherence

Adherence in the context of the video refers to how closely the generated images match the details and requirements specified in the prompt given to the AI models. It is one of the three factors used to rank and compare the AI models. The script provides examples where some models, like Stable Diffusion 3, have strong adherence to the prompt, while others may deviate in terms of text or details.

💡Coolness Factor

The 'coolness factor' is a subjective measure of the aesthetic appeal and stylistic uniqueness of the generated images. It is one of the criteria used by the video creator to evaluate and compare the AI models. The script describes how some models, particularly Midjourney, have a higher 'coolness factor' due to their more artistic and less conventional image outputs.

💡Prompt

A prompt in this context is the textual description or request given to the AI image generation models to create a specific image. The video discusses how different models interpret and respond to the same prompt, which influences the level of detail, adherence, and 'coolness factor' of the resulting images. The script provides several prompts, such as 'a cinematic photo of a red apple on a table in a classroom,' to demonstrate the models' capabilities.

💡Text Generation

Text generation is the ability of the AI models to include and accurately render text within the generated images. The video notes that Stable Diffusion 3 excels in this area, as it can effectively incorporate text like 'go big or go home' or 'stable diffusion' into the images as per the prompt's instructions.

💡Image Quality

Image quality is an assessment of the clarity, detail, and realism of the generated images. The video script discusses how different AI models produce varying levels of image quality. For instance, Dalle-3 is praised for creating images with good clarity and detail, such as the clear typography and shadows in the apple image.

💡Realism

Realism in the video refers to the degree to which the generated images resemble real-world objects and scenes. Some models are noted for their realistic outputs, while others are appreciated for their more stylized and artistic interpretations. The script contrasts the realistic qualities of certain images with the more fantastical or artistic styles produced by other models.

💡AI Image Generation

AI image generation is the overarching process by which artificial intelligence models create visual content based on textual prompts. The video is centered on comparing different AI models' capabilities in this field, focusing on their outputs' detail, adherence to prompts, and aesthetic appeal. The script provides a detailed comparison of how models like Stable Diffusion 3, Midjourney, and Dalle-3 perform in generating images.

Highlights

Stable Diffusion 3, Midjourney, and Dolly 3 are compared on the same prompt based on detail, adherence, and coolness factors.

Stable Diffusion V3 is criticized for lacking on the coolness factor.

Midjourney's image of a red apple has higher coolness but lacks detail clarity.

Dolly 3's image features good clarity, detail, and dramatic lighting, making it the most favored in the first comparison.

Stable Diffusion excels in adherence to the prompt, especially with complex scenarios.

Midjourney's style is likened to street art, with a good coolness factor but less focus on text adherence.

Dolly 3 sometimes generates multiple images, with varying levels of adherence and style.

Studio photograph of a chameleon showcases detailed scales and eye texture, with a dramatic background blur.

Midjourney receives a 10 out of 10 score for its animal imagery, despite lacking text elements.

Dolly 3's stylized and dramatic photos score high on coolness, even if they are not always text-accurate.

Stable Diffusion 3 effectively creates nostalgic vibes with graffiti and a welcoming message on a computer screen.

Midjourney's interpretation of the prompt leans towards a gritty, steampunk aesthetic.

Dolly 3's retro UI and subtle 'sd3' sign on the wall add to the coolness factor of the image.

Transparency and liquid color accuracy in glass bottles is challenging for Midjourney and Dolly 3.

Stable Diffusion's embroidery on a cloth appears beautiful, with a dramatic dim light effect.

Midjourney struggles with text generation and adherence in the embroidery example.

Dolly 3's inclusion of fine details like pottery and imperfections adds a unique style to the embroidery image.

Stable Diffusion 3's night photo of a sports car on a racetrack with motion lines and text is highly stylized and appealing.

Midjourney's neon lights and speed theme in the sports car image are consistent with high-quality output.

Dolly 3's composition and perspective in the racetrack image offer a cool and unique viewpoint.

The horse balancing on a colorful ball is unrealistic but visually impressive in the generated images.

Stable Diffusion's text and placement accuracy are praised, while Dolly 3's style is preferred for its aesthetic appeal.

Midjourney's focus on platform functionality over text adherence may change with future model releases.

The video concludes with Dolly 3 being favored for its style and adherence, despite Stable Diffusion's strong text generation capabilities.