Stable Diffusion 3 vs ChatGPT Dalle-3 vs Midjourney [NEW Best Image Generator?]
TLDRIn this video, the host compares three image generation models: Stable Diffusion 3, Midjourney, and Dalle-3, using the same prompts for each to evaluate their performance. The evaluation criteria include detail, adherence to the prompt, and the 'coolness' factor. The video showcases various prompts, such as a cinematic photo of a red apple, a painting of an astronaut riding a pig, and a sports car with text on the side. The host finds that Stable Diffusion excels at text and placement accuracy but lacks in the coolness factor. Midjourney provides high-quality images with a strong coolness factor but struggles with text adherence. Dalle-3 offers a stylish approach and good detail, making it the host's preferred choice for its blend of realism and creative flair. The video concludes with the host's recommendation of Dalle-3 for its style and effectiveness in generating compelling images.
Takeaways
- 🔍 The video compares three image generators: Stable Diffusion 3, Midjourney, and Dalle-3, using the same prompts to evaluate them on detail, adherence, and coolness.
- 🍎 For the prompt of a red apple in a classroom, Stable Diffusion V3 lacked coolness, Midjourney had better detail clarity but struggled with text, and Dalle-3 excelled in detail and coolness with dramatic lighting.
- 🎨 In creating a painting of an astronaut riding a pig, Stable Diffusion perfectly adhered to the prompt with a unique style, while Midjourney's output was more like street art with good adherence but less clarity, and Dalle-3 struggled with the prompt.
- 📸 A studio photograph of a chameleon was highly detailed in Stable Diffusion, with Midjourney also providing a cool and detailed image, and Dalle-3 offering a stylized and dramatic photo.
- 🖥 For a prompt of a 90's desktop computer, Stable Diffusion 3 excelled with nostalgic vibes, Midjourney provided a gritty, steampunk style, and Dalle-3 created a retro UI with a cool factor.
- 🏎 In depicting a fast-moving sports car, Stable Diffusion 3 turned up the style with motion lines and text, Midjourney offered neon lights and speed, but Dalle-3 did not perform well with the prompt.
- 🥤 When generating images of glass bottles with colored liquids, Midjourney struggled with the order and colors, while Dalle-3 provided a more accurate and stylized result.
- 🌙 For an embroidered cloth with a tiger and the text 'good night', Stable Diffusion created a beautiful texture but missed the lighting effect, Midjourney did not adhere to the prompt well, and Dalle-3 offered a detailed and moody scene.
- 🏎️ In a night photo of a sports car, Stable Diffusion 3 and Midjourney both provided cool and high-quality images with good adherence to the prompt, but Dalle-3 did not include the required text.
- 🐎 A prompt for a horse balancing on a ball was unrealistic in Midjourney's depiction, while Dalle-3 offered a more stylized and believable image with a dramatic background.
- 🌄 Lastly, for an anime-style illustration of a stand with text and a stormy background, Stable Diffusion was accurate but basic, Midjourney was creative but off-target, and Dalle-3 provided a vibrant and detailed anime scene that was preferred by the reviewer.
Q & A
What is the main purpose of the video?
-The main purpose of the video is to compare three different image generators—Stable Diffusion 3, Midjourney, and Dolly 3—based on the same prompts, ranking them on detail, adherence to the prompt, and coolness factor.
What are the three factors used to rank the image generators in the video?
-The three factors used to rank the image generators are detail, adherence to the prompt, and coolness.
What is the first prompt given to the image generators in the video?
-The first prompt is to create a cinematic photo of a red apple on a table in a classroom with the words 'go big or go home' written on the blackboard.
How does Stable Diffusion 3 perform in terms of coolness factor according to the video?
-According to the video, Stable Diffusion 3 is criticized for lacking in the coolness factor compared to the other generators.
What is the second prompt used in the video, and how does Midjourney perform with it?
-The second prompt is for a painting of an astronaut riding a pig wearing a tutu, holding a pink umbrella, with a robin bird wearing a top hat next to the pig and the words 'stable diffusion' in the corner. Midjourney performs well, offering a high coolness factor and good adherence to the prompt, despite some minor issues with text clarity.
What issue does the video highlight with Dolly 3's first image generation?
-The video highlights that Dolly 3's first image generation looks like a low-quality, cheap generation with an acrylic painting style that doesn't work well for the given prompt.
Which image generator does the video suggest is best at creating detailed and realistic images?
-The video suggests that Stable Diffusion 3 is best at creating detailed and realistic images, particularly when it comes to text and adherence to the prompt.
How does the video compare the performance of the image generators when creating an image of a chameleon?
-The video compares the performance by showing that all three generators—Stable Diffusion 3, Midjourney, and Dolly 3—create high-quality and detailed images of a chameleon, each with its own unique style and coolness factor.
What is the main criticism of Midjourney's performance in the video?
-The main criticism of Midjourney's performance is that it struggles with text clarity and adherence to specific details in the prompts, such as the correct placement and style of text.
Which image generator does the video suggest has the most stylized and visually appealing output?
-The video suggests that Dolly 3 has the most stylized and visually appealing output, often creating images with a high coolness factor and unique artistic styles.
What conclusion does the video draw about the best image generator to use?
-The video concludes that while all three image generators have their strengths, the host's personal preference leans towards using Dolly 3 and Chat GPT for their style and text capabilities over Stable Diffusion 3.
Outlines
🎨 Comparison of Stable Diffusion 3, Mid Journey, and Dolly 3
The video script discusses a comparison between three different AI image generation models: Stable Diffusion 3, Mid Journey, and Dolly 3. The comparison is based on three criteria: detail, adherence to the prompt, and coolness factor. The script outlines a series of prompts given to each model and discusses the resulting images. The first prompt involves a cinematic photo of a red apple in a classroom with a specific phrase on the blackboard. The video goes on to compare the models' outputs for various prompts, including a painting of an astronaut on a pig, a close-up of a chameleon, a desktop computer with graffiti, glass bottles with colored liquids, an embroidered cloth with a tiger, a sports car on a racetrack, and more. The script concludes with the video creator's personal preference for Dolly 3 and Chachi BT for their style and adherence to the prompts, despite Stable Diffusion 3's strong performance with text and positioning elements in images.
🚗 Sports Car and Chachi BT's Superior Style
The paragraph focuses on the comparison of generated images for a prompt featuring a sports car with the text 'sd3' on the side, racing on a track with a 'faster' road sign. The video creator appreciates the style and detail in the Stable Diffusion 3 image, noting the motion lines and the text on the car. Mid Journey's rendition is also praised for its neon lights and correct text placement, although it struggles with adhering strictly to the text in some instances. Chachi BT's image is admired for its unique and cool perspective, with a retro UI and subtle 'sd3' sign, which the creator finds more appealing than the Stable Diffusion 3 version.
🌐 Mid Journey's Struggles with Text and Realism
This section of the script highlights the challenges Mid Journey faces when generating images with text and realistic elements. The video creator points out that while Mid Journey produces high-quality and cool-looking images, it often fails to accurately represent text and details as per the prompt. Examples include a prompt for a horse balancing on a ball in a field, which Mid Journey fails to render with correct physics, and an anime-style illustration that ends up looking like a vending machine. The creator suggests that Mid Journey might be focusing more on the user interface than on the accuracy of its image generation.
📸 Dolly 3's Creative and Stylized Approach
The video script praises Dolly 3 for its creative and stylized approach to image generation. Despite not always being the most realistic, Dolly 3 is noted for its high coolness factor and unique interpretations of the prompts. The creator particularly likes Dolly 3's handling of a prompt involving a horse on a ball and an anime-style illustration, where the model adds creative elements like vines and a stormy background. The video concludes with the creator expressing a preference for Dolly 3's style over a more traditional or academic approach.
🔍 Final Thoughts and Personal Preferences
In the concluding paragraph, the video creator summarizes their thoughts on the image generation models. They acknowledge that Stable Diffusion 3 excels at handling text and positioning elements within the generated images. However, the creator expresses a personal preference for the style of Chachi BT and Dolly 3, suggesting that these models offer a more appealing aesthetic. The script ends with a call to action, inviting viewers to find their preferred Chachi BT prompt and to continue watching the creator's videos for more content.
Mindmap
Keywords
💡Stable Diffusion 3
💡Midjourney
💡Dalle-3
💡Adherence
💡Coolness Factor
💡Prompt
💡Text Generation
💡Image Quality
💡Realism
💡AI Image Generation
Highlights
Stable Diffusion 3, Midjourney, and Dolly 3 are compared on the same prompt based on detail, adherence, and coolness factors.
Stable Diffusion V3 is criticized for lacking on the coolness factor.
Midjourney's image of a red apple has higher coolness but lacks detail clarity.
Dolly 3's image features good clarity, detail, and dramatic lighting, making it the most favored in the first comparison.
Stable Diffusion excels in adherence to the prompt, especially with complex scenarios.
Midjourney's style is likened to street art, with a good coolness factor but less focus on text adherence.
Dolly 3 sometimes generates multiple images, with varying levels of adherence and style.
Studio photograph of a chameleon showcases detailed scales and eye texture, with a dramatic background blur.
Midjourney receives a 10 out of 10 score for its animal imagery, despite lacking text elements.
Dolly 3's stylized and dramatic photos score high on coolness, even if they are not always text-accurate.
Stable Diffusion 3 effectively creates nostalgic vibes with graffiti and a welcoming message on a computer screen.
Midjourney's interpretation of the prompt leans towards a gritty, steampunk aesthetic.
Dolly 3's retro UI and subtle 'sd3' sign on the wall add to the coolness factor of the image.
Transparency and liquid color accuracy in glass bottles is challenging for Midjourney and Dolly 3.
Stable Diffusion's embroidery on a cloth appears beautiful, with a dramatic dim light effect.
Midjourney struggles with text generation and adherence in the embroidery example.
Dolly 3's inclusion of fine details like pottery and imperfections adds a unique style to the embroidery image.
Stable Diffusion 3's night photo of a sports car on a racetrack with motion lines and text is highly stylized and appealing.
Midjourney's neon lights and speed theme in the sports car image are consistent with high-quality output.
Dolly 3's composition and perspective in the racetrack image offer a cool and unique viewpoint.
The horse balancing on a colorful ball is unrealistic but visually impressive in the generated images.
Stable Diffusion's text and placement accuracy are praised, while Dolly 3's style is preferred for its aesthetic appeal.
Midjourney's focus on platform functionality over text adherence may change with future model releases.
The video concludes with Dolly 3 being favored for its style and adherence, despite Stable Diffusion's strong text generation capabilities.