Run Stable Diffusion 3 Locally! | ComfyUI Tutorial

Markury AI
12 Jun 202403:48

TLDRThis tutorial demonstrates how to locally run Stable Diffusion 3 Medium with ComfyUI, a newly released AI model. The process involves accessing the Hugging Face repository, downloading necessary files like sd3 medium safe tensors and text encoders, updating ComfyUI, and installing the models. The video showcases the generation of a female character with ethereal, aurora-like hair using a natural language prompt, highlighting the model's impressive capabilities and the need for community involvement in license clarification.

Takeaways

  • 🌐 Visit Hugging Face to access the Stable Diffusion 3 medium model, which requires filling out a form and agreeing to terms.
  • 📁 Download necessary files such as 'sd3 medium.safetensors', 'clip G clip L', 'T5 XXL', and 'fp16' from the repository.
  • 🔄 Update Comfy UI by navigating to its directory and running the 'update_comfy_ui.bat' script.
  • 📂 Organize downloaded models into the 'clip' folder and 'checkpoints' directory within the Comfy UI models folder.
  • 🚀 Prepare to start Comfy UI by running 'Nvidia GPU dobat' from the base directory.
  • 📝 Load the 'sd3 medium.safetensors' checkpoint and associated 'clip' files in Comfy UI for setup completion.
  • 🎨 Use the example prompt provided to generate an image, showcasing the model's ability to interpret natural language descriptions.
  • 🌌 The generated image depicts a female character with hair resembling the northern lights, demonstrating the model's creative output.
  • 🔍 The script emphasizes the model's responsiveness to natural language prompts, which is closer to 'sdxl' but distinct from 'boru tag' style.
  • 🆓 Acknowledge the release of the model's weights for free, expressing excitement about its availability.
  • 📜 Note the licensing issues and the call to action for the community to engage with Stability AI to resolve them.
  • 👋 Conclude with a reminder to have a great day, highlighting a positive and engaging tone throughout the tutorial.

Q & A

  • What is the main topic of the tutorial video?

    -The main topic of the tutorial video is how to use Stable Diffusion 3 Medium and ComfyUI locally.

  • Where should one go to access the Stable Diffusion 3 model?

    -To access the Stable Diffusion 3 model, one should go to Hugging Face and fill out the form to gain access to the repository.

  • What files need to be downloaded from Hugging Face for using Stable Diffusion 3 Medium?

    -The files needed to be downloaded include the sd3 medium safe tensors, text encoders like CLIP G, CLIP L, and T5 XXL in fp16 format, and the ComfyUI workflows.

  • Why is it necessary to update ComfyUI before installing new models?

    -It is necessary to update ComfyUI to ensure compatibility with the new models and to get the latest features and improvements.

  • How does one update ComfyUI according to the tutorial?

    -To update ComfyUI, one should go to the ComfyUI directory, navigate to the 'update' folder, and run the 'update_comfy_ui.bat' file.

  • What is the recommended workflow to use with the Stable Diffusion 3 Medium model?

    -The recommended workflow to use with the Stable Diffusion 3 Medium model is the 'basic inference workflow'.

  • Where should the downloaded CLIP models be placed in the ComfyUI directory structure?

    -The downloaded CLIP models should be placed in the 'clip' folder within the 'models' directory of ComfyUI.

  • What should one do if there is no 'sd3' folder in the 'checkpoints' directory?

    -If there is no 'sd3' folder in the 'checkpoints' directory, it is recommended to create one and then place the sd3 medium safe tensor file inside it.

  • How does one start ComfyUI after updating and installing the new models?

    -After updating and installing the new models, one should go back to the base directory of ComfyUI, run the 'Nvidia_GPU.bat' file, and then start ComfyUI.

  • What is the example prompt provided in the tutorial for generating an image with Stable Diffusion 3 Medium?

    -The example prompt is 'a female character with long flowing hair that appears to be made of ethereal swirling patterns resembling the northern lights or Aurora Borealis'.

  • What issue is mentioned regarding the licensing of the Stable Diffusion 3 model?

    -The issue mentioned is that the licensing is a bit unclear or 'messed up', and the community is encouraged to open an issue or contact Stability AI to update the license.

Outlines

00:00

🎨 Introduction to Stable Diffusion 3 Medium

The video begins with an introduction to the Stable Diffusion 3 Medium model, a new and exciting tool in the world of AI-generated art. The host explains that the model is available on Hugging Face and requires filling out a form to access it. The viewer is guided through the process of downloading the necessary files, including the safe tensors, text encoders, and the basic inference workflow for the model. The host also advises viewers to update their Comfy UI if it's currently running, to ensure compatibility with the new model.

Mindmap

Keywords

💡Stable Diffusion 3

Stable Diffusion 3 is a medium-sized model in the Stable Diffusion series, which is a type of artificial intelligence that generates images from textual descriptions. In the video, it is used to demonstrate how to download and utilize this model for image generation, showcasing its capabilities in creating detailed and imaginative visuals.

💡ComfyUI

ComfyUI refers to a user interface designed for ease of use and comfort. In the context of the video, it is a software interface that the presenter uses to interact with Stable Diffusion 3, updating it and integrating the new model for image generation tasks.

💡Hugging Face

Hugging Face is a company that provides a platform for sharing machine learning models. In the script, it is mentioned as the source from which the Stable Diffusion 3 model and related files are downloaded after filling out a form and agreeing to access the repository.

💡Gated Model

A gated model is one that requires some form of access control, such as filling out a form or agreeing to terms and conditions. The script mentions that Stable Diffusion 3 is a gated model, indicating that viewers need to go through a process to gain access to it.

💡Tensors

In the context of machine learning and artificial intelligence, tensors are multi-dimensional arrays of numerical values used to represent complex data. The script refers to 'safe tensors' as a component of the Stable Diffusion 3 model, which are essential for its operation.

💡Text Encoders

Text encoders are algorithms that convert natural language text into a format that can be understood by a machine learning model. In the video, CLIP (Contrastive Language–Image Pre-training) models are mentioned as text encoders that are downloaded and used alongside Stable Diffusion 3 for processing textual prompts into image generation.

💡Checkpoints

In machine learning, a checkpoint is a point at which the state of a model is saved, allowing for recovery or evaluation. The script mentions placing the downloaded 'sd3 medium safe tensor' file into a checkpoints folder, which is part of setting up the Stable Diffusion 3 model for use.

💡Nvidia GPU

Nvidia GPUs (Graphics Processing Units) are specialized hardware used for accelerating the processing of complex computations, such as those required for running AI models. The script instructs the viewer to run 'Nvidia GPU dobat', which likely refers to a batch file for configuring the GPU to work with ComfyUI and the Stable Diffusion 3 model.

💡Workflow

A workflow in the context of software and AI refers to a sequence of steps or processes that are followed to complete a task. The video mentions a 'basic inference workflow' which is a set of instructions used to run the Stable Diffusion 3 model and generate images from text prompts.

💡Q Prompt

In the script, 'Q prompt' seems to refer to a quick or example prompt provided by the software for testing purposes. The presenter uses this feature to demonstrate the image generation capabilities of the Stable Diffusion 3 model with a descriptive text prompt about a female character.

💡Ethereal

Ethereal describes something that is extremely delicate and light, often associated with a heavenly or supernatural quality. In the video, the term is used in the example prompt to describe the flowing hair of a female character, suggesting an otherworldly and mesmerizing appearance in the generated image.

Highlights

Introduction to using Stable Diffusion 3 Medium with ComfyUI.

Accessing the gated model on Hugging Face and filling out the form to download the model.

Downloading necessary files such as sd3 medium safe tensors, text encoders, and ComfyUI workflows.

Updating ComfyUI to the latest version for compatibility with the new model.

Installing CLIP models into the ComfyUI directory for text-to-image functionality.

Creating a new sd3 folder in the checkpoints directory for the Stable Diffusion 3 Medium model.

Instructions on how to load the Stable Diffusion 3 Medium safe tensors into ComfyUI.

Using the Nvidia GPU dobat to run ComfyUI for optimal performance.

Loading the checkpoint and CLIP files in ComfyUI for Stable Diffusion 3 Medium.

Demonstration of the model's ability to generate images from natural language prompts.

Example prompt provided: a female character with long flowing hair resembling the northern lights.

Comparison of the model's prompt usage to a more natural language style rather than a tag style.

Observation of the model's impressive image generation capabilities.

Discussion on the model's licensing and the need for community effort to address it.

Encouragement for users to open issues or contact Stability AI regarding the license.

Conclusion and well-wishes for the viewers.