/r/StableDiffusion
/r/StableDiffusion is an unofficial community embracing the open-source material of all related. Post art, ask questions, create discussions, contribute new tech, or browse the subreddit. It’s up to you.
All posts must be Open-source/Local AI image generation related All tools for post content must be open-source or local AI generation. Comparisons with other platforms are welcome. Post-processing tools like Photoshop (excluding Firefly-generated images) are allowed, provided the don't drastically alter the original generation.
Be respectful and follow Reddit's Content Policy This Subreddit is a place for respectful discussion. Please remember to treat others with kindness and follow Reddit's Content Policy (https://www.redditinc.com/policies/content-policy).
No X-rated, lewd, or sexually suggestive content This is a public subreddit and there are more appropriate places for this type of content such as r/unstable_diffusion. Please do not use Reddit’s NSFW tag to try and skirt this rule.
No excessive violence, gore or graphic content Content with mild creepiness or eeriness is acceptable (think Tim Burton), but it must remain suitable for a public audience. Avoid gratuitous violence, gore, or overly graphic material. Ensure the focus remains on creativity without crossing into shock and/or horror territory.
No repost or spam Do not make multiple similar posts, or post things others have already posted. We want to encourage original content and discussion on this Subreddit, so please make sure to do a quick search before posting something that may have already been covered.
Limited self-promotion Open-source, free, or local tools can be promoted at any time (once per tool/guide/update). Paid services or paywalled content can only be shared during our monthly event. (There will be a separate post explaining how this works shortly.)
No politics General political discussions, images of political figures, or propaganda is not allowed. Posts regarding legislation and/or policies related to AI image generation are allowed as long as they do not break any other rules of this subreddit.
No insulting, name-calling, or antagonizing behavior Always interact with other members respectfully. Insulting, name-calling, hate speech, discrimination, threatening content and disrespect towards each other's religious beliefs is not allowed. Debates and arguments are welcome, but keep them respectful—personal attacks and antagonizing behavior will not be tolerated.
No hateful comments about art or artists This applies to both AI and non-AI art. Please be respectful of others and their work regardless of your personal beliefs. Constructive criticism and respectful discussions are encouraged.
Use the appropriate flair Flairs are tags that help users understand the content and context of a post at a glance
Useful Links
Ai Related Subs
NSFW Ai Subs
SD Bots
/r/StableDiffusion
I have used stable diffusion for a while now, and am proficient with it on SD Forge. There is a particular image I've tried to generate for a long time and just cannot get prompt adherence to generate what I want.
I'm wondering if there is somewhere I can reach out for help to see if someone else can achieve what I've failed for months. I can achieve the result I'm looking for on Bing AI with ease. I understand it has excellent capability to understand natural language. Is there a way I can get SD to cooperate like Bing AI?
I appreciate the help in advance!
I've noticed something incredibly annoying with SDXL and no amount of tweaking I've done so far to the runline command edits (including xformers, no vae, etc. that I see posted thousands of times elsewhere), or changes to my system, or changes to my browser settings seems to playing well to fix it.
For months, ANY of the SDXL models that I have will sporadically freeze around 99% -- basically whatever last sampling step it does...at that point I have to forcefully close everything down and restart SD. They'll do like 5-10 gens successfully, and then the above.
I believe it was hugging face that I went through to initially install everything.
Last month I finally realized what I was doing right before it would freeze like that.
I was touching the prompt. I'd see something about it that I wanted to adjust, and then while it generates its next image, I would go ahead and use that time to make my adjustments.
That's when it'd freeze. As if my edits to the prompt were somehow tied, in any way, to the generation and display of the image.
If I even so much as look at the prompt the wrong way, while an SDXL model is running, it hangs up at around 99%. As mentioned in the title, the exact percentage ranges from 96-99%, dependent on the sampling steps I have. So it's pretty much just the final step of the image where it compiles everything that it's done along the way, and then goes to save it and then display it.
Has anyone run into this before, and do you know of a fix for it?
Is it possible?
I’m working on a visual novel and want to create different poses and scenes for a character I like while keeping them consistent and detailed for concept art. What’s the best way to upload a base image and get variations that stay true to the original character’s look?
Bonus points for anyone who could give some tips for the prompting/technical side. I.e. How much scale is best, how much prompting, etc.
Hey all, I will be sharing some exciting Pony Diffusion V7 updates tomorrow on CivitAI Twitch Stream at 2 PM EST // 11 AM PST. Expect some early images from V7 micro, updates on superartists, captioning and AuraFlow training (in short, it's finally cooking time).
First, I have minimal technical skills. That said, I am working on building a website for a local service business using AI. I have contracted with a web designer as well as a coder. The coder has created a Python script to automate web page and blog creation. However, we have encountered some issues in automating the images that will go with the blog posts. ChatGPT images are not very good. Is Stable Diffusion the answer to the problems? If so, what is the best way to use Stable Diffusion via API so that I can have fully automated blog posts?
I’ve been trying to make a character Lora for PonyXL using about 30 images. All the images are photos, but I want to train out the style. I put tags like “Real life, photorealistic, photo, source_real, realistic” but even after that it seems to make output images that are more realistic upon use. How would I go about training out the style? I’ve heard for consistency with characters, it’s better to use the same style across all of the dataset as it can cause inconsistency otherwise. Plus, for this in particular, it’d be hard to obtain digital art or other styles to train the Lora on.
I’m curious too if it’s just overtrained. I used a low Epoch version of my model too though just in case, it didn’t make a huge difference.
I mostly use pony and sdxl, and I noticed recently that when I include the words “eyes” or “face” in the prompt that those details show up much more clearly defined in the final result. Add 5 to 10 high-res steps (use a ksampler to do image to image with a denoise of .2 to .3) after the initial image is generated and you’ll end up with more clearly defined line art as well.
I recently finally figured out how to get my LoRAs looking more consistent, but I ran into a new issue where my most recently trained LoRAs is looking slightly washed and blurry to me compared the same prompt that I used without a LoRA.
I got some examples below, so you guys can take a look on how these LoRAs look. In the first picture with the Asian woman, it wasn't used with a character LoRA, so it came out with the quality that I prefer my generated character to look like. The other ones with the Caucasian woman was used with my LoRA and you might be able to see the difference where some of the quality has been lost. I included one of the original images that I trained the LoRA on, so that you could see the original quality.
I was wondering if it's recommended or not to train the LoRAs with higher quality photos. I'll admit I used a more low quality image to train them, so I was wondering if that has anything to do with these LoRAs coming out looking washed or not?
Prompted without character LoRA
Prompted without a character LoRA
Prompted with a character LoRA
Prompted with a character LoRA
I want to learn how to draw few same objects with Automatic1111 or with API? For example make the same face for all people on picture if I have lora with specific face
i have a 2050 with 4 gb of vram i know its low but can it be the reason for this error
bec i have seen people generate images with the same amount of vram
I had my yard leveled and now. It’s an open canvas. What do you think I should build on this space.
Are there any free ai art generators that convert drawings into digital images?
Hey guys,
How can I achieve the same quality in Flux Dev LoRA pictures as on replicate.com? I've used it for my LoRAs, and the pictures there are very good. Locally, on ComfyUI and Forge, I can't get results like that. I tried same prompts and they are not so realistic
I am wanting to upgrade my gpu from a RX 6600 to a RTX 4060ti 16gb because I don't feel like getting a new psu. Witch is a 550w
But I'm thinking I should and if I do I would get a 4070 or even a 4070 SUPER. Would it be bottle necked by my 5600g cpu
Which lora can use to generate most realistic hands?
I'm trying to get a better understanding of the practical effects of adjusting the start and end percentages in ControlNet. Does anyone know of a good tutorial or example that actually shows how these adjustments impact the results? Ideally, I'm looking for something straightforward without animated diff or overly complex setups—just a plain, clear explanation with some examples.
So far, most of the tutorials I've found just repeat what others or the official documentation say, without showing how this affects the outcome in real scenarios. Any insights or resources would be really appreciated!
I'm integrating garment design into my project pipeline and need an efficient way to create SVG vectors of clothing to test multiple textures on different sections (sleeves, collars, etc.).
LLM Model: https://huggingface.co/fancyfeast/llama-joycaption-alpha-two-hf-llava
I found this incredible LLM model for describing images which outperforms any models like florence-2-large etc.
The problem is that I can't seem to figure out how to run it as API. I tried pushing it to sites like replicate.com but I don't seem to quite get it.
Anyone has any ideas or could publish the model on a LLM site like replicate?
Hello,
I am having significant difficulties in Forge with getting the UI to recognize different controlnet inputs with XL models. It only seems to process the first image, after which the image cannot be replaced without restarting the UI.
I am trying to automate an img2img SD task with controlnets and I was wondering if anyone had working command-line code that called a non-default stable diffusion XL model (specifically, albedobase, but I figure the code would be similar) and a controlnet XL model. Ideally I would not have to load the entire model for every image.
My desired code would look as follows:
-load image
-load mask (this is generated in a separate step which already works)
-preprocess image for controlnet 1
-preprocess image for controlnet 2
-load base image for controlnet 3
-run img2img with mask
I am looking for any code fragments that: call an SD XL model that isn't the basic SDXL and any code fragments that use an XL Controlnet. Thank you.
Setting up a new box for learning AI projects, especially image gen.
I'm leaning towards SwarmUI because it looks like it works easily with SDXL, LoRAs, Flux, and Mochi, without the learning curve of using ComfyUI directly.
Any pros or cons based on what's available this week?
Also is Python 3.10.6 still the best version to install?
If you've been creating content of any kind with Generative AI, you might have become infected with the "must have a lora" disease. The symptoms usually start when you see a new model and immediately start asking "is there a lora for..."
Do you NEED a lora?
probably not. unless you're trying to use base stable diffusion 1.5. then you probably need a fine tuned checkpoint.
So what is a lora? LoRA stands for Low Rank Adaptation and was developed as a way to add or revise information that a BASE model already has, or needs, without the massive expensive of retraining the entire model. Here's the paper on it for additional research https://arxiv.org/abs/2106.09685
Underline already has.
The really old models might have had the information, but they were older technology and had other issues that caused images to not come out looking good. The new models today - SD3.5, Flux, and others - are trained on 8 billion or more parameters. Unless you are doing something so specific there's no way the model can know about it, you don't need a lora.
What you DO need, however, is to get rid of your preconceived ideas of how prompts SHOULD work, and learn how the model you want to use thinks - and then prompt it correctly so IT can create the image you are visualizing instead of creating something you don't like.
You're talking to a computer, not a human. it doesn't think like you do, and it can't read your mind to see the image you have in there. If you tell it "a girl wearing a coat" and what you are visualizing is a tall, thin, supermodel wearing some sort of jacket made of aluminum foil, it's not going to guess that's what you want and you sure didn't tell it. And don't tell it stuff it can't draw. don't tell it something like "a dog sitting on the curb, thinking about what it had for breakfast and wondering why it's boy isn't home from school yet" - the AI can't draw most of that unless you also want cartoon thought bubbles. tell it something like "a dog sitting on the curb, staring down the street, ears up, watching the street