/r/StableDiffusion
/r/StableDiffusion is an unofficial community embracing the open-source material of all related. Post art, ask questions, create discussions, contribute new tech, or browse the subreddit. It’s up to you.
All posts must be Open-source/Local AI image generation related All tools for post content must be open-source or local AI generation. Comparisons with other platforms are welcome. Post-processing tools like Photoshop (excluding Firefly-generated images) are allowed, provided the don't drastically alter the original generation.
Be respectful and follow Reddit's Content Policy This Subreddit is a place for respectful discussion. Please remember to treat others with kindness and follow Reddit's Content Policy (https://www.redditinc.com/policies/content-policy).
No X-rated, lewd, or sexually suggestive content This is a public subreddit and there are more appropriate places for this type of content such as r/unstable_diffusion. Please do not use Reddit’s NSFW tag to try and skirt this rule.
No excessive violence, gore or graphic content Content with mild creepiness or eeriness is acceptable (think Tim Burton), but it must remain suitable for a public audience. Avoid gratuitous violence, gore, or overly graphic material. Ensure the focus remains on creativity without crossing into shock and/or horror territory.
No repost or spam Do not make multiple similar posts, or post things others have already posted. We want to encourage original content and discussion on this Subreddit, so please make sure to do a quick search before posting something that may have already been covered.
Limited self-promotion Open-source, free, or local tools can be promoted at any time (once per tool/guide/update). Paid services or paywalled content can only be shared during our monthly event. (There will be a separate post explaining how this works shortly.)
No politics General political discussions, images of political figures, or propaganda is not allowed. Posts regarding legislation and/or policies related to AI image generation are allowed as long as they do not break any other rules of this subreddit.
No insulting, name-calling, or antagonizing behavior Always interact with other members respectfully. Insulting, name-calling, hate speech, discrimination, threatening content and disrespect towards each other's religious beliefs is not allowed. Debates and arguments are welcome, but keep them respectful—personal attacks and antagonizing behavior will not be tolerated.
No hateful comments about art or artists This applies to both AI and non-AI art. Please be respectful of others and their work regardless of your personal beliefs. Constructive criticism and respectful discussions are encouraged.
Use the appropriate flair Flairs are tags that help users understand the content and context of a post at a glance
Useful Links
Ai Related Subs
NSFW Ai Subs
SD Bots
/r/StableDiffusion
I've been trying out magespace's pro pass subscription and have started creating images from prompts. I can't find a way to have it generate multiple images at the same time from a prompt. Am I just not knowing how to do that, or does magespace not allow that? Multiple, simultaneous image generation from a prompt is a feature on all other StableDiffusion sites I've used.
Trying out stablediffusion using their API key. Now that I have decent understanding of their API, any guidance on how to run model locally?
Hey, what kind of Nvidia GPU /NPU within obtainable budget would you recommend for home lab. I'm running beat up 3090 at the moment ,looking to upgrade and offload generation tasks so I can use my PC while generating.
https://www.instagram.com/reel/DFfvXNKsfWy/?igsh=MnY2cXB3M3VxaTVv I'm about to lose my mind how can I produce this type of videos. sora runway luma klingai these are the ones I've tried
What is the best LLM to create them?
I want to upload a picture of a person and then tell the LLM that it should create a caricature.
It should also be able to add his job like a carpenter to the caricature and should be very playful and creative.
What prompt and what LLM should I use?
Is there a better way to create caricatures?
I have installed FLUX.1 dev regular (fp16) following Stable Diffusion Art's guide
I am running the base workflow they include in this guide, with the prompt they gave.
I am running it on a mac mini M2 pro, 32GB of Unified Memory.
It tool 11 minutes to generate the image. That seems extremely long to me. Is that expected?
So I'm managing a business that involves bunch of Lora train on flux and generating images with these Loras. I have a RTX 4060 ti 16gb at the office and it started to not fulfill my needs as I started to work with more clients. So I was planning to build a new pc to explore and test workflows on comfy ui and get fast results while using the 4060 ti pc to train Loras overnight and give it to the intern for generate with simple workflows. But there is the problem. My budget is quite low for the 5090 and looks like I can not afford it atm plus there is a stock problem in my country. So my choices are down to 4090 and 5080. 5080 has 16 gb vram so it doesn't feel like an upgrade but I saw the fp4 model that Nvidia and Black Forest will launch together that will be exclusive for 50 series and im now confused. Also I had to buy 4090 secondhand so I won't be able to tax write off. Any advice for me?
So I have been using Florence in Comfyui to caption my images for my hunyuan lora training and was wondering if i should try anything else? I haven't messed around a lot with local LLMs and I only have Florence installed because it was used another image to video workflow, I think an LTX workflow. I deleted everything but the Florence nodes and have been using it.
Is there any other LLM that runs in Comfyui that you all use, that you think is better?
Questions for those with this card:
Thanks!
Hello folks, I’ve been looking for a good-quality, fully open-source lip-sync model for my project and finally came across LatentSync by Bytedance (TikTok). I should say for me it delivers some seriously impressive results, even compared to commercial models.
The only problem was that the official Replicate implementation was broken and wouldn’t accept images as input. So, I decided to fork it, fix it, and publish it—now it supports both images and videos for lip-syncing!
If you want to check it out, here’s the link: https://replicate.com/skallagrimr/latentsync
Hope this helps anyone looking for an optimal lip-sync solution. Let me know what you think!
Currently a lawsuit is aiming to shutdown 16 of the biggest websites. But with more laws being brought into place to make it illegal to create & share deepfake content. Will these websites possibly even make the tool itself illegal?
I would like to test SwarmUI to create an easy-to-use interface on top of my ComfyUI nodes. However, I just opened the Swarm GitHub page and last commit was 8 months ago. Last Comfy commit was 8 HOURS ago. So, is Comfy much more up-to-date than Swarm? I mean, if Swarm uses Comfy underneath, is it possible to update the Comfy layer in Swarm to get the latest changes?
I just tried to reinstall SD zluda after get a new Windows but when run webui.bat, it's showing like this; venv "D:\SD\stable-diffusion-webui-amdgpu\venv\Scripts\Python.exe" Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Version: v1.10.1-amd-23-g9604f573 Commit hash: 9604f57342d6c6714acd5a09611ee1aa074e3b17 ROCm: agents=['gfx1032', 'gfx1032'] ROCm: version=5.7, using agent gfx1032 ZLUDA support: experimental Using ZLUDA in D:\SD\stable-diffusion-webui-amdgpu.zluda Traceback (most recent call last): File "D:\SD\stable-diffusion-webui-amdgpu\launch.py", line 48, in <module> main() File "D:\SD\stable-diffusion-webui-amdgpu\launch.py", line 39, in main prepare_environment() File "D:\SD\stable-diffusion-webui-amdgpu\modules\launch_utils.py", line 612, in prepare_environment rocm.conceal() File "D:\SD\stable-diffusion-webui-amdgpu\modules\zluda_installer.py", line 101, in conceal import torch # noqa: F401 ModuleNotFoundError: No module named 'torch' Press any key to continue . . .
I stuck this for an hour and I give up. So can you guys help me how to fix it, please?
I recently tried using the nijijourney app from playstore and it offers u 20 free generations per device. I loved the quality and importantly STYLIZATION of the image, but, is there any related model for me to use in stable diffusion
So I've been playing around with SD for about two weeks now using ComfyUI and my PC to generate stuff. I was thinking that Flux is looking quite nice and wanted to give it a go. Set it up, pressed queue, PC basically died lol. So I've come to realize that my PC is probably not remotely good enough to be used with Flux (RTX 3080 10GB, AMD 5800X, 32 GB Ram DDR4).
Now I was wondering, do ya'll just have insane PC specs or am I doing something wrong? I wasn't even using any loras or other extras, just the basic stuff you need for Flux to work (full model).
EDIT: Here is a screenshot of the workflow I was using Workflow - The prompt is the standard one I got when following a tutorial. Starting the generation caused my PC to stutter extremely, very long response time (like 30 seconds to open task manager) and even after stopping SD I could not start/play any videos before restarting the entire system. Haven't tried to change anything about it since then cause I was thinking my PC is too weak. I never had these problems before when using other models or playing video games or working in the Adobe Suite.
EDIT 2: When starting ComfyUI I always use the run_nvidia_gpu.bat, which I think should be correct.
First of all, please hold your Patreon links. I’m not looking to buy anything today, thanks! 😃
I can create decent CGI animations that are somewhat realistic, but I’m looking for a vid2vid solution to significantly enhance quality while maintaining consistency. I’ve tried some workflows available online, but many are based on SD 1.5, while others produce messy results.
What’s your best recommendation for generating stable vid2vid videos, preferably using SDXL?
My setup: i7 9900K, 32GB RAM, RTX 3060 (12GB).
Thanks in advance!
Here is the screenshot of the problem. The SD 2.1 model works fine without controlnet, but with control enable it generates very messed up images. I tried changing the sampling steps and CFG scale but none of it helps. I am using the controlnet 2.1 version downloaded from here https://huggingface.co/thibaud/controlnet-sd21
I appreciate any insights.
https://civitai.com/articles/8309/flux1-fp16-vs-fp8-time-difference-on-rtx-4080-super-in-comfyui
This article shows speed comparisons for generation using Flux dev on a 4080 super.
What I don't understand is how the speeds are so good for the fp16 version of Flux when the model doesn't even fully fit in the VRAM?
Is there some sort of rule of speed degradation per gb of spill over into RAM? I feel like my intuition is way off... Whenever I read about best GPUs for SD everyone says VRAM is essential for speed as, if your model doesn't fit on your card then you will have a huge speed drop off, but this doesn't seem terrible at all.
Any thoughts?
I'm training a flux lora om my 3060 and I finally got it working but I'm getting 235s/image with 1024x1024 at rank/dim 4 and alpha 2.
Is this reasoable? or is something wrong with my setup ? At this rate It's going to take several days of training to finish the training.. Realistic ?
Here are my parameters:
accelerate launch ^ --mixed_precision bf16 ^ --num_cpu_threads_per_process 1 ^ sd-scripts/flux_train_network.py ^ --pretrained_model_name_or_path "models\unet\flux1-dev-fp8-e4m3fn.safetensors" ^ --clip_l "\models\clip\clip_l.safetensors" ^ --t5xxl "models\clip\t5xxl_fp16.safetensors" ^ --ae "models\vae\ae.sft" ^ --cache_latents_to_disk ^ --save_model_as safetensors ^ --sdpa --persistent_data_loader_workers ^ --max_data_loader_n_workers 2 ^ --seed 42 ^ --gradient_checkpointing ^ --mixed_precision bf16 ^ --save_precision float ^ --network_module networks.lora_flux ^ --network_dim 4 ^ --optimizer_type adafactor ^ --optimizer_args "relative_step=False" "scale_parameter=False" "warmup_init=False" "weight_decay=0.01" ^ --split_mode ^ --network_args "train_blocks=all" ^ --lr_scheduler constant ^ --max_grad_norm 0.0 ^ --sample_prompts="C:\ai\pinokio\api\fluxgym.git\sample_prompts.txt" ^ --sample_every_n_steps="34" ^ --learning_rate 0.00015 ^ --cache_text_encoder_outputs ^ --cache_text_encoder_outputs_to_disk ^ --fp8_base ^ --highvram ^ --max_train_epochs 100 ^ --save_every_n_epochs 3 ^ --dataset_config "dataset.toml" ^ --output_dir "C:\ai\pinokio\api\fluxgym.git\outputs" ^ --output_name "**1024-2" ^ --timestep_sampling shift ^ --discrete_flow_shift 3.1582 ^ --model_prediction_type raw ^ --guidance_scale 1 ^ --loss_type l2 ^ --network_alpha 2 ^ --multires_noise_discount 0.3 ^ --flip_aug ^ --text_encoder_lr 0 ^ --apply_t5_attn_mask