Posts
Search Results
Creative Corner » Text-to-image prompting » Post 45
@tyto4tme4l
It’s intended to be NSFW in my case.
It’s intended to be NSFW in my case.
Creative Corner » Text-to-image prompting » Post 44
@Thoryn
If it’s SFW, then maybe try prompting it in Bing Image Creator and then use the result for img2img in Stable Diffusion? Bing has a much better prompt following.
If it’s SFW, then maybe try prompting it in Bing Image Creator and then use the result for img2img in Stable Diffusion? Bing has a much better prompt following.
Creative Corner » Text-to-image prompting » Post 43
I’m struggling hard with text to image prompts getting characters to interact with furniture, prompting for them to be on something like a changing table is seemingly impossible with PonyV6 and ZoinksNoob. I’ve tried CFG from 4 to 7, and even kept the prompts to 75 blocks(?) on many of the attempts.
Figured I’d work around it by using a dresser instead, but it refuses to put characters on it.
The dresser gets identified, since if I type e.g. wooden or white dresser it usually changes accordingly, but it still puts the character on a bed, in a crib, ordinary table or the floor instead. Even when I up the strength for dresser, and puts crib and bed in negative prompts, it still includes one of them to put the character there, if not the floor.
Using alternative words for dresser, like chest of drawers, or similar things like commode, has the same outcome.
The dresser gets identified, since if I type e.g. wooden or white dresser it usually changes accordingly, but it still puts the character on a bed, in a crib, ordinary table or the floor instead. Even when I up the strength for dresser, and puts crib and bed in negative prompts, it still includes one of them to put the character there, if not the floor.
Using alternative words for dresser, like chest of drawers, or similar things like commode, has the same outcome.
Later in the year I’ll hopefully have a better setup and can try inpainting/sketching (as if I’m not bad enough at drawing under optimal conditions already, trackpad really sucks for drawing), but for now I’m dumbfounded as to why I can’t get my prompts to work how I want. Any tips would be appreciated.
Creative Corner » Text-to-image prompting » Post 42
@mp40
No problem.
No problem.
Nice thing about ollama is that it’s a service running, so you install that and install whatever models you want though it, then you can use any program that talks to ollama to interact with the models. (And there are ComfyUI nodes that can talk to it.)
Ollama itself is over here, as it lists of all the various models you can install with it (Though bear in mind the size. 1b, 3b, or 8b is fine. Don’t download 70b models…):
https://ollama.com/
https://ollama.com/
You can technically talk to the models directly with ollama, but that’s chatting through a command line, so you really do want another program to use with it as an interface.
I personally am using Open WebUI with it:
https://docs.openwebui.com/
https://github.com/open-webui/open-webui
https://docs.openwebui.com/
https://github.com/open-webui/open-webui
When installing it with docker, you can choose to install a version that has ollama as well, but I did them separately.
Open WebUI gives you a nice web interface where you can chat with any of the models you install, and even has a way to set it up to talk to ComfyUI, so you can send text from a chat directly to comfyui to generate an image using it as a prompt. It’s fairly fun to play with.
Creative Corner » Text-to-image prompting » Post 41
Thanks for the info! I was using a much simpler setup – I had Mistral small installed via pinoko and was just trying jailbreak prompts, even though I thought Mistral small was uncensored.
Creative Corner » Text-to-image prompting » Post 40
@mp40
Oh, also, one thing worth mentioning is that I think the longer the system prompt is, the more likely it is for the system prompt to start going out of the context window. I’ve noticed that since the instructions to uncensor it are at the beginning, it tends to start becoming censored again if you put too much in the system prompt.
Oh, also, one thing worth mentioning is that I think the longer the system prompt is, the more likely it is for the system prompt to start going out of the context window. I’ve noticed that since the instructions to uncensor it are at the beginning, it tends to start becoming censored again if you put too much in the system prompt.
Creative Corner » Text-to-image prompting » Post 39
@mp40
No problem. It’s one of these spots where I really need to play more with it, and there might be better ways to do some of it, but that’s what was getting me results.
No problem. It’s one of these spots where I really need to play more with it, and there might be better ways to do some of it, but that’s what was getting me results.
I remember one oddity was that autocomplete on what I was typing kept giving I can’t talk about this topic type lines, but the actual response was uncensored.
There could easily be better models, too. I just remember trying two or three and this was the one that was giving decent results.
Creative Corner » Text-to-image prompting » Post 38
@Lord Waite
Thanks!
Thanks!
Creative Corner » Text-to-image prompting » Post 37
@mp40
I haven’t done more with it, but with ollama, the key was making a custom “Modelfile” file, and creating a model from that Modelfile.
I haven’t done more with it, but with ollama, the key was making a custom “Modelfile” file, and creating a model from that Modelfile.
What you can do is copy the modelfile of an existing model and modify it.
So, first:
ollama pull rolandroland/llama3.1-uncensoredto install the modelfile you are going to base it off of. Then, if you run:
ollama show rolandroland/llama3.1-uncensored --modelfileIt’ll print out on the console that models Modelfile, so just copy that to a file named Modelfile.
Then change the FROM section to say:
FROM rolandroland/llama3.1-uncensored:latestand add a section at the bottom that says:
SYSTEM"""<your prompt here>"""And just write a prompt for how the AI is going to act there. You basically want to describe to it what its purpose is, and let it know that it’s uncensored and can describe sexual acts and such, tell it not to add in disclaimers, tell it the exact format that a prompt should be in and the type of words it should use, and give it a few examples of real prompts.
(I’d give one here, but looking at it, I really want to clean it up and improve it. I was explicitly telling it to add the line of score tags, then a source and rating tag, then a description, then several paragraphs of danbooru tags.)
Then run:
ollama create <new mode name> --file ModelfileKeep the modelfile, try using the model you generated, and if you want to tweak it, do:
change the modelfile, and rerun the create command.
ollama rm <model>change the modelfile, and rerun the create command.
That’s basically how to do it, in any case, the key is going to be playing with creating a prompt until something sticks, and basing it off the right model, as I remember trying it with a different model or two and not having as much luck…
Creative Corner » Text-to-image prompting » Post 36
@Lord Waite
Have you done anything else with this? I’m looking for resources on how to build or curate a llm of my own but the “uncensored” model is still denying some of my prompt requests, do I just need to try other jailbreak prompts till somthning works or?
Have you done anything else with this? I’m looking for resources on how to build or curate a llm of my own but the “uncensored” model is still denying some of my prompt requests, do I just need to try other jailbreak prompts till somthning works or?
Creative Corner » Text-to-image prompting » Post 35
I always feel like it helps to have a bit of a base understanding on how the models work on these things.
Initially, someone created a large dataset of images and descriptions. The descriptions were tokenized, and the images cut up into squares. Then, random noise was generated based on a seed. It took one square, generated random noise based on a seed, and attempted to denoise the noise into the image on the square. Once it got something close, it discarded the square and grabbed another one. At the end, all of this was saved in a model.
Now, what happens when you are generating an image is that your prompt is reduced to tokens by a text encoder (XL based models use CLIP-L and CLIP-G), random noise is generated by the specified seed, and then the sampler and noise schedule is how it denoises, with as many steps as you specify.
Some schedulers introduce a bit of noise at every steps, namely the ancestral ones (with an a at the end), and sde, but there may be others. With those ones, the image is going to change more between steps and they’ll be more chaotic. Also, some will take less steps then others to get to a good image, and how long each step takes will vary a bit. I believe some are just better at dealing with certain things in the image, too, so it’ll take some playing around.
Now, the clip text encoder actually can’t cope with anything more than 77 tokens at once, and that includes a start and end token, so effectively 75. So if your prompt is more than 75 tokens, it gets broken up into chunks of 75.
The idea behind “BREAK” is that you are telling it to end the current chunk right there and just pad it out with null tokens at the end. The point is just that you’re making sure that particular part of the prompt is all in the same chunk. I’ve had mixed results on it, so I try doing it that way occasionally, but also don’t a lot of the time. It is going to have trouble with getting confused anyways. This is just an attempt to minimize it a bit.
(Text encoding is one of the differences between model architectures, too. 1.* & 2.* had one clip, XL has two, then when you start getting into things like flux and 3, you start dealing with things like two clips and a t5 encoder, and the t5 encoder accepts more like 154 tokens. I also didn’t get into the vae, which is actually what turns the results into an image…)
Creative Corner » Text-to-image prompting » Post 34
I’ve seen some guides mention to use BREAK in prompts to help guide the model. E.g.
Description of scenery
BREAK
Character 1 wearing denim jeans and red sweater sitting on a bench
BREAK
Character 2 wearing black suit with bowtie walking in the background
Description of scenery
BREAK
Character 1 wearing denim jeans and red sweater sitting on a bench
BREAK
Character 2 wearing black suit with bowtie walking in the background
But I’m not having much success with it, it still gets confused as to who wears/does what.
Any of you using it successfully?
Any of you using it successfully?
Creative Corner » Text-to-image prompting » Post 33
The quantity of steps depends on the sampler, for Euler it’s 25+ sampling steps, but sometimes it can be lower. I guess it depends on composition and it’s never constant. I recommend just trying different settings and checking if increasing the steps substantially improves the image
Creative Corner » Text-to-image prompting » Post 32
@Scarlet Ribbon
Okay, interesting.
Okay, interesting.
In your opinion, what sampler is best to use when creating pony images in Pony Diffusion? On CivitAI, the default appears to be DPM++ 2M Karras.
@MareStare
Okay.
Okay.
Another question - generally, how many steps for generation produce the best results?
Creative Corner » Text-to-image prompting » Post 31
@Zerowinger
You can use full sentences to describe the prompt with Pony Diffusion as well. Citing from their page on civitai the recommended prompt format:
You can use full sentences to describe the prompt with Pony Diffusion as well. Citing from their page on civitai the recommended prompt format:
score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up, just describe what you want, tag1, tag2
where
tag1, tag2 are simple words/word combinations similar to derpibooru tags like “unicorn, blushing, trio, duo”, etcCreative Corner » Text-to-image prompting » Post 30
@Zerowinger
Different models are trained in different ways, leading to some models being better for natural language, and others better for tag-based prompting. Pony doesn’t completely fail with natural language prompting, but in my experience it performs much better with tag-based. If you add source_pony to your prompt, you can damn near just use Derpi/Tanta tags to get most of the results you’re looking for.
Different models are trained in different ways, leading to some models being better for natural language, and others better for tag-based prompting. Pony doesn’t completely fail with natural language prompting, but in my experience it performs much better with tag-based. If you add source_pony to your prompt, you can damn near just use Derpi/Tanta tags to get most of the results you’re looking for.
Creative Corner » Text-to-image prompting » Post 29
@MareStare
So basically, including that string is necessary for higher quality images then? What about the rest of the prompting? On Imagen, I’m used to using full sentences and phrases to describe exactly what I want the output to be, with Pony Diffusion it seems the go to format is to list each individual aspect as a prompt separated by a comma.
So basically, including that string is necessary for higher quality images then? What about the rest of the prompting? On Imagen, I’m used to using full sentences and phrases to describe exactly what I want the output to be, with Pony Diffusion it seems the go to format is to list each individual aspect as a prompt separated by a comma.
Creative Corner » Text-to-image prompting » Post 28
@Zerowinger
The
The
score_* tags are specific to Pony Diffusion. Their original idea was that you’d be able to write score_7_up tag only (just a single tag), and you’d get an image based on the dataset of images of quality 7 or higher.However, the way this was implemented during training was wrong, and completely broken. The developers discovered this bug only in the middle of training, at which point fixing that bug would be too expensive (they’d need to restart training from scratch again, which would cost them potentially several tens or even hundreds of thousands of dollars). So, they kept the bug, and made a guideline to include that lengthy
score_9, score_8_up, ... etc. string at the start of the prompt to work around it.There is more detail on this training fiasco in this article: https://civitai.com/articles/4248/what-is-score9-and-how-to-use-it-in-pony-diffusion
Creative Corner » Text-to-image prompting » Post 27
So, I tried out Pony Diffusion on Civitai to some success, and part of the prompt was copy-pasting the score_x score_up prompts that I had seen elsewhere. However, I’m a little confused as to exactly how those prompts work, the whole text to image format is very different to the style I’m familiar with.
Could I get some insider info on just exactly how this format in Pony Diffusion and similar checkpoints works?
Creative Corner » Text-to-image prompting » Post 26
@Thoryn
I have the same GPU. I can generate a 1024x1024 image in Comfy UI in less than 15 seconds. I don’t know what was up with automatic1111, but I was getting similarly glacial performance on it.
I have the same GPU. I can generate a 1024x1024 image in Comfy UI in less than 15 seconds. I don’t know what was up with automatic1111, but I was getting similarly glacial performance on it.
Strongly recommend you just get rid of it and learn a different front end.
Creative Corner » Text-to-image prompting » Post 25
@MareStare
Great idea! Img2img and inpainting are invaluable tools and it would be useful to describe them in detail. Things like denoising strength and the difference between “Whole picture” vs “Only masked” for the inpaint area are extremely important here.
Great idea! Img2img and inpainting are invaluable tools and it would be useful to describe them in detail. Things like denoising strength and the difference between “Whole picture” vs “Only masked” for the inpaint area are extremely important here.
Creative Corner » Text-to-image prompting » Post 24
@Thoryn
I used Photoshop for drawing the scribbles, and then moved the images to Forge UI for inpainting. I’m planning to describe some of my learnings and creative process in a shared guide website. I’ll post about it on tantabus discord and create a forum thread on tantabus when it’s more-or-less ready. I’d like to collect all the tips and tricks and organize them on a convenient medium for beginners study.
I used Photoshop for drawing the scribbles, and then moved the images to Forge UI for inpainting. I’m planning to describe some of my learnings and creative process in a shared guide website. I’ll post about it on tantabus discord and create a forum thread on tantabus when it’s more-or-less ready. I’d like to collect all the tips and tricks and organize them on a convenient medium for beginners study.
Creative Corner » Text-to-image prompting » Post 23
@MareStare
Really cool to see WIP steps like this and have them explained.
Really cool to see WIP steps like this and have them explained.
And your sketching abilities are way ahead of mine. :p
What program do you use for SD?
Do you handle all the painting in it, or do it elsewhere and move it over to SD?
Do you handle all the painting in it, or do it elsewhere and move it over to SD?
Creative Corner » Text-to-image prompting » Post 22
Getting the composition as you want may be too big of a work for
text2img. I recommend you to try inpainting with a colored scribble (coloring is important to make AI get the colors right).For example, this is how I got Fluttershy inplanted into the scenery of this image:
Yeah, you can tell my drawing skills aren’t that good, but Zoinksnoob nailed Flutty almost immediately after I pasted it there and inpainted that area with a denoising strength of something like 0.7+. Sometimes it takes several iterations of drawing a scribble, then letting inpainting improve the detail, and then improve that more detailed version with some lighter scribble to get things exactly as you want.
Showing results 1 - 25 of 47 total
Default search
If you do not specify a field to search over, the search engine will search for posts with a body that is similar to the query's word stems. For example, posts containing the words winged humanization, wings, and spread wings would all be found by a search for wing, but sewing would not be.
Allowed fields
| Field Selector | Type | Description | Example |
|---|---|---|---|
author | Literal | Matches the author of this post. Anonymous authors will never match this term. | author:Joey |
body | Full Text | Matches the body of this post. This is the default field. | body:test |
created_at | Date/Time Range | Matches the creation time of this post. | created_at:2015 |
id | Numeric Range | Matches the numeric surrogate key for this post. | id:1000000 |
my | Meta | my:posts matches posts you have posted if you are signed in. | my:posts |
subject | Full Text | Matches the title of the topic. | subject:time wasting thread |
topic_id | Literal | Matches the numeric surrogate key for the topic this post belongs to. | topic_id:7000 |
topic_position | Numeric Range | Matches the offset from the beginning of the topic of this post. Positions begin at 0. | topic_position:0 |
updated_at | Date/Time Range | Matches the creation or last edit time of this post. | updated_at.gte:2 weeks ago |
user_id | Literal | Matches posts with the specified user_id. Anonymous users will never match this term. | user_id:211190 |
forum | Literal | Matches the short name for the forum this post belongs to. | forum:meta |