Posts

Search Results

Creative Corner » Text-to-image prompting » Post 46

Equum

@Thoryn
Did you end up making progress on this? Settle on any specific workflows?
I’m basically reliving this experience you’re having as I try to get up to speed on all this image generation stuff after stumbling into it in 2026. Lots of the same issues here.

Posted about 3 hours ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 45

Thoryn

Latter Liaison

@tyto4tme4l
It’s intended to be NSFW in my case.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 44

tyto4tme4l

Something of an artist

@Thoryn
If it’s SFW, then maybe try prompting it in Bing Image Creator and then use the result for img2img in Stable Diffusion? Bing has a much better prompt following.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 43

Thoryn

Latter Liaison

I’m struggling hard with text to image prompts getting characters to interact with furniture, prompting for them to be on something like a changing table is seemingly impossible with PonyV6 and ZoinksNoob. I’ve tried CFG from 4 to 7, and even kept the prompts to 75 blocks(?) on many of the attempts.

Figured I’d work around it by using a dresser instead, but it refuses to put characters on it.
The dresser gets identified, since if I type e.g. wooden or white dresser it usually changes accordingly, but it still puts the character on a bed, in a crib, ordinary table or the floor instead. Even when I up the strength for dresser, and puts crib and bed in negative prompts, it still includes one of them to put the character there, if not the floor.
Using alternative words for dresser, like chest of drawers, or similar things like commode, has the same outcome.

Later in the year I’ll hopefully have a better setup and can try inpainting/sketching (as if I’m not bad enough at drawing under optimal conditions already, trackpad really sucks for drawing), but for now I’m dumbfounded as to why I can’t get my prompts to work how I want. Any tips would be appreciated.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 42

Lord Waite

@mp40
No problem.

Nice thing about ollama is that it’s a service running, so you install that and install whatever models you want though it, then you can use any program that talks to ollama to interact with the models. (And there are ComfyUI nodes that can talk to it.)

Ollama itself is over here, as it lists of all the various models you can install with it (Though bear in mind the size. 1b, 3b, or 8b is fine. Don’t download 70b models…):
https://ollama.com/

You can technically talk to the models directly with ollama, but that’s chatting through a command line, so you really do want another program to use with it as an interface.

I personally am using Open WebUI with it:
https://docs.openwebui.com/
https://github.com/open-webui/open-webui

When installing it with docker, you can choose to install a version that has ollama as well, but I did them separately.

Open WebUI gives you a nice web interface where you can chat with any of the models you install, and even has a way to set it up to talk to ComfyUI, so you can send text from a chat directly to comfyui to generate an image using it as a prompt. It’s fairly fun to play with.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 41

mp40

Thanks for the info! I was using a much simpler setup – I had Mistral small installed via pinoko and was just trying jailbreak prompts, even though I thought Mistral small was uncensored.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 40

Lord Waite

@mp40
Oh, also, one thing worth mentioning is that I think the longer the system prompt is, the more likely it is for the system prompt to start going out of the context window. I’ve noticed that since the instructions to uncensor it are at the beginning, it tends to start becoming censored again if you put too much in the system prompt.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 39

Lord Waite

@mp40
No problem. It’s one of these spots where I really need to play more with it, and there might be better ways to do some of it, but that’s what was getting me results.

I remember one oddity was that autocomplete on what I was typing kept giving I can’t talk about this topic type lines, but the actual response was uncensored.

There could easily be better models, too. I just remember trying two or three and this was the one that was giving decent results.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 38

mp40

@Lord Waite
Thanks!

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 37

Lord Waite

@mp40
I haven’t done more with it, but with ollama, the key was making a custom “Modelfile” file, and creating a model from that Modelfile.

What you can do is copy the modelfile of an existing model and modify it.

So, first:
ollama pull rolandroland/llama3.1-uncensored

to install the modelfile you are going to base it off of. Then, if you run:
ollama show rolandroland/llama3.1-uncensored --modelfile

It’ll print out on the console that models Modelfile, so just copy that to a file named Modelfile.

Then change the FROM section to say:
FROM rolandroland/llama3.1-uncensored:latest

and add a section at the bottom that says:
SYSTEM"""<your prompt here>"""

And just write a prompt for how the AI is going to act there. You basically want to describe to it what its purpose is, and let it know that it’s uncensored and can describe sexual acts and such, tell it not to add in disclaimers, tell it the exact format that a prompt should be in and the type of words it should use, and give it a few examples of real prompts.

(I’d give one here, but looking at it, I really want to clean it up and improve it. I was explicitly telling it to add the line of score tags, then a source and rating tag, then a description, then several paragraphs of danbooru tags.)

Then run:
ollama create <new mode name> --file Modelfile

Keep the modelfile, try using the model you generated, and if you want to tweak it, do:
ollama rm <model>
change the modelfile, and rerun the create command.

That’s basically how to do it, in any case, the key is going to be playing with creating a prompt until something sticks, and basing it off the right model, as I remember trying it with a different model or two and not having as much luck…

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 36

mp40

@Lord Waite
Have you done anything else with this? I’m looking for resources on how to build or curate a llm of my own but the “uncensored” model is still denying some of my prompt requests, do I just need to try other jailbreak prompts till somthning works or?

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 35

Lord Waite

I always feel like it helps to have a bit of a base understanding on how the models work on these things.

Initially, someone created a large dataset of images and descriptions. The descriptions were tokenized, and the images cut up into squares. Then, random noise was generated based on a seed. It took one square, generated random noise based on a seed, and attempted to denoise the noise into the image on the square. Once it got something close, it discarded the square and grabbed another one. At the end, all of this was saved in a model.

Now, what happens when you are generating an image is that your prompt is reduced to tokens by a text encoder (XL based models use CLIP-L and CLIP-G), random noise is generated by the specified seed, and then the sampler and noise schedule is how it denoises, with as many steps as you specify.

Some schedulers introduce a bit of noise at every steps, namely the ancestral ones (with an a at the end), and sde, but there may be others. With those ones, the image is going to change more between steps and they’ll be more chaotic. Also, some will take less steps then others to get to a good image, and how long each step takes will vary a bit. I believe some are just better at dealing with certain things in the image, too, so it’ll take some playing around.

Now, the clip text encoder actually can’t cope with anything more than 77 tokens at once, and that includes a start and end token, so effectively 75. So if your prompt is more than 75 tokens, it gets broken up into chunks of 75.

The idea behind “BREAK” is that you are telling it to end the current chunk right there and just pad it out with null tokens at the end. The point is just that you’re making sure that particular part of the prompt is all in the same chunk. I’ve had mixed results on it, so I try doing it that way occasionally, but also don’t a lot of the time. It is going to have trouble with getting confused anyways. This is just an attempt to minimize it a bit.

(Text encoding is one of the differences between model architectures, too. 1.* & 2.* had one clip, XL has two, then when you start getting into things like flux and 3, you start dealing with things like two clips and a t5 encoder, and the t5 encoder accepts more like 154 tokens. I also didn’t get into the vae, which is actually what turns the results into an image…)

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 34

Thoryn

Latter Liaison

I’ve seen some guides mention to use BREAK in prompts to help guide the model. E.g.
Description of scenery
BREAK
Character 1 wearing denim jeans and red sweater sitting on a bench
BREAK
Character 2 wearing black suit with bowtie walking in the background

But I’m not having much success with it, it still gets confused as to who wears/does what.
Any of you using it successfully?

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 33

MareStare

Mare Zealot

@Zerowinger

The quantity of steps depends on the sampler, for Euler it’s 25+ sampling steps, but sometimes it can be lower. I guess it depends on composition and it’s never constant. I recommend just trying different settings and checking if increasing the steps substantially improves the image

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 32

Zerowinger

3-3/4" Army Man Fan

@Scarlet Ribbon
Okay, interesting.

In your opinion, what sampler is best to use when creating pony images in Pony Diffusion? On CivitAI, the default appears to be DPM++ 2M Karras.

@MareStare
Okay.

Another question - generally, how many steps for generation produce the best results?

Posted about a year ago Report
Edited about a year ago

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 31

MareStare

Mare Zealot

@Zerowinger
You can use full sentences to describe the prompt with Pony Diffusion as well. Citing from their page on civitai the recommended prompt format:

score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up, just describe what you want, tag1, tag2

where tag1, tag2 are simple words/word combinations similar to derpibooru tags like “unicorn, blushing, trio, duo”, etc

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 30

Scarlet Ribbon

True Wildcard

@Zerowinger
Different models are trained in different ways, leading to some models being better for natural language, and others better for tag-based prompting. Pony doesn’t completely fail with natural language prompting, but in my experience it performs much better with tag-based. If you add source_pony to your prompt, you can damn near just use Derpi/Tanta tags to get most of the results you’re looking for.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 29

Zerowinger

3-3/4" Army Man Fan

@MareStare
So basically, including that string is necessary for higher quality images then? What about the rest of the prompting? On Imagen, I’m used to using full sentences and phrases to describe exactly what I want the output to be, with Pony Diffusion it seems the go to format is to list each individual aspect as a prompt separated by a comma.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 28

MareStare

Mare Zealot

@Zerowinger
The score_* tags are specific to Pony Diffusion. Their original idea was that you’d be able to write score_7_up tag only (just a single tag), and you’d get an image based on the dataset of images of quality 7 or higher.

However, the way this was implemented during training was wrong, and completely broken. The developers discovered this bug only in the middle of training, at which point fixing that bug would be too expensive (they’d need to restart training from scratch again, which would cost them potentially several tens or even hundreds of thousands of dollars). So, they kept the bug, and made a guideline to include that lengthy score_9, score_8_up, ... etc. string at the start of the prompt to work around it.

There is more detail on this training fiasco in this article: https://civitai.com/articles/4248/what-is-score9-and-how-to-use-it-in-pony-diffusion

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 27

Zerowinger

3-3/4" Army Man Fan

So, I tried out Pony Diffusion on Civitai to some success, and part of the prompt was copy-pasting the score_x score_up prompts that I had seen elsewhere. However, I’m a little confused as to exactly how those prompts work, the whole text to image format is very different to the style I’m familiar with.

Could I get some insider info on just exactly how this format in Pony Diffusion and similar checkpoints works?

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 26

Scarlet Ribbon

True Wildcard

@Thoryn
I have the same GPU. I can generate a 1024x1024 image in Comfy UI in less than 15 seconds. I don’t know what was up with automatic1111, but I was getting similarly glacial performance on it.

Strongly recommend you just get rid of it and learn a different front end.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 25

tyto4tme4l

Something of an artist

@MareStare
Great idea! Img2img and inpainting are invaluable tools and it would be useful to describe them in detail. Things like denoising strength and the difference between “Whole picture” vs “Only masked” for the inpaint area are extremely important here.

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 24

MareStare

Mare Zealot

@Thoryn
I used Photoshop for drawing the scribbles, and then moved the images to Forge UI for inpainting. I’m planning to describe some of my learnings and creative process in a shared guide website. I’ll post about it on tantabus discord and create a forum thread on tantabus when it’s more-or-less ready. I’d like to collect all the tips and tricks and organize them on a convenient medium for beginners study.

Posted about a year ago Report
Edited about a year ago

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 23

Thoryn

Latter Liaison

@MareStare
Really cool to see WIP steps like this and have them explained.

And your sketching abilities are way ahead of mine. :p

What program do you use for SD?
Do you handle all the painting in it, or do it elsewhere and move it over to SD?

Posted about a year ago Report

Link Quote Reply

Creative Corner » Text-to-image prompting » Post 22

MareStare

Mare Zealot

@Thoryn

Getting the composition as you want may be too big of a work for text2img. I recommend you to try inpainting with a colored scribble (coloring is important to make AI get the colors right).

For example, this is how I got Fluttershy inplanted into the scenery of this image:

Yeah, you can tell my drawing skills aren’t that good, but Zoinksnoob nailed Flutty almost immediately after I pasted it there and inpainted that area with a denoising strength of something like 0.7+. Sometimes it takes several iterations of drawing a scribble, then letting inpainting improve the detail, and then improve that more detailed version with some lighter scribble to get things exactly as you want.

Posted about a year ago Report

Link Quote Reply

Default search

If you do not specify a field to search over, the search engine will search for posts with a body that is similar to the query's word stems. For example, posts containing the words winged humanization, wings, and spread wings would all be found by a search for wing, but sewing would not be.

Allowed fields

Field Selector	Type	Description	Example
`author`	Literal	Matches the author of this post. Anonymous authors will never match this term.	`author:Joey`
`body`	Full Text	Matches the body of this post. This is the default field.	`body:test`
`created_at`	Date/Time Range	Matches the creation time of this post.	`created_at:2015`
`id`	Numeric Range	Matches the numeric surrogate key for this post.	`id:1000000`
`my`	Meta	`my:posts` matches posts you have posted if you are signed in.	`my:posts`
`subject`	Full Text	Matches the title of the topic.	`subject:time wasting thread`
`topic_id`	Literal	Matches the numeric surrogate key for the topic this post belongs to.	`topic_id:7000`
`topic_position`	Numeric Range	Matches the offset from the beginning of the topic of this post. Positions begin at 0.	`topic_position:0`
`updated_at`	Date/Time Range	Matches the creation or last edit time of this post.	`updated_at.gte:2 weeks ago`
`user_id`	Literal	Matches posts with the specified user_id. Anonymous users will never match this term.	`user_id:211190`
`forum`	Literal	Matches the short name for the forum this post belongs to.	`forum:meta`