Generating creative images with Stable Diffusion


Stable diffusion allows non-artistic individuals, like myself, to create stunning images simply by providing a text prompt.

Figure: highly detailed painting of san francisco in the style of thomas kinkade, soft lighting, 4k resolution Figure 1: highly detailed painting of san francisco in the style of thomas kinkade, soft lighting, 4k resolution

In this article I’ll explain a simple template I use to create incredible images. I’ve only tested this on Stable Diffusion, so I don’t know if this translates to DALL-E or Midjourney.

If you want to follow along, right now I recommend using playgroundai since their UI is simple and intuitive (I have no relationship with them). This field moves so fast, I have to mention that this information is accurate as of April 2023.

1. A simple template

This is the prompt used to create the image above:

highly detailed painting of san francisco in the style of thomas kinkade, soft lighting, 4k resolution

Let’s break it down.

a. Art form + qualifier

highly detailed painting

First you decide what art form you want to create. Here I want something that looks like a painting. You could use other art forms and even be more specific. Examples:

Once’ve decided on the art form, prepend a qualifier. Something simple like highly detailed will work.

b. Subject

san francisco

What or who is the subject of your image?

In this example, I’ve chosen San Francisco. Because it’s such a big city, you’ll get vastly different images each time you run it. I’m okay with this.

But if you want something specific, you should specify it here. If I wanted the Golden Gate bridge only, I’ll write Golden Gate bridge of San Francisco so that it narrows down your images to mostly those of Golden Gate.

What if your subject is not well-known or generic? Let’s say I’m imagining a German Shephard dog. It’s kind of generic but at the same time there’s many different types. Try your best to describe what you’re imagining:

a large well-proportioned German Shepherd dog, ears are large and stand erect, tail is bushy and curves downward

Another option is to get a reference image. Maybe your neighbour’s dog is what you’re imagining. Then use that with the Image to Image option so that the SD model uses the image as a reference along with your prompt.

c. Artist(s)

in the style of thomas kinkade,

This part has the most impact on your image (other than the subject of course).

You can choose the style of one or more artists here. Usually I keep the number of artists between 1-3. I find the quality decreases when you add more than 3 artists.

You can be quite creative in combining styles here. You don’t have to be limited to the art form you’ve chosen. For example, even if your art form is oil painting, you can add a style like “Studio Ghibli” and get something weird and wonderful.

Here are some resources for finding known artists in Stable Diffusion:

d. Qualifiers

soft lighting, 4k resolution

Finally, you want to end your prompt with more qualifiers. You want some combination of:

What do I mean by art specific qualifiers?

If you’re trying to create an oil painting, use qualifiers that describe that type of art form. For example:

If you’re trying to create something that looks realistic, use qualifier words from photography. For example:

What about quality qualifiers? These can be more generic. Example phrases include:

Add a few of these artistic and quality qualifiers to round off your prompt.

e. Order of your prompt

The level of importance decreases with each word in your prompt, so you don’t want to stray too much from the above order.

2. Negative prompts

Only use this if you’ve tried different variations of your prompt and you’re still not satisfied.

How does this work? Stable Diffusion can potentially create many variations of our subject “San Francisco”. Which ones will it choose? For example, they can be beautiful and artistic, or ugly and horrendous.

Negative prompts let you exclude things from your image.

negative prompt

Here is an example of a negative prompt for generating images of people:

((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), out of frame, extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), ((cross-eyed)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))

The parentheses put emphasis on certain terms. This negative prompt is more tailored for generating images of people. Unfortunately Stable Diffusion has been shown to malform certain body parts like fingers. If you’re not generating images of people, you can remove the relevant text (eg. arms, legs, neck, face).

If you want something generic, you can start with something like this:

(((ugly))), ((mutilated))), ((deformed)), blurry, out of frame

3. Parameters

Depending on the UI you’re using, you may some of these parameters.

a. Model

Different models produce different results. The two main Stable Diffusion models are 1.5 and 2.1. In my opinion, 1.5 generally performs better. It’s also more controversial.

SD 2 contains fewer celebrity and artistic images. This means using prompts like “in the style of (artist name)” doesn’t work well.

Are there cases where SD 2 works better? Yes.

SD 2 has a depth model, meaning if you pass in an image to it along with a prompt, it can preserve the relative geometry of your reference image. This lets you transform images that look radically different from the original, but still preserves the depth of the original.

Figure 2: depth-guided stable diffusion

There are other changes that are not as important now.

b. Image Dimensions

How big do you want your image to be?

c. Prompt Guidance

The higher this value, the closer the output image will be to your prompt. This also means SD will be less creative. Usually I stick to values between 7-15. 7 being if I want something creative. 15 if I want an image that is very close to my prompt.

d. Quality & Details

The higher the value here, the higher the quality and the longer it’ll take to generate the image(s). I find that having a value of at least 100 gives you good enough quality. Feel free to experiment. Higher values do not always translate to higher quality.

e. Seed

SD takes in a random number. That can help you recreate the same image if you use the same seed. This also means you can create slightly different variations of your images if you pass in different seeds, but with the same prompt and parameters above.

Closing

Experimenting is key to getting an intuitive understand of prompting and parameter tuning. There are also different fine-tuned models of Stable Diffusion. One place to keep up with what’s happening is the SD subreddit. Enjoy =)


Have some thoughts on this post? Reply with an email.

If you're interested in updates, you can subscribe below or via the RSS feed

Powered by Buttondown.