Generating creative images with Stable Diffusion
Stable diffusion allows non-artistic individuals, like myself, to create stunning images simply by providing a text prompt.
Figure 1: highly detailed painting of san francisco in the style of thomas kinkade, soft lighting, 4k resolution
In this article I’ll explain a simple template I use to create incredible images. I’ve only tested this on Stable Diffusion, so I don’t know if this translates to DALL-E or Midjourney.
If you want to follow along, right now I recommend using playgroundai since their UI is simple and intuitive (I have no relationship with them). This field moves so fast, I have to mention that this information is accurate as of April 2023.
1. A simple template
This is the prompt used to create the image above:
highly detailed painting of san francisco in the style of thomas kinkade, soft lighting, 4k resolution
Let’s break it down.
a. Art form + qualifier
highly detailed painting
First you decide what art form you want to create. Here I want something that looks like a painting. You could use other art forms and even be more specific. Examples:
- different types of painting (oil, watercolor, pastel, matte, etc.)
- sketching (line drawing, caricature, etc.)
- digital art (vector, etc.)
Once’ve decided on the art form, prepend a qualifier. Something simple like
highly detailed will work.
What or who is the subject of your image?
In this example, I’ve chosen San Francisco. Because it’s such a big city, you’ll get vastly different images each time you run it. I’m okay with this.
But if you want something specific, you should specify it here. If I wanted the Golden Gate bridge only, I’ll write
Golden Gate bridge of San Francisco so that it narrows down your images to mostly those of Golden Gate.
What if your subject is not well-known or generic? Let’s say I’m imagining a German Shephard dog. It’s kind of generic but at the same time there’s many different types. Try your best to describe what you’re imagining:
a large well-proportioned German Shepherd dog, ears are large and stand erect, tail is bushy and curves downward
Another option is to get a reference image. Maybe your neighbour’s dog is what you’re imagining. Then use that with the
Image to Image option so that the SD model uses the image as a reference along with your prompt.
in the style of thomas kinkade,
This part has the most impact on your image (other than the subject of course).
You can choose the style of one or more artists here. Usually I keep the number of artists between 1-3. I find the quality decreases when you add more than 3 artists.
You can be quite creative in combining styles here. You don’t have to be limited to the art form you’ve chosen. For example, even if your art form is oil painting, you can add a style like “Studio Ghibli” and get something weird and wonderful.
Here are some resources for finding known artists in Stable Diffusion:
soft lighting, 4k resolution
Finally, you want to end your prompt with more qualifiers. You want some combination of:
- art specific qualifiers
- quality qualifiers
What do I mean by art specific qualifiers?
If you’re trying to create an oil painting, use qualifiers that describe that type of art form. For example:
- richly pigmented
- well-blended smooth texture
If you’re trying to create something that looks realistic, use qualifier words from photography. For example:
- photorealistic or hyperrealistic
- studio lighting
- soft volumetric lights
- cinematic lighting
What about quality qualifiers? These can be more generic. Example phrases include:
- 4k resolution
- trending on art station
- high quality
Add a few of these artistic and quality qualifiers to round off your prompt.
e. Order of your prompt
- Art form + qualifier
- Artistic + quality qualifiers
The level of importance decreases with each word in your prompt, so you don’t want to stray too much from the above order.
2. Negative prompts
Only use this if you’ve tried different variations of your prompt and you’re still not satisfied.
How does this work? Stable Diffusion can potentially create many variations of our subject “San Francisco”. Which ones will it choose? For example, they can be beautiful and artistic, or ugly and horrendous.
Negative prompts let you exclude things from your image.
Here is an example of a negative prompt for generating images of people:
((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), out of frame, extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), ((cross-eyed)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))
The parentheses put emphasis on certain terms. This negative prompt is more tailored for generating images of people. Unfortunately Stable Diffusion has been shown to malform certain body parts like fingers. If you’re not generating images of people, you can remove the relevant text (eg. arms, legs, neck, face).
If you want something generic, you can start with something like this:
(((ugly))), ((mutilated))), ((deformed)), blurry, out of frame
Depending on the UI you’re using, you may some of these parameters.
Different models produce different results. The two main Stable Diffusion models are 1.5 and 2.1. In my opinion, 1.5 generally performs better. It’s also more controversial.
SD 2 contains fewer celebrity and artistic images. This means using prompts like “in the style of (artist name)” doesn’t work well.
Are there cases where SD 2 works better? Yes.
SD 2 has a depth model, meaning if you pass in an image to it along with a prompt, it can preserve the relative geometry of your reference image. This lets you transform images that look radically different from the original, but still preserves the depth of the original.
There are other changes that are not as important now.
b. Image Dimensions
How big do you want your image to be?
c. Prompt Guidance
The higher this value, the closer the output image will be to your prompt. This also means SD will be less creative. Usually I stick to values between 7-15. 7 being if I want something creative. 15 if I want an image that is very close to my prompt.
d. Quality & Details
The higher the value here, the higher the quality and the longer it’ll take to generate the image(s). I find that having a value of at least 100 gives you good enough quality. Feel free to experiment. Higher values do not always translate to higher quality.
SD takes in a random number. That can help you recreate the same image if you use the same seed. This also means you can create slightly different variations of your images if you pass in different seeds, but with the same prompt and parameters above.
Experimenting is key to getting an intuitive understand of prompting and parameter tuning. There are also different fine-tuned models of Stable Diffusion. One place to keep up with what’s happening is the SD subreddit. Enjoy =)
Have some thoughts on this post? Reply with an email.