If you have followed my series so far, you have started with simple tools and moved up to more powerful but less user-friendly tools. The previous post with Stable Diffusion using Night Cafe was the first post covering a powerful model with much more control and influencing factors — using a simple interface. One of the future posts I plan will dig much deeper into Stable Diffusion.
We are continuing the series with an alternative model called Midjourney, which is more popular than Stable Diffusion or DALL-E, even though the differences are getting less pronounced.
What is Midjourney?
Midjourney is an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species.
We are a small self-funded team focused on design, human infrastructure, and AI. We have 11 full-time staff and an incredible set of advisors.
Midjourney, like Stable Diffusion, is capable of creating stunning images and has a huge community of artists using it. There is a free trial, but at one point you need to buy a subscription to get access to image generation. If you are curious to get started, the documentation is at https://docs.midjourney.com.
Midjourney can only be accessed through Discord, a social chat client. You interact with the Midjourney bot by sending a message into a channel or via direct message. The bot then takes your prompt, creates the image, and posts the images as a reply to you.
First, you need to get access to Discord. There is no need to download an app (but you can, of course). Follow this link — https://discord.gg/midjourney — and create an account. Once you are in, you should see something like this:
Select one of the newbies-channels on the left. You’ll see a new feed in the main window with many images being generated live, and you can watch what others are doing. You can see their prompts and the results.
Type /info into the message box, and send it. You should see an info screen about your subscription (or probably your trial subscription).
This is basically how you interact with Midjourney. You send messages that start with a / and a keyword. Any message you send without that command will be sent as chat messages that anyone else can see.
Let’s make your classical first example image:
You will see a new message indicating the progress of your generation, with intermediary results. This is super interesting because you can see how the AI is creating the images from nothing.
After a minute or so you will get the final results. You see that the look and feel of a default Midjourney image is different than DALL-E or Stable Diffusion. Painted, not rendered. Much more artsy.
Below the image you see nine buttons. The “U” are used to get larger versions of the image. The “V” buttons are used to make variations starting with one of the four images. And the last button repeats the generation based on the prompt and a random new starting point.
Here are the images in detail (using the U buttons):
Let’s also make one set of variations for the last of these images by pressing the V4 button. On the output message, with the 2×2 images, when you add an “envelope” emoji reaction, you will get the four images sent to you as a direct message. Useful to collect images so you find them again!
The images are variations, indeed, but true to the style and look of the original image. Go, and click one of them, and inspect the hands and fingers! Nice! Midjourney seems to have a better skill at making fingers! I don’t really like the embrace shape, but maybe we can make that work, later!
Let’s go to settings, first:
Here you can make some default settings. The most important one, I think, is the Variation Mode: low or high. It’s the amount of randomness that the “V” buttons bring in. One other setting is important: the model. V5.2 is a good model, but V6 is better. You just can’t select it here, yet, so we will need to specify it as a parameter.
/imagine tango --v 6.0 --style raw
It’s a different set of images, as expected. Still more artsy, and still with tango feeling, right?
Finally, let’s go back to the prompt we used for Stable Diffusion:
A milonguero dance couple in the midst of a social dance at a milonga in Buenos Aires. The setting is vibrant and authentic, filled with dancers. The couple is in close embrace, capturing the intimate and intricate style of milonguero dancing. The man is dressed in traditional attire, and the woman is in a stylish dress, both moving gracefully on the dance floor. The background shows other dancers and the lively atmosphere of the milonga, with vintage decorations and warm lighting. –v 6.0 –style raw
Woohoo! What happened? This is photorealistic?! Amazing! But, is that tango? It doesn’t have a tango embrace, just a normal embrace. Are they kissing? What happened?
Here are images from Midjourney v5.2. Different look, less photorealistic, more artsy, and more tango. Notice the left two pictures with the woman leading the man! Nice deviation from stereotype, without any prompt for it!
It’s a matter of the prompt. Midjourney v6 is a different model than the others. It can create realistic images, but you need to be much more specific in how you describe the scene. And since I am not an expert, but I have chatGPT pro, I am asking a custom GPT specialized in making prompts for Midjourney 6 to do it for me:
Here are the results of the five prompts proposed by the GPT (click for details)
This is still not exactly what we are looking for in tango pictures. Here’s one last attempt for today:
A hyperdetailed photograph of Shane, a short blond male with a casual build, and Ruby, a vibrant red-haired woman in her 30s. Shane leads in tango, wearing a simple black outfit, while Ruby in tight jeans and a white tank-top follows with elegance. Hand placements are traditional for tango: Shane's right hand on Ruby's back, left hands interlocked at chest level. Background: a softly lit dance studio with wooden floors and mirrors. Created Using: Realistic details, warm color tones, fluid dance motion, elegant posture, detailed clothing textures, intimate atmosphere, hd quality, natural look, 8k, very detailed, bokeh --ar 16:9 --v 6.0 --style raw