Ian Hughes – Senior Analyst IoT, 451 Research part of S&P Global Market Intelligence, Sci-fi Author epredator, Doctor of Technology (Hons)

 

The past few months have seen a rapid development in the ability for anyone with access to a computer or social media to be able to create stunning high-quality images from simply describing what they would like to see with a few words in a text prompt. You may see things mentioned like Dall-E, Midjourney, Nightcafe Studio in this area. One of the most significant developments though was the release in August 2022, into the open source world, of Stable Diffusion, an AI image generating model created by Stability AI that you run on your own machines rather than the others which are applications you subscribe to at various tiers of usage.

What do you mean?

It is best to first show images of what these AI generators make and from what. In the spirit of Halloween, I asked the application Midjourney to imagine (that is the text prompt they use) “Ghost making pizza”. This image was generated, along with 3 similar options in a matter of seconds. As you can see it’s what I asked for, it may not have been what I was expecting, but it just works.

 

All the various applications have their own style and quirks, they can also generate in the style of other artists such as Monet or Picasso and in a movie poster style, photo, cyberpunk, tattoo, manga, wall tile and so on. You name it, there will be something that can be added to a text prompt to create an unusual image.

“Basingstoke Canaletto” caused the 4 images to be generated by the AI algorithm.

Much longer descriptions can be used to try and be more specific and you will see some people finding just the right way to get a perfect image through tweaking, and trial and error. Also, any of the images can also be used as a seed to generate similar images, and a picture is of course worth 1,000 words.

The devil is in the detail, and just like the unexpected twists and turns in the film Bedazzled where Peter Cook (playing the Devil) disappoints Dudley Moore by being very literal in the wishes he grants for him, so to the text to image throws curve balls. I went through a thread of thought to take well known stars and get them doing things they may not have been seen doing. The first was Laurel and Hardy breakdancing in the street, which without any other prompt came out in black and white, and in a odd swirly style too.

This led to many more including “Elvis skydiving” which came out pretty much as I thought it might.
It went a bit haywire though, when I asked it to generate “Chuck Norris pruning roses” as you can see. There is a also a lot of discussion on why horses seldom get the right number of legs, but it these imperfections in either the models being trained or our use of language that makes this interesting.

This one didn’t make it to my growing album of Stars in Places but I will try again to get just what I am asking for to happen. This may seem like a way to “cheat” at art, but it is instead a different form of creative expression and of image curation. Sometimes the first pass does just what you want, other times you find multiple resubmissions, tweaks or even a new line of though emerges from what is generated. In looking at art styles people become aware of many different artists and the techniques they use. Our other forms of art will continue as they have through the ages, this is just another tool.

 

What about Stable Diffusion?

I mentioned at the start that Stable Diffusion has been released so that you can run your own local application to generate images. It’s a bit of techie messing around but there are some instructions here. You will also start to see these sorts of text or sketch prompts appearing in existing apps, e.g. I opened PhotoLeap, an iphone image editing app, to be greeted by new text to image generating features there.

 

The Future

We are already seeing the next obvious steps, if you can generate one image, then you can generate a series of images, then you have an AI generated movie. There are tools already to generate 3D objects rather than just 2D images, which means we will be able to ask for entire virtual worlds in the Metaverse to be created based on our description of what we would like to see.