The machine vision & learning group led by Prof. Björn Ommer (LMU Munich) developed an AI algorithm that is capable of converting text into images.
The LMU scientists used the Stable diffusion AI model which creates images from text in seconds without using supercomputers. The AI model was trained on servers of Stability.Ai, a start-up that supported their project. The main attribute of this approach is that it is so compact that it can run on a conventional graphics card instead of a supercomputer normally used for image synthesis. Artificial intelligence can extract important information from billions of training images and construct it into an AI model of just a few gigabytes.
“This additional computing power and the extra training examples turned our AI model into one of the most powerful images synthesis algorithms,” says the computer scientist. “Once such AI has understood what constitutes a car or what characteristics are typical for an artistic style, it will have apprehended precisely these salient features and should ideally be able to create further examples, just as the students in an old master’s workshop can produce work in the same style,” explains Ommer.
The LMU scientists envision making computers learn ‘how to see that is to say’ i.e. to understand the content of images. This is another milestone that can be achieved through further research in machine learning and computer vision. The trained model was available free of charge under the “CreativeML Open RAIL-M” license to enable further research and application of this technology. “We are excited to see what will be built with the current models as well as to see what further works will be coming out of open, collaborative research efforts,” says doctoral researcher Robin Rombach.
Click here for more details