Musicians are waiting patiently as AI tools like Dalle 2 and Midjourney roll out to the visual art world (i.e. Twitter and Reddit). Achieving the seemingly impossible, neural networks have empowered non-artists to transform their words and ideas into a collection of high quality AI-generated images, almost instantaneously. So how much longer do we have to wait until something similar is available for music producers?
Currently the closest option out there is AudioCipher, a MIDI plugin that translates words and phrases into melodies within the DAW. Your music is generated through a transcoding algorithm with rhythm randomization. It's still up to the musician to take this inspiration and create a full song out of it. But it might not be long until text-based AI music apps generate the full arrangement. We'll see.
So in the mean time, we're watching closely as competitors go head to head in the text-to-image space. As of July 2022, Dalle still requires would-be users to join a long waitlist. MidJourney on the other hand recently opened its doors to anyone with a direct invite. So we decided to hop in and take a look to see what kind of interesting musical imagery it could conjure for us.
Quick Overview: How Midjourney Works
Unlike its competitors, Midjourney loads within a private discord community. This may seem like an odd choice at first, but the Discord medium comes with a number of perks that might not be obvious. For example, mobile users of Dalle Mini (Craiyon) and Dalle 2 have to keep their phone open to that browser window while the image renders, or the system crashes.
MidJourney's decision to render images in Discord has a big impact for mobile users. It means you can open and close the app to do other tasks while the images load. When you navigate away from Discord and come back later, all the images are ready to go. In contrast, browser-based Dalle apps force mobile users to generate one set of images at a time and wait while they render.
Choosing an image prompt
When you've signed up and logged into MidJourney on Discord, you'll have a few options for where to start generating images. There are dozens of "general" channels that are available to free users. You'll get to see a stream of content generated by people around the world. Maybe you'll catch some inspiration there.
If you prefer to keep images private, you can sign up for a paid account and move your prompts over to a chat with the MidJourney bot. The bot will listen for special commands from you, like "/imagine prompt:", that summon the main image generator function. From there, you can type in a short phrase or even a full paragraph of descriptive text.
Below is an example of what MidJourney did with the simple phrase "A musical sorcerer". Within sixty seconds, we had four images depicting the concept of a wizard with their instruments. Beneath the renders, MidJourney provides a set of clickable boxes labeled U1-4 and V1-4.
The U1-4 options in MidJourney refer to the option for upscaling one of the four images that were generated. In other words, you can increase the detail and resolution of a target image twice, until you come away with what you wanted. On the other hand, if you prefer to see some variations on one of the images, use the V1-4 options to create slightly different versions.
If the art style isn't quite what you had in mind, you can always build upon the concept by adding more descriptors. Instead of a digital painting, what if our musical sorcerer was designed to look like a muppet?
Here's what happened when we asked for four variations on a design we liked:
Lastly, we chose the image that looked the closest to our vision. It needed to have an obvious musical instrument, so we picked that version and upscaled it. In the end we had a banjo-wielding muppet wizard who seemed to in fact be made from felt and yarn:
From MidJourney to MIDI-Journey
For as much hype as MidJourney and Dalle have enjoyed in the visual arts, AI music apps seem to be lagging behind. The same $1B tech company responsible for Dalle has also been generating music tech like Jukebox and Musenet. Jukebox in particular has been shown to transform text into music, but with less flexible parameters for the input.
The text input on Jukebox allows for a genre of music in the style of a particular artist. Examples we've seen don't attempt to pair genres with unrelated artists. They generate full songs, albeit with warbly instruments and garbled vocals. The tech is not available for public use at this time.
Site visitors at JukeBox can read the lyrical output as they listen to the music, thanks to Soundcloud's karaoke feature. Here's an example of some music and lyrics generated in the style of Tupac:
Entertaining as it is to see how AI generates music from text, here at AudioCipher we've always dreamed of something greater. We created a tool that generates MIDI files that producers can edit and build upon. As natural language processing APIs become readily available, we will be standing by ready to hook up our VST to a neural network that interprets the meaning and emotional qualities of a phrase.
AudioCipher has been adopted by thousands of musicians and beatmakers, as a source of creative inspiration. It generates MIDI melodies in the key signature of your choice, with control over the rhythmic output. Just drag the phrase to your virtual instrument to hear it played back with any sound patch.
So what do you want to see from AudioCipher in the coming years?
Would you prefer that AudioCipher generate a single melody and let you build it into your own song, to retain creative agency?
Or would you prefer to see AudioCipher move in the direction of software like MidJourney and Jukebox, so that the notes and arrangement of our output evoke and symbolize the words you type in?
We're listening to your input and will build the next app versions accordingly.