Music is a distinctly human activity. Which leads a lot of us to ask - could a machine ever write good songs?
Launched in April 2020, OpenAI's Jukebox lets users to explore this question directly. With enough RAM, you can upload a snippet of audio and hear what the neural network comes up with.
Jukebox AI creates wild and unpredictable results. The output can be amazing, hilarious, or downright awful. And for the many hours of time and computing power it takes to generate even a minute of audio, we hope you're ready to laugh regardless of the results.
Even when Jukebox veers off in an unexpected direction, frustrating as that might be, this freedom to experiment is what gives Jukebox its power. The generator almost always shows signs of imagination and commitment to novelty.
In its best moments, Jukebox can transform the way you think about an existing piece of music. Trained on a variety of genres, from hip hop to oldies like Frank Sinatra and the Beatles. It can even riff on classic video game music like these Super Mario 64 tracks.
As of January 2023, a new competitor to Jukebox called Dance Diffusion has become available to the public, thanks to an independent developer called Harmonai. Check out our step-by-step instructions on how to use Dance Diffusion to try it out yourself.
How the Jukebox AI song generator works
24 themes from Super Mario 64, reimagined by OpenAI Jukebox
Machine learning is a gradual process where neural networks train on a dataset, using algorithms to interpret and solve problems placed in front of them. It's amazing to hear how a neural net can create its own generative model for music.
Jukebox listens to the song you upload and then replies with a creative statement. Once you have that output, it's commonly stitched together with the original track, so you get a kind of Frankenstein-song.
Back in 2020, when youtube creators were sharing many of these weird musical experiments, Jukebox AI famously trained on Eminem and Juice WRLD's song Godzilla to produce this track:
Seeding OpenAI Jukebox with AudioCipher Output
Around the same time Jukebox launched, AudioCipher published the first ever MIDI plugin that turns words into melodies. The VST lets musicians generate melodies and audition them quickly on any virtual instrument, to break through "blank DAW syndrome" and help musicians get the inspiration flowing.
But what if there was a cheat code for inspiration, beyond melody generators, that got you even further toward the goal line, without literally writing the song for you?
You would type words into Audiocipher based on anything - a moment, a dream, a movie character. Instead of writing lyrics about them, AudioCipher generates a melody that's connected to the thought, but encoded into music.
Once you have that AudioCipher melody, you feed it into Jukebox. Choose the artist/genre that you want to imitate, upload your seed file to the cloud, render a one minute AI generated song, and pull it back into the DAW to study. If you like the concept, you can pick out the best ideas and then compose your own version.
A direct integration between AudioCipher and Jukebox would reduce the time and effort required to produce these song inspirations. We've contacted OpenAI to discuss this opportunity.
Capturing musical feelings with Jukebox
Jukebox has a critical limitation in its dataset that warrants consideration. You can review the Jukebox artist list and genre list, last updated in 2020.
For contrast, check out this example from Marmoset's music licensing website. They've done a fantastic job categorizing music by aesthetics like mood, energy, and dynamic arc. Imagine if an aesthetic musical taxonomy like Marmoset's was applied to Jukebox.
The option to turn anything into an image is what makes Dalle2 so endlessly entertaining. Jukebox can't interpret abstract concepts like bouncy, anthemic, or calm. Those are not genres of music. Jukebox creates from artists and genres, not feelings.
If Jukebox was trained on aesthetic metadata, you might be able to rig up some middleware with GPT-3 that performs sentiment analysis and pipes aesthetic descriptors over to jukebox, to generate the music from a broader set of text input.
Using OpenAI Jukebox with Google Colab
This tutorial is one of many that outlines the steps for firing up Jukebox yourself.
Currently, the Jukebox Github repo has not been updated in two years and the popular method of loading Jukebox into Google Colab seems to have stopped working. One of the TAR files is too large for Colab+ to process, causing the GPU to crash. A Python library called tqdm’s version has also fallen out of sync.
Fortunately, you can still load Jukebox locally. You’ll need a good deal of RAM to run the service and should expect to dedicate up to 12 hours, while the output progresses through two upsampling phases.
Jukebox’s encoder model compresses audio with a quantization approach called VQ-VAE. This technique takes high compressed, raw audio at the top level and transforms the audio through two stages of upsampling to improve its quality.
You can read a complete description of the process on Openai.com.
For anyone interested in exploring the topic further, we've assembled a list of the best AI Music apps that you can use and license. They are the preferred alternative for most people today, who want to experiment with an AI song generator but can't afford to load Jukebox on their local device.
And to bring things full circle, here's a taste of what happens when AI generated music is brought to a group of human musicians and performed on real instruments.
Released in 2016, the media called it the first pop song ever written by AI. It was certainly one of the first attempts to create and distribute a polished studio recording like this. Will humans and machines be bandmates in the future?
fascinating article!