A community-driven organization called Harmonai is leading the charge on open source AI music generation with a tool called Dance Diffusion. We’ve previously covered other generative audio tools from OpenAI (Jukebox and Musenet), but were disappointed to find that the same week OpenAI’s ChatGPT launched, Musenet’s API shut down. It hasn’t come back online since.
In this article, we'll review the basics of how Harmonai makes music and how it differs from other AI music generation companies that focus on licensing. I'll share a sample of music by Harmonai's founder, Zach Evans. Then, to tie everything together, you'll receive a step by step walkthrough on how to use Dance Diffusion to generate your own AI music.
How Harmonai generates AI music from scratch
Harmonai’s Dance Diffusion leverages machine learning to create new music from the ground up. Users can generate audio from an existing model, upload audio samples and regenerate them, or interpolate between two tracks. After running through the notebook's steps, users receive audio files composed entirely by the neural network.
I'm always forthright with people when they first start using neural networks to create music. The audio quality is grainy, warbly, and often sounds a bit weird. If you find the quality of the audio clips disappointing, remember that Dance Diffusion is just getting started. Get comfortable with using Google Colab now and you’ll be ready to go when later iterations start to roll out.
Dance Diffusion isn't the first independent project to roll out alternatives to OpenAI's Jukebox. In December 2022, a new AI audio generator called Riffusion was published. The developers leveraged an image generator called Stable Diffusion to create pictures of wave forms, based on text input from a user, and then sonify it (meaning they turn the images into sound).
Dance Diffusion and Stable Diffusion are both built on an open source API called Stability-AI. To gain a better understanding on how diffusion models work, check out this popular article by Nvidia.
AI Music websites that focus on licensing
Artificial intelligence is the buzzword of our decade. The general public knows so little about this technology that they can’t easily discern what’s happening under the hood. Harmonai’s Dance Diffusion software is the real deal. The audio quality is rough compared to other options out there, and with good reason.
At least half a dozen popular websites are marketing paid AI music generation services with polished, modern user interfaces. These companies hire musicians to create short MIDI clips and upload them to a private database. Subscribers log in and select parameters like BPM, key signature, and genre to render music for commercial use. It's typically ready to download within a couple minutes.
AI music licensing companies like Mubert use artificial intelligence to analyze and select existing sounds. They build arrangements from those clips using AI as well. The sound quality is much better than what you get from Harmonai, but the tech is more like a DJ or amateur beat maker in the sense that it can only sample music. They're positioned to challenge music licensing platforms like Epidemic Sounds, Soundstripe, Artlist, and AudioJungle.
Harmonai's Dance Diffusion notebook doesn't use pre-rendered audio. It takes you right to the source of AI music generation.
Music by Harmonai: DADABOTS & Diffusion Radio
Named after the Dadaist, an absurdist art movement from the early 20th century, DADABOTS are not just some fringe group. They are fully integrated into the global AI music scene as contestants in the 2023 international AI Song Contest. You can check out their AI song submission here.
Toning things down a bit, probably to reach a broader audience, Evans announced at the end of 2022 that he'd been working on this 24 hour live stream all year, called Diffusion Radio. It's still pretty weird, but maybe more digestible than the DADABOTS material. Have a listen to it here:
How to use the Dance Diffusion Notebook
Now that you have a grasp of Harmonai and their founder's musical efforts, let's get into the meat of this article. My goal with this walkthrough is to take you through the basics of turning your own audio clips into new music using the Dance Diffusion notebook.
I’m going to say this again - Harmonai is just getting started and the quality of the audio reflects that. Keep your expectations low for now and wait to see what happens in the coming year. The experience you build here will be helpful later.
To get started, you’ll need a Google account with some available space on Google Drive. You’ll have automatic access to a free Google Colab account, but the basic plan comes with limited GPU and limited memory. You’ll need to upgrade if you want to experiment regularly.
Step 1. Go to the Dance Diffusion notebook and unfold the instructions section
Step 2. Scroll down to the View Changelog section and uncheck the skip_for_run_all box. Then press the play button next to it. This will run the code and you’ll see a green check mark when it’s finished.
Step 3. Scroll down and unfold the next section titled Install Dependencies and Restart Notebook. Just below the header, you’ll see a similar section titled Install and restart. Click the play button next to this, with the understanding that it’s expected to crash automatically in order to restart itself.
Step 4. Scroll down and unfold the Setup section. Press the play button next to Check GPU status and wait for the green check mark again.
Step 5. Scroll to the next section titled Prepare Folders. You have the option to check the save_to_wandb checkbox. If you do, it will log your experiments with a 3rd party tool called Weights & Biases, which to my understanding helps Dance Diffusion for experiment tracking.
Press play and you’ll be prompted to connect to your Google Drive account. When you’ve approved and granted the service permission, you’ll see a green check and you can continue.
Step 6. Press play on Imports and Definitions. This will run some code that pulls in the necessary dependencies to generate audio.
Step 7. Unfold the Model Settings section. The first subsection here is titled Select the model you want to sample from. No action needs to be taken here - it’s just a reference area. There are currently six Dance Diffusion models available in v0.12, each trained on a unique data set. The best available option is Maestro-150K - you can use it to create new piano audio clips.
Here are the six models you can choose from:
Glitch-440k are based on glitch and noise samples from glitch.cool
Maestro-150k was trained on piano recordings from Google Magenta's MAESTRO dataset
Unlocked-250k was trained on audio files in the Unlocked Recordings dataset. This is a collection of music in the Internet Archive that is no longer commercially available and therefore free to reference.
Honk-140k is based on clips of Canadian Geese from the wildlife recordings of xeno-canto
Step 8. Scroll down and unfold the Create the model section. The left half of the screen shows a bunch of code that you can ignore. On the right are a set of fields, but the only one you need to pay attention to for now is model_name. Select from one of the six models. Press the play button and wait for the green check mark.
Keep in mind that these files are big - several gigabytes each. If you don’t have enough storage space, this step will fail. The Custom Options fields are an advanced feature set that you can explore once you’ve got a hold on the basics.
Step 9. Scroll down and unfold the Select the sampler you want to use section. I recommend starting with v-iplms even though it’s one of the slower options, because it’s the most reliable. Keep the k-diffusion settings at their default.
Step 10. Skip the Generate new sounds section. You would use this section if you wanted to generate sounds from the Dance Diffusion model dataset instead of uploading your own audio clips.
Step 11. Unfold the Regenerate your own sounds section. In the record audio or enter a filepath section you’re going to deselect the record_audio checkbox, so that you can upload your own. When you upload an audio file to the sample generator in Google Colab, it’s best to select a similar model so that the final output aligns properly. Since we’re using Maestro and it references piano music, the best audio clip would be a piano solo.
The file_path field is where you paste in a URL to your audio file. It has to be hosted on a separate site and they currently accepts wav files only. Hosting on Google drive or a file hosting site won’t work if they provide a link that doesn’t end in “.wav” extension.
If you don’t have an easy way to upload and host wave files, you can grab one of the samples from here to experiment with. I personally used the Pink Panther theme, which means I pasted in the full url like this: https://www2.cs.uic.edu/~i101/SoundFiles/PinkPanther30.wav
Press play on this step and wait for the green check again.
Step 12. In the next step titled Generate new sounds from recording, keep the default parameters and press the play button. Within a couple minutes, you should have a set of new audio to listen to. Click on the vertical dots to download the file to your computer.
Audio diffusion uses nueral style transfer, borrowing key signature and rhythmic flourishes from your audio sample and applying it to the new audio clip. When I used the Pink Panther clip, for example, it pulled out the classic “bu-bum, bu-bum” rhythms and hovered around the same key. However, the Maestro 150k model’s data set lacks other instruments like percussion, so those didn’t carry over.
Step 13. You have the option to interpolate between sounds, if you want to continue experimenting. This essentially means that you can weave together separate tracks. I used a short audio clip of the opening theme to Star Wars (available here: https://www2.cs.uic.edu/~i101/SoundFiles/StarWars3.wav) and had it interpolate with Pink Panther.
The results were interesting - maybe even more than the sound regeneration tool. You can really hear the style of both tracks come together. For an amusing take on Harmonai's interpolation feature, check out this final video that uses Dance Diffusion to combine Smashmouth with Super Mario Bros.
That's covers everything - you made it through Dance Diffusion's Colab notebook!
Stay tuned for updates on the latest in AI music generation and other fun music production techniques on our blog. You can sign up for our newsletter for updates at the AudioCipher homepage.