top of page

Best Alternatives to Google's MusicLM for Music Producers

Google has officially entered the text-to-music arena with a new machine learning product called MusicLM. The developers published their first Github paper in the last week of January 2023, receiving immediate attention from tech hubs like HackerNews. The paper claims that MusicLM can generate high fidelity audio from text, providing dozens of audio samples for readers to reference.

On May 10th 2023, TechCrunch made an exciting announcement that Google had made MusicLM available for public use. The music generator would be delivered through Google's AI Test Kitchen app. We hurried over to the App Store to give it a spin, only to discover that there's currently a waitlist. Fortunately we were admitted within 24 hours.

Google has offered a few screenshots of the MusicLM interface. The image below was shared with the press.

MusicLM interface on AI Kitchen
MusicLM interface on AI Kitchen

On the surface, the core offer of MusicLM closely resembles the free text-to-music browser app Riffusion that came out back in December 2022. They share a common premise that users will type in a description of music and generate original songs with the press of a button. In an earlier demo, MusicLM was able to generate sound effects, foley, and other non musical audio content too.

AI test kitchen review on app store

Initial feedback to Google's AI Test Kitchen has been lukewarm. The app currently has 2.7 stars, with a number of complaints that the app simply "doesn't work". We have had a few issues with the interface as well. Once admitted, the launch demo button was not working. I had to rage click about 50 times for it to open.

I spoke with another user who had signed up for the AI Test Kitchen a couple months ago. They had a download option for tracks, whereas my newer account did not. This seems to imply that there are cohorts with throttle access to certain features.

Music producers have been posing concerns about AI music on social media sites, however most of the focus has been on AI voice generators. The imitation of a person's voice is extremely personal and the ethical violations are more apparent.

Generating music from a descriptive text prompt would appear to come with a different set of problems. For example, what musical datasets did the neural network train on in order to gain its understanding of that genre? Is that body of music protected by copyright?

MusicLM was born from an underlying philosophy that anyone should be able to create music, without feeling blocked by technical limitations. The option to describe your musical idea in text and watch it come to life instantly has mass appeal. That being said, MusicLM is not the only text-to-music software out there and some of the alternatives could be better suited for established musicians.

Best alternatives to MusicLM for musicians

Artists take pride in their creative process. There is an intimate relationship between a musician and the sounds they select. A culture of click-and-share music creation seem to strip away most of our creative agency, in favor of an artificial intelligence

There are a couple of great alternatives to MusicLM, for musicians who enjoy the idea of turning words into music but want more control.

AudioCipher is a text-to-MIDI plugin that loads within your DAW and gives you tight control over parameters like key signature, chord extensions, and rhythm automation. The app uses a musical cryptogram which means that letters are swapped for notes, generating melodic MIDI sequences and chord progressions that still require a musician to shape it into something meaningful.

Instead of creating an entire song with a text description, AudioCipher lets you set a focus point for your song. What is the concept you want to convey and what words would you use to describe it? Generating MIDI based on a word or phrase has been celebrated as a fun and simple way to overcome your creative blocks.

WavTool is a second option worth exploring. This text-to-MIDI DAW loads within your web browser and comes with many the core features you would expect from a workstation. It includes a GPT-4 AI chat bot that understands text commands and can translate basic ideas into actions in the DAW. The video above demonstrates the power of the app along with some of its shortcomings.

In June 2023, Meta put out a competitive product called MusicGen. We've since published several articles with tutorials on how to use the application, including AI film scoring, creating infinite songs, and turning melodies into full arrangements.

What makes Google's MusicLM unique?

This isn't the first time Google has taken a stab at creating music using artificial intelligence. We've previously covered their MIDI generating software, Google Magenta Studio, along with other innovative tools like DDSP for tone transfer.

There are several other AI music apps out there, so why should we care about Google's contribution? Let's break it down one feature at a time.

  1. MusicCaps dataset

  2. Long Generation: Consistent musical output

  3. Audio generation from rich captions

  4. Story mode

  5. Melody conditioning

  6. Painting descriptions

The MusicCaps dataset: A new approach to descriptions

MusicCaps dataset
MusicCaps dataset sample

In the spirit of transparency, Google released their MusicCaps dataset through Kaggle. Each of the 5,521 music samples is labeled with English descriptions, including aspect lists and free text captions.

An aspect list is a comma separated collection of short phrases describing the music, whereas the free text captions are written descriptions in natural language by expert musicians.

Example of an aspect list: "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead"

Example of a free text caption: "A low sounding male voice is rapping over a fast paced drums playing a reggaeton beat along with a bass. Something like a guitar is playing the melody along. This recording is of poor audio-quality. In the background a laughter can be noticed. This song may be playing in a bar."

This training set differs from OpenAI's Jukebox data because it focuses on how the music sounds instead of metadata about the music, like artist name or genre.

The MusicCaps developers have published a separate paper on Arxiv describing their goal of generating music from text. It was released in tandem with the publication of MusicLM.

Long Generation: Consistent musical output over time

The newly published MusicLM paper claims that their network remains consistent for several minutes. AI developers have had a particularly hard time generating good AI music due to a problem with LTSM, or long term short memory. LTSM is a feature of a recurrent neural networks (RNN) that enable the machine to stay focused over a period of time.

To really get a feel for this problem, I suggest checking out OpenAI's Jukebox Sample Explorer. You'll find that the music tends to lose focus and devolve by the end of the clip, so that a rap song in the style of Machine Gun Kelly gradually morphs into a convoluted reggae death metal tune. MusicLM claims to outperform these other AI music generators with a hierarchical sequence-to-sequence model that outputs 24 kHz audio quality.

Audio generation from rich captions

MusicLM caption example
Example of a MusicLM caption

Thanks to the MusicCaps dataset, MusicLM is able to receive long form text input with rich descriptions of music. This means that users will not need any technical music theory knowledge in order to create songs. Filmmakers and video game developers will eventually be able to generate the sounds they need on demand, by simply describing the scenes in question.

Story Mode: Fluid progression through a series of prompts

MusicLM story mode
Story Mode example

MusicLM includes a story mode that lets users describe time stamps where the music should evolve. Prompts could include abstract feelings and words like "fireworks" as well as genres like "rock song" and "string quartet". Behind the scenes, the model works to create a smooth musical transition from one semantic framework to the next.

Melody conditioning: Hum or whistle melodies into any style

Melody prompt and text prompt
MusicLM's Melody and text prompt grid

Melody conditioning with text prompts is where things start to get pretty crazy.

MusicLM lets you input any kind of audio sample like humming, whistling, or even guitar melodies. You can then type in a short text prompt describing the style of audio that you want to hear and it does a phenomenal job replicating the melody provided in that style.

We'll return to this later, to explain how AudioCipher could be used to overcome the lack of key signature and tempo controls.

Painting Caption Conditioning

turning painting captions into music

The paper includes a demo that turns image captions into audio. This isn't necessarily a feature so much as it is a display of how the software might be used. Instead of trying to look at artwork and interpret it, MusicLM looks at human descriptions of the art and generates music from those ideas.

Tuning MusicLM Output with AudioCipher

One of MusicLM's major shortcomings is the absence of music theory data. Their MusicCap dataset does not include tempo or key signature information. As a result, users will not be able to gain full control over the output.

Fortunately, AudioCipher provides a text-to-music MIDI plugin that includes key signature parameters. This means you will be able to generate chords and melodies based on words in any key. Fine tune the rhythm in your piano roll and set the BPM. Lastly, you will save the audio file and pass it into the MusicLM melody prompt tool with a description of the style you want.

Pick up a copy of AudioCipher to become familiar with the interface and start experimenting. Once Google publishes the MusicLM API, it's only a matter of time before open source developers create an interface like MuseTree. Then we'll be able to put this software to the test and enter a new phase of creative freedom.


bottom of page