Google Lyria 2: The New AI Music Generator from Deepmind
- Ezra Sandzer-Bell
- Apr 25
- 7 min read
Updated: Apr 29
Google is about to launch Lyria 2, the most controllable AI text-to-music interface to date. This exciting announcement was made on April 24th by Google Deepmind's Lyria team.
In this article we'll share a detailed overview of what's new in the product, followed by a primer on the history of Google's AI music products. This should help you get a sense of how far they've come and where they might be headed.
Lyria 2 is currently in a private beta and Google intends on releasing it to the public through their AI Sandbox application. We will update this post in the future when it goes live for everyone.
What's new in Google's Lyria 2 interface
Google's prior apps, MusicLM and MusicFX, were instrumental text-to-music audio web apps at their core. Lyria 2 improves on MusicFX in several important ways. Here's a screenshot of the new interface and a summary of what's included.

Like before, Google has provided a describe your clip text area for prompting instrumental music generation based on style, genre, mood, and instrument.
However, Lyria 2 now includes a second field for lyrics, to generate singing vocals to accompany the style of music you create. Lyrics are positioned on a sliding timeline, below the audio wave forms, so they begin and end at just the right spot.
The model now supports transformation arcs that control how much novelty is introduced during a segment of the song, prior to rendering. This is particularly helpful for inpainting, a technique where two clips are fused together with an AI generated mid-section.

The advanced prompt settings include control over tempo (BPM) and key signature, making it particularly useful for musicians. There's also audio extension parameters based on song section that support intro and outro designations, so you can shape the composition to lead in our out of the composition as a whole.
You can watch a recently-released product demo and endorsment of Lyria 2 by Isabella Kensington here in the video below:
Lyria 2 will integrate with Google Deepmind's existing AI DJ product, MusicFX, and has been rebranded with the new product title Lyria Realtime.
All music created with Lyria is watermarked with SynthID for content detection, as part of an effort to increase accountability and traceability of outputs.
Listen to demos of Lyria 2 music outputs here.
Google's AI Sandbox: A retrospective
Google announced their new Music AI Sandbox tool on May 14th 2024 during its annual I/O developer conference. The promotion included endorsements from music industry heavyweights like Wyclef Jean, Mark Rebillet, and Donald "Childish Gambino" Glover.
The artists focused on the value of writing faster and with less hassle. Gambino was quoted saying "You can make a mistake faster. That’s all you really want at the end of the day, at least in art — it’s just to make mistakes fast.”
DeepMind is the Google team behind this new tech. It's the same AI music department that recently splintered off and created the tech behind a music-centric company Udio. This week, Music Business Worldwide reported that Udio is now churning out 10 songs per second (that's ~850,000 songs per day).
The popularity of Udio is a market indicator for how Deepmind's tech will be received and the level of engagement we can expect to see in the very near future.
The Alphabet company officially entered the text-to-music arena in 2023 with their generative AI model called MusicLM.
Developers published their first Github paper in the last week of January 2023, receiving immediate attention from tech hubs like HackerNews. On May 10th 2023, TechCrunch broke the story that Google had made MusicLM available for public use.
Follow up efforts from the team surface when Google's Deepmind team announced Lyria and SynthID in December 2023.
The bridge between MusicLM and SynthID has since been completed and is now available through Google's AI Test Kitchen. It's available in a container called MusicFX, which is more or less identical to the original MusicLM interface.
From MusicLM to MusicFX to AI Music Sandbox
The new MusicFX interface could be described as a slightly more colorful version of MusicLM, with a tagging service that detects your most important prompt phrases. Users can click on settings to select a longer track length than before, including 50 and 70 second options as shown below.
There's also a new "looping" option that blends the beginning and end of your track to create an infinite song. One of my benchmarks for AI models is to see if it understands the notion of odd time signatures. In deed, MusicLM created a song section that alternated between two measures of 4/4 and a measure of 7/4.
MusicFX is a bit lightweight compared to most of the other popular text-to-music services available today. There's also been some fresh controversy surrounding the musical datasets they trained their model weight on. Outspoken AI music ethicist Ed Newton Rex (Formerly of Stable Audio) published this critical oped detailing Google's degrading values since their early efforts with Magenta.
Best alternatives to MusicLM for musicians
If the questionable ethics behind Google's model training rubs you the wrong way, there are some great alternatives to explore. Musicians who work in a DAW can experiment with AudioCipher, the text-to-MIDI plugin shown below:
AudioCipher is a text-to-MIDI plugin that loads within your DAW and gives you tight control over parameters like key signature, chord extensions, and rhythm automation. The app uses a musical cryptogram which means that letters are swapped for notes, generating melodic MIDI sequences and chord progressions that still require a musician to shape it into something meaningful.
Instead of creating an entire song with a text description, AudioCipher lets you set a focus point for your song. What is the concept you want to convey and what words would you use to describe it? Generating MIDI based on a word or phrase has been celebrated as a fun and simple way to overcome your creative blocks.
WavTool is a second option worth exploring. This text-to-MIDI DAW loads within your web browser and comes with many the core features you would expect from a workstation. It includes a GPT-4 AI chat bot that understands text commands and can translate basic ideas into actions in the DAW. The video above demonstrates the power of the app along with some of its shortcomings.
Beyond MIDI generation, you might also enjoy trying out Splash Music and Stable Audio. Both of these platforms trained consensually on licensed music from partners who opted in. Stable Audio generates instrumental music only, while Splash creates both instrumentals and AI vocals in what is commonly called text-to-song.
In June 2023, Meta put out a competitive product called MusicGen. On paper, they did technically train the model consensually, in partnership with Pond5. However, we've spoken to a library holder who sells over 50,000 tracks through Pond5 and learned that they never had the chance to opt out and were poorly compensated for the deal.
What makes Google's MusicLM unique?
This isn't the first time Google has taken a stab at creating music using artificial intelligence. We've previously covered their MIDI generating software, Google Magenta Studio, along with other innovative tools like DDSP for tone transfer.
There are several other AI music apps out there, so why should we care about Google's contribution? Let's break it down one feature at a time.
The MusicCaps dataset: A new approach to descriptions

In the spirit of transparency, Google released their MusicCaps dataset through Kaggle. Each of the 5,521 music samples is labeled with English descriptions, including aspect lists and free text captions.
An aspect list is a comma separated collection of short phrases describing the music, whereas the free text captions are written descriptions in natural language by expert musicians.
Example of an aspect list: "pop, tinny wide hi hats, mellow piano melody, high pitched female vocal melody, sustained pulsating synth lead"
Example of a free text caption: "A low sounding male voice is rapping over a fast paced drums playing a reggaeton beat along with a bass. Something like a guitar is playing the melody along. This recording is of poor audio-quality. In the background a laughter can be noticed. This song may be playing in a bar."
This training set differs from OpenAI's Jukebox data because it focuses on how the music sounds instead of metadata about the music, like artist name or genre.
The MusicCaps developers have published a separate paper on Arxiv describing their goal of generating music from text. It was released in tandem with the publication of MusicLM.
Long Generation: Consistent musical output over time
The initial MusicLM paper claimed that their network remained consistent for several minutes, but when the app went live in May 2023, generations were capped at 30 seconds.
AI developers have had a particularly hard time generating good AI music due to a problem with LTSM, or long term short memory. LTSM is a feature of a recurrent neural networks (RNN) that enable the machine to stay focused over a period of time.
To really get a feel for this problem, I suggest checking out OpenAI's Jukebox Sample Explorer. You'll find that the music tends to lose focus and devolve by the end of the clip, so that a rap song in the style of Machine Gun Kelly gradually morphs into a convoluted reggae death metal tune.
MusicLM claims to outperform these other AI music generators with a hierarchical sequence-to-sequence model that outputs 24 kHz audio quality.
Audio generation from rich captions

Thanks to the MusicCaps dataset, MusicLM is able to receive long form text input with rich descriptions of music. This means that users will not need any technical music theory knowledge in order to create songs. Filmmakers and video game developers will eventually be able to generate the sounds they need on demand, by simply describing the scenes in question.
Melody conditioning: Hum or whistle melodies into any style

We've still not seen the hum-to-music feature promised by MusicLM in their original paper, but maybe that will change in the near future with the release of the AI Music Sandbox. Meta's MusicGen model released a feature like this nearly one year ago and it works pretty well, so there's no reason Google can't achieve something similar. It's just a matter of time.