On November 13th 2023, Adobe's team published a research paper detailing their new AI music model, Music ControlNet. The research effort was a partnership between computer science interns from Carnegie Mellon and a senior research scientist at Adobe, Nicholas J Bryan.
Adobe hasn't indicated that they have any plans to add the Music ControlNet model to their DAW, Adobe Audition. Big companies often do research like this without plans to put anything on their product roadmap.
That being said, Adobe does have a history of implementing AI audio features in their software. Their Enhance web app gives podcast editors an AI audio enhancer to improve speech clarity, remove background noise and sharpen voice frequencies. Adobe Premiere Pro also has an AI feature that time stretches and repositions music for video.
So let's have a look at what Adobe's Music ControlNet model currently offers and how it stacks up against Meta's equivalent model, MusicGen (AudioCraft).
What is Adobe's AI Music ControlNet?
Adobe's Music ControlNet is a generative AI music model with a text-to-music prompt system and time-varying controls. Users transform input audio into new styles, with more accuracy and efficiency than Meta's MusicGen model.
Like Riffusion, they fine-tuned a diffusion-based model using audio spectrograms of melody, dynamics, and rhythm controls.
Visit the ControlNet Github demo page to listen to a large collection of six second audio clips generated by the model.
How does Music ControlNet compare to Meta's MusicGen?
According to Adobe's Arxiv paper, ControlNet outperforms Meta's MusicGen model by a meaningful degree. Let's have a closer look at those numbers and compare the products' feature sets.
"Faithfulness" of style transfer: When a user uploads an audio file to ControlNet and requests a new style of music, the Adobe's team measured their output to be 49% more "faithful" to the original melody than MusicGen.
This faithfulness to the original melody is important to musicians. It serves a practical purpose if you're using AI text-to-music to augment your creative and human-centric workflows.
Ideally, users will be able to sing or play a melodic idea and hear the core melodic phrase retained, against a variety of genres, moods and styles according to the words that have been typed in.
Lightweight compared to MusicGen: ControlNet has 35x fewer parameters than MusicGen, and was trained on 11 times less data. So it appears to be a lightweight alternative.
New time-varying control features: The AI model has novel time-varying controls that don't exist in Meta's AudioCraft library. This includes attributes like the position of beats in time or the changing dynamics of music.
What is the future of AI text-to-music generation?
AI text-to-music generators had a coming out party in 2023. Google and Meta both released AI music models that the public could try for free. They already had experts in generative AI on payroll, so it wasn't a big deal to allocate budget for research and development.
Adobe's move to research text-to-music aligns with our prediction that audio production software will begin developing custom AI text-to-music models and rolling them out in 2024-2025.
The public is quietly, but I think eagerly, waiting for upcoming improvements to the conventional music making experience. We might witness a "quantum leap" in features and workflow with these new generative audio workstations. In the short term, we're seeing independent AI DAWs and AI VSTs surfacing already.