top of page
Search

Generative Audio Workstations: AI VSTs & The Future of DAWs

The expression generative audio workstation refers to the future state of DAWs. As gen AI tools continue to diversify and become more powerful, there will be an almost unlimited number of ways to edit and experiment with sound. In fact, the revolution has already begun.


A handful of companies have made the first move to create and maintain GAWs for the public. In December 2023, FL Studio released a new AI mastering feature that took the internet by surprise. The RipX AI DAW offers stem separation and audio manipulation.


At the top of January 2024, the classic audio editing software Audacity launched a new collection of AI plugins. Alongside noise reduction and musical stem separation, they collaborated with Riffusion to provide a text-to-music experience in app.


During this article, we'll provide several other examples and share some ideas about what GAWs could evolve into over the course of the next year.


Table of Contents


What is a generative audio workstation?


The expression generative audio workstation was first popularized in June 2023 by Samim Winiger, AI music expert and CEO at Okio. Its origins can be traced back to a research paper titled Composing with Generative Systems in the Digital Audio Workstation by Ian Clester and Jason Freeman.


By December 2023, Okio had announced the launch of their open source AI audio tool suite, Nendo Core, along with Nendo Cloud for delivering the service. Other open source companies, like Audacity, announced a new AI plugin suite as well.

The first digital audio workstations emerged in the late 1970s. They streamlined analog processing and improved on the limitations of recording to tape. By the early 1990s, producers gained access to more powerful DAWs like Pro Tools and Cubase, followed in turn by Fruity Loops, Ableton, Logic Pro X and countless others.


Fast forward to today - We're on the bring of a technological revolution.


AI music generators that don't qualify as a GAW


In our quest to define the GAW, we have to look at the existing software landscape and eliminate categories that don't meet these criteria.


The first wave of commercial AI music generators were marketed primarily to non-musicians looking for easy and quick paths to a finished song. They appeal to content creators that might otherwise be using audio licensing catalogs like Artlist, Epidemic Sounds, Soundstripe, AudioJungle, Envato, and so on.


Web applications like Boomy, Soundraw, and Soundful do provide options for customization, but it would be a stretch to call them workstations. They lack the robust controls that seasoned audio engineers and composers require.


The AIVA GAW: Beyond simple music parameters

AIVA's paramater-based tool
AIVA's paramater-based tool

One early mover in B2C AI music generation, called AIVA, included the familiar parameter-based web interface. Users can select properties like key signature, BPM, meter, and genre to spin up several songs. But their product goes above and beyond these features by providing a full DAW experience, in browser and as a downloadable, standalone desktop application.


Whenever a new piece of music is generated from parameters, AIVA's users have the option to go deeper with a DAW and MIDI piano roll editor. Here they can make changes to the notes manually or leverage generative features to modify the melody and chord progressions. Effect layers and mixing tools are also available.


For this reason, AIVA qualifies as a generative audio workstation.


The AIVA GAW
The AIVA GAW

WavTool: AI Chatbots in the GAW


WavTool is a great example of a GAW that's transformed the DAW landscape. Their text-to-music features are still in the early phases of development and could use improvement. Still they represent a meaningful and innovative shift in the way users think about music production workflows.


The video below showcases Wavtool's GPT-4 powered AI chatbot. This creative assistant understands text prompts related to the audio workstation and can act on your behalf. Users can request chord and melody material, new instrument tracks, changes to the mix, and more.



The first version of WavTool's AI chatbot (shown above) was constrained to generating MIDI tracks. However, a recent build introduced a new text-to-audio sample generator. This works nicely to bypass the GAW's limited sound design tools and provide immediate access to loopable samples. Check out a demo of this new feature below:



We expect to see more AI chatbots in future DAWs, perhaps leading to a scenario where producers work alongside AI bandmates to come up with new ideas.


Visit WavTool's website to learn more and sign up for free to try it out!


10 AI Music Plugins: Thin client VSTs vs local device


The symbiosis between DAWs and plugins will persist under the influence of generative AI. A few software companies are ahead of the curve and provide AI VSTs that work with current, conventional DAWs.


In this next section, we'll share ten examples of AI VSTs that fit into this category. They can be roughly divided into software that runs its generations locally and thin client VSTs that use resources from a centralized cloud server.


Thin client plugins: Samplab 2 audio-to-midi


Thin client software is designed to consume less memory and processing power from the user's local device. They use API calls to send information up to the cloud and then pull the finished output back down to a user's computer. These AI VSTs still hook into the DAW like ordinary plugins. Artificial intelligence simply introduces new, innovative capabilities that would otherwise be memory-intensive.



Samplab is an example of an AI-powered audio-to-midi VST that runs stem separation on audio files in the cloud, transcribing them to MIDI and then surfacing the MIDI files in plugin's piano roll.


One unique capability that sets the plugin apart is the option to drag individual notes up and down. Samplab's piano roll will make a direct change to the original audio composition, while retaining its timbre and sound design. It can also detect chord progressions for polyphonic instrument layers.


The final audio and midi files can be dragged from the plugin into your DAW of choice. In the future, these kinds of transcription features could be part of a GAW.


Local tone transfer VSTs: Neutone, DDSP, Mawf


Neutone is a hub that runs real-time AI audio processing within a DAW. It comes with some models by default, but includes the option to download more from within the VST. A walkthrough of the software can be found in the video above. Neutone includes access to Google Magenta's DDSP model, which can also be downloaded independently as its own plugin.


DataMind Audio published a new timbre transfer plugin in 2024 called The Combobulator. Their company combines a slick user interface and high quality audio with a solid ethical framework. Artists get a 50% revenue share every time on each sale of their models.



One of Google's DDSP developers, Hanoi Hantrakul, was later hired by TikTok and created an improved DSP model called Mawf. The beta version includes timbre transfer for three instrument types (saxophone, trumpet, and a bamboo flute from Thailand called the khlui). The output is significantly better than DDSP.


Mawf plugin settings

When using an audio track with Mawf, switch on control mode under the modulation tab as shown above. I recommend adjusting the dry/wet mix to 100% to isolate the timbre transfer, as shown in the screenshot above. Then you can use the dynamics and effects tabs to experiment with transforming the sound.


Generative vocal synths: Synthesizer V and Vocaloid

Synthesizer V
Synthesizer V

AI voice generators are extremely popular at the moment, but only a few of them are designed for musicians. Even fewer can run inside of a DAW. Two of the most popular plugins in this category today are Synthesizer V and Vocaloid 6.


Vocal synthesis could eventually be baked into generative audio workstations. Users will type in lyrics, select a voice model, and let generative AI produce novel vocal melodies as a source of inspiration. Autotune features will provide control over the melody through a piano roll and additional dynamic layers will be controlled via the GAW's mixing interface.


The end game with vocal synthesis doesn't have to be the elimination of human vocalists. Instead, it could be a way for non-singers to prototype music and send their rough ideas over to human talent, who then record it to give it a polished feel.


On the other hand, genres like trap and RnB appropriated autotune to create a new musical aesthetic. It follows that AI voices could become a core part of new genres of music and even be a coveted sound.


AI MIDI generation: Lemonaide, Orb, Magenta Studio



In August 2023, the AI MIDI VST Lemonaide dropped a new app version that lets you not only generate but also collect seed ideas. With their library tool, you can now save and refer back to prior ideas that you've worked on.


Users begin by choosing whether to produce melodies, chords, or both. From there, a key signature is chosen and users hit "get seeds". Each seed that it creates can be auditioned from a list and viewed in the bottom half of the app as notes on a MIDI piano roll.


Edit the notes, key signature, and tempo and then drag the MIDI into a DAW to further refine the sound design. AI MIDI generators like this are a great way to inspire new material quickly.


Orb Producer Suite is another popular AI midi generator. It can scan existing MIDI chord tracks and create bass lines that align with the key signature.


As far as we know, Google has not successfully cracked the code on AI MIDI generation yet. Their Magenta Studio plugin suite included a Create feature but the quality of the output was not good.


OpenAI's Musenet service had similar limitations. Microsoft's AI music research team, Muzic, has allegedly made headway on AI MIDI, but they have not published anything yet.


AI Audio Generation: Samplab and Semilla


Most of the focus on generative AI music has been directed towards audio rather than MIDI composition. Earlier this year, Google published a text-to-music generator called MusicLM, followed by Meta's publication of MusicGen as an open source codebase.


Samplab took this a step further and published a free VST version of MusicGen called Text-to-Sample. It runs on the computer's local GPU and attempts to minimize the memory consumed, which leads to a lower performance than one gets from Hugging Face and Google Colab.


In theory, a company could roll out a thin client version of MusicGen as a VST, but there are several costs associated with running the audio generation layer. The GPU costs, user authentication, and subscription management are half the battle.


Selling a generative music service trained on third-party data could also present legal problems. MusicGen was trained on music from the Pond5 collection. It comes with a creative commons license but it's unclear whether this license permits third parties to resell the service commercially without paying royalties back to Meta or their partners.



If you are going to run AI music experiments locally, one of the most popular solutions is Max 8. You've probably heard of Ableton's Max for Live, but did you know that you can run Max on its own?


In November 2023, AI music developer and live performer Hexorcismos released a new Max plugin called Semilla. It can load pre-trained models and offers a wide range of parameters for audio output. Semilla is currently the most advanced Max patch we've seen for AI music generation.


Non-generative AI mixing and mastering in a GAW

Tasks like mixing and mastering are not usually considered generative, but they have been a staple of music production for years. Technical tasks like EQ, compression, saturation and stereo imaging can be streamlined through machine learning processes. Musicians who prefer to focus on composing can use these apps instead.


Here are a few popular AI mixing and mastering plugins that exist today:

  1. Izotope Neutron 4 - Mix Assistant runs automatic processing and also supports changes to the parameters so users maintain control.

  2. Izotope Ozone 10 - Mastering assistant uses genre references to guide its automation and provides controls for width, EQ, and dynamics

  3. Sonible Pure Bundle - Compressor, Limiter and Reverb plugins adjust parameters according to the received signal and provide a control knob

There are several other plugins like these on the market today. We can imagine that some of the same core functionality will be expected from a GAW.


Final thoughts on generative audio workstations


In this article we've covered generative AI for midi and audio, chatbot-assisted sound design, stem separation, and vocal synthesis. Some of the most innovative python libraries and models are operating in stealth mode on experimental Google Colab and Hugging Face spaces. This creates a barrier to entry for non-programmers as well as musicians who need tools that run directly in their DAW.


Most computers won't have sufficient GPU or VRAM to support high fidelity audio generation locally. However, software development frameworks like JUCE do already support API calls. This means that user authentication and pay-to-play generative services could begin rolling out at any time. As we mentioned, Samplab has already accomplished this.


The main barriers to innovation are funding and legal frameworks that protect the companies who serve up AI music models. We expect to see a breakthrough in the quality and volume of plugins available as these fall into place.


When plugins do begin to show meaningful growth, legacy DAWs will likely begin the inevitable pivot into native generative audio features. At this stage, VST chains could become less common, with most of the important actions available directly within the GAW.


Kommentare


bottom of page