top of page

Generative Audio Workstations: AI VSTs & The Future of DAWs

The first digital audio workstations emerged in the late 1970s. They streamlined analog processing and improved on the limitations of recording to tape. By the early 1990s, producers gained access to more powerful DAWs like Pro Tools and Cubase, followed in turn by Fruity Loops, Ableton, Logic Pro X and countless others.

Each DAW has its own unique features and benefits, augmented by third party plugins that handle effect processing, instrument simulation, synthesis, analysis, MIDI generation, and so on. Over time these tools have continued to evolve, but some would argue that innovation has started to plateau.

As generative AI surges into every technology sector, digital audio workstations and plugins will begin to evolve in strange and exciting ways. A handful of companies have already made the first move. In this article we'll provide a list of current software and how they point to the possibilities of what GAWs could eventually become.

Table of Contents

  1. What is a generative audio workstation?

  2. AI music generators that don't qualify as a GAW

  3. AIVA's GAW: Beyond simple music parameters

  4. WavTool: AI Chatbot in the GAW

  5. 10 AI Plugins: Thin client VSTs versus local devices

  6. Thin Client: Samplab 2's Audio-to-MIDI plugin

  7. Local tone transfer VSTS: Neutone, DDSP & Mawf

  8. Generative vocal synths: Synthesizer V and Vocaloid

  9. AI MIDI generation: Lemonaide, Orb & Magenta Studio

  10. AI audio generation: Text-to-Sample by Samplab

  11. Non-generative AI mixing and mastering in a GAW

  12. Final thoughts on generative audio workstations

What is a generative audio workstation?

The expression generative audio workstation was first popularized in June 2023 by Samim Winiger, AI music expert and CEO at Okio. Its origins can be traced back to a research paper titled Composing with Generative Systems in the Digital Audio Workstation by Ian Clester and Jason Freeman.

Samim posed a prescient question to his audience -- Will the DAW soon be replaced by the GAW?

Clester and Freeman's paper stated that they should be equally adept with generative audio as the DAW has been with static digital audio.

Samim expanded on the topic again in September 2023, outlining critical questions that every serious music producer will ask -- Where are the creative controls? How does this fit into my workflow? Is this just a toy or will AI replace me?

AI music generators that don't qualify as a GAW

In our quest to define the GAW, we have to look at the existing software landscape and eliminate categories that don't meet these criteria.

The first wave of commercial AI music generators were marketed primarily to non-musicians looking for easy and quick paths to a finished song. They appeal to content creators that might otherwise be using audio licensing catalogs like Artlist, Epidemic Sounds, Soundstripe, AudioJungle, Envato, and so on.

Web applications like Boomy, Soundraw, and Soundful do provide options for customization, but it would be a stretch to call them workstations. They lack the robust controls that seasoned audio engineers and composers require.

The AIVA GAW: Beyond simple music parameters

AIVA's paramater-based tool
AIVA's paramater-based tool

One early mover in B2C AI music generation, called AIVA, included the familiar parameter-based web interface. Users can select properties like key signature, BPM, meter, and genre to spin up several songs. But their product goes above and beyond these features by providing a full DAW experience, in browser and as a downloadable, standalone desktop application.

Whenever a new piece of music is generated from parameters, AIVA's users have the option to go deeper with a DAW and MIDI piano roll editor. Here they can make changes to the notes manually or leverage generative features to modify the melody and chord progressions. Effect layers and mixing tools are also available.

For this reason, AIVA qualifies as a generative audio workstation.


WavTool: AI Chatbots in the GAW

WavTool is a great example of a GAW that's transformed the DAW landscape. Their text-to-music features are still in the early phases of development and could use improvement. Still they represent a meaningful and innovative shift in the way users think about music production workflows.

The video below showcases Wavtool's GPT-4 powered AI chatbot. This creative assistant understands text prompts related to the audio workstation and can act on your behalf. Users can request chord and melody material, new instrument tracks, changes to the mix, and more.

The first version of WavTool's AI chatbot (shown above) was constrained to generating MIDI tracks. However, a recent build introduced a new text-to-audio sample generator. This works nicely to bypass the GAW's limited sound design tools and provide immediate access to loopable samples. Check out a demo of this new feature below:

We expect to see more AI chatbots in future DAWs, perhaps leading to a scenario where producers work alongside AI bandmates to come up with new ideas.

10 AI Music Plugins: Thin client VSTs vs local device

The symbiosis between DAWs and plugins will persist under the influence of generative AI. A few software companies are ahead of the curve and provide AI VSTs that work with current, conventional DAWs.

In this next section, we'll share ten examples of AI VSTs that fit into this category. They can be roughly divided into software that runs its generations locally and thin client VSTs that use resources from a centralized cloud server.

Thin client plugins: Samplab 2 audio-to-midi

Thin client software is designed to consume less memory and processing power from the user's local device. They use API calls to send information up to the cloud and then pull the finished output back down to a user's computer. These AI VSTs still hook into the DAW like ordinary plugins. Artificial intelligence simply introduces new, innovative capabilities that would otherwise be memory-intensive.

Samplab is an example of an AI-powered audio-to-midi VST that runs stem separation on audio files in the cloud, transcribing them to MIDI and then surfacing the MIDI files in plugin's piano roll.

One unique capability that sets the plugin apart is the option to drag individual notes up and down. Samplab's piano roll will make a direct change to the original audio composition, while retaining its timbre and sound design. It can also detect chord progressions for polyphonic instrument layers.

The final audio and midi files can be dragged from the plugin into your DAW of choice. In the future, these kinds of transcription features could be part of a GAW.

Local tone transfer VSTs: Neutone, DDSP, Mawf

Neutone is a hub that runs real-time AI audio processing within a DAW. It comes with some models by default, but includes the option to download more from within the VST. A walkthrough of the software can be found in the video above. Neutone includes access to Google Magenta's DDSP model, which can also be downloaded independently as its own plugin.

One of Google's DDSP developers, Hanoi Hantrakul, was later hired by TikTok and created an improved DSP model called Mawf. The beta version includes timbre transfer for three instrument types (saxophone, trumpet, and a bamboo flute from Thailand called the khlui). The output is significantly better than DDSP.

Mawf plugin settings

When using an audio track with Mawf, switch on control mode under the modulation tab as shown above. I recommend adjusting the dry/wet mix to 100% to isolate the timbre transfer, as shown in the screenshot above. Then you can use the dynamics and effects tabs to experiment with transforming the sound.

Generative vocal synths: Synthesizer V and Vocaloid

Synthesizer V
Synthesizer V

AI voice generators are extremely popular at the moment, but only a few of them are designed for musicians. Even fewer can run inside of a DAW. Two of the most popular plugins in this category today are Synthesizer V and Vocaloid 6.

Vocal synthesis could eventually be baked into generative audio workstations. Users will type in lyrics, select a voice model, and let generative AI produce novel vocal melodies as a source of inspiration. Autotune features will provide control over the melody through a piano roll and additional dynamic layers will be controlled via the GAW's mixing interface.

The end game with vocal synthesis doesn't have to be the elimination of human vocalists. Instead, it could be a way for non-singers to prototype music and send their rough ideas over to human talent, who then record it to give it a polished feel.

On the other hand, genres like trap and RnB appropriated autotune to create a new musical aesthetic. It follows that AI voices could become a core part of new genres of music and even be a coveted sound.

AI MIDI generation: Lemonaide, Orb, Magenta Studio

In August 2023, the AI MIDI VST Lemonaide dropped a new app version that lets you not only generate but also collect seed ideas. With their library tool, you can now save and refer back to prior ideas that you've worked on.

Users begin by choosing whether to produce melodies, chords, or both. From there, a key signature is chosen and users hit "get seeds". Each seed that it creates can be auditioned from a list and viewed in the bottom half of the app as notes on a MIDI piano roll.

Edit the notes, key signature, and tempo and then drag the MIDI into a DAW to further refine the sound design. AI MIDI generators like this are a great way to inspire new material quickly.

Orb Producer Suite is another popular AI midi generator. It can scan existing MIDI chord tracks and create bass lines that align with the key signature.

As far as we know, Google has not successfully cracked the code on AI MIDI generation yet. Their Magenta Studio plugin suite included a Create feature but the quality of the output was not good.

OpenAI's Musenet service had similar limitations. Microsoft's AI music research team, Muzic, has allegedly made headway on AI MIDI, but they have not published anything yet.

AI Audio Generation: Samplab's MusicGen VST

Most of the focus on generative AI music has been directed towards audio rather than MIDI composition. Earlier this year, Google published a text-to-music generator called MusicLM, followed by Meta's publication of MusicGen as an open source codebase.

Samplab took this a step further and published a free VST version of MusicGen called Text-to-Sample. It runs on the computer's local GPU and attempts to minimize the memory consumed, which leads to a lower performance than one gets from Hugging Face and Google Colab.

In theory, a company could roll out a thin client version of MusicGen as a VST, but there are several costs associated with running the audio generation layer. The GPU costs, user authentication, and subscription management are half the battle.

Selling a generative music service trained on third-party data could also present legal problems. MusicGen was trained on music from the Pond5 collection. It comes with a creative commons license but it's unclear whether this license permits third parties to resell the service commercially without paying royalties back to Meta or their partners.

A future solution will need to arrange licensing agreements, train models on audio that they own, or allow users to create models trained on their personal music libraries. We may be just a year or two away from services like this.

Non-generative AI mixing and mastering in a GAW

Tasks like mixing and mastering are not usually considered generative, but they have been a staple of music production for years. Technical tasks like EQ, compression, saturation and stereo imaging can be streamlined through machine learning processes. Musicians who prefer to focus on composing can use these apps instead.

Here are a few popular AI mixing and mastering plugins that exist today:

  1. Izotope Neutron 4 - Mix Assistant runs automatic processing and also supports changes to the parameters so users maintain control.

  2. Izotope Ozone 10 - Mastering assistant uses genre references to guide its automation and provides controls for width, EQ, and dynamics

  3. Sonible Pure Bundle - Compressor, Limiter and Reverb plugins adjust parameters according to the received signal and provide a control knob

There are several other plugins like these on the market today. We can imagine that some of the same core functionality will be expected from a GAW.

Final thoughts on generative audio workstations

In this article we've covered generative AI for midi and audio, chatbot-assisted sound design, stem separation, and vocal synthesis. Some of the most innovative python libraries and models are operating in stealth mode on experimental Google Colab and Hugging Face spaces. This creates a barrier to entry for non-programmers as well as musicians who need tools that run directly in their DAW.

Most computers won't have sufficient GPU or VRAM to support high fidelity audio generation locally. However, software development frameworks like JUCE do already support API calls. This means that user authentication and pay-to-play generative services could begin rolling out at any time. As we mentioned, Samplab has already accomplished this.

The main barriers to innovation are funding and legal frameworks that protect the companies who serve up AI music models. We expect to see a breakthrough in the quality and volume of plugins available as these fall into place.

When plugins do begin to show meaningful growth, legacy DAWs will likely begin the inevitable pivot into native generative audio features. At this stage, VST chains could become less common, with most of the important actions available directly within the GAW.

bottom of page