Note: This is not a paid endorsement or affiliate piece. We reached out to the developer during our research, to learn more about their backstory.
Artificial intelligence is booming in 2023. Most of the public focus has been on text and image generation with apps like ChatGPT, Dalle-2, and Midjourney. There are hundreds of lesser-known AI apps in the same ecosystem. They can do anything from writing code to analyzing websites and even composing AI music.
WavTool is the first AI-accelerated text-to-music DAW in the marketplace. It includes an AI chat assistant that can compose midi, generate instruments, control effects, and runs a number of other activities in the workstation.
For readers who already have a preferred DAW and just want the text-to-MIDI experience, check out the AudioCipher VST plugin. You'll be able to enter words and transform them into melodies and chord progressions.
There are a number of AI plugins emerging in 2023. Last month we reported on Neutone, a DAW VST that acts as a hub for AI services like DDSP and RAVE2. Google's Magenta plugin suite offers a collection of AI music plugins as well. I've even heard whispers of AI functionality coming soon to the sound design DAW Audio Design Desk.
To my knowledge, WavTool is currently the only AI DAW with a MIDI composer assistant that's powered by GPT-4.
ChatGPT generates chord progressions and melody ideas, but it's limited to writing text. WavTool uses GPT-4 to generate musical commands the DAW can act on, from composing MIDI to wavetable synthesis.
WavTool's chat assistant generates MIDI notes, creates new instrument tracks, configures side-chain compression, and so much more.
But here's the best part -- if the AI does something you don't like, this is the first time in history that you can go ahead and ask it why it made those decisions. Once you understand its thought process, you can offer instructions on how to improve and reach its goals. You can fine-tune the project with ongoing prompts.
The company's founder, Sam Watkinson, was kind enough to accept our interview request. This article covers some of the AI DAW's strengths and weaknesses with MIDI composition. We'll include the transcript of our conversation, where Sam shares his background and inspiration for creating WavTool.
Table of Contents
What is WavTool?
WavTool is an AI-powered digital audio workstation that loads in your browser. That's right, you never have to download an app, and with their one-click sign up through Google or Facebook, it's very easy to get started using the latest version of Google Chrome browser.
You can sign up for free, but the app offers a limited number of AI prompts. At the time of writing this article, the DAW costs $20/month.
It includes an embedded AI chatbot that you can show and hide at will. This chatbot, called the Conductor, has a solid grasp on music theory and audio production concepts that it can use to understand your ideas and try to implement them.
Users can have a conversation, collaborate on song segments, generate or revise MIDI content, and configure plugins to add effects or signal processing.
5 Unique WavTool Features
AI Conductor - WavTool calls their AI chatbot the Conductor because of its ability to guide you through the music making process. It can touch every part of the DAW, hold deep conversations about music, and generate MIDI.
Loads in browser - You don't need to install WavTool locally on your computer. It runs seamlessly in modern browsers with a fast connection. I'll share a few bugs that I ran into, but let's wait until later in the article.
Custom Wavetables - Pull in one of WavTool's instrument presets for your MIDI track or build custom wavetables from scratch. The video above goes into detail on how users can synthesize new instruments.
Device panel - WavTool's panel lets you set up devices that control EQ, reverb, delay, dynamics, distortion, LFOs and side chain compression, visualizers and more. Unlike Ableton Max (a comparable tool), AI Conductor can create and edit devices from your text prompts.
QWERTY & MIDI Controller Support - WavTool's piano roll includes a keyboard interface that shows you the notes you're playing. You can play on a QWERTY computer keyboard or use a standard MIDI controller.
How quickly can I make a beat with WavTool?
When you boot up a new project in WavTool, the Conductor starts with extremely simple musical expressions, like mechanical 8th-note patterns based on a major scale. It doesn't seem to matter what type of genre or style you ask it to compose in. This could lead some people to believe the tool's useless, but they're wrong.
You can make a mediocre beat with WavTool quickly, but if you spent thirty minutes refining an initial prompt through follow up commands, you can guide the Conductor closer to the music you're actually looking for.
As the developer explains on their FAQ page, WavTool has a long way to go and they are working hard to make it better every day.
In a moment, I'll share some prompts you can use to get the creative process moving in the right direction with Wavtool. But first, a few thoughts on what makes this tool so revolutionary, even within the AI music software scene.
Conductor chatbot: AI music composer
WavTool's AI music composer, the Conductor, leverages the conversational intelligence of GPT-4 to have deep and nuanced conversations with you about any musical topic. But its real talent is the ability to turn around and take action in the DAW, based on commands from GPT. This text-to-music feature is something we've never seen before.
The secret weapon that makes WavTool superior to Musenet
There's one very exciting thing about Conductor that sets the AI DAW apart from the other major AI MIDI generator apps. Unlike Google and OpenAI's MIDI generators, WavTool knows why it generated MIDI in a particular way and explain its reasoning to you in detail. You just have to ask!
Previous MIDI generators have not been set up to engage conversationally or take text commands. This means we never knew why the AI was generating a particular MIDI melody or chord progression. It was not possible to critique and fine tune its choices over time, other than requesting variations.
WavTool's AI Conductor gives you direct access to the mind of the AI music composer.
Why did it create that awful chord progression or such a simplistic melody? As it turns out, there appears to be some underlying logic behind even the bad AI MIDI compositions.
Now that we can see deeper into the minds of the GPT-4 MIDI composer, we can collaborate with it and instruct it as if it were human.
Fine-tune the AI MIDI with Text-to-Music Prompts
If you've been exploring generative AI tools lately, you may have heard people using the expression prompt engineer. Corporations are hiring these "engineers" at salaries above $300k/yr.
The prompt engineer's primary objective is to choose the right words for "text-to-something" AI tools, in order to extract their employer's desired outcome.
As a WavTool user, you don't need be an amazing writer or "prompt engineer" to get started. That being said, the words you use with the AI will dictate the quality of its creations and the quality of your experience.
Promptbase: Will people sell AI music prompts?
Prompt engineering isn't just for six-figure earners. People are selling AI prompts on sites like Promptbase. The site already has a music category and a small collection of text prompts for ChatGPT. In my mind, selling prompts like this feels like it's part of the Web3 music scene, adjacent to music NFTs.
I haven't purchased or sold anything on Promptbase, but I think it's an interesting idea. As a seller, their Stripe integration is mandatory and it requires your SSN, so I opted out. Also, as someone who enjoys writing prompts, I'm not all that enthusiastic about buying them from others.
In my experience, WavTool needs more than prompts. It requires a whole playbook and set of strategies. A pro user could pull out all of the terms and expressions that it understands the most, to create a dictionary of sorts.
4-prompt sequence for MIDI generation in WavTool
WavTool supports prompts related to any action in the DAW. This prompt sequence was designed to focus on MIDI generation and composing.
Here are four AI prompts you can use in a sequence to start drilling down and getting something good from the DAW. Swap out the placeholder text inside the carets <> with your own words.
Prompt #1: "Are you familiar with <your genre, artist, or song of choice>?"
Prompt #2: "I'd like you to create a <description of what you want>. Before you start generating it, please name 5 defining features of <your music selection> that you could emulate with a MIDI track here in WavTool?"
Prompt #3: "Okay, generate it on three instrument tracks."
Prompt #4: "Generate a new variation with <small incremental changes that you want to make>. This last prompt can be used repeatedly to fine-tune the MIDI output. In my experience if takes 5-10 iterations to get a long and healthy chunk of usable MIDI content.
You'll discover with time that the Conductor wants to help you, but has some shortcomings. It has knowledge, but fumbles tasks and fails to accomplish all that it sets out to do.
My best advice is to be patient and guide it toward your desired outcome. Challenge yourself to find better ways to describe what you want with as much accuracy as you can.
You may need to give technical commands like "move the piano chords up two octaves" or "replace that melody with something more complex. Use a tasteful combination of quarter, eighth, and sixteenth notes".
On the other hand, you can always intervene and. make those changes yourself.
There are lots of free music glossaries online, along with free music theory lessons, if you want to expand your vocabulary with some of the fundamentals. Conductor has a very deep knowledge of music theory concepts.
Bugs we encountered with WavTool's AI DAW
WavTool represents a remarkable achievement. It's the first AI DAW of its kind and is even more impressive when you realize that it was built by a solo developer.
That being said, the DAW does have some bugs that need to be worked out.
The Conductor says one thing and does another. The chat might say that it's going to create a MIDI file with chords and melody, but only generates a melody. When confronted about this, it might create a chord progression on a separate track, but it could be in the wrong key. This inconsistency does cost some time and eat into productivity, unless you're ready to pick up the slack and make some of those corrections yourself.
The Conductor only does one of the things it promised to do. If you ask for too many things with your prompt, the Conductor seems to get overloaded. It might only do one of the things you asked for. OpenAI's ChatGPT does the same thing when presented with music prompts. Be patient and ask for improvements one step at a time for the best results.
If a request is too complex, the Conductor will time out. It's not always clear what the Conductor's limits are going to be, so if it keeps timing out and not generating a button for you, scale back the prompt's request.
The Conductor's buttons sometimes fail. I've noticed this happens if the button was created for an instrument that was later deleted. But sometimes it fails for no obvious reason. The workaround is to ask it to create a new track with the same MIDI file. That seems to fix this issue.
The piano roll's MIDI note editor has difficulty dragging notes up or down in the same vertical axis. It will try to force the note over to one side or the other. The workaround is to select your notes and use the keyboard arrows to move them up or down.
All of these bugs had workarounds, so the main point of frustration ended up being lost time. It would be great to see an improvement in the GPT render times as well as an improvement in the accuracy of each MIDI generation.
The slow incremental pace of composing makes it less convenient than editing the MIDI myself. On the other hand, the text-to-music prompt system is so novel that I don't really care how slow it is. At least for the time being, it's fun to play with and I trust these issues will improve over time, as the application matures.
An interview with the founder of WavTool
Founder Sam Watkinson was available on the WavTool Discord channel and was kind enough to answer a few questions we had regarding their backstory.
Ezra: Thanks for being open to a quick conversation! Can you share a little bit with our readers about your background in music production and software development?
Sam: I started producing music in high school, then attended college for music production and audio engineering. After graduating, I taught myself to code so I could help some friends of mine get their startup off the ground. That turned into a career in tech, and music became a much-loved side gig, consisting of various soundtrack projects and original music over the years. It struck me that everyone, even expert musicians, really struggled to pick up DAWs for the first time. I've had loads of friends come to me for help learning to produce, and every time I see this struggle up close. Now, making music has been a really important part of my life. Producing and DJing helped me really discover my identity and build personal confidence in high school.
I had a technical mind and a lot of free time, so it was fun for me to learn the software. But there are loads of brilliant creative people everywhere who aren't in the same position, and I believe there's a lot tech can do to help them find joy and meaning in music production.
Ezra: So what drove you to go as far as to create your own DAW?
Sam: A couple years ago I started experimenting with the tech I knew (web development) to see if it was possible to build something that fulfilled the core technical requirements of a DAW (signal routing, real-time third-party effects, etc) so that it could then be refined into something more user-friendly At the end of last year, the startup I was working for got hit by the economic downturn and I found myself out of a job. Since then I've been full-time on this, and in February I asked my former coworker Keith to join on as cofounder. His background is similar, he's a trained producer who's been working in software for about as long, also with a focus on startups and supporting creatives with tech.
Ezra: What's your experience been like working with GPT-4?
Sam: The GPT-4 integration was our first big step in the direction of lowering the barriers to entry for music production. We got a lot of feedback - positive, negative, and everything in between - and we're taking it all very seriously as we move forward. We may be about to enter a future that is flooded with pure-AI content, and our mission with WavTool is to do everything we can to help human creativity thrive.
Final Thoughts on WavTool & the future of AI DAWs
From what I can tell, we are witnessing the birth of an entirely new species of audio workstation.
The text prompt interfaces we use today could easily replaced by voice-to-text commands. We will speak to our DAWs and tell them what we want for the project. The AI DAWs will respond with their own voice, using text-to-speech technology. It sounds a bit ridiculous typing that out, but I think we could be closer to this scenario than anyone expects.
In this new landscape, our musical vocabulary will be an asset. Writers may gain a competitive advantage in music that they've never enjoyed before.
Melody generators and chord progression software could also be in the line of fire, if these AI DAWs become sufficiently advanced. Why would you pay for a random note generator when your AI music composer can do it for you?
That being said, GPT-4 still has some ways to go before it actually poses a threat to MIDI generation software. Producers are attached to their existing DAWs and workflows. The quality of GPT's musical output also needs to get better in order to claim the throne.
We'll continue to monitor this space and report on it as the situation develops.