Text-to-speech apps like Siri have been around for decades, but until recently they were never able to hold a tune. Recent improvements to AI voice models and AI generated music have given birth to AI singing voice generators.
As you might have guessed, the majority of people are using this technology for entertainment purposes. One website, called VoiceMod, offers a Musical Meme Machine that play their users’ lyrics back on top of a prerecorded track. The melody and rhythm of the AI voices match the lyrics surprisingly well. Plus you can make the vocalists say anything, which makes for a good laugh.
VoiceMod was first to market, layering each singer’s voice over an existing track, but other AI tools like Chirp and Splash Music are offering generative music as well. These companies combine text-to-music prompting with AI singing voices that perform any lyrics. Users enter a written genre description, like rap or hip hop, and pick from melodic vocals or rappers.
Chirp runs in Discord, whereas Splash features a user-friendly interface in a web browser. Both of these products can generate new songs in under a minute. Splash and VoiceMod host the AI songs on their servers, so that users can share it on social media.
Behind the scenes, these kinds of text-to-singing software have been trained on large collections of vocal data. Some companies, like Kits.AI and Controlla Voice, allow users to train their own AI singing voice datasets. If you decide to use these platforms, we would caution against using a famous artist’s voice. Or if you do train on someone else's voice, get permission to use it before you publish.
In April 2023, an AI Drake song featuring The Weeknd was published. This track, titled Heart on my Sleeve, had a reported 600,000 Spotify streams, 15 million TikTok views, and 275,000 Youtube views when labels clamped down and ordered a takedown.
On October 18th, Universal Music Group issued a press release that they would be partnering with Bandlab to begin protecting artist voices, citing Taylor Swift as an example. The RIAA has repeatedly said that they consider AI voice impersonation a credible threat to their bottom line. They may have legal precedent to sue artists who monetize on these celebrity AI voices.
There are some real dangers with voice transfer technology, and they go beyond music rights. Scammers have started using voice clones to target elderly people and exploit them for money.
It’s not all bad news, though. Some artists are welcoming the new technology and selling direct access to their voice. Grimes, a major pop star and the former wife of Elon Musk, announced in 2023 that anyone could use her AI voice as long as they shared royalties when the AI song becomes a hit. She went on to publish a free AI platform called Elf Tech to distribute direct access.
Back in 2021, the popular indie artist Holly Herndon trained an artificial intelligence model on her own voice and released it under the name Holly+. She sells access to it via a DAO has an AI music podcast where she discusses these topics in detail.
Web applications are a great starting point, but real artists need music production plugins that fit into their workflow. In this article, we’ll share a collection of the best AI singing voice generators that we’ve found and include tutorials on how to use them.
Best AI singing voice generators for music producers
There are a lot of AI voice generators out there, but most are not designed for musicians. This article only focuses on the vocal apps that you can use to start creating vocal melodies with artificial intelligence.
If you enjoy text-to-speech apps and plan on designing melodies for your AI voice generator, be sure to check out AudioCipher's text-to-midi VST. You'll be able to type in words and phrases, convert them into melodies, shape them to sound the way you want and apply them to your AI voices using the apps below.
BONUS: The Google Colab Underground
KITS.AI - Voice to Voice audio conversion
Kits.AI is a freemium web app that delivers voice-to-voice audio conversion based on a variety of high-quality royalty-free voice models. Users record vocals directly into the app or upload a clean, vocal audio file in mp3 and wav format. During our tests, it took less than a minute to complete the AI voice conversion and all of the subtleties from the vocal performance were retained. If you're looking for a sound that the existing voice collection doesn't offer, Kits.AI includes an AI voice model creation feature. Upload up to 30 minutes of a capella audio files using a single voice and with a single click, you can train your own custom AI model. Check out their voice model creation guide before you get started with this process, to familiarize yourself with best practices.
We love that Kits.AI is partnering directly with artists to co-release tracks created from their AI voice. As the music industry adopts this kind of technology, it's important that artist consent and ethics are taken into consideration. Users can draw from the official artist library and submit their songs for a commercial co-release alongside that artist. The free plan lets you train two AI voice models and access the royalty-free voice library.
Controlla Voice - Train & blend your own voice model
Controlla.XYZ launched in July 2021 as a spatial audio company, and has grown into a mature web app where people can train their own AI singing voice models.
How do you train a Controlla Voice model?
Controlla Voice allows users to train AI singing models from a capella vocal stems.
Vocal takes should ideally include a few different intensity levels and feature melodies that span an octave or more. There are exceptions, like training a rapping or speech model, where the pitch range can be less than an octave.
When the training is complete, anyone can sing in that vocal style. Or at least, an artificially intelligent approximation of it... Once there are two or more AI voices, things get even more interesting.
Controlla lets you blend voices to create hybrid AI vocalists. Our team enjoyed the blending feature and exploring the latent space between real human voices.
These new, blended vocal avatars represent a completely new and original sound.
We can imagine a Controlla audio marketplace where vocalists would create and license their vocal models. Digital music producers who can sing but don't want to use their own voice could purchase access to voices and blend multiple styles together into something new.
Visit their website - Controlla.xyz - to learn more.
Vocaloid - Yamaha's AI Voice Generator for musicians
Vocaloid is the most musical app amongst all the AI voice generators. Yamaha built it specifically for music producers, which means that is has most of the important features that other text-to-speech doesn't. With over 100 voices to choose from, you'll be able to easily test out different vocal types on your track.
Vocaloid 6 includes a voice changer, so you can sing a melody and turn it into another voice. This is common enough in today's landscape, but their secret power is a note-control feature, comparable to what Melodyne offers.
Murf.ai - AI Voice Changer and Voice Cloning
Murf.ai is tackling AI voice technology head on, expanding from text-to-speech to more advanced tools that can change your voice or clone someone else's. With a free account, you'll be able to make limited use of the app, including the voice transfer that you need for AI singing. They also offer a high-touch service that can create a custom voice model for you.
If you're exploring audio for video or want to create an AI music video, check out their audio and music page. You can sing a melody yourself and hear it played back through their voice changer. From there, audition several types of voices until you find the right one.
With a working melody, you can send the track to a similar artist so they can imagine their own voice over it. AI voice changers speed up the process, so you don't have to shape the speech into a melody by hand.
Synthesizer V by Dreamtonics -
Synthesizer V is another AI voice generator geared specifically to musicians. The company is based in Tokyo, where artificial intelligence and music have already been popular for half a decade or more. We recently reviewed NeuTone, a free VST from Japan that acts as a hub for other open source voice changer APIs like Google DDSP and RAVE.
Unlike the NeuTone VST, Synthesizer V is a full DAW that curates a selection of their own AI voice models. They've also created a more robust interface for editing and improving initial output. Like Melodyne and Vocaloid, you're able to sculpt the melody that your AI voice sings. Drag the notes up and down in the audio equivalent to a MIDI editor and Synth V creates a smooth render without losing the emotional tone of the voice.
As you may already know, the majority of voice changers require an internet connection. That's because the neural networks responsible for generating the AI voices are operating as a cloud service. Synthesizer V does not require an internet connection to run its model, meaning you can use it any time or location.
Revocalize - Record and convert your voice
Revocalize has skipped over the text-to-speech tools and dove straight into voice changing. That laser focus has given them the bandwidth to become one of the best apps for generating AI vocal tracks. Subtle qualities of your voice, like your accent or the emotion you're feeling when you speak, will transfer to the new voice. Check out their landing page to hear a demo of the voice changer.
Revocalize aims to protect your voice, as outlined on their about page, but it's not clear exactly how they plan to do that. They could follow in Holly Herndon's steps and use a DAOs to manage and sell access. Water and Music has been sharing lots of content about how to sell AI related music materials, including your voice, on Web3 as well.
Emvoice - Choose from a collection of AI singers
Emvoice One has taken a novel approach to AI singing software, combining a MIDI piano roll interface with text boxes for lyrical snippets. Users program a melody manually and for each melodic segment, Emvoice will spawn a dedicated text area. Type in your short phrase and the vocal model will do its best to match the melodic shape to the pattern of your words.
Fans of their software have noted that the point-and-click approach to melodies can be a bit time consuming. If you want to give it a spin before committing to the $69 dollar vocal models (a price tag that's on par with competitors), they do offer a free trial that's limited to melodies with just seven notes.
Uberduck - The O.G. Text to Speech Generator
We've already covered Uberduck in detail here, but thought it would be interesting to point out their new AI Grimes Challenge. At the start of this article, I mentioned that Grimes recently took to Twitter and announced that creators have permission to use her AI Voice, as long as she gets 50% of royalties.
So if vocal synthesis is exciting to you and you want to experiment with using a celebrity voice to write a hit song, Grimes is a safe bet. You'll likely be able to post the music online without a DMCA takedown.
Google Colab - The AI music underground
There are only half a dozen AI voice apps designed for singing, but you can find many more options from independent developers and hackers on the internet. One good way to find them is to look for a popular text-to-speech tool, like ElevenLabs, and then run a google search like "ElevenLabs singing". You'll find a number of Reddit, Twitter, and Quora conversations on the topic.
To give a concrete example, one Reddit thread pointed us to the Singing Voice Conversion model on Google Colab. These tools require little more than the ability to upload a file, press Colab's play button one step at a time, and the patience while your songs render.
There you have it - a complete guide to the most popular AI voice generators in 2023. We hope you've found this guide helpful!