How to Use an AI Music Video Generator
Creating a music video can be time consuming and expensive. In a world where the success of a musician is closely tied to their social media accounts, the pressure to create engaging visual content is at an all time high... and it’s an introverted songwriter’s worst nightmare!
Fortunately, a new breed of ai music video generators have emerged to help fill this need for visual content. You may have seen them trending on the internet lately. The video above shows how one group, Die Antwoord, turned their latest song Age of Illusion into an ai-generated animation. It's one of the best to date.
In this article we'll show how you how to create your own AI music videos. But first we'll take a moment to reflect on why these tools are gaining popularity with musicians and their fans. If you prefer to jump straight into instructions on how to create your own videos, click here.
Video creation is a burden to musicians

There’s a saying that it takes ten thousand hours to master a skill. Today’s musician is expected to have a deep understanding of digital music production, on top of their core songwriting talents. Without the audio engineering skillset, you remain dependent on someone else to make your tracks sound good. That comes at a cost.
A lot of musicians are skipping instrument training to focus on mixing music in a DAW. To compensate for their lack of music theory and songwriting knowledge, they are increasingly dependent on sampled audio. In an effort to hold on to more royalties, artists are also turning to ai-generated music apps like Musenet and Google Magenta Studio.
Songwriting and music production are very different skills. A person would need to spend close to twenty thousand hours to master them both. So what do we expect will happen to the quality of music as independent artists take on a third responsibility as self-promoting, social media content creators?
The average artist is creating music and giving it away for free on streaming platforms, with the hope that it will eventually pay off. Creating video content is painful when all you want to do is make music.
AI Music Video Makers to the rescue?
To solve the need for visual content, musicians have been exploring their options. Karaoke-style Lyric videos are a common solution. They're quick and easy to produce. You don’t need to be skilled with editing tools to load up a video template and plug in lyrics. But artists rarely want to attach their brand to a generic template. There's a real concern that their content will start to feels more like a placeholder than a music video. Fans want something more engaging.
This is where artificial intelligence may start to come into play, in the very near future. Image generators like Dalle-2 and Midjourney train on existing imagery and use machine learning to produce impressive visual content from simple text prompts.
At first, these user-friendly web apps were adopted by musicians as a quick way to generate album art. But early pioneers have started manually exporting huge batches of images and syncing them up with music files, using video editing software to produce a custom animation.
Compiling images in a video editor is time consuming. If you're using a paid service to get the highest quality images, the pricing can be prohibitive. So while the problem of generating visual content is partly solved, there's still a lot of work to be done that would be better spent on music making.
AI-generated music videos are going to speed up this process dramatically.
Confronting misinformation about Midjourney
AI music video technology is still in its infancy. For this reason, consumer-grade tools aren't readily accessible yet. We've tested and confirmed that at least one tool works. I'll get to that in just a moment, but I want to address some misinformation on the internet that initially had me confused, looking for a tool that doesn't seem to exist.
Some popular articles, like this one titled This AI Software Creates Music Videos from Lyrics, are suggesting to readers that the featured videos were spun up automatically from lyrical input. The software they link to is Midjourney, an AI image generator tool that only creates static images. These still have to be manually sewn together in a video editor or brought to life with an app like Pixamotion.
In a classic game of internet telephone, several other websites have mimicked the message and made similar false claims about Midjourney's ability to generate videos.
Piling onto the confusion, some AI video generator companies are taking ads out on this trending keyword, but have nothing to do with creating music videos.
The ad below claims to turn Midjourney images to video, but the landing page makes no mention of Midjourney and instead promotes a tool used to create corporate explainer videos for marketers.

Fortunately, there are some legit machine learning resources out there that truly generate imagery and animate it for you in a single workspace. They just take a little time and effort to set up.
In the remainder of this article, we're going to provide you with step by step instructions on how to use one of these AI music video makers yourself. The service runs in Google Colab, a cloud service that doesn't take up local RAM or space on your hard drive.
How to use an AI music video generator
The best AI music video generator we've found to date is Kaiber. It has been used for a number of notable projects in 2023, including the revival of an archived Linkin Park song called Lost. You've probably seen music videos made with Kaiber, not realizing that they were responsible for it.
In the video tutorial below, Sharp Startup provides a complete overview on how to go through each step, from generating music (if you don't have any) to creating images and loading them into Kaiber.
A second alternative to Kaiber isVideo Killed the Radio Star by indie developer David Marx. Ben Gillin was the inspiration for VKTRS, however he generated each image manually and synchronized them with an MF Doom song, in the style of Salvador Dali. Two of the defining features on the shimmering imagery and song lyric display.
This free tool runs in Google Colab and should be easy to use, however a number of users have reported difficulties with it recently. We ran into some of the same problems in May 2023 when we tested it. I'll provide step by step instructions here so you can try it out for yourself, if only to gain some experience with Colab.
Be warned - these instructions are a general guide with some tips on what to do if you get stuck. That being said, the codebase is updated regularly and I'm not paid to maintain this article, so we do not regularly test it and make updates here.
If you're up to the challenge and want help getting started, here's what you'll need:
- A free Google account with permission to access Google Drive. My experiment with a 5 minute song and default image quantity took up about 13 GB of space from start to finish. I personally upgraded to Colab Pro to get 100 GB of storage and faster RAM ($10/mo). That being said, your free google account comes with 15 GB by default and you can get at least one music video rendered on the free plan.
- A free Hugging Face account
- A music file or youtube link to music content
- An hour of your time to go through the process and render the video
If that sounds like too much work, go with Kaiber. It's much more turnkey and requires less technical knowledge.
How to use Video Killed the Radio Star
Step 1. Navigate to the Video Killed the Radio Star Google Colab environment
Step 2. In a separate tab, open Hugging Face and generate a token. If you don't have an account, it only takes a moment and it's very easy to use.
Step 3. Return to Google Colab and unfold step zero to run each process. This will install the app's dependencies so that you can perform all of the generative functions.
When you hit the play icon for "provide your API key", it will ask you to paste in your Hugging Face token. As a side note, I ran into an error when attempting to use Stable Diffusion's DreamStudio API key, but once I unchecked that option, it defaulted to Hugging Face and worked just fine.

Step 4. If your Hugging Face token is valid, you'll be able to proceed without much difficulty. A section titled infer speech from audio will ask you to provide either a Youtube video url or a link to a hosted audio file. As you can see in the screenshot, the app will listen for lyrics using OpenAI's speech-to-text tool, Whisper. The lyrics will inform the final video. We only tested the youtube video option for this demo.

Step 5. Choose your art style and update the video dimensions. I initially chose 1920 x 1080 px so that it would be formatted for YouTube, but even with Colab Pro this seemed to cause problems. By reducing the dimensions to 1080 x 720, I was able to proceed. If you're looking to create reels for TikTok and Instagram, reverse those numbers to 720 x 1080 for a vertical reel.
The AI music video generator will use OpenAI's Whisper service to detect the lyrics and convert them to text. It will then reference your theme_prompt statement for the art style:

As the song progresses, so will the images. It's best to use music with lyrics instead of instrumental songs, at least while you run your early tests. The lyrics will inform the image subject, in the art style of your choice.

Step 6. Continue pressing play on each step until you reach the end of the process. Make sure the download_video box is checked and then press play.

That's all there is to it. Once you get a feel for how the software works, you may want to upgrade to Google Colab Pro. Their $10/mo plan will give you more memory to work with, so you don't have to keep resetting your runtime and fighting with the notebook as you experiment with making multiple videos.
If you do run into memory issues, use Colab's change runtime type feature and select the Premium GPU class with the High-RAM resource for your Runtime shape. Here's what those settings look like:

Love the concept and want to see it continue to grow? Contact the developer, David Marx, on Twitter at @digthatdata to show your appreciation for his efforts.
Music video makers that don't generate images
Before we wrap this up, I recognize that some people may want to explore less technical solutions. So here are a few resources you can look into.
Make-a-Video by Facebook-Meta
In September 2022, the company formerly known as Facebook announced their upcoming text-to-video animator called Make-a-Video. Meta briefly opened up a private beta, but this window has since closed and the form is no longer accessible.
The Make-a-Video website showcases some of the animated video clips that they've rendered. They look okay. Given the recent failure of the Metaverse, we may see the company re-allocating their resources to projects like this instead.
So if Google Colab isn't your thing, be patient and wait. We're likely to see more sophisticated solutions roll out over the next year or two.
Commercial AI music video makers
There are a number of AI music video makers that leverage artificial intelligence but don't generate animated content for you. If you're okay sourcing footage and just want to add lyrics, get the font size right, add social media stickers, and so forth, then these might be closer to what you're looking for. Software like Nova AI, Zoomerang, and Rotor Video are just a few examples.
Of course, these video makers put the musician back into the role of a video editor. Our goal with this article is to liberate you from that, so you can rapidly create visual content for your music without learning a whole new skill.
Final thoughts on AI music videos
There are a number of AI music videos out there that look more fluid than Video Killed the Radio Star. These may be the product of custom solutions built by machine learning specialists who chose not to make their code open source.
If you're aware of other AI music video generators that seem legitimate and were not included in this video, please drop a comment below. We'll look into it and update the article, in order to continue improving on this resource.
Thanks for reading!