How to Use an AI Music Video Generator
Creating a music video can be time consuming and expensive. In a world where the success of a musician is closely tied to their social media accounts, the pressure to create engaging visual content is at an all time high... and it’s an introverted songwriter’s worst nightmare!
Fortunately, a new breed of ai music video generators have emerged to help fill this need for visual content. You may have seen them trending on the internet lately. The video above shows how one group, Die Antwoord, turned their latest song Age of Illusion into an ai-generated animation. It's one of the best to date.
In this article we'll show how you how to create your own AI music videos using both free and paid options. First up is Neural Frames, a versatile text-to-video program that synchronizes with the beats and tempo of any song.
Neural Frames: Syncing up music with text-to-video
Neural Frames currently offers the most robust AI music video generation service that we've seen, complete with audio synchronization features and fast GPU processing times.

When users upload a music file, the platform automatically detects the track tempo (BPM) and offers a snap-to-grid feature for editing. It runs stem separation on the file to identify instruments in the mix.
Drum impacts, like kicks and snares, are parsed from the percussion layer so that users can isolate and access them individually. The text-to-video generator then modulates its imagery dynamically to match the transients selected by the user.

In simple terms, this means that your music video will change noticeably whenever a chosen instrument makes a loud sound. The video tutorial below uses a snare to demonstrate this image modulation feature in action:
Imagery and art style for these videos are based on user text prompts, similar to other text-to-image services like Midjourney and Dalle-3. Free users start with a basic model and can access even more options later if they choose to upgrade.
Users can animate a single image based on parameters like zoom and rotation. However, if you have existing content and want to animate in between them, it's also possible to establish keyframes and animate between them. Learn more about that technique here.
Neural Frames even allows users to train their own custom models, enabling the depiction of real objects in the AI world. This makes it possible to create character-consistent videos, rather than relying entirely on AI-generated figures.

Once a music video is finished, Neural Frames exports in both horizontal and vertical layouts, giving users the option to create YouTube videos, social media reels and Spotify canvases. You can find a demonstration of that technique here:
Why are musicians turning to AI music videos?
You might be wondering why anyone would want to create a video with artificial intelligence in the first place. Some have a genuine desire to experiment, while other feel pressure to create visual content for social media. We're in a cultural phase where fans prefer to watch a video when artists drop new music.
Creating an engaging music video can be expensive and time consuming. Fans consume and move on from content at lightspeed. With music royalties at an all time low, artists are trying to retain what they can and reinvest it into new music gear, studio time, and maybe even keep a little for themselves!
Saving money and competing on social media are just half the story though. AI video generation is a fun, creative act that requires less technical skill than traditional video production. Apps like Neural Frames give artists more creative control over their visual brand, without the need for a big budget.
How to use an AI music video generator
So far we've covered Neural Frames. The other popular AI music video generator is called Kaiber. It's been used for a number of notable projects in 2023, including the revival of an archived Linkin Park song called Lost. You've probably seen music videos made with Kaiber, not realizing that they were responsible for it.
In the video tutorial below, Sharp Startup provides a complete overview on how to go through each step, from generating music (if you don't have any) to creating images and loading them into Kaiber.
A third alternative to Kaiber isVideo Killed the Radio Star by indie developer David Marx. Ben Gillin was the inspiration for VKTRS, however he generated each image manually and synchronized them with an MF Doom song, in the style of Salvador Dali. Two of the defining features on the shimmering imagery and song lyric display.
Once your AI-generated music video is ready, you can use free movie maker tools to further enhance the visuals with AI-based special effects.
Applying professional-grade LUTs and filters is going to give you in-depth control over the video's color grading, while the AI background removal feature can help you cut out certain elements which can later be used in other projects. AI motion tracking is another great tool that can enhance your music videos. This feature detects objects and allows you to highlight elements that you want to stand out.
How do you create a lyric video with AI?
Karaoke-style Lyric videos are common solution to creating video content on a budget. They're quick and easy to produce, especially if you've already got video content from services like Neural Frames and Kaiber. You don’t need to be skilled with editing tools to load up a video template and plug in lyrics. The video tutorial above will set you in the right direction.
However, if you're interested in a free AI music video that generates both images and lyric captions for you, continue reading to learn about Video Killed the Radio Star.
Video Killed the Radio Star
This free tool runs in Google Colab and was one of the first AI music video generators to hit the internet, long before Kaiber and Neural Frames became available.
I'll provide step by step instructions here so you can try it out for yourself. These instructions are a general guide with tips on what to do if you get stuck. That being said, the codebase is updated regularly. We don't regularly test it and make updates here, so I recommend reaching out to the developer if you run into any issues.
To get started, here's what you'll need:
- A free Google account with permission to access Google Drive. My experiment with a 5 minute song and default image quantity took up about 13 GB of space from start to finish. I personally upgraded to Colab Pro to get 100 GB of storage and faster RAM ($10/mo). That being said, your free google account comes with 15 GB by default and you can get at least one music video rendered on the free plan.
- A free Hugging Face account
- A music file or youtube link to music content
- An hour of your time to go through the process and render the video
If that sounds like too much work, go with Kaiber. It's much more turnkey and requires less technical knowledge.
How to use Video Killed the Radio Star
Step 1. Navigate to the Video Killed the Radio Star Google Colab environment
Step 2. In a separate tab, open Hugging Face and generate a token. If you don't have an account, it only takes a moment and it's very easy to use.
Step 3. Return to Google Colab and unfold step zero to run each process. This will install the app's dependencies so that you can perform all of the generative functions.
When you hit the play icon for "provide your API key", it will ask you to paste in your Hugging Face token. As a side note, I ran into an error when attempting to use Stable Diffusion's DreamStudio API key, but once I unchecked that option, it defaulted to Hugging Face and worked just fine.

Step 4. If your Hugging Face token is valid, you'll be able to proceed without much difficulty. A section titled infer speech from audio will ask you to provide either a Youtube video url or a link to a hosted audio file. As you can see in the screenshot, the app will listen for lyrics using OpenAI's speech-to-text tool, Whisper. The lyrics will inform the final video. We only tested the youtube video option for this demo.

Step 5. Choose your art style and update the video dimensions. I initially chose 1920 x 1080 px so that it would be formatted for YouTube, but even with Colab Pro this seemed to cause problems. By reducing the dimensions to 1080 x 720, I was able to proceed. If you're looking to create reels for TikTok and Instagram, reverse those numbers to 720 x 1080 for a vertical reel.
The AI music video generator will use OpenAI's Whisper service to detect the lyrics and convert them to text. It will then reference your theme_prompt statement for the art style:

As the song progresses, so will the images. It's best to use music with lyrics instead of instrumental songs, at least while you run your early tests. The lyrics will inform the image subject, in the art style of your choice.

Step 6. Continue pressing play on each step until you reach the end of the process. Make sure the download_video box is checked and then press play.

That's all there is to it. Once you get a feel for how the software works, you may want to upgrade to Google Colab Pro. Their $10/mo plan will give you more memory to work with, so you don't have to keep resetting your runtime and fighting with the notebook as you experiment with making multiple videos.
If you do run into memory issues, use Colab's change runtime type feature and select the Premium GPU class with the High-RAM resource for your Runtime shape. Here's what those settings look like:

Love the concept and want to see it continue to grow? Contact the developer, David Marx, on Twitter at @digthatdata to show your appreciation for his efforts.
Music video makers that don't generate images
Before we wrap this up, I recognize that some people may want to explore less technical solutions. So here are a few resources you can look into.
Make-a-Video by Facebook-Meta
In September 2022, the company formerly known as Facebook announced their upcoming text-to-video animator called Make-a-Video. Meta briefly opened up a private beta, but this window has since closed and the form is no longer accessible.
The Make-a-Video website showcases some of the animated video clips that they've rendered. They look okay. Given the recent failure of the Metaverse, we may see the company re-allocating their resources to projects like this instead.
So if Google Colab isn't your thing, be patient and wait. We're likely to see more sophisticated solutions roll out over the next year or two.
Commercial AI music video makers
There are a number of AI music video makers that leverage artificial intelligence but don't generate animated content for you. If you're okay sourcing footage and just want to add lyrics, get the font size right, add social media stickers, and so forth, then these might be closer to what you're looking for. Software like Nova AI, Zoomerang, and Rotor Video are just a few examples.
Of course, these video makers put the musician back into the role of a video editor. Our goal with this article is to liberate you from that, so you can rapidly create visual content for your music without learning a whole new skill.
Final thoughts on AI music videos
There are a number of AI music videos out there that look more fluid than Video Killed the Radio Star. These may be the product of custom solutions built by machine learning specialists who chose not to make their code open source.
If you're aware of other AI music video generators that seem legitimate and were not included in this video, please drop a comment below. We'll look into it and update the article, in order to continue improving on this resource.