top of page

How to Use an AI Music Video Generator

Creating a music video can be time consuming and expensive. In a world where the success of a musician is closely tied to their social media accounts, the pressure to create engaging visual content is at an all time high... and it’s an introverted songwriter’s worst nightmare!

Fortunately, a new breed of ai music video generators have emerged to help fill this need for visual content. You may have seen some of these trending on the internet lately. The video above shows how one group, Die Antwoord, turned their latest song Age of Illusion into an ai-generated animation. It's one of the best to date.

In February 2024, OpenAI announced a new text-to-video service called Sora that took the world by storm. The videos are currently silent, which opens up a lot of questions about who or what will fill the void with audio. For now, the system is still in a closed beta.

A few months later, at the beginning of May 2024, a band called Washed Out on the popular indie label Sub Pop released an AI music video created with Sora.

In this article we'll show how you how to create your own trippy videos with the help of artificial intelligence. First up is Neural Frames, a versatile text-to-video program that synchronizes with the beats and tempo of any song.

Table of Contents

Neural Frames: Syncing music with AI text-to-video

Neural Frames currently offers the most robust AI music video generation service that we've seen, complete with audio synchronization features and fast GPU processing times. They also have a vibrant community on Discord and Twitter, with ongoing AI video competitions and cash prizes for the best user generated content.

Modulation settings for Neural Frame timeline

When users upload a music file, the platform automatically detects the track tempo (BPM) and offers a snap-to-grid feature for editing. It runs stem separation on the file to identify instruments in the mix.

Drum impacts, like kicks and snares, are parsed from the percussion layer so that users can isolate and access them individually. The text-to-video generator then modulates its imagery dynamically to match the transients selected by the user.

Neural Frames stem separation

In simple terms, this means that your music video will change noticeably whenever a chosen instrument makes a loud sound. The video tutorial below uses a snare to demonstrate this image modulation feature in action:

Imagery and art style for these videos are based on user text prompts, similar to other text-to-image services like Midjourney and Dalle-3. Free users start with a basic model and can access even more options later if they choose to upgrade.

Users can animate a single image based on parameters like zoom and rotation. However, if you have existing content and want to animate in between them, it's also possible to establish keyframes and animate between them. Learn more about that technique here.

In December 2023, the company rolled out a new ControlNet feature called visual echo. It helps to stabilize objects and characters during frame-by-frame animation, allowing creators to tell long form visual stories around a theme.

Prior to this, objects requested by a text prompt would look different in each frame and led to a choppier look visually. The new visual echo feature has been well received by their audience, who in a recent poll considered it one of the best new additions.

Selecting a model for the AI music video

Neural Frames even allows users to train their own custom models, enabling the depiction of real objects in the AI world. This makes it possible to create character-consistent videos, rather than relying entirely on AI-generated figures.

Once a music video is finished, Neural Frames exports in both horizontal and vertical layouts, giving users the option to create YouTube videos, social media reels and Spotify canvases. You can find a demonstration of that technique here:

Why are musicians turning to AI music videos?

You might be wondering why anyone would want to create a video with artificial intelligence in the first place. Some have a genuine desire to experiment, while other feel pressure to create visual content for social media. We're in a cultural phase where fans prefer to watch a video when artists drop new music.

Creating an engaging music video can be expensive and time consuming. Some artists work around this with abstract, geometric music visualizers. These can be stimulating and fun to watch, but it's difficult to tell a story with them.

AI video generation is fun and creative, taking less technical skill than traditional video production. Apps like Neural Frames give artists more creative control over their visual brand, without being held back by smaller media budgets.

How to use an AI music video generator

So far we've covered Neural Frames. The other popular AI music video generator is called Kaiber. It's been used for a number of notable projects in 2023, including the revival of an archived Linkin Park song called Lost. You've probably seen music videos made with Kaiber, not realizing that they were responsible for it.

In the video tutorial below, Sharp Startup provides a complete overview on how to go through each step, from generating music (if you don't have any) to creating images and loading them into Kaiber.

A third alternative to Kaiber isVideo Killed the Radio Star by indie developer David Marx. Ben Gillin was the inspiration for VKTRS, however he generated each image manually and synchronized them with an MF Doom song, in the style of Salvador Dali. Two of the defining features on the shimmering imagery and song lyric display.

Once your AI-generated music video is ready, you can use free movie maker tools to further enhance the visuals with AI-based special effects.

Applying professional-grade LUTs and filters is going to give you in-depth control over the video's color grading, while the AI background removal feature can help you cut out certain elements which can later be used in other projects. AI motion tracking is another great tool that can enhance your music videos. This feature detects objects and allows you to highlight elements that you want to stand out.

How do you create a lyric video with AI?

Karaoke-style Lyric videos are common solution to creating video content on a budget. They're quick and easy to produce, especially if you've already got video content from services like Neural Frames and Kaiber. You don’t need to be skilled with editing tools to load up a video template and plug in lyrics. The video tutorial above will set you in the right direction.

However, if you're interested in a free AI music video that generates both images and lyric captions for you, continue reading to learn about Video Killed the Radio Star.

Video Killed the Radio Star

This free tool runs in Google Colab and was one of the first AI music video generators to hit the internet, long before Kaiber and Neural Frames became available.

I'll provide step by step instructions here so you can try it out for yourself. These instructions are a general guide with tips on what to do if you get stuck. That being said, the codebase is updated regularly. We don't regularly test it and make updates here, so I recommend reaching out to the developer if you run into any issues.

To get started, here's what you'll need:

- A free Google account with permission to access Google Drive. My experiment with a 5 minute song and default image quantity took up about 13 GB of space from start to finish. I personally upgraded to Colab Pro to get 100 GB of storage and faster RAM ($10/mo). That being said, your free google account comes with 15 GB by default and you can get at least one music video rendered on the free plan.

- A free Hugging Face account

- A music file or youtube link to music content

- An hour of your time to go through the process and render the video

If that sounds like too much work, go with Kaiber. It's much more turnkey and requires less technical knowledge.

How to use Video Killed the Radio Star

Step 1. Navigate to the Video Killed the Radio Star Google Colab environment

Step 2. In a separate tab, open Hugging Face and generate a token. If you don't have an account, it only takes a moment and it's very easy to use.

Step 3. Return to Google Colab and unfold step zero to run each process. This will install the app's dependencies so that you can perform all of the generative functions.

When you hit the play icon for "provide your API key", it will ask you to paste in your Hugging Face token. As a side note, I ran into an error when attempting to use Stable Diffusion's DreamStudio API key, but once I unchecked that option, it defaulted to Hugging Face and worked just fine.

Initial setup settings
First step in Google Colab

Step 4. If your Hugging Face token is valid, you'll be able to proceed without much difficulty. A section titled infer speech from audio will ask you to provide either a Youtube video url or a link to a hosted audio file. As you can see in the screenshot, the app will listen for lyrics using OpenAI's speech-to-text tool, Whisper. The lyrics will inform the final video. We only tested the youtube video option for this demo.

Select youtube video URL

Step 5. Choose your art style and update the video dimensions. I initially chose 1920 x 1080 px so that it would be formatted for YouTube, but even with Colab Pro this seemed to cause problems. By reducing the dimensions to 1080 x 720, I was able to proceed. If you're looking to create reels for TikTok and Instagram, reverse those numbers to 720 x 1080 for a vertical reel.

The AI music video generator will use OpenAI's Whisper service to detect the lyrics and convert them to text. It will then reference your theme_prompt statement for the art style:

Theme prompt and dimensions

As the song progresses, so will the images. It's best to use music with lyrics instead of instrumental songs, at least while you run your early tests. The lyrics will inform the image subject, in the art style of your choice.

AI Music Video
Animated song lyrics from the Beatles tune "And I love her"

Step 6. Continue pressing play on each step until you reach the end of the process. Make sure the download_video box is checked and then press play.

Download video content

That's all there is to it. Once you get a feel for how the software works, you may want to upgrade to Google Colab Pro. Their $10/mo plan will give you more memory to work with, so you don't have to keep resetting your runtime and fighting with the notebook as you experiment with making multiple videos.

If you do run into memory issues, use Colab's change runtime type feature and select the Premium GPU class with the High-RAM resource for your Runtime shape. Here's what those settings look like:

High RAM GPU settings

Love the concept and want to see it continue to grow? Contact the developer, David Marx, on Twitter at @digthatdata to show your appreciation for his efforts.

Music video makers that don't generate images

Before we wrap this up, I recognize that some people may want to explore less technical solutions. So here are a few resources you can look into.

Make-a-Video by Facebook-Meta

In September 2022, the company formerly known as Facebook announced their upcoming text-to-video animator called Make-a-Video. Meta briefly opened up a private beta, but this window has since closed and the form is no longer accessible.

The Make-a-Video website showcases some of the animated video clips that they've rendered. They look okay. Given the recent failure of the Metaverse, we may see the company re-allocating their resources to projects like this instead.

So if Google Colab isn't your thing, be patient and wait. We're likely to see more sophisticated solutions roll out over the next year or two.

Commercial AI music video makers

There are a number of AI music video makers that leverage artificial intelligence but don't generate animated content for you. If you're okay sourcing footage and just want to add lyrics, get the font size right, add social media stickers, and so forth, then these might be closer to what you're looking for. Software like Nova AI, Zoomerang, and Rotor Video are just a few examples.

Of course, these video makers put the musician back into the role of a video editor. Our goal with this article is to liberate you from that, so you can rapidly create visual content for your music without learning a whole new skill.

Final thoughts on AI music videos

There are a number of AI music videos out there that look more fluid than Video Killed the Radio Star. These may be the product of custom solutions built by machine learning specialists who chose not to make their code open source.

If you're aware of other AI music video generators that seem legitimate and were not included in this video, please drop a comment below. We'll look into it and update the article, in order to continue improving on this resource.

23 comentários

HP Einvik
HP Einvik
18 de ago. de 2023

Hi. Thanks for the article.

I tried for an hour to get the the "Initial Audio Processing" hing, but it failed over and over. I made a stripped down version of the song, but no dice. Would've been cool though.


Ezra Sandzer-Bell
Ezra Sandzer-Bell
19 de mai. de 2023

Hey all, this is Ezra (author of this article). To everyone who tried Video Killed the Radio Star and had trouble: I've updated this article to promote Kaiber, which is by and large a much easier and more turnkey solution than VKTRS.

Kaiber is a paid service. We're not affiliated with them in any way, but I wanted to promote them for anyone looking for an enjoyable alternative to VKTRS. Kaiber was released several months after this article was initially written, so we're keeping the original instructions as a time capsule.

David, the developer at VKTRS, has confirmed that he's been maintaining the codebase and that it should work. I personally could not get it to work as of May…


Dust Down
Dust Down
09 de mai. de 2023
I get to the compile your video section, but getting these errors - any ideas? 
BrokenPipeError: [Errno 32] Broken pipe
AttributeError: '_idat' object has no attribute 'fileno'

Stefan Sugblogg
Stefan Sugblogg
19 de abr. de 2023

This doesn't work. Just a bunch of errors. Thanks for the article though...



Randy Mitchell
Randy Mitchell
16 de abr. de 2023

I can't get past Step 2. It wants an option for this segment of code.

####################### # save transcriptions # ####################### transcriptions = {} transcription_root = root / 'whispers' transcription_root.mkdir(parents=True, exist_ok=True) writer = whisper.utils.get_writer(output_format='vtt', output_dir=transcription_root) # output dir doesn't do anything...? for k in whispers: outpath = str( transcription_root / f"{k}.vtt" ) transcriptions[k] = outpath with open(outpath,'w') as f: # to do: upstream PR to control verbosity write.write_result( whispers[k], file=f, <------------------------ ) storyboater.wrrd.params.whisper.transcriptions = transcription

bottom of page