Neural Frames: Generate AI Music Videos on Autopilot

Ezra Sandzer-Bell
May 18
9 min read

Creating a music video used to be a time consuming and expensive process. But in a world where success is tied to our presence on social media, musicians are starting to find new and creative ways to make professional videos. AI music video generators are one of the most popular options available.

Neural frames launched the internet's first AI music video generator in 2023. To this day, it's still the only fully controllable, audio-reactive solution on the web.

In May 2025, neural frames released a new feature called autopilot that streamlines the content creation more than ever before. Users upload a song, approve or adjust the video style, and hit generate. It's literally that simple.

Some of you will want to go deeper and take control over the video production workflow. The tutorial video above demonstrates five different techniques you can use to create audio visualizers with on their platform.

I'll share some of my own tips with you in this article, based on the last couple years of experience creating content in their app. Neural frames generously donated AI video generation credits and covered the time we spent testing these tools. However, the opinions expressed and music videos we created are our own.

Generate AI music videos on autopilot
Neural frames: Frame-by-frame vs AI video models
How to create audio-reactive AI music videos in NF
Watch an AI music video demo (neural frames + Kling)
How I personally use neural frames as a content creator
Star in your AI music video with custom AI avatars
Using neural frames with an AI music generator
Examples of AI music videos from popular artists

Generate AI music videos on Autopilot

Neural frames has an autopilot feature that can spin up captivating videos with very little effort. There's virtually no learning curve, no parameters to fuss with, no assembling clips on a timeline, and no software download required.

To access autopilot, navigate to your dashboard home page and click on the music icon, located in the left navigation menu (as seen in the screenshot below).

Drag your song in and on the next screen you'll see a printout of the lyrics with an option to create clips. Check the lyrics to make sure they've been transcribed correctly and then move forward.

When you hit create clips, neural frames will generate a storyboard of four different scenes. You can choose a fine-tuned stable diffusion image model to redefine the visual style and hit regenerate on any or all of the images until you get something you're happy with.

Finally, hit "render clips" to complete the process. You can use Kling Standard or Kling Pro and if you want my opinion, it's worth paying extra for Kling Pro. The video quality is going to be much better. Expect to wait around 10 minutes for the music video to render.

Neural frames: Frame-by-frame vs AI video models

The autopilot mode works exclusively with Kling's text to video model. But there's a second, audio-reactive option that generates imagery frame-by-frame. With this you can create animations that respond visually to any instrument in your mix.

For those of you who want more control over the imagery and the order of the video clips, check out this option instead. Neural frames has two separate kinds of text-to-video models, labeled "video models" and "frame by frame animation":

The audio reactive, frame-by-frame videos tend to have a "trippy" aesthetic due to the lower frame rate (25 FPS) and the way that images morph and transform in a short period of time. It's a great way to achieve a psychedelic look and feel for your video, if that's what you're going for.

In 2024, Neural frames introduced an AI video hub for third-party solutions like Kling and RunwayML. They're not audio-reactive, but they do create high FPS video content without the hallucinatory shuttering effect.

On a personal note, I like to start videos with Kling 1.6 Pro and as the music begins to climax, switch over to the audio-reactive frame-by-frame model. This allows me to harness the intensity of the audio visualization and use it for emphasis, rather than overdoing it and burning the audience out.

How to create audio-reactive AI music videos in NF

Imagery and art style for these videos are based on text prompts, just like other popular text-to-image services. You can describe any kind of image you want to create and if you're concerned that it's too basic, hit pimp my prompt to get a more elaborate description.

Creating the first AI music video keyframe from a text prompt

On this screen, you can also select the layout format you want, including a 1:1 square, landscape, and portrait views. Hit the create button to generate four images. Choose one and then move on to the video configuration step.

We'll be testing out the 1:1 ratio so we can use the video on an Instagram post. Landscape is best for YouTube format while portrait works for TikTok and Spotify.

Here's what the neural frames video editor looks like. Our text prompt called for little bubbles floating in a fantasy world, where a castle on a top of a chessboard, floating on a cloud in the sky. The AI image captured that concept perfectly:

How to add music to neural frames

Double click on the timeline row next to the music icon, shown in the screenshot above.
Choose from music presets provided by neural frames or upload our own music file.
It takes about 15 seconds for neural frames to split the music into stems and render the song's waveform on the timeline as shown below.

The next step in this process is to pick an instrument layer that will act as the modulation trigger. You can click the play button on any layer to hear what it sounds like and decide which would be the best choice for your animation.

In less technical terms, this means that every time a snare, kick drum, or loud guitar strum occurs, neural frames will signal to the video generator that it should change the imagery more dramatically at that moment in the song.

We isolated the snare and you can see below that the audio timeline now has a wave form located above it. These wave shapes represent the snare hits that will modulate our animation.

The trigger is used to modulate a specific target. If you click on that dropdown menu, you can select what you want it to do. It defaults to strength, however we could have also selected motions like panning, zooming, rotation, and more.

We updated the duration from its 3 second default to 10 seconds, so that it covered the full length of our clip. With those modulation settings configured, we returned to the main screen and extended the prompt and modulation bars to match the full length of the audio clip. Then we hit render and waited.

Modulation controls are the most challenging and important feature to lear

A note about rendering speeds: Neural frames renders one image at a time with animation frame rate of 25 FPS. This means that rendering times can be quite slow even for a short video.

To speed that up, you can switch on the new Turbo Mode feature that was added in June 2024. It will increase your rendering speed by 400%.

Rendering and exporting your neural frames video

We opted for the slow mode in order to get the highest image quality. Since we're using the Juggernaut XL model, it took about 5 minutes to render ten seconds. The speed varies depending on the model you choose. We were happy with the results and hit the download icon in the upper right corner of the screen to export.

After confirming the export, we were routed to our video collection where we waited for it to upscale to 2x quality. Here's what that dashboard looks like:

neural frames video collection dashboard

It was ready to go within a couple of minutes. Here's how the final neural frame AI music video turned out. Notice that the animation very clearly changes with each snare hit, while maintaining a slower and consistent movement between those modulations.

That covers the full process we went through for this round. Neural frames is actually capable of a lot more. It can animate a single image as we showed here but it can also animate in between two image keyframes to create a kind of morphing effect. Check out their blog for more tutorials and announcements.

Watch an AI music video demo (neural frames + Kling)

Here's an AI music video I created by combining output from both neural frames with Kling. The lofi art style was a personal choice, but you can create videos from any kind of image you want. In fact, when you use this technique you can actually up load your own image key frames. This offers more control than autopilot.

To create this short video, I rendered and export video clips from both models. I dropped them into iMovie for some manual editing (don't judge me). You really don't need any technical skills for post production, but it does help to have a handle on some basics like trimming, splitting, zooming and panning.

I usually follow a 30:70 rule, where one third of my video is audio reactive and the rest is rendered with Kling. This helps to make the climax of the song more visually stimulating.

Excessive frame-by-frame animation can start to look a bit homogenous and even give off that recognizable "artificial intelligence" look that some people have an aversion to. Keep it tasteful and give it your own personal touch!

How I use neural frames as a social media content creator

Using AI music videos to boost engagement as a content creator on social media

I don't think of myself as a content creator, but our company has used neural frames for product demos on several occasions. We shared the FrogSynth video on Instagram and it racked up 70,900 impressions with 157 likes. That's not bad considering we have only ~1,500 active followers.

Like most musicians, I have visual ideas for my music that stock footage can never seems to satisfy. Instead of being forced into selfie videos, TikTok dances and other social media antics, I've used AI music videos to help my audience keep a pulse on our company and see what we're up to.

Star in your AI music video with custom AI avatars

The videos I create are for instrumental music, but if you're someone who sings or wants to be more visually present, AI avatars can give your music video that personal touch. Your audience will usually feel more engaged more when they can see you on screen alongside the visuals. There's a couple of ways to do this.

You can take a green screen approach and place yourself on top of a separate image, or if you're feeling more experimental, you can play with AI rotoscoping. I've personally tried both techniques and will explain briefly how to do it.

Demonstration of background removal and re-synthesis with AI generated image

For the green screen approach, start with a photo of you that's well lit and in a suitable posture. You can use the background removal feature from a tool like Canva to isolate your body. Place that transparency on top of a visual that's consistent with the rest of your music video's art style. If you don't have a background image in mind, it's easy to create them from text prompts using a service like Midjourney or ChatGPT.

Rotoscoping is a bit more complex. This technique involves creating a "painted" version of your original reference images, and then animating between them using a neural frames model. The video tutorial above explains how to do it, if you're interested.

You can also watch the rotoscoped AI music video I created with neural frames and Kling below to get a better sense of it:

Using neural frames with an AI music generator

The video above was created using a combination of live acoustic guitar performance, audio extension in

All of the popular AI music generator services (like Suno, Udio, and Riffusion) generate original, static images to go with your song. In my opinion the art tends to be a bit generic, but if you see something that inspires you, it's easy to right click and save. Then import that image as your starting point for one of the neural frames models.

The "album art" created on these AI song platforms are square shaped. If you're going to posting on Instagram and want the videos to fit into that grid shape, you can select the 1:1 ratio while using the text prompt in neural frames to achieve that.

Choose 1:1 image ratio to create square AI music videos for social grids like instagram

I prefer to use dedicated AI image generation services like Midjourney or Dalle 3 instead, as they tend to produce higher quality images. You can prompt those services for square images as well. When you upload one of those square images to neural frames, the project dimensions will automatically be set at 1:1.

Once you've got your visual style figured out, the process is more or less the same. You'll download the song from your AI music generator and upload it to neural frames like you would with any other track. Then you're off to the races!

Examples of AI music videos from popular artists

The video above is from Die Antwoord, an avant-pop group who turned their song Age of Illusion into an ai-generated animation. They are one of several major artists, alongside Linkin Park, Disturbed, and Periphery, who have embraced the trend toward AI music videos over traditional animation.

In May 2024, a band called Washed Out released one of the first ever AI music videos created with Sora but it was not audio-reactive. That trend continued into 2025 when controversial rapper Kanye West and Ty Dolla $ign released an AI generated music video for their song 530.

If you'd like to try your hand at generating an AI music video, head over to neural frames and sign up for a free trial.