top of page
Search
Writer's pictureEzra Sandzer-Bell

New AI Film Scoring Technique Uses MusicGen, GPT-4 and a DAW

AI music debates this year have largely focused on artist remunerations and the unauthorized use of celebrity voices. Songs like Heart On My Sleeve sent major labels scrambling to create legal frameworks that protect their bottom line.


But in the near future, film composers could find themselves in a similar position. As generative AI models achieve long-duration, high-fidelity music output, the once coveted skillset of film scoring could be devalued.


Unlike mainstream pop music artists, composers could struggle to get the same kind of legal representation. To remain competitive, they may need to explore generative tools and adopt them into their workflows where appropriate.


Academic research papers on generative film music have tended to focus on theory and private systems instead of publicly available products and techniques.


In April 2024, the "sound for video" DAW Audio Design Desk announced that they will be integrating their AI music generation tool SoundGen directly within the editor. The web app already included a video embed feature, but this is the first clear sign of AI audio entering the film production software niche.


In this article, we'll provide a step-by-step tutorial anyone use to begin scoring films with artificial intelligence. It's a full-pipeline from spotting cues and generating music to cleaning up low fidelity audio in a DAW and arranging your score on a video editor's timeline.


Before we get into the tutorial, let's have a look at the Hollywood strikes and its impact on music sync licensing, Discovery's retraction of US royalties for film composers, and a prediction about the impact of commercial AI video generators on grassroots film scoring. Later in the article, I'll address lingering challenges in AI composition that our proposed technique has failed to resolve.


Table of Contents



Hollywood actor & writer strikes impact musicians



Writers and actors in Hollywood have been on strike for months, demanding better wages and opposing the misuse of artificial intelligence in their industry. Unfortunately, these strikes might not end favorably.


In July 2023, Netflix attracted a wave of criticism for listing an AI product manager role with a salary of $300,000 to $900,000. Many of the guild’s actors are reported to make less than $26,000 a year. The wage gap is insulting, but if movie studios can generate quality AI content at scale, they have reasoned that the return on investment will be significant.


As Pitchfork Media reported in August, music supervisors who provide audio for film are feeling the impact of these strikes. As freelancers, their attempts to unionize have been denied. Most professional musicians depend on sync licensing to pay their bills, but opportunities for placements in TV and film are slowing down due to the decline in content.


“It wasn’t until the last couple of weeks where we’re starting to feel like, OK, they’re running out of things to license music for,” says Jen Pearce, founder and CEO of Low Profile

Licensed songs fall into the "soundtrack" category of film music. The same song can be used across multiple placements, multiplying the artist's earning potential. Unfortunately, the same cannot be said for composers who write original scores for a film. They have less opportunities to earn a meaningful income from their work, despite being highly skilled in their craft.


Warner Discovery pulls royalties from composers



Film composers have not only been struggling with the decline in content, but are also fighting for music royalty payments ever since Discovery announced a shift to direct source licenses at the end of 2019. Musicians in Discovery's network no longer collect US royalties for future and past work — they only collect upfront fees and foreign royalties.


When long format AI music generation takes hold, it will become easier than ever to generate film scores. This once coveted skill could be adopted by a younger cohort untethered to legacy workflows, free of university debt and lacking the emotional grief of displacement. In other words, existing composers will need to remain flexible and adapt to the new creative climate to remain employed.


Indie AI filmmakers will need affordable film scores



Meanwhile, on the internet, generative video software companies like Gen-2's Runway and Pika Labs have started providing a service that creates short video clips based on text input. As the clip duration shifts into longer-form content, I can imagine experimental filmmakers publishing and monetizing on original content through platforms like YouTube.


As a grassroots movement, the indie AI filmmakers won't be interested in paying exorbitant fees for sync licensing. There will be a growing demand for engaging, high quality film scores generated with artificial intelligence.


Legislation will clamp down on commercial AI music models that were trained on licensed music, enforcing remunerations and profit shares. In response, we will see a corresponding rise in illegally trained models available on the dark web. Think Pirate Bay and BitTorrent, but for creation instead of consumption.


While all of this plays out, legal generative audio workstations and AI VSTs will also become more common. They will streamline traditional DAW workflows and help composers of all skill levels to accelerate their workflow. Music producers will be able to generate and construct film scores at a more reasonable pace and cost.


Writing film scores with GPT-4, MusicGen and a DAW


Let's move on to a demonstration. In this section I'll share one possible AI film scoring workflow that could be practiced today. It requires no special programming knowledge and can be achieved with free, entry-level software.



Our demo video above highlights the first few steps of this process. The rest of the workflow will be intuitive to music producers familiar with working in a DAW.

  1. Open ChatGPT, or other LLM hosts like Perplexity

  2. Paste in the initial prompt that we'll provide later in this article. The prompt will explain to the LLM that you're looking for a text-to-music prompt to capture the mood of a scene.

  3. Paste in a script or written description of the scene. If it's a well known film and you don't have access to the script, the LLM may be able to summarize the scene for you. You can also type up a paragraph of text describing the scene, in lieu of a script.

  4. Copy the text-to-music prompt from your LLM and paste it into MusicGen. If you already have a melody or arrangement idea, you can upload it as an audio condition to guide the musical output.

  5. Save the generative music to your local device.

  6. Import the video and music to a video editor with sound design capabilities. I personally recommend Audio Design Desk, but in our demo I used iMovie to underscore the ease and accessibility of the method.

  7. Sync your music with the scene. You can use a foley library to add layers of sound effects to the scene.

  8. If the music doesn't feel like a good fit, return to MusicGen and try again. Once you find music that you like, Import the generated music to your DAW.

  9. Use an audio-to-midi tool like Samplab 2. Apply stem separation and import each instrument onto separate MIDI tracks. Use virtual instruments that reflect the original audio and improve the sound quality.

  10. Clean up the transcribed MIDI for each track in your piano roll as needed.

  11. Some DAWs, like Logic Pro X, include a score view that will convert MIDI to sheet music. For live instrumentation, you have the option to print the sheet music and hand it off to a live ensemble for studio recordings.

  12. Export the final audio file back into the video editor. Swap it for the original scratch track generated from MusicGen.

To put this technique into practice, we need a general understanding of how film scoring works. So before we break down each step of the tutorial, let's take a quick detour to review the basics of composing for movie scenes.


Spotting sessions and creating cue sheets


At its core, film score "spotting" is the collaborative process between the film's director, composer, and sometimes the music editor, wherein decisions are made about where music will be placed in the film, what emotional impact it should have, and other considerations. It's an essential step in post-production that ensures music appropriately complements and enhances the narrative, pacing, and emotional arcs of the story.


A typical spotting session involves watching the film and determining the 'in' and 'out' points for each music cue. The 'in' point refers to when a piece of music starts, while the 'out' point denotes when it concludes.


During the session, the director and composer discuss the desired emotional tone for each scene, whether it should be underscored with music or left silent, and any other specific musical motifs or themes that may be pertinent. The length, style, and type of music required are hashed out, and the composer takes notes to guide the composition process.


After spotting, the composer has a roadmap of the film's musical needs. They'll retreat to compose the music, providing mock-ups or demos to the director for feedback before finalizing the score.


The core guidelines for creating film music



According to classic film composer Aaron Copland, film music should accomplish at least one of the following five functions:

  1. Creating atmosphere.

  2. Highlighting the psychological states of the characters.

  3. Providing a neutral background filler.

  4. Building a sense of continuity.

  5. Sustaining tension and then rounding it off with a sense of closure.

When we set out to create AI music for scenes in a movie or TV show, these goals should remain top of mind. Many books have been written about film scoring, but my personal favorite is Hollywood Harmony by Frank Lehman.


We'll return to Lehman's work at the end of this article, when we talk about leitmotifs and thematic transformation.


Practicing the full 12-step AI film scoring technique


Congrats, you've made it to the main tutorial. Let's dive into this technique in detail so you know exactly what to do.


Step 1: Choose an LLM (ChatGPT / Perplexity)


ChatGPT interface

The easiest way to get started is with a free tool like ChatGPT or the Llama LLM hosted on Perplexity. Once you've opened these in your browser, start a new chat.


Step 2: Prime your LLM for the music spotting task

Prompting an LLM to generate music cues for film

The LLM needs a simple primer so that it understands the task at hand. I'll provide an example here and you can fine tune it or write your own:



Read through the following film script. For each scene, analyze the emotion, pacing, setting, and significant events. Based on your analysis, provide a text-to-music prompt that encapsulates the mood and essence of the scene. This prompt should be descriptive enough to guide a text-to-music generator in producing music that matches the scene's atmosphere.

Step 3: Paste in a script or description of a scene


To help make this walkthrough more concrete, we'll reference a scene from the 2006 Martin Scorsese film Departed. I chose this clip below because there's no music and minimal sound effects. It's a tense and important turning point in the movie. In a following step, you'll hear it again with music.


Instead of hunting down the original script for Departed, I asked GPT-4 to summarize the scene's events. We can copy that text and feed it into our primed LLM in order to retrieve the music cues we asked for.


A text-to-music prompt from ChatGPT

Here's the result of combining the primer with the script summary from above. If it looks good, go ahead and copy your text-to-music prompt to clipboard:


Text-to-music prompt

Step 4 & 5: Paste text-to-music prompt into MusicGen


Navigate to MusicGen and paste in the music prompt for your scene. Extend the generated music duration to 30 seconds, hit the submit button, and wait a few minutes. Once the track is generated, hit the ellipses on the playback widget and download the track. You can hit submit repeatedly to generate as much music content as you need.

Paste the film music cue into MusicGen

Step 6-8: Import the video and music to a video editor


Import your video into a video editor and begin laying down the MusicGen tracks, trimming them to fit into one another and match the feeling of the scene.



To convey the simplicity of this process, we used the most basic video editing tool, iMovie. If you're planning on adding layers of foley to your film, we recommend using a sound design DAW like Audio Design Desk.


Step 9: Convert audio-to-MIDI in a DAW with Samplab 2

Import the film score to a DAW

To improve on the audio quality, drag your audio file into a DAW and use audio-to-MIDI transcription software like Samplab 2 for stem separation. You'll be able to pull apart the bass and lead melodies, accelerating the transcription process significantly. This is much easier than trying to transcribe it by ear. Drag the MIDI file onto a MIDI track in your DAW as shown above.


Step 10: Clean up MIDI & apply software instruments

Cleaning up your MIDI files

Once you've separated each instrument section, open up your piano roll and edit the notes as needed. It may be helpful to assign your virtual instruments ahead of time. Samplab will include velocity in its transcription, starting with the preferred instrument makes it easier to test articulations and find your ideal volume levels at the note level.


Step 11: Clean up MIDI & apply software instruments

Logic Pro X score view

If you're using a DAW with a sheet music view, like Logic Pro, then you can print the score and hand it off to live musicians. A studio recording of orchestral music always sounds better than a virtual one.


Step 12: Swap in high-fidelity music in video editor

Arrange your AI film music in a video editor

Return to your video editor and swap in the new music recording. As I mentioned before, I recommend using Audio Design Desk if you plan on adding layers of foley and sound design. They provide 70,000+ studio-level effects including risers, drones, impacts, and so forth. With hot swapping, you'll be able to iterate quickly.


That's all there is to it. This workflow isn't fully automated, but it does highlight a practical use case that any music producer can begin experimenting with today.


Imminence of computer vision & deep learning


The 12-step process above could be made even more efficient with computer vision and deep learning. Video content analysis has existed for years. Basic object and action recognition are common features, but they lack the rich descriptive capabilities required to describe film scenes.


Image captioning

Microsoft's Azure AI image captioning tool, shown above, can recognize objects and interactions like a man jumping on a skateboard. Deeper emotional nuance and context are missing, so we can't pass in movie scenes and generate text-to-music prompts at scale, yet.


4 challenges in generative AI film scoring


There are several remaining challenges that our current tutorial does not solve. We don't pretend to have solved AI film scoring and I don't think these issues will go away overnight. However, by identifying each of these we may be able to start coming up with solutions that help composers work more efficiently.


Problem 1: Limitations at the attention layer


Today's most advanced public models, like MusicLM and MusicGen, can generate up to thirty seconds of low-to-medium fidelity music before losing focus. Limitations at the attention layer have posed problems for even short-duration music like a three minute song.


Meta's underlying AudioCraft python library does have a generate_continuation method that extends beyond the initial 30 second clip. It's not currently able to retain a coherent understanding of the themes that it generated or iterate on them the way a skilled composer would.


Problem 2: Transcribing arrangements from raw audio



Generative audio synthesis is still far from studio quality. AI music needs to be transcribed and recreated in a DAW so that audio engineers can give it their professional touch.


As we pointed out in the tutorial, audio-to-MIDI software like Samplab can apply stem separation and transcribe raw music to MIDI. Without a service like this, we would have to rely on ear training and reconstruct each instrument layer by hand.


Even with Samplab, users still need a degree of skill to operate a DAW, editing MIDI effectively, apply sound design, improve the mix, and so forth.


Problem 3: Leitmotifs and variation in film scores



The art of writing memorable themes is increasingly rare, even among film composers. With shrinking salaries and crazy deadlines, precious little time is left for composers to dream and imagine new themes.


AI software can spit out a quantity of music, but it's yet to demonstrate any real ability to write compelling leitmotifs for characters and environments. The kinds of thematic development heard in classic films like Lord of the Rings, Harry Potter, and Star Wars require careful harmonic preparation with chord progressions that draw out the desired emotional states based on the scene where it reoccurs.


Meta's MusicGen model does support the option to upload an audio condition and use text prompts to re-imagine it in new styles.


MusicGen's training set includes metadata about mood and instrumentation, which means that prompts can include notes about feeling and arrangement. These will help to guide the reharmonization, which is a step in the right direction.


Still, film composers will still need to listen and evaluate these arrangements to ensure that they achieve the intended effect. In this way, human and AI enter into a collaborative relationship.


Problem 4: Hollywood Harmony, Neo-Riemannian Theory, and black boxes



As author Frank Lehman outlines in his book Hollywood Harmony, there's more to film scores than melodies and chord progressions. Some of the best movie composers use a technique of chromatic modulation, best understood through the lens of Neo-Riemannian theory, to evoke a higher order of semiotic musical meaning that the audience feels and recognizes subconsciously.


These meta-patterns are not tied to any particular key signature, scale or tonal center, but rather to a series of step-wise transformations that are deliberately omitted. They produce chromatic leaps between chords that are understood by the ear due to underlying mathematical relationships.


Composers have used these special chord progressions for decades and they've become part of the film music lexicon, but can AI models be trained on them?


PhD researcher and composer Sara Cardinale has written extensively about this intersection in a paper titled Neo-Riemannian theory and generative film and video game music. Her approach calls for procedurally generated music with a system she calls GENRT.


These rule-based techniques differ from AI models like MusicGen, but are nevertheless interesting. I reached out to Cardinale to get her opinion and she kindly responded with the following:

"Explicitly incorporating domain knowledge like NRT into AI models is an important but open question. Ultimately, it comes down to imposing a stronger prior to the model, but the best way to implement that is not totally clear yet"


Cardinale's idea to impose stronger priors on the AI model aligns with research from University of Illinois professor Lav Varshney, who published this paper in November 2022 demonstrating that AI models could be trained on music theory fundamentals by introducing partial orders on information elements to form an information lattice.


Resources for learning more about film scoring


For readers who enjoyed the tutorial and would like to learn more about the art of film scoring, I highly recommend the podcast Art of the Score. It's a casual and conversational look at classic films and the music that made them. You can also check out this award-winning documentary, Score, to get better acquainted with the industry.


Most film composers will study the craft in university, but the internet has a lot of resources for independent study. This article on Premium Beat provides a great overview of movie scores, what's involved in creating them and names some of the big players the industry.


This IMDB webpage lists their 100 top film composers, with photos and short biographies for each person. For a complete list of film composers, this Wikipedia page is going to be your best resource.


For ideas on how to start landing jobs as a film composer, check out this piece from MIDI Film Scoring.


You'll be advised to start by creating a reel and then build your network. It's a tough time to be getting into the field, for the reasons outlined earlier in this article. So if you're planning on using AI, these conventional paths might not be for you. Try exploring Discord servers and meeting other creatives through those communities.


If you enjoyed this article, check out our roundup of the best Cyberpunk film scores and video game OSTs.

Comments


bottom of page