Game audio composers have flown into California this week to share notes and watch presentations at the Game Sound Conference. Speakers from all around the world are presenting their research on topics related to virtual reality, sound design, adaptive music, and more. One of this year's speakers caught our attention and inspired this article.
Elle Spencer Lewis is a composer and interactive media designer who developed a brain-computer interface (BCI) called Synapsody. The program uses an EEG device to generate real-time procedural music based on the operator's neural activity. She has proposed that artificial intelligence and BCI can be used together to lead us into a new era of adaptive audio. Think biofeedback for video games.
AI Brain-to-Music composition systems
In August 2023, our site announced a research paper from UC Berkley where artificial intelligence was combined with inter-cranial EEG machines to record the music playing inside peoples' heads. We demonstrated a proof of concept for a future where machines could hear the songs in our dreams and play them back to us upon waking, using MusicGen to simulate how this will be achieved.
Then, in early October 2023, we published an article on the role of machine learning in emerging adaptive music middleware systems for video games.
We put a spotlight on Giorgio Yannakakis and his research paper on affective game computing. Yannakakis has leveraged VR eye tracking and skin conductance to monitor player emotions and compose adaptive game audio that responds dynamically to their emotional state. Ultimately, the goal is to hit a sweet spot where the music never becomes too boring or overstimulating.
The work emerging from people like Elle Spencer Lewis, UC Berkeley, and Yannakakis are focused on brain-to-music systems. But most of what we've seen from AI music this year has been focused on AI text-to-music systems instead.
While we wait for the future to catch up with our imagination, I'll share a novel technique anyone can use to make video game music with artificial intelligence.
How to Make Video Game Music with AI Tools
Video game music has always inspired me, even though I've never personally had a career in it. Over the past six years, I've participated in several indie Global Game Jams, collaborating with local game developers to build experiences in as little as 72 hours.
The most important thing I've learned about composing for video games is that you're never writing music for yourself. You're in service to the collective imagination of the group, and you owe it to the hardworking developers to make music that brings the game to life.
Before we get into techniques for writing game music with AI tools, I want to suggest that we first create an original character with some level designs using the amazing new text-to-image capabilities of Dalle-3. In my experience, writing instrumental themes and leitmotifs around a visual idea is the best starting point.
The AI Video Game Music Stack: What to expect from this tutorial
I'm going to show you how to make video game music using an AI text-prompt stack that generates images, MIDI melodies, audio, and video:
Dalle-3 Text-to-Image: We'll use our imaginations to construct a character concept and generate AI mood boards to inspire our musical choices.
AudioCipher text-to-MIDI: We'll turn our character concept into a melody with AudioCipher, edit the MIDI, and bounce it as an audio file.
SoundGen melody-to-song: We'll run the melody file through an AI music transformer that brings our melody to life based on text prompts.
Neural frames text-to-video: For fun, we'll animate the character images and music together and bring it to life.
Samplab audio-to-MIDI: Once we're ready to commit to a song idea, we'll return to the DAW and use Samplab to convert the SoundGen track into MIDI. Then we'll add our own sound design to create an original score.
Step 1: Choose a character and world to write music for
The first and most important step in this process is figuring out who or what you're making music for. I recommend choosing a world or character since these are the most common subjects for musical themes in a game.
World music tends to be have more of a "background" function while character motifs are more prominent and have a distinct personality. To make this less abstract, I'm going to take you through my own creative process.
Starting with a character concept: What do they look like?
The music for this example will be built around a character and later, an environment. I decided to go with a boss in an RPG game.
The character could be based on anything. I went with Greek mythology and found a character called Abraxas. It's associated with the creation of the universe, which seemed like a good "boss" personality type. It's also got the otherworldly design features that I would look for in a video game character.
Abraxas has the head of a rooster, the torso and arms of a man, and snakes for legs. It holds a whip and a shield, which suggest combat and align with the boss theme.
I didn't want the character to be a direct reference to Greek mythology, so I used a play on words and turned Abraxas into Zebraxas. It seemed fitting that it would have a zebra head instead of a rooster.
Using Dalle-3 to create mood boards for your video game character
Once we've formulated a character concept, we can head over to OpenAI's new ChatGPT plugin Dalle-3 and enter a prompt to begin visualizing it:
"Create a technical engineer's drawing of the following figure: a zebra headed deity sitting cross legged with snakes for legs, angel wings, the torso of a man, and holding a sword and a whip."
Next, I wanted to imagine some environments where the gamers would fight Zebraxas. This helps me place the mood and atmosphere for the video game music. I used the following prompt:
"Wide technical blueprint of a video game boss battle stage. A split screen design: The left side offers a ground-level perspective of the hero, ready for combat, confronting the zebra deity with its snake legs and angel wings. The right side provides an aerial view, revealing the layout of the battleground, filled with traps and obstacles, with the deity and hero at its center."
These are nice art prints but it's time to move into something more closely resembling an actual video game. It needs color and video game interface design. Here's the prompt I used:
"3D polygonal render reminiscent of the early PlayStation era, integrating the zebra-headed deity into a classic video game scene. On the left, three pixelated characters stand ready: a muscular man with a gun arm, a sword-wielding figure, and a long-haired magician or caster. On the right, instead of the dragon-like creature, the zebra-headed deity with its human torso, four arms, snake legs, and angel wings is poised for battle. The game interface below shows options like 'Attack,' 'Magic,' 'Summon,' and 'Item' in a blue menu box. Above it, HP and MP bars display numerical values for each character's points. The backdrop is a dim, cavernous environment."
I enjoyed seeing the Zebraxas character in a classic RPG video game context, but the environment was too bleak and the boss didn't look strong or big enough. I changed the scenery to a forest temple, without the game interface components.
Now that I'm beginning to imagine the character in a game environment, it's almost time to start making music. But first I want a final, full-color mood board that I can hand off to game developers in the future. Here's the prompt I used:
"Mood board drawing inspiration from early 2000s RPG aesthetics, showcasing various interpretations of the zebra deity. The board is adorned with conceptual art, polygonal 3D models, and in-game sprites of the deity. Detailed sections zoom in on the zebra face, snake legs, and angelic wings. Dynamic poses illustrate the deity in action, using both the sword and whip. Supplementary elements include color samples, texture swatches, and design annotations to guide the game's creative direction."
This concludes the mood board phase of the process and we can move on to generating music for the character using AudioCipher and SoundGen.
Step 2: Using AudioCipher to create the musical theme
Once we've developed a character and some level design, it's time to start generating the melody for our video game music.
I prefer to start with a MIDI melody instead of chord progressions or instrumental arrangements. My philosophy stems from the concept of leitmotifs, which we've written about on the blog previously.
If writing a melody for the character comes naturally, a MIDI keyboard might be all you need. To make the composing process a bit more fun (and dare I say gamified), I used the AudioCipher text-to-MIDI VST instead.
You can load the AudioCipher VST directly in your DAW, enter the name of your character or world, and then drag that melody into your DAW. AudioCipher also offers a chord progression generator, so feel free to start with whatever makes sense to you.
Since Zebraxas is a boss, I switched from AudioCipher's default major key to a chromatic scale. This gave me a better chance of hitting notes that were of the ordinary key. Several other scales can convey a sinister tone, like locrian and phrygian, so I encourage you to experiment and see what works. Every mode has its own feeling.
After some tweaks to the rhythm and changing the octave for two MIDI notes, I applied a virtual instrument with Omnisphere and bounced the track as a WAV file.
Step 3: Using SoundGen AI melody-to-song with text
Once you've got a melody, you can start writing chords and designing the sound that best suits your character. However, if you want to use artificial intelligence to create an initial sketch, I recommend SoundGen AI.
SoundGen is an AI text-to-music generator by the founder of Audio Design Desk. Several other AI music tools offer text prompting, including MusicLM, Stable Audio, Riffusion, Splash Music, and Chirp by Suno.
The feature that makes SoundGen unique from these other apps is the ability to upload an audio file and transform it with text prompts. Here's an example:
Instead of simply describing a sound, I was able to upload the Zebraxas boss melody and experiment with different text prompts. I described the sound I was looking for. Then I clicked the upload button (highlighted in the interface screenshot above) to open Melody Mode (shown below):
I experimented with dozens of text prompts until I found a style that seemed like a good fit for my character. As a composer, this is something I would normally do within the DAW, but I found SoundGen to be more exploratory and less mentally taxing than trying to create a whole arrangement from scratch.
Some of the tracks I generated were eerie but not aggressive enough. Others had the boss energy but lacked a certain video game music aesthetic I was looking for. Eventually I settled on a glitchy chiptune boss theme that matched the retro game look that I had set up in my bood board.
Next, I'll show you how I brought the imagery and sound together in a more dynamic way with the help of Neural Frames.
Step 4: Using Neural Frames to animate video game music
This powerful text-to-video service lets you upload an audio file, use stem separation to isolate any instrument, and modulate the imagery based on audio transients. If that sounds complicated, let me put it another way. When one of the instruments in your mix makes a loud sound, the video changes more dramatically.
The screenshot above gives you a clear look at the editor. On the left half of the screen you'll see the text prompt I used, along with a series of negative prompts to omit things that I didn't want. I selected the balanced flicker style with no movement. On the right we have a preview of the video. My timeline marker is at the first frame. You might recognize the starting image from earlier in this article.
On the editor's timeline, the second row features that series of transients that I mentioned, located about halfway through the track. This is the snare sound from the video game music that SoundGen produced. I used the stem separator to isolate it and created an automation to modulates the imagery based on the snare.
You can watch the AI video game music video that I created with Neural Frames:
Step 5: Refine the music and create a polished track
Now that you've got all the raw inspiration from your audio and visual mood boards, the final step is to compose some real music. You know, like human music from your own brain.
I'm of the opinion that AI should be used as a creative collaborator and source of inspiration, rather than replacing humans in the composing process.
Return to the DAW and use your SoundGen audio as a reference track. Then use MIDI to reconstruct it and choose better virtual instruments. If you want to save time recreating the SoundGen song, try Samplab's audio-to-MIDI VST.
Samplab will perform stem separation on the SoundGen track and allow you to isolate individual instruments in the mix. So you can pull the snare out, for example, and convert all of the tonal content into MIDI notes on a piano roll. Check out the chord view for names and roman numerals of the progression.
Once the audio has been converted, simply drag that MIDI from Samplab to an empty MIDI track in your DAW. At that point you can start tweaking the MIDI, experimenting with new virtual instruments, and writing new parts around what you already have. Considering pulling in additional ideas from your sample manager to get some initial ideas.
That's as far as I'm going to take this tutorial. The rest is up to you as a musician. I hope you've found this workflow inspiring and entertaining. It's exciting to think about all of the new possibilities that lay before us with artificial intelligence.
We look forward to the day when brain-computer interfaces can take the ideas straight out of our head and turn them into beautiful, finished pieces. But for now, this modular approach feels like a step in the right direction.