Adaptive music goes back more than forty years, to classic arcade titles like Frogger and Space Invaders. These early game soundtracks seem crude by today's standards, but they paved the way for a whole new approach to audio.
Game music will be undergoing another round of improvements in the near future, as machine learning and generative algorithms become a core part of how games are composed and delivered. Twitch streamers and their audience will benefit from these advancements as well.
In this article we'll define adaptive music and dynamic game audio, outlining the evolution of its underlying technology. I'll highlight three companies who have recently leveraged machine learning to improve on existing adaptive systems.
We'll also touch on AI-enhanced biofeedback systems that have been developed in academic research circles, measuring and adapting to gamer's emotions based on data from physical sensors and in-game behavior. Biometrics are going to close the gap between gameplay, music, and our emotional state.
Table of Contents
What is the role of adaptive music in video games?
Music is considered adaptive when it responds dynamically to player interactions. For example, when Mario picks up a super star and becomes invincible, the game's soundtrack speeds up. Players understand the need to hurry up before the star-power runs out. Mario's super star music was such a hit that it has become a core part of the game's sonic branding.
These kind of adaptations elevate the soundtrack from something that plays in the background to something interactive, functional, and symbolic. When a soundtrack adapts to game mechanics, it makes the whole experience more immersive. Of course, the more immersed we become in a game, the longer we play it and the more attached we become to it.
Differences in vertical and horizontal adaptive music
It's important to understand the distinction between vertical and horizontal approaches to audio in adaptive music. The video below offers an overview of how they differ:
Vertical adaptation refers to composing and layering multiple tracks, each with their own unique musical elements, and then mixing them in real-time based on game states. For example, a game might introduce more intense layers a during combat sequence or strip back layers of audio during quiet exploration moments. The vertical technique allows for a continuously evolving musical texture that reflects the gameplay dynamics.
Horizontal adaptation, on the other hand, involves transitioning between different pieces or sections of music based on game events. The iMUSE examples we shared earlier would fall into the horizontal category. Games shift from a high-energy battle theme to a calm or triumphant track to signal an important change in the game. This technique requires well-designed transition points to ensure smooth musical shifts.
Early software innovations in game music
The 1978 game Space Invaders offered one of the first and most compelling examples of adaptive music. A simple four-note bass motif played in the background, accompanied by sparse sound effects. As the enemies approached, the bass line's tempo sped up gradually until the fast pace melody suggested a serious threat.
Despite the simplicity of this audio dynamic, changes to musical tempo have remained a central device in adaptive music to this day.
Just a few years after the release of Space Invaders, Dave Smith invented the MIDI format and revolutionized digital music creation. The new format allowed for the programmatic control of musical elements. MIDI went public at NAMM in 1983 and within another five years, found its way into game music.
Procedural music generation in early games
Generative music games have existed since at least the mid-1980s. Special rules are programmed into the game audio engine, dictating the kind of music that it outputs. These rules are based on algorithms that may or may not be related to gameplay itself. Procedural music isn't always used for adaptive purposes.
The 1983 Commodore 64 game Moondust is considered the first example of generative music in an adaptive context. As players spread "moonjuice" across the game's pallet, the ambient musical score transforms and adapts accordingly.
Another popular title from 1985, Ballblazer, used a procedural algorithm to generate its theme Song of the Grid. It didn't adapt to player activity though. The lead melody was based on a set of 32 eight-note melodic riffs that were assembled randomly. Several other parameters influenced the music, including tempo, volume, and decisions to omit notes for a rhythmic break.
Csound and Pure Data were two of the most influential software environments for electronic music and sound design in games. Csound lets users create scripts, known as orchestras and score files, to define the audio synthesis processes and control data.
Pure Data, on the other hand, uses a graphical programming environment, where users build audio processing chains using a node-based interface. This visual approach is closer to Max for Live and tends to be more intuitive for individuals new to audio programming.
These legacy frameworks have been improved upon over several decades, culminating in DSP systems used by today's biggest game developers.
King's Quest IV & Space Quest III: The Roland MT-32
The 1988 game King's Quest IV supported the Roland MT-32 module, offering a higher quality of music playback compared to the PC speaker or Adlib sound cards. Players could now experience a rich, orchestral sound due to the higher-quality MIDI synthesis these devices provided. Composer William Goldstein
King's Quest IV wasn't adaptive in the modern sense, but different MIDI tracks were triggered based on the player's location within the game. This created a more immersive and emotionally resonant experience where the music could reflect the mood or theme of the current gameplay scenario.
Several other games from this period explored MIDI compositions powered by the MT-32. Space Quest III is a notable example that won the "Excellence in Musical Achievement" Award from Computer Gaming World in 1989. In contrast to the orchestral arrangements on King's Quest, Space Quest III explored a collection of more modern, sci-fi instrument aesthetics.
LucasArts iMuse: Interactive music streaming engine
A big leap in MIDI script technology took place during the early 1990's, with the implementation of LucasArts' iMUSE adaptive music system. iMuse allowed for complex musical interactivity in games and made transitions feel natural by keeping them in tempo, regardless of player actions.
Monkey Island 2: LeChuck's Revenge (1991)
The video above showcases how the iMUSE adaptive music system was implemented by the 1991 title Monkey Island 2.
MI2 players would hear a new variation on the soundtrack as they walked from one location to the next. To accomplish this effect, themes were written in the same tempo and employed different melodies or arrangements to indicate the change in environment.
The magic of iMUSE was its ability to wait until the beginning of the next measure to start the new MIDI track. This led to a sense of seamless continuity, in contrast to the abrupt stop-and-start of prior game soundtracks.
Indiana Jones and the Fate of Atlantis (1992)
Indiana Jones and the Fate of Atlantis was another LucasArts game celebrated for its skillful use of iMUSE transitions and sound effects. The MIDI compositions were complex and high quality, aligning with expectations set by the original John Williams film score. As the video above shows, music was changing constantly to adapt to the character's environment and events in the game narrative.
TIE Fighter (1994) - Transitions from battle music to success themes
TIE Fighter (1994) is considered one of the most advanced demonstrations of adaptive music by iMUSE. Take a look at this video and notice how the soundtrack transitions seamlessly to a triumphant theme after the player defeats the enemy. The unpredictable timing of the victory and complexity of the score required special preparation.
In the interest of brevity, I'll be moving on to the next phases of adaptive game audio history. Readers who want to explore iMUSE further can see the walkthrough videos here: Day of the Tentacle (1993), Sam & Max Hit the Road (1993), The Dig (1995), and Full Throttle (1995).
FMOD and WWISE: Real-time modulation and DSP
Real-time modulation and Digital Signal Processing (DSP) started influencing video game music in the mid to late 1990s, particularly with the release of FMOD in 1995. DSP allowed for real-time audio effects and processing, while real-time modulation enabled dynamic control over musical parameters.
FMOD was followed a decade later by AudioKinetic's WWISE in 2006 and these two systems have since become the most popular audio middleware solutions in the game industry. They help with the management of audio resources, streaming, and playback, making it easier for developers to work with complex audio tasks.
Unreal 5: Creating adaptive music with MetaSounds
Today's indie game developers tend to create their games in Unity and Unreal Engine. They lack the resources to build proprietary engines like CryTek did with CryEngine. To meet the growing demand for rich, adaptive game music, Unreal rolled out a high-level node-based system for DSP called MetaSounds.
MetaSounds allows for the manipulation and generation of audio in a real-time, sample-accurate, and modular way. MetaSounds offers a significant level of control over audio processing, much like a fully-featured digital audio workstation (DAW) integrated within the Unreal Engine, enabling the creation of complex, dynamic, and interactive audio systems for games and other interactive media.
Unity: Generative music tools in the audio asset store
Unity's audio asset store features a broad collection of procedural music generation resources. They are developed and sold by third party companies, which leads to a healthy, competitive software landscape. We'll share a few examples here for your reference.
The Ambient Sounds system provides an easy solution for building Interactive Soundscapes. Developers can organize their tracks and effects into sequences, using the event and modifier systems to control which sounds are played without writing any code. It includes a library of professionally composed and produced music and sfx, reducing some of the pressure to source high-quality, royalty-free audio.
Several generalist tools are available, including the Dynamic Music System, Generative Music Lab and Procedural Music Generator. Some developers have focused on solving the need for specific genres of adaptive music. Check out Adaptive Fantasy Music for orchestral boss themes, Scary Alley for adaptive horror themes, and Authentic Medieval music. There are dozens of resources like this.
AI and the next generation of adaptive music
Now that we've covered some of the adaptive music fundamentals, let's move on to present day examples that leverage AI and machine learning.
Infinite Album delivers adaptive music with Overwolf
Infinite Album is an adaptive music system that allows users to customize their game experience by selecting the "Vibe" that they want to hear. These Vibes are infinite songs that users can customize and save into a personal cloud library. The Vibes are currently based on "style" and "emotion" parameters.
The demo below outlines some of the application's most impressive features.
In-house instruments, proprietary & open source AI models, and licensing
All of the music generated by Infinite Album has been created from instruments hand-made in house, generated with AI and licensed musical audio, or directly licensed from commercial instrument libraries.
Infinite Album use its own proprietary AI models, along with open source music models. Both are trained on public domain material, general music theory, and the artistry of their AI music engineers, who are all musicians. The team has also started signing deals with recording artists to create infinite amounts of music in their style. They train models on their instruments, compositions, and playing style to achieve this.
Streaming Infinite Album through Overwolf and OBS
Infinite Album hooks into Overwolf, bypassing the need to partner directly with game developers. Overwolf listens for in-game events (kills, deaths, victories, etc) and exposes them to Infinite Album, allowing the system to generate AI music in real time, as a response to the player's immediate experience.
Streamers can also connect Infinite Album to programs like OBS Studio, routing their gameplay, music and sfx into streaming platforms like Twitch and Youtube.
Creating Vibes from style and emotion parameters
When users create a style for their Vibe, they also configure an emotional state. Behind the scenes, these emotions are guided by a machine learning concept called the valence-arousal axis. On the edge of the circle you can find key-point emotions: they move clockwise from "tired" and "sad", to "angry" and "tense", then "excited" and "happy", and lastly on "tender" and "calm".
If the user is playing a supported game, they can choose a game mapping. This determines how the music reacts to game events in real time. Infinite Album provides default mappings, like “Kill or Be Killed,” which makes the music do things like get sad when the player’s character dies. Multiple mappings can be active at once. Simply toggle the mapping on or off as shown below:
Advanced configuration settings allow users to map game events to changes in style and emotion. This includes the ability to control how long the change lasts. In the example shown below, the music will change to Epic Rock style of music for 20 seconds while the player is dominating in Leage of Legends. You can see how the parameters are set up for that purpose:
The generative music system doesn't consume a lot of RAM or CPU. This is an important differentiator from generative audio systems that are extremely memory intensive. It means that gaming rigs won't suffer from lag while running Infinite Album.
Giving the audience control with Twitch Bits & Channel Points
Twitch streamers can choose how their audience interacts with musical styles, emotions, instruments, and sound effects on demand. The interactions will take effect immediately on the music being played and everyone on the stream will hear it.
Options in the settings interface include which musical interactions to make available to viewers, how long the interactions will take place, and how much it will cost in either Twitch Bits (Twitch’s virtual “currency”) or Twitch Channel Points.
Reactional Music: Generative and adaptive middleware
Reactional Music has a big presence in the generative and adaptive music space. Where Infinite Album hooks into game events from the Overwolf system, Reactional's middleware is used directly by game composers, in partnership with their fellow programmers and creative directors.
As you'll see in the demo below, Reactional adapts to in-game events, but can also send commands that modify the events of the game itself. A mid-air leap might be slowed down for dramatic effect, to align the player's ground-impact with the first beat of the new measure.
Reactional scores can be programmed to transform under any circumstance. For example, if a player approaches a new space they might here the tempo and music change. Stingers are tuned perfectly with the game's soundtrack. In the past, this required careful coordination between the composer and sound design team, but not anymore.
Earlier this year, Reactional announced a partnership with Hipgnosis and a second partnership with APM, two of the biggest production music sync licensing libraries in the world. Game studios are now able to create custom playlists and serve them to players ala cart, as in-game purchases.
Instead of simply dropping licensed music into a game by itself, Reactional uses machine learning to analyze any song. It can accurately detect properties like tempo and key signature. But what does it do with that information, you ask?
The game's composers create vertical layers of sound (stems in a DAW) and pass that final mix into Reactional, where they can apply dynamic rules to blend each instrument layer with the licensed music selected by the player.
The game’s composers create MIDI tracks that are then transformed into a proprietary format that makes every gesture relative instead of hard coded. Reactional Music can then apply dynamic rules to adapt that music, note by note, to game actions or to synchronize with the licensed music selected by the player.
It is a generative music engine, but not a randomizer - it actually takes composed music and adapts this to fit game actions and/or commercial music chosen by the gamer. It does all of this while coordinating dynamic changes in relation to the video game events. Mind-blowing.
In August 2023, the leading games publisher Amanotes (100 million monthly active users) announced a global partnership with Reactional. They seem to be positioned as the next major music delivery platform and personalization engine, deepening the connection between licensed music, generative music and adaptive game audio developers.
Warpsound's text-to-music API for Twitch Streams
The last example we'll provide comes from WarpSound. Their generative music API was announced earlier this year and is slated to integrate with Twitch. Its adaptive mostly in the horizontal sense, blending one style of music with the next according to text prompt commands from the streamer. The use of GPT touches on the AI music theme of this article, so we had to include it.
ChatGPT is just one of several inputs that WarpSound supports. In the future we can imagine the system will be voice-actuated, delivering a hands-free method of changing a game's soundtrack.
Advanced adaptive music models powered by ML
Alongside these impressive business use cases, there are numerous independent research teams developing models without productization. A recent example of this came from IEEE senior member Giorgio Yannakakis, in a paper titled Affective Game Computing. A diagram of the concept is shown below:
They describe the affective gaming loop in five main stages:
Elicitation: Players interact with the game and feel something emotionally.
Sensing: The game device uses physical sensors like eye-tracking (in VR headsets) and skin conductance to collect player data.
Annotation: In-game behavior can also be recorded and labeled based on the emotional implications. For example, running around haphazardly implies a frantic state, compared to more tactical performance.
Detection: The multimodal data collected from sensing and annotating behavior is used to infer the emotional state of the player.
Adaptation: The music adapts based on the emotional states detected by the model.
The screenshot below demonstrates this system applied to a game called Apex of Fear. As the dashboard indicates, there's a sweet spot of fear that developers are looking to target. This will maximize the player's enjoyment, since they're playing the game to feel a thrill (but don't want to feel nauseous).
Analytics systems like this affective game computing model could eventually become an advanced source of information for game audio middleware like Reactional and Infinite Album. Instead of relying on in-game events alone, audio teams could leverage biometrics and target ideal emotional states based on a particular scene in their game.
It's exciting to think about all of the possibilities here and there's no question that video games will continue to evolve rapidly in the coming decade.