Can AI help us make humane and imaginative music?

Dom Aversano

There is a spectrum upon which AI music software exists. On one end are programs which create entire compositions, and on the other are programs that help people create music. In this post I will focus on the latter part of the spectrum, and ask the question, can AI help us compose and produce music in humane and imaginative ways? I will explore this question through a few different AI music tools.

Tone Transfer / Google

For decades the dominance of keyboard interaction has constrained computer music. Keyboards elegantly arrange a large number of notes but limit the control of musical parameters beyond volume and duration. Furthermore, with the idiosyncratic arrangement of a keyboard’s notes, it is hard to work — or even think — outside of the 12-note chromatic scale. Even with the welcome addition of pitch modulation wheels and microtonal pressure-sensitive keyboards such as Roli’s fascinating Seaboard, keyboards still struggle to express the nuanced pitch and amplitude modulations quintessential to many musical cultures.

For this reason, Magenta’s Tone Transfer may represent a potentially revolutionary change in computer music interaction. It allows you to take a sound or melody from one instrument and transform it into a completely different-sounding instrument while preserving the subtleties and nuances of the original performance. A cello melody can be transformed into a trumpet melody, the sound of birdsong into fluttering flute sounds, or a sung melody converted into a number of traditional concert instruments. It feels like the antidote to autotune, a tool that captures the nuance, subtly, and humanity of the voice, while offering the potential to transform it into something quite different.

In practice, the technology falls short of its ambitions. I sang in a melody and transformed it into a flute sound, and while my singing ability is unlikely to threaten the reputation of Ella FitzGerald, the flute melody that emerged sounded like the flautist was drunk. However, given the pace at which machine learning is progressing, one can expect it to be much more sophisticated in the coming years, and I essentially regard this technology as an early prototype.

Google has admirably made the code open source and the musicians who helped train the machine learning algorithms are prominently credited for their work. You can listen to audio snippets of the machine learning process, and hear the instrument evolve in complexity after 1 hour, 3 hours, and 10 hours of learning.

It is not just Google developing this type of technology — groups like Harmonai and Neutone doing similar things and any one of them stands to transform computer music interaction, by anchoring us back into the most universal instrument, the human voice.

Mastering / LANDR

Although understanding how mastering works is relatively straightforward, understanding how a mastering engineer perceives music and uses their technology is far from simple since there is as much art as there is science to their craft. Therefore, is this a process that can be devolved to AI?

That is the assumption behind LANDR’s online mastering service which allows you to upload a finished track for mastering. Once it is processed, you are given the option to choose from three style settings (Warm, Balanced, Open) and three levels of loudness (Low, Medium, High), with a master/original toggle to compare the changes made.

I uploaded a recent composition to test it. The result was an improvement on the unmastered track, but the limited options to modify it gave the feeling of a one-size-fits-all approach, inadequate for those who intend to carefully shape their musical creations at every stage of production. However, this might not be an issue for people on lower-budget projects, or those who intend to simply and quickly improve their tracks for quick release.

In a desire to understand the AI technology I searched for more precise details, and while the company says that ‘AI isn’t just a buzzword for us’ I could only find a quote that does little to describe how the technology actually works.

Our legendary, patented mastering algorithm thoroughly analyzes tracks and customizes the processing to create results that sound incredible on any speaker.

While LANDR’s tool is useful for quick and cheap mastering, it feels constrained and artistically unrewarding if you want something more specific. The interface also feels like it limits the potential of the technology. Why not allow text prompts such as: “cut the low-end rumble, brighten the high end, and apply some subtle vintage reverb and limiting”.

Fastverb / Focusrite

Unlike mastering, reverb is an effect rather than a general skill or profession, making it potentially simpler to devolve aspects of it to AI. Focusrite’s Fastverb reverb effect uses AI to analyse your audio before prescribing certain settings for you based on this, which you can then go on to tweak. The company is vague about how their AI technology works, simply stating.

FAST Verb’s AI is trained on over half a million real samples, so you’ll never need to use presets again.

I use the plugin on a recent composition. The results were subtle but an improvement. I adjusted some of the settings and it sounded better. Overall, I had the impression of a tasteful reverb that would work with many styles of music.

Did the AI help significantly in arriving at the desired effect? It is hard to say. I would assume for someone with very limited experience using such tools, yes, but without someone confident with an effect, I doubt it saves much time at all.

I am aware however there is the potential for snobbery here. After all, if a podcaster can add a decent reverb to their show or a guitarist can add some presence to their recording easily, that’s no bad thing. They can if they want go on to learn more about these effects and fine-tune them themselves. For this reason purpose, it represents a useful tool.

Overview

LANDR’s Mastering service and Focusrite’s Fastverb are professional tools that I hope readers of this article will be tempted to try. However, while there is clearly automation at work, how the AI technology works is unclear. If the term AI is used to market tools, there should be clarification of what exactly it is — otherwise one might as well just write ‘digital magic’. By contrast, Google’s Tone Transfer have made their code open source, as well as describing in detail how they use machine learning, and the people involved in training the models.

I expect that the tools that attempt to speed up or improve existing processes, such as mastering and applying reverb, will have the effect of lowering the barrier to entry into audio engineering, but I have yet to see evidence it will improve it. In fact, it could degrade and homogenise audio engineering by encouraging people to work faster but with less skill and care.

By contrast, the machine learning algorithms that Googe, Harmonai, Neutone, and others are working on, could create meaningful change. They are not mature technologies, but there is the seed of something profound in them. The ability to completely transform the sounds of music while preserving the performance and the potential to bring the voice to the forefront of computer music could prove to be genuinely revolutionary.

About
Privacy