A Q&A with AI regulator Ed Newton-Rex

Dom Aversano

In November last year, Ed Newton-Rex, the head of audio at Stability AI, left the company citing a small but significant difference in his philosophy towards training large language models (LLMs). Stability AI was one of several companies that responded to an invitation from the US Copyright Office for comments on generative AI and copyright, submitting an argument that training their models on copyrighted artistic works fell under the definition of fair use: a law which permits the use of copyrighted works for a limited number of purposes, one of which is education. This argument has been pushed by the AI industry more widely, who contest that much like a student who learns to compose music by studying renowned composers, their machine learning algorithms are conducting a similar learning process.

Newton-Rex did not buy the industry’s arguments, and while you can read his full arguments for resigning in his X/Twitter post, central to his argument was the following passage:

(…) since ‘fair use’ wasn’t designed with generative AI in mind — training generative AI models in this way is, to me, wrong. Companies worth billions of dollars are, without permission, training generative AI models on creators’ works, which are then being used to create new content that in many cases can compete with the original works. I don’t see how this can be acceptable in a society that has set up the economics of the creative arts such that creators rely on copyright.

It is important to make clear that Newton-Rex is not a critic of AI; he is an enthusiast who has worked in the machine learning field for more than a decade; his contention is narrowly focused on the ethics surrounding the training of AI models.

Newton-Rex’s response to this was to set up a non-profit called Fairly Trained, which awards certificates to AI companies whose training data they consider ethical.

Their mission statement contains the following passage:

There is a divide emerging between two types of generative AI companies: those who get the consent of training data providers, and those who don’t, claiming they have no legal obligation to do so.

In an attempt to gain a better understanding of Newton-Rex’s thinking on this subject, I conducted a Q&A by email. Perhaps the most revealing admission is that Newton-Rex desires to eliminate his company. What follows is the unedited text.

Do you think generative artificial intelligence is an accurate description of the technology Fairly Trained certifies?

Yes!

Having worked inside Stability AI and the machine learning community, can you provide a sense of the culture and the degree to which the companies consider artists’ concerns?

I certainly think generative AI companies are aware of and consider artists’ concerns. But I think we need to measure companies by their actions. In my view, if a company trains generative AI models on artists’ work without permission, in order to create a product that can compete with those artists, it doesn’t matter whether or not they’re considering artists’ concerns – through their actions, they’re exploiting artists.

Many LLM companies present a fair use argument that compares machine learning to a student learning. Could you describe why you disagree with this?

I think the fair use argument and the student learning arguments are different.

I don’t think generative AI training falls under the fair use copyright exception because one of the factors that is taken into account when assessing whether a copy is a fair use is the effect of the copy on the potential market for, and value of, the work that is copied. Generative AI involves copying during the training stage, and it’s clear that many generative AI models can and do compete with the work they’re trained on.

I don’t think we should treat machine learning the same as human learning for two reasons. First, AI scales in a way no human can: if you train an AI model on all the production music in the world, that model will be able to replace the demand for pretty much all of that music. No human can do this. Second, humans create within an implicit social contract – they know that people will learn from their work. This is priced in, and has been for hundreds of years. We don’t create work with the understanding that billion-dollar corporations will use it to build products that compete with us. This sits outside of the long-established social contract.

Do you think that legislators around the world are moving quickly enough to protect the rights of artists?

No. We need legislators to move faster. On current timetables, there is a serious risk that any solutions – such as enforcing existing copyright law, requiring companies to reveal their training data, etc. – will be too late, and these tools will be so widespread that it will be very hard to roll them back.

At Fairly Trained you provide a certification that signifies that a company trains their models on ‘data provided with the consent of its creators’. How do you acquire an accurate and transparent knowledge of the data each company is using?

They share their data with us confidentially.

For Fairly Trained to be successful it must earn people’s trust. What makes your organisation trustworthy?

We are a non-profit, and we have no financial backing from anyone on either side of this debate (or anyone at all, in fact). We have no hidden motives and no vested interests. I hope that makes us trustworthy.

If your ideal legislation existed, would a company like Fairly Trained be necessary?

No, Fairly Trained would not be necessary. I very much hope to be able to close it down one day!

To learn more about what you have read in this article you can visit the Fairly Trained website or Ed Newton-Rex’s website.

Dom Aversano is a British-American composer, percussionist, and writer. You can discover more of his work at the Liner Notes.

Back

Music in the browser or app?

Dom Aversano

As The Bard famously put it, ‘The app, or the browser, that is the question.’

At some point, your inspirational idea for digital music will have to travel from the platonic realm of your thoughts, into either an app or browser. Unless you can luxuriate in doing both, this represents a stark choice. The most appropriate choice depends on weighing up the advantages and disadvantages of both. The graphic above is designed to help categorise what you are creating, thereby providing a better sense of its ideal home.

The most traditional category is recorded music, as it predates the proliferation and miniaturisation of personal computing. In the 20th Century, radio transformed music, and then television transformed it again. In this regard, Spotify and YouTube are quite traditional, as the former imitates radio while the latter mimics TV. This might help explain why Spotify is almost entirely an app, sitting in the background like a radio, and YouTube is most commonly used in the browser, fixing your gaze as if it were a TV. Whether a person is likely to be tethered to a computer or walking around with a phone, may help in deciding between browsers and apps.

Turning to generative music, a successful example of this in the browser is Generative FM, created by Alex Bainter, which hosts more than 50 generative music compositions that you can easily dip into. It is funded by donations, as well as an online course on designing generative systems. The compositions are interesting, varied, and engaging, but as a platform it’s easy to tune out of it. This might be because we are not in the habit of listening to music in the browser without a visual component. The sustainability of this method is also questionable since, despite there still being a good number of daily listeners, the project appears to have been somewhat abandoned, with the last composition having been uploaded in 2021.

Perhaps Generative FM was more suited to an app form, and there are many examples of projects that have chosen this medium. Artists such as Bjork, Brian Eno, and Jean-Michel Jarre have released music as apps. There are obvious benefits to this, such as the fact that an app feels more like a thing than a web page, as well as the commitment that comes from installing an app, especially one you have paid for — in the case of Brian Eno’s generative Reflection app, it comes at the not inconsiderable costs £29.99.

Yet, more than a decade since Bjork released her app Biophilia, the medium is still exceedingly niche and struggling to become established. Bjork has not released any apps since Biophilia, which would have been time-consuming and expensive to create. Despite Bjork’s app not having beckoned in a new digital era for music, this may be a case of a false start rather than a nonstarter. As app building gets easier and more people learn to program, there may be a breakthrough artist who creates a new form of digital music that captures people’s imaginations.

To turn the attention to music-making, and music programming in particular, there is a much clearer migratory pattern. Javascript has allowed programming language to work seamlessly in the browser. In graphical languages, this has led to P5JS superseding Processing. In music programming languages Strudel looks likely to supersede TidalCycles. Of the many ways in which having a programming language in the browser is helpful, one of the greatest is that it allows group workshops to run much more smoothly, removing the tedium and delays caused by faulty software. If you have not yet tried Strudel, it’s worth having a go, as you can get started with music-making in minutes by running and editing some of its patches.

The final category of AI — or large language models — is the hardest to evaluate. Since there is massive investment in this technology, most of the major companies are building their software for both browsers and apps. Given the gold rush mentality, there is a strong incentive to get people to open up a browser and start using the software as quickly as possible. Suno is an example of this, where you can listen to music produced with it instantly. If you sign it only takes a couple of clicks and a prompt to generate a song. However, given the huge running costs of training LLMs, this culture of openness will likely reduce in the coming years, as the companies seek to recuperate their backers’ money.

The question of whether to build something for the browser or an app is not a simple one. As technology offers us increasingly large numbers of possibilities, it becomes more difficult to choose the ideal one. However, the benefit of this huge array of options is that we have the potential to invent new ways of creating and presenting music that may not yet have been imagined, whether that’s in an app or browser.

Please feel free to share your thoughts and insights on creating for the browser or apps in the comment section below!

Dom Aversano is a British-American composer, percussionist, and writer. You can discover more of his work at Liner Notes.

Back

Should fair use allow AI to be trained on copyrighted music?

Dom Aversano

This week the composer Ed Newton-Rex brought the ethics of AI into focus when he resigned from his role in the Audio team at Stability AI, citing a disagreement with the fair use argument used by his ex-employer to justify training its generative AI models on copyrighted works.

In a statement posted on Twitter/X he explained the reasons for his resignation.

For those unfamiliar with ‘fair use’, this claims that training an AI model on copyrighted works doesn’t infringe the copyright in those works, so it can be done without permission, and without payment. This is a position that is fairly standard across many of the large generative AI companies, and other big tech companies building these models — it’s far from a view that is unique to Stability. But it’s a position I disagree with.
I disagree because one of the factors affecting whether the act of copying is fair use, according to Congress, is “the effect of the use upon the potential market for or value of the copyrighted work”. Today’s generative AI models can clearly be used to create works that compete with the copyrighted works they are trained on. So I don’t see how using copyrighted works to train generative AI models of this nature can be considered fair use.

As Newton-Rex states, this is quite a standard argument made by companies using copyright material to train their AI. In fact, Stability AI recently submitted a 23-page document to the US Copyright Office arguing their case. Within it, they state they have trained their Stable Audio model on ‘800,000 recordings and corresponding songs’ going on to state.

These models analyze vast datasets to understand the relationships between words, concepts, and visual, textual or musical features ~ much like a student visiting a library or an art gallery. Models can then apply this knowledge to help a user produce new content, This learning process is known as training.

This highly anthropomorphised argument is at least very questionable. AI models are not like students for obvious reasons: they do not have a body, do not have emotions, and have no life experience. Furthermore, as Stability AI’s own document testifies, they do not learn in the same way that humans learn; if a student were to study 800,000 pieces of music over a ten-year period that would require analysing 219 different songs a day.

The contrast in how humans learn and think was highlighted by the American linguist and cognitive scientist Noam Chomsky in his critique of Large Language Models (LLMs).

The human mind is not, like ChatGPT and its ilk, a lumbering statistical engine for pattern matching, gorging on hundreds of terabytes of data and extrapolating the most likely conversational response or most probable answer to a scientific question. On the contrary, the human mind is a surprisingly efficient and even elegant system that operates with small amounts of information; it seeks not to infer brute correlations among data points but to create explanations.

A lot of this issue is further complicated by the language emerging from the AI community, which varies from anthropomorphic (‘co-pilot’) to deistic (‘godlike’) to apocalyptic (‘breakout scenarios’). Specifically with Stability AI, the company awkwardly evokes Abraham Lincon’s Gettysburg Address when writing on their website that they are creating ‘AI by the people for the people’ with the ambition of ‘building the foundation to activate humanity’s potential’.

While of course, they are materially different circumstances there is nevertheless a certain echo here of the civilising mission used to morally rationalise the economic rapaciousness of empire. To justify the permissionless use of copyrighted artwork on the basis of a mission to ‘activate humanity’s potential’ in a project ‘for the people’ is excessively moralistic and unconvincing. If Stability AI wants their project to be ‘by the people’ they should have artists explicitly opt-in before using their work, but the problem with this is that many will not, rendering the models perhaps not useless, but greatly less effective.

This point was underscored by venture capital fund Andreessen Horowitz who recently released a rather candid statement to this effect.

The bottom line is this: imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development.

Although in principle supportive of generative AI Newton-Rex does not ignore the economic realities behind the development of AI. In a statement that I will finish with, he succinctly and eloquently brings into focus the power imbalance at play and its potential destructiveness

Companies worth billions of dollars are, without permission, training generative AI models on creators’ works, which are then being used to create new content that in many cases can compete with the original works. I don’t see how this can be acceptable in a society that has set up the economics of the creative arts such that creators rely on copyright.

If you have an opinion you would like to share on this topic please feel free to comment below.

Dom Aversano is a British-American composer, percussionist, and writer. You can discover more of his work in his Substack publication, Liner Notes.

Back

Can AI help us make humane and imaginative music?

Dom Aversano

There is a spectrum upon which AI music software exists. On one end are programs which create entire compositions, and on the other are programs that help people create music. In this post I will focus on the latter part of the spectrum, and ask the question, can AI help us compose and produce music in humane and imaginative ways? I will explore this question through a few different AI music tools.

Tone Transfer / Google

For decades the dominance of keyboard interaction has constrained computer music. Keyboards elegantly arrange a large number of notes but limit the control of musical parameters beyond volume and duration. Furthermore, with the idiosyncratic arrangement of a keyboard’s notes, it is hard to work — or even think — outside of the 12-note chromatic scale. Even with the welcome addition of pitch modulation wheels and microtonal pressure-sensitive keyboards such as Roli’s fascinating Seaboard, keyboards still struggle to express the nuanced pitch and amplitude modulations quintessential to many musical cultures.

For this reason, Magenta’s Tone Transfer may represent a potentially revolutionary change in computer music interaction. It allows you to take a sound or melody from one instrument and transform it into a completely different-sounding instrument while preserving the subtleties and nuances of the original performance. A cello melody can be transformed into a trumpet melody, the sound of birdsong into fluttering flute sounds, or a sung melody converted into a number of traditional concert instruments. It feels like the antidote to autotune, a tool that captures the nuance, subtly, and humanity of the voice, while offering the potential to transform it into something quite different.

In practice, the technology falls short of its ambitions. I sang in a melody and transformed it into a flute sound, and while my singing ability is unlikely to threaten the reputation of Ella FitzGerald, the flute melody that emerged sounded like the flautist was drunk. However, given the pace at which machine learning is progressing, one can expect it to be much more sophisticated in the coming years, and I essentially regard this technology as an early prototype.

Google has admirably made the code open source and the musicians who helped train the machine learning algorithms are prominently credited for their work. You can listen to audio snippets of the machine learning process, and hear the instrument evolve in complexity after 1 hour, 3 hours, and 10 hours of learning.

It is not just Google developing this type of technology — groups like Harmonai and Neutone doing similar things and any one of them stands to transform computer music interaction, by anchoring us back into the most universal instrument, the human voice.

Mastering / LANDR

Although understanding how mastering works is relatively straightforward, understanding how a mastering engineer perceives music and uses their technology is far from simple since there is as much art as there is science to their craft. Therefore, is this a process that can be devolved to AI?

That is the assumption behind LANDR’s online mastering service which allows you to upload a finished track for mastering. Once it is processed, you are given the option to choose from three style settings (Warm, Balanced, Open) and three levels of loudness (Low, Medium, High), with a master/original toggle to compare the changes made.

I uploaded a recent composition to test it. The result was an improvement on the unmastered track, but the limited options to modify it gave the feeling of a one-size-fits-all approach, inadequate for those who intend to carefully shape their musical creations at every stage of production. However, this might not be an issue for people on lower-budget projects, or those who intend to simply and quickly improve their tracks for quick release.

In a desire to understand the AI technology I searched for more precise details, and while the company says that ‘AI isn’t just a buzzword for us’ I could only find a quote that does little to describe how the technology actually works.

Our legendary, patented mastering algorithm thoroughly analyzes tracks and customizes the processing to create results that sound incredible on any speaker.

While LANDR’s tool is useful for quick and cheap mastering, it feels constrained and artistically unrewarding if you want something more specific. The interface also feels like it limits the potential of the technology. Why not allow text prompts such as: “cut the low-end rumble, brighten the high end, and apply some subtle vintage reverb and limiting”.

Fastverb / Focusrite

Unlike mastering, reverb is an effect rather than a general skill or profession, making it potentially simpler to devolve aspects of it to AI. Focusrite’s Fastverb reverb effect uses AI to analyse your audio before prescribing certain settings for you based on this, which you can then go on to tweak. The company is vague about how their AI technology works, simply stating.

FAST Verb’s AI is trained on over half a million real samples, so you’ll never need to use presets again.

I use the plugin on a recent composition. The results were subtle but an improvement. I adjusted some of the settings and it sounded better. Overall, I had the impression of a tasteful reverb that would work with many styles of music.

Did the AI help significantly in arriving at the desired effect? It is hard to say. I would assume for someone with very limited experience using such tools, yes, but without someone confident with an effect, I doubt it saves much time at all.

I am aware however there is the potential for snobbery here. After all, if a podcaster can add a decent reverb to their show or a guitarist can add some presence to their recording easily, that’s no bad thing. They can if they want go on to learn more about these effects and fine-tune them themselves. For this reason purpose, it represents a useful tool.

Overview

LANDR’s Mastering service and Focusrite’s Fastverb are professional tools that I hope readers of this article will be tempted to try. However, while there is clearly automation at work, how the AI technology works is unclear. If the term AI is used to market tools, there should be clarification of what exactly it is — otherwise one might as well just write ‘digital magic’. By contrast, Google’s Tone Transfer have made their code open source, as well as describing in detail how they use machine learning, and the people involved in training the models.

I expect that the tools that attempt to speed up or improve existing processes, such as mastering and applying reverb, will have the effect of lowering the barrier to entry into audio engineering, but I have yet to see evidence it will improve it. In fact, it could degrade and homogenise audio engineering by encouraging people to work faster but with less skill and care.

By contrast, the machine learning algorithms that Googe, Harmonai, Neutone, and others are working on, could create meaningful change. They are not mature technologies, but there is the seed of something profound in them. The ability to completely transform the sounds of music while preserving the performance and the potential to bring the voice to the forefront of computer music could prove to be genuinely revolutionary.

Back

The Top 5 AI Tools for Music Production

Jean-Baptiste Thiebaut

As music production continues to evolve, Artificial Intelligence (AI) has played a significant role in changing the landscape, especially in the last few years. AI tools have brought a new level of creativity, convenience, and efficiency to the production process. In this blog post, we will take a look at the top 5 AI tools for music production.

Masterchannel is an innovative platform that utilizes AI technology to revolutionize the music production industry by providing a fast, affordable, and high-quality solution for mastering. What sets Masterchannel apart is its ability to produce high-quality masters that rival those of a human mastering engineer. The platform achieves this by replicating the techniques and processes used by experienced mastering engineers.

Masterchannel’s reputation as a top-quality mastering tool is evident from the fact that it has been used by GRAMMY award-winning artists. The platform’s AI-powered algorithms require minimal effort, making it an ideal choice for both beginner and experienced music producers seeking professional-sounding masters tailored to their needs.

Masterchannel offers an unbeatable value proposition by providing unlimited mastering for a low price, making it an affordable option for music producers who want top-quality results in just a few minutes. As an added bonus, Music Hackspace has managed to secure a discount code for users. Simply enter MUSIC_HACKSPACE_20 when joining Masterchannel’s unlimited tier!

AudioShake can take any song, even if it was never multi-tracked, and break it down into individual stems, creating endless possibilities for new uses of the music in instrumentals, samples, remixes, and mash-ups. This opens up a whole new world of creativity for artists who might have otherwise been limited by the availability of tracks.

The key feature of AudioShake is its ability to recognize and isolate different components in a piece of audio. For example, if you have a rock song with drums, guitars, and vocals, AudioShake’s A.I. can identify each component and separate them into their own tracks. This means that you can use the isolated tracks for new purposes like sampling, sync licensing, re-mixing, and more.

In addition to its stem separation capabilities, AudioShake can also be used for re-mastering and to remove bleed from a multi-tracked live recording. This makes it a versatile tool for music producers and sound engineers who are looking to enhance the quality of their recordings.

Beatoven.ai is a music composition tool that empowers content creators to craft unique soundtracks for their projects. With just a few clicks, users can specify their desired genre, duration, tempo, and mood, and Beatoven.ai will generate a track that perfectly fits their needs.

But that’s not all – the platform also provides a range of customization options to fine-tune the music to your liking. From adjusting the volume and instrument selection/removal to adding multiple moods and emotions, Beatoven.ai gives you complete control over your music. Once you’ve created your perfect track, downloading it is a breeze. Plus, you’ll receive an instant license via email, so you can use your new music with confidence.

With Beatoven.ai, you’ll never have to worry about copyright issues or spending hours searching for the right music again. The platform’s fast, easy, and intuitive interface makes music composition accessible to everyone, regardless of their musical background.

Synthesizer V is a virtual singer software that leverages the power of AI to produce high-quality vocal tracks that sound natural and life-like. By utilizing advanced deep learning algorithms, Synthesizer V can analyze voice samples and generate realistic vocal performances with remarkable accuracy.

One of the standout features of Synthesizer V is its comprehensive toolkit that enables users to fine-tune and control various aspects of the vocal track. With built-in tools for pitch correction, expression control, and tuning, music producers have everything they need to create stunning vocal performances that are tailored to their specific needs. It also has an extensive range of customization options, allowing users to experiment with different tones, styles, and vocal characteristics. Whether you’re looking for a soulful, emotional performance or a powerful, energetic vocal track, Synthesizer V has got you covered.

Overall, Synthesizer V is an essential tool for any musician, producer, or content creator looking to produce high-quality vocal tracks with ease and precision. Its intuitive interface, powerful features, and unparalleled accuracy make it a must-have for anyone looking to take their music production to the next level.

Musiio’s AI-powered solutions are designed to provide music lovers with a more personalized and enjoyable music listening experience. Their flagship product is a content recommendation system that utilizes machine learning to analyze music and generate accurate, relevant tags and metadata. This ensures that music companies and streaming services can efficiently organize and categorize their vast music libraries, making it easier for listeners to discover new music and artists they love.

In addition to content recommendation, Musiio offers a wide range of other AI-powered solutions for music companies and other customers, including content moderation, content identification, and copyright protection. These solutions help music companies to streamline their operations and reduce manual effort, while also ensuring that they comply with copyright laws and regulations.

Conclusion
AI has brought significant advancements to the music industry, enhancing the production process overall. The top 5 AI tools for music production we have highlighted in this blog post – Masterchannel, AudioShake, Beatoven.ai, Synthesizer V, and Musiio – have all made significant contributions to the industry, improving efficiency, creativity, and convenience. As AI technology continues to evolve, we can expect even more exciting developments in the future that will further enhance the music production process and the overall music listening experience.

Interested in learning more about AI in music technology? Check out one of our courses!

Back

What is Music Hacking? We’ve asked ChatGPT

Jean-Baptiste Thiebaut

Much is said about OpenAI’s ChatGPT tool, the conversational AI search engine. You can ask it to write an FM synthesiser embedded in a web page, or a conversation between the Dalai Lama and Donal Trump.

We were curious about its understanding of Music Hacking. We were not disappointed by the result!

“Music hacking is the practice of using technology to manipulate or create music in innovative ways. This can include using software to remix existing songs, creating new music from scratch using synthesizers and other electronic instruments, or even using unconventional devices such as hacked game controllers to make music.

One of the key principles of music hacking is the idea of open source and collaboration. Many music hackers share their creations and techniques with others, and often work together to build upon each other’s ideas. This has led to the creation of a vibrant online community of music hackers, who share their work and provide feedback and support to each other.

One of the most popular tools for music hacking is the digital audio workstation (DAW), which allows users to record, edit, and produce music using a computer. DAWs often come with a wide range of virtual instruments and effects, making them ideal for creating electronic music. Another popular technique in music hacking is the use of chiptunes, which are songs created using the sound chips of old video game consoles. By using emulators or actual hardware, music hackers can create nostalgic, 8-bit style tunes that have become popular in their own right.

Music hacking has also been used to create experimental and avant-garde music. By using unconventional methods and technologies, music hackers are able to push the boundaries of what is considered “music” and create unique and interesting sounds.

Overall, music hacking is a fascinating and growing field that is constantly evolving. Whether you’re a seasoned musician looking to try something new, or just a curious technologist, there are many ways to get involved and explore the world of music hacking. So, if you have a passion for music and technology, why not give it a try?”

A web-based synth

write javascript code for an FM synthesiser inside an html page

The initial response was missing the buttons to start/stop the synth, and the controls to change the modulation and frequency, so we asked :

write fully functional javascript code for an fm synthesiser inside an html page

What’s amazing is that it picked up the new requirements and added the UI code. We then proceeded to add a MIDI keyboard, nice buttons, and here’s the result!

Back