+
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Who's In This Podcast
Helen Todd is co-founder and CEO of Sociality Squared and the human behind Creativity Squared.
Zach Evans Cover Art for Creativity Squared
Zach Evans is the head of Harmonai, a Stability AI research lab and online community dedicated to creating open-source generative audio models and pushing forward creative uses of generative A.I. in music production.

Ep12. Zach Evans: Stability AI’s Harmonai on Open Source Generative A.I. Music

Up Next
Ep13. ArtsWave "Truth & Healing" Part 1
Zach Evans Cover Art for Creativity Squared

Ep12. Stability AI & Open Source Generative A.I. Music: Discover How Harmonai is Enabling Artistic Freedom and Revolutionizing Music Production with Zach Evans, the Head of Harmonai  

On the latest episode of Creativity Squared, Harmonai founder, Zach Evans, walks us through the fascinating evolution of A.I. for music production via his own journey going from Microsoft developer, to amateur EDM musician, to now leading a team of artists and programmers working to develop the next generation of A.I. models for music. 

Harmonai is a community-led lab “by artists and for artists” under the umbrella of Stability AI. Stability AI is a company that offers a range of open-source A.I. products, such as their well-known Stable Diffusion text-to-image generator. 

Through the episode, Zach touches on the importance of the open-source development community, the biggest challenges in improving A.I. music models, and his goal of developing A.I. technology that empowers artists. 

“I think that it’s a very powerful technology. It’s almost like a nuclear bomb, and I think that it’s our job to try to turn that into nuclear energy. Find a way to take the power of this technology and really support artists and be a strong but beneficial force in the music scene.”

Zach Evans

Harmonai

Some of us got into making sourdough bread, and others filled their time welcoming a new puppy in their home, but when social restrictions were implemented early in the Covid-19 pandemic, Zach Evans started making electronic dance music (EDM). As a frequent raver prior to the pandemic, Zach was already an enthusiast. 

In an effort to accelerate his learning, Zach started actively participating in a few communities of EDM producers on Discord, where members would swap tips, share their tracks, and solicit feedback. Through these communities, Zach got the opportunity to interact with some of the producers he’d already admired, such as Kill The Noise. In December 2020, Zach was co-hosting a Twitch stream with Kill the Noise, where the producer was experimenting with one of the first A.I. music projects, Jukebox from OpenAI. 

“His conclusion after a few hours was, ‘This is cool, this is neat, but not really up to the standard of quality for me to use it in my music as a tool.’ And I was like, well, that’s a thing to have as a goal, that’s really neat! And so I took that opportunity to really start diving into machine learning.”

Zach Evans

As a software developer at Microsoft, Zach had dabbled in machine learning before, but the potential applications for music provided a new passion to delve further. Just like when he wanted to level up his music-making, Zach tried to find the “movers and shakers” in the A.I. music scene. That’s how he first encountered Dadabots, a death metal duo from Boston. They caught Zach’s attention with a Youtube livestream they started in 2019 that uses a neural network to produce endless death metal (it’s still going to this day). Dadabots weren’t exactly machine learning scientists at that point, but their experience working with A.I. in music production put them in a position to be advising PhDs. Realizing he could make an impact on the space without an extensive background in machine learning, Zach accepted an invitation to mingle with the larger machine learning community at the NeurIps conference. 

Following the conference, he tried to find online communities for people interested in making music with A.I., only to find that none really existed. 

“I realized, alright, I’m gonna have to go work for OpenAI or Google, or go get a PhD to do this stuff. So that kind of waned. I thought, ‘That’s unfortunate, but I’ll keep my job working at Microsoft, I’ll keep working on music, this isn’t the time. [The A.I. music space] is still a little bit of an ivory tower.”

Zach Evans

In 2021, Zach performed his first and only DJ set to date, opening for Au5, among others. Attending shows and getting backstage with artists, Zach grew his network. Meanwhile, Zach started participating in communities such as Eleuther A.I., where members were sharing their experiments with earlier iterations of text-to-image generators available on Google’s Colab platform. Colab, short for Colaboratory, is a service that offers developers an environment on their own computer where they can run machine learning code using Google’s massive computing infrastructure for a fee. The service has made the machine learning space much more accessible by allowing developers to run code without having to invest thousands to buy their own graphics processing unit. GPUs are the machines purpose-built to execute complex code and the same machines that fill the massive data centers that power all of our cloud-based programs. 

Using a Colab notebook for an image generator called Disco Diffusion that could produce an infinite zoom effect, Zach started making audio-reactive music videos for some of his producer friends that lacked the resources for hiring videographers or digital animators. From there, he got himself onto the development team for Disco Diffusion. 

“The whole time this is still just me messing around in Colabs, making some fun art stuff. And I think a really important part about that was the community. I was just seeing people making these cool, innovative changes. And it was all just independent nerds, excited, usually heavy ADHD, technical programmers and artists who were just like, ‘hey, I wrote a couple lines of code, I’m not really much of a programmer, but it did this.’ And, it’s like, that’s groundbreaking!” 

Zach Evans

It was through his participation in the A.I. image generation communities that Zach was introduced to the founder of Stability AI, Emad Mostaque. At the time, Mostaque was supporting A.I. research by offering access to expensive, high-powered GPUs. Zach was deep in the machine learning community at that point, even receiving some mentorship for optimizing his models from Katherine Crowson,  who he calls the “Oracle” of the space. Eventually, Zach had the “glass-shattering, world-breaking” realization that his tweaking and fine-tuning in Colab notebooks was actually cutting-edge research. 

After a conversation with Mostaque, Zach got the support he needed to leave his job at Microsoft, as well as a directive from Mostaque to go out and build a community of artists and developers that could help them apply the technology they were using for image generation to generate music instead. And so, Harmonai was born. 

Dance Diffusion

Harmonai’s most significant contribution to A.I. for music is their Dance Diffusion model. Dance Diffusion can generate new variations of music that it’s been trained on. Enter Jonathan Mann, who holds the Guinness World Record for writing and recording a song every day for over 5,000 consecutive days. Mann was looking to join the young Midjourney server on Discord, so he wrote a song about it, posted it to YouTube, and got the invite. Zach was a member of the server already since some of the people he’d worked with on Disco Diffusion had joined Midjourney. 

Comparing himself to Danny Ocean in Ocean’s 11, Zach enjoyed collaborating with Mann who reintroduced Zach to Dadabots. Zach ended up recruiting CJ Carr from Dadabots to be on the Harmonai team. Mann’s biggest impact, however, was allowing Zach to use his collection of thousands of songs to train the Dance Diffusion model. The “J Mann” model was the first trained version of Dance Diffusion, but anybody can get their own version of the model and train it on their own music or owned samples to produce new samples in the same style. 

Open-source distribution is a core tenet of Harmonai and Stability AI’s business. Zach says that free access to their models is critical to unleashing the full potential of artistic expression that A.I. technology can enable. 

“If you don’t have it open source and available to people, then all of the outputs and the expression are controlled by corporations. And their impetus is going to be to keep it as banal and safe as possible. It’s more about PR, than about enabling expression. And that’s going to happen for any system that is behind a paywall.”

Zach Evans

Regulating access to A.I., and preventing bad actors from abusing A.I.-enabled voice cloning or deepfakes is currently a hot topic for stakeholders at every level. Zach sees less risk for A.I.-generated music though, reasoning that “you could make bad music, but you could do that without A.I. too.”

The thornier question for A.I.’s application to the music industry is how it will affect individual artists’ equity. The technology isn’t advanced enough to replace artists for now, but could that be a risk one day? At a high level, Zach says that A.I. could reduce the skill barriers and empower more people to create more diverse music. As for the existing musicians, Zach says much of the fear about A.I. imagines a dynamic of severe corporate control that already exists to a certain extent. 

“When was music popularity ever about the music? It’s not about the actual artifact you put out. There’s no correlation, after a certain point, between subjective quality and fame and fortune. It’s all business at that point, marketing, branding, whatever. But the nuclear energy I see is that now artists are able to get out their ideas more easily.”

Zach Evans

Future of A.I. Music and Challenges Along the Way  

Zach sees A.I. as a new tool for musicians with the potential to expand the depth of the music landscape we know today. He says he wants to see creatives use the technology “wrong” to explore what else is possible in music. He compares it to the night sky: all the stars we can see is all the music that already exists, and all the space between the stars is where he and his colleagues want to explore. He thinks a lot about an A.I. model that can mimic the natural ability of a brilliant musician to hear and compose new sounds that exist between and outside the sounds we’ve already heard. 

“I’m fascinated by the concept of, can these models learn some abstraction of music theory, that’s not specifically Western music theory or Eastern music theory. And then be able to explore that kind of mind model of creativity, exploring that space is what really interests me.” 

Zach Evans

In terms of the technical music production process, Zach thinks A.I. can serve as a creative collaborator to handle the aspects of the process that might not interest every musician. For instance, a songwriter might not be a great instrumentalist, but maybe A.I. can generate beats or melodies to accompany their lyrics. As Zach sees it, A.I. could help musicians focus on what they do best and delegate the tasks that might slow down their process. 

“I don’t think I have some heartfelt song to write to change America, that’s not going to be my impact on art. But as a technological person, I think that this has a strong potential to create new scenes, create new genres, create something different. And that I think is what I’m most excited about, just what creative people will do with these tools.” 

Zach Evans

While there’s already a lot of great work being made with A.I. right now, one of the biggest challenges in developing more powerful models is the size of the data and the amount of computing power it takes to process high-quality audio. Digital audio is recorded at a standard sample rate of 48kHz, which refers to how many tiny pieces of analog music are digitally recorded per second. 48kHz represents 48,000 numbers per second, requiring significant processing power that might be financially out of reach for those looking to dabble in music production. 

That’s why Harmonai is supporting volunteer research projects and hosting production challenges for artists, trying to push the limits of the technology in pursuit of longer sequence lengths. The long-term goal for Zach and others in his space is a music version of text-to-image generators such as Stable Diffusion or Midjourney. 

Links Mentioned in this Podcast

Continue the Conversation

Thank you, Zach, for being our guest on Creativity Squared.

This show is produced and made possible by the team at PLAY Audio Agency: https://playaudioagency.com.  

Creativity Squared is brought to you by Sociality Squared, a social media agency who understands the magic of bringing people together around what they value and love: http://socialitysquared.com.

Because it’s important to support artists, 10% of all revenue Creativity Squared generates will go to ArtsWave, a nationally recognized non-profit that supports over 150 arts organizations, projects, and independent artists.

Join Creativity Squared’s free weekly newsletter and become a premium supporter here.