Generative AI for Music - Is Stable Audio Ready for Prime Time (Or Even Waltz Time)?
Will musicians be the next victims of AI?
The popularity of generative tools for images has taken off in the past year, with Stable Diffusion in various releases, DALL-E 2, etc.
And consequently there are heated debates over such images and their uses.
Also rising in the spotlight are tools for audio. Voice imitation tools are now common.
And consequently there are heated debates over such generated voice “recordings”.
Music is more complex than spoken words, though, and music generation by computer is still rather limited.
This week the people at Stability AI made Stable Audio available to the public.
They claim:
AI music by musicians, for musicians.
And that may be true.
But will musicians really embrace this?
I decided to give Stable Audio a try. Here are three examples:
1) I prompted for a “cool Foxtrot in swing time”. Here is the result:
Hmm… sounds like someone trying to tango on saws…
2) Then I prompted for an “Uplifting Viennese Waltz on cellos”, which resulted in this:
That really does sound like accordions, but it is kind of in waltz time.
3) Finally I asked for “Cello Solo, adante” and thus we have:
We can say that “adante” was interpreted closely enough, but while the sound is similar to a cello there sounds like there is more than one.
Can we make any conclusions?
To give Stable Audio some grace, let’s acknowledge that this is the first public version.
Everyone has to start somewhere.
But my examples tell me that Stable Audio still has a long ways to go.
And even though I am not a musician, I suspect tools like Stable Audio will meet quite a bit of resistance in some circles.
Already the fear of AI tools is feeding the various strikes in the corporate entertainment industry. If a producer can generate a soundtrack without hiring any musicians or composers, the temptation will be too great under funding pressures.
Final thought: Are musicians immune to being made redundant?