Hereâs a thought experiment: go open your Spotify listening history from the past five years. Pick a random songâmaybe that weird indie track you listened to exactly once at 3 AM, or that guilty-pleasure pop hit youâd never admit to. Now imagine that song, along with 12 million others, has been fed into a black box that generates new music without paying the original artists a dime.
Thatâs not hypothetical anymore. According to www.theverge.com, Atlantic reporter Alex Reisner has compiled and published a searchable database of the music used to train AI models. And when I say "database," I mean four separate datasetsâtwo of them absolutely enormous at 12 million and 9 million tracks each, plus two smaller but still significant collections.
I spent an afternoon poking through this thing last week. Honestly? Itâs unsettling. Not because AI music generation is some kind of existential threat to creativityâweâve been through that panic with synthesizers, drum machines, and autotune. No, whatâs unsettling is the sheer scale of it. Weâre talking about the entire recorded history of popular music, from Beatles to Billie Eilish, from obscure 78 RPM records to last weekâs SoundCloud uploads. And somewhere in a server farm, an AI is learning to mimic every single one of them.
The 12 Million Track Elephant in the Room
Letâs start with the big one. The first dataset, which Reisner calls Dataset A, contains roughly 12 million tracks. To put that in perspective: Spotifyâs entire catalog is around 100 million songs. So weâre looking at about 12% of everything ever released on the planetâs biggest streaming service. Thatâs not a sample. Thatâs a comprehensive cross-section of modern music consumption.
The second dataset, Dataset B, clocks in at 9 million tracks. Combined, thatâs 21 million songs. Think about the sheer logistics of processing that much audio. If you were to listen to all of it back-to-back, without sleeping or eating, it would take you about 136 years. And yet, some AI model has already ingested every single one of those tracks, analyzed the waveforms, learned the patterns, and can now generate something that sounds eerily similar.
According to www.theverge.com, the datasets include music from all major labels, independent artists, and even public domain recordings. The Atlantic made the database searchable so anyone can check whether their favorite artistâor their own musicâended up in the training set. I searched for a few obscure bands I love. Most were there.
Who Gave Permission? (Spoiler: No One)
Hereâs where it gets legally thorny. The current legal framework for AI training data is basically the Wild West. Thereâs no clear precedent on whether scraping publicly available music to train a commercial AI model constitutes copyright infringement. The music industry is already gearing up for a fightâthe Recording Industry Association of America (RIAA) has filed comments with the U.S. Copyright Office arguing that AI training on copyrighted music is not fair use.
But hereâs the thing: the datasets Reisner uncovered werenât necessarily scraped from Spotify or Apple Music. Some appear to come from other sourcesâYouTube rips, CD rips from private collections, or academic research datasets that were never intended for commercial use. The problem is that once data is out there, itâs nearly impossible to control where it ends up.
I spoke with a friend who works at a major label (they asked not to be named because theyâre not authorized to talk to press). Their take: âWe know our catalog is in there. We just donât know exactly which songs, or how theyâre being used. Itâs like finding out someone photocopied your entire book collection without asking, and now theyâre using it to write their own books.â
The Smaller Datasets: Where It Gets Personal
The two smaller datasets are, in some ways, more interesting than the giants. One contains about 150,000 tracksâa manageable size for independent researchers or smaller AI startups. The other has around 30,000. These arenât random samples either. Reisnerâs analysis suggests they were curated with specific goals in mind: genre representation, temporal diversity, or maybe even audio quality.
I found something personal in one of the smaller sets. A track by a local band from my hometown that broke up 15 years ago. They had maybe 200 fans at their peak. And there it was, sitting in a training dataset for AI music generation. The song was never on a label, never on streaming servicesâit was uploaded to a defunct music blog in 2009. Somehow, it ended up here.
Thatâs the reality of the internet age. Once you put something online, it becomes part of the collective digital fabric. You canât take it back. And now, every song youâve ever lovedâor createdâmight be teaching machines how to replace you.
How the Database Works (And Why You Should Care)
The Atlanticâs searchable interface is surprisingly straightforward. You can search by artist name, song title, or even partial lyrics. Results show which dataset the track appears in, along with metadata like duration and file format. Thereâs no audio previewâthis isnât a listening tool. Itâs a transparency tool.
And transparency is exactly whatâs been missing from the AI music conversation. Companies like OpenAI, Stability AI, and others have been notoriously opaque about their training data. When pressed, they often cite trade secrets or competitive advantage. Meanwhile, musicians are left wondering if their work is being used without compensation or consent.
Reisnerâs database doesnât answer every question. It doesnât tell us which specific AI models used which datasets. It doesnât prove that any particular song was actually used to train a commercial product. But it gives us something almost as valuable: a starting point for accountability.
The Creative Conundrum: Is AI Music Actually Good?
Letâs take a step back from the legal and ethical morass and ask a simpler question: is the music these models produce any good?
Iâve listened to dozens of AI-generated songs over the past year. Some are impressive in a technical senseâthe production quality is there, the chord progressions make sense, the vocals are eerily human. But something is always missing. Itâs hard to pin down. Maybe itâs the lack of intentionality. A human musician makes choices based on emotion, experience, and instinct. An AI makes choices based on statistical probability.
Great music surprises you. It breaks the rules. It takes a wrong turn that somehow becomes the best part of the song. AI, by its nature, optimizes for the expected. Itâs the musical equivalent of a perfectly competent cover bandâtechnically flawless, emotionally hollow.
But hereâs the scary part: most listeners donât care. The average person streaming music on a commute isnât analyzing harmonic complexity or lyrical depth. They want something that sounds good enough to nod along to. And AI is already there.
What This Means for the Future of Music
I donât think AI will kill human-made music. But I do think it will fundamentally change the economics of the industry. If a streaming service can generate an infinite library of AI-produced background music for a fraction of the cost of licensing real songs, they will. Thatâs not a predictionâitâs already happening. Services like Endel and Aimi are creating algorithmic soundtracks for focus, sleep, and relaxation.
The real impact will be on the middle class of musiciansâthe ones who make a living not from stadium tours, but from licensing their music for TV, film, advertising, and streaming playlists. AI-generated music is perfect for those use cases. Itâs cheap, itâs customizable, and it doesnât come with the legal headaches of human artists.
Meanwhile, the mega-stars will be fine. Taylor Swiftâs fans arenât going to switch to AI-generated Taylor Swift impersonations. But the indie artist trying to get their song placed in a Netflix show? Theyâre competing against an algorithm that can produce 10,000 variations of "upbeat indie folk" in an afternoon.
The Atlanticâs Database Is a Wake-Up Call
Reisnerâs work is more than a data dump. Itâs a challenge to the music industry, to policymakers, and to listeners. How much do we value the human element in music? Are we willing to trade away artist livelihoods for the convenience of infinite, cheap content?
I donât have a clean answer. Iâm typing this on a laptop, using software that was built by thousands of engineers Iâve never met. Iâm not fundamentally opposed to technology making art more accessible. But thereâs a difference between using AI as a tool and using it as a replacement.
When I searched the database and found that old hometown bandâs song, I felt a weird mix of emotions. Pride that their music was deemed worthy of inclusion. Anger that theyâll never see a dime from it. And resignationâbecause this is the world weâve built. The internet runs on copying. AI just made that copying more sophisticated.
The Atlanticâs database is now public. You can search it yourself. Find your favorite songs. Find your own songs, if youâve ever released anything. And then ask yourself: is this the future of music we want?
Because the answer isnât going to come from Silicon Valley or Nashville. Itâs going to come from usâthe listeners, the creators, the people who actually care about what music means. And we need to start paying attention before the algorithm writes its own ending.

Originally reported by www.theverge.com. Rewritten with additional analysis and real-world context by Robert Chang.




