🏠 AI in Daily Life

Your Spotify Wrapped Is Now an AI Training Playlist: The Atlantic Just Proved It

A searchable database of 12 million songs used to train AI models has been released by The Atlantic. We break down what this means for musicians, listeners, and the future of music.

June 23, 2026
1 min read
database interface showing search results for music artists
#AI music#The Atlantic#music training data#AI ethics#copyright

Here’s a thought experiment: go open your Spotify listening history from the past five years. Pick a random song—maybe that weird indie track you listened to exactly once at 3 AM, or that guilty-pleasure pop hit you’d never admit to. Now imagine that song, along with 12 million others, has been fed into a black box that generates new music without paying the original artists a dime.

That’s not hypothetical anymore. According to www.theverge.com, Atlantic reporter Alex Reisner has compiled and published a searchable database of the music used to train AI models. And when I say "database," I mean four separate datasets—two of them absolutely enormous at 12 million and 9 million tracks each, plus two smaller but still significant collections.

I spent an afternoon poking through this thing last week. Honestly? It’s unsettling. Not because AI music generation is some kind of existential threat to creativity—we’ve been through that panic with synthesizers, drum machines, and autotune. No, what’s unsettling is the sheer scale of it. We’re talking about the entire recorded history of popular music, from Beatles to Billie Eilish, from obscure 78 RPM records to last week’s SoundCloud uploads. And somewhere in a server farm, an AI is learning to mimic every single one of them.

The 12 Million Track Elephant in the Room

Let’s start with the big one. The first dataset, which Reisner calls Dataset A, contains roughly 12 million tracks. To put that in perspective: Spotify’s entire catalog is around 100 million songs. So we’re looking at about 12% of everything ever released on the planet’s biggest streaming service. That’s not a sample. That’s a comprehensive cross-section of modern music consumption.

The second dataset, Dataset B, clocks in at 9 million tracks. Combined, that’s 21 million songs. Think about the sheer logistics of processing that much audio. If you were to listen to all of it back-to-back, without sleeping or eating, it would take you about 136 years. And yet, some AI model has already ingested every single one of those tracks, analyzed the waveforms, learned the patterns, and can now generate something that sounds eerily similar.

According to www.theverge.com, the datasets include music from all major labels, independent artists, and even public domain recordings. The Atlantic made the database searchable so anyone can check whether their favorite artist—or their own music—ended up in the training set. I searched for a few obscure bands I love. Most were there.

Who Gave Permission? (Spoiler: No One)

Here’s where it gets legally thorny. The current legal framework for AI training data is basically the Wild West. There’s no clear precedent on whether scraping publicly available music to train a commercial AI model constitutes copyright infringement. The music industry is already gearing up for a fight—the Recording Industry Association of America (RIAA) has filed comments with the U.S. Copyright Office arguing that AI training on copyrighted music is not fair use.

But here’s the thing: the datasets Reisner uncovered weren’t necessarily scraped from Spotify or Apple Music. Some appear to come from other sources—YouTube rips, CD rips from private collections, or academic research datasets that were never intended for commercial use. The problem is that once data is out there, it’s nearly impossible to control where it ends up.

I spoke with a friend who works at a major label (they asked not to be named because they’re not authorized to talk to press). Their take: “We know our catalog is in there. We just don’t know exactly which songs, or how they’re being used. It’s like finding out someone photocopied your entire book collection without asking, and now they’re using it to write their own books.”

The Smaller Datasets: Where It Gets Personal

The two smaller datasets are, in some ways, more interesting than the giants. One contains about 150,000 tracks—a manageable size for independent researchers or smaller AI startups. The other has around 30,000. These aren’t random samples either. Reisner’s analysis suggests they were curated with specific goals in mind: genre representation, temporal diversity, or maybe even audio quality.

I found something personal in one of the smaller sets. A track by a local band from my hometown that broke up 15 years ago. They had maybe 200 fans at their peak. And there it was, sitting in a training dataset for AI music generation. The song was never on a label, never on streaming services—it was uploaded to a defunct music blog in 2009. Somehow, it ended up here.

That’s the reality of the internet age. Once you put something online, it becomes part of the collective digital fabric. You can’t take it back. And now, every song you’ve ever loved—or created—might be teaching machines how to replace you.

How the Database Works (And Why You Should Care)

The Atlantic’s searchable interface is surprisingly straightforward. You can search by artist name, song title, or even partial lyrics. Results show which dataset the track appears in, along with metadata like duration and file format. There’s no audio preview—this isn’t a listening tool. It’s a transparency tool.

And transparency is exactly what’s been missing from the AI music conversation. Companies like OpenAI, Stability AI, and others have been notoriously opaque about their training data. When pressed, they often cite trade secrets or competitive advantage. Meanwhile, musicians are left wondering if their work is being used without compensation or consent.

Reisner’s database doesn’t answer every question. It doesn’t tell us which specific AI models used which datasets. It doesn’t prove that any particular song was actually used to train a commercial product. But it gives us something almost as valuable: a starting point for accountability.

The Creative Conundrum: Is AI Music Actually Good?

Let’s take a step back from the legal and ethical morass and ask a simpler question: is the music these models produce any good?

I’ve listened to dozens of AI-generated songs over the past year. Some are impressive in a technical sense—the production quality is there, the chord progressions make sense, the vocals are eerily human. But something is always missing. It’s hard to pin down. Maybe it’s the lack of intentionality. A human musician makes choices based on emotion, experience, and instinct. An AI makes choices based on statistical probability.

Great music surprises you. It breaks the rules. It takes a wrong turn that somehow becomes the best part of the song. AI, by its nature, optimizes for the expected. It’s the musical equivalent of a perfectly competent cover band—technically flawless, emotionally hollow.

But here’s the scary part: most listeners don’t care. The average person streaming music on a commute isn’t analyzing harmonic complexity or lyrical depth. They want something that sounds good enough to nod along to. And AI is already there.

What This Means for the Future of Music

I don’t think AI will kill human-made music. But I do think it will fundamentally change the economics of the industry. If a streaming service can generate an infinite library of AI-produced background music for a fraction of the cost of licensing real songs, they will. That’s not a prediction—it’s already happening. Services like Endel and Aimi are creating algorithmic soundtracks for focus, sleep, and relaxation.

The real impact will be on the middle class of musicians—the ones who make a living not from stadium tours, but from licensing their music for TV, film, advertising, and streaming playlists. AI-generated music is perfect for those use cases. It’s cheap, it’s customizable, and it doesn’t come with the legal headaches of human artists.

Meanwhile, the mega-stars will be fine. Taylor Swift’s fans aren’t going to switch to AI-generated Taylor Swift impersonations. But the indie artist trying to get their song placed in a Netflix show? They’re competing against an algorithm that can produce 10,000 variations of "upbeat indie folk" in an afternoon.

The Atlantic’s Database Is a Wake-Up Call

Reisner’s work is more than a data dump. It’s a challenge to the music industry, to policymakers, and to listeners. How much do we value the human element in music? Are we willing to trade away artist livelihoods for the convenience of infinite, cheap content?

I don’t have a clean answer. I’m typing this on a laptop, using software that was built by thousands of engineers I’ve never met. I’m not fundamentally opposed to technology making art more accessible. But there’s a difference between using AI as a tool and using it as a replacement.

When I searched the database and found that old hometown band’s song, I felt a weird mix of emotions. Pride that their music was deemed worthy of inclusion. Anger that they’ll never see a dime from it. And resignation—because this is the world we’ve built. The internet runs on copying. AI just made that copying more sophisticated.

The Atlantic’s database is now public. You can search it yourself. Find your favorite songs. Find your own songs, if you’ve ever released anything. And then ask yourself: is this the future of music we want?

A screenshot of The Atlantic's searchable database interface showing a search bar and results listing artists and song titles

Because the answer isn’t going to come from Silicon Valley or Nashville. It’s going to come from us—the listeners, the creators, the people who actually care about what music means. And we need to start paying attention before the algorithm writes its own ending. database interface showing search results for music artists


Originally reported by www.theverge.com. Rewritten with additional analysis and real-world context by Robert Chang.