Hacker Newsnew | past | comments | ask | show | jobs | submit | koheripbal's commentslogin

To add a little context, this suspension comes immediately after Anna's Archive publicly implicated themselves in the Spotify scraping "hack" in which they downloaded nearly the entire content library of Spotify and was preparing to release it publicly (~300TB worth) via torrent.

They published a blog post outlining their plans.


Did the operators _want_ to poke the well connected & well funded bear with a historical anger problem?

No, but they weren't not going to, given that their mission is to archive all cultural content, by hook or by crook.

Archiving it and publishing it are different things.

More importantly, they may sabotage their mission: If Spotify shuts them down, their exiting archives and especially future archives may be effectively lost.


I guess I should say more accurately: Their mission is to both archive it and publish it. They seem to be explicitly against copyright, on principle. Which I greatly respect.

It's time to abolish copyright. It creates more problems (stiffles innovation, creates rents) than it solves (rewards innovation).

It doesn't create problems for large companies that make AI systems.

Yeah, it seems to only be a problem when you're a human being remixing the culture you grew up with.

Meta can admit to soullessly scraping books they don't own for their for-profit AI datasets [1], and it's not a problem because they're Meta. But if you're an artist? Nope. Sampling in hip hop songs, for example, is in a "complex legal gray area" (translation: "it's illegal but we don't want to admit that out loud") [2].

[1] https://futurism.com/the-byte/facebook-trained-ai-pirated-bo...

[2] https://urbanspook.com/copyright-laws-2025-impact-on-hip-hop...


Fortunately, Spotify does not have that power. Annas Archive is not based in US or EU jurisdictions. They can make access for normal people a bit harder, but not shut it down.

(Edited for clarity)


> Fortunately, Spotify does not have that power. They are not based in US or EU jurisdictions.

Perhaps I misunderstood something, but according to my understanding

1. Spotify is registered in Luxembourg and has its operational headquarter in Sweden (Stockholm). Both are EU countries.

2. I guess it won't be Spotify that sues, but the individual music labels (very likely united).


Annas archive is not based in the EU (sorry for being not clear). So the law in EU is limited to enforce a ban. In germany it is already "banned" via ISP but just DNS.

But the real servers are hosted in kazachstan or russia I think. And they do not cooperate so much with EU courts.

So unless the EU installs a great firewall like china, they cannot really shut it down.


> But the real servers are hosted in kazachstan or russia I think. And they do not cooperate so much with EU courts.

I believe the "official" AA servers only host the website + source code. The actual copyrighted content is stored by volunteers who seed the torrents.


Exactly, this is why the 'Hydra' is difficult to take it down.

Presumably the opposing party is residing in non-US-or(and? depends on the order of evaluation)-EU territory, but I might be mistaken. "They" refers to both sides in the parent comment.

I'm not sure archiving and publishing are different things.

They are, but archiving without publishing is pointless.

I occasionally wonder how many enormous collections of culture like that of Marion Stokes[1] have been lost because their curators made no effort to realize the value of their collection.

1. https://en.wikipedia.org/wiki/Marion_Stokes


Most archives - the ones in libraries, etc. - are not published, except they are available to qualified people who physically travel there. Most are not even fully indexed - nobody knows all of what's there.

My perspective is compatible with this fact. An archive that approximately nobody can access and/or nobody knows what it contains has no value to society at large, except the potential that it may some day be published.

The good news is I'd guess the number of (nonreligious/nonproprietary) institutionally managed pointless archives is dwindling.


> They are, but archiving without publishing is pointless.

One may collect/archive now (when the data is, well, "available"), and publish later, when copyright expires and the material will likely be harder to obtain.


Both are illegal, if you just hoard you will never know if what you have is useful. Only way to judge that is by letting people use it.

I can save a copy of my friend's book on my computer, archiving it. Nobody else could see it unless I publish it.

They stated that they would pass the information on to other archivists and public/private trackers no? They obviously have backups, since there are multiple users seeding Gbs and even TBs of data. Mirrors can be created as well, like TPB.

No, because they are all backed up on torrent. Good luck, getting those "shut down" from the DHT

They didn’t come anywhere close to the entire content library, the 300TB represents about 33% of Spotify, though it is close to 100% of the played music.

Kind of nuts that 66% of their library is virtually unplayed. It’s hard to make it as a musician.

It is ridiculously easy to create an album with Suno and push it Spotify. I'm surprised its only 66% TBH

Anna's archive has a great analysis of the Spotify data.

They identify a huge surge in tracks that few listen to after gen AI started.

The analysis is worth reading. The distribution is (Pareto)^3 ~99% of the tracks played are 1% of the catalogue.


1. Generate slop music nobody will ever listen to 2. ???? 3. Profit

It's actually:

1. Generate slop music no _human_ will ever listen to

2. Use a botnet to "play" this music en masse

3. Profit

This is a whole arms race, with companies (such as Beatdapp) specializing in detecting fraudulent plays.

Source: I work for a niche music retailer that struggles with the same issues on a smaller scale.


From a stat I saw years ago, about the same amount of apps on the iOS app store have never been downloaded.

To be completely fair, I am not certain what it means for a track to be "virtually unplayed".

First off, it was striking to me how little of the "top 10 000" they published back on Christmas I recognize. I'm not sure what I expected, but 10 000 sounds like a big number, so it seemed likely to me, that if I get a random song from my playlist I could find it there. It turned out I hardly can find an artist I recognize. Ok, I can recall a song from Lady Gaga and even Billie Eilish, I've heard of Bruno Mars (cannot recall any song), but I have no idea what is "Bad Bunny", "Doechii", "Drake". I mean, I think I do have a pretty good idea what these things are (abstractly), and I probably wouldn't want to listen that. And while I knew that all this stuff is very popular, I didn't quite realize how little place in the top-10000 it leaves for the music I (and everyone I know) actually listen to.

I didn't download the metadata they released (it would be hard to process it on my laptop anyway), but now I wonder how much of my 3 TB music collection is in top 100 000, or heck, even top 1M Spotify, or on Spotify at all.

I also am sometimes surprised how little scrobbles some tracks get. I didn't bother to find out what this means, how many people still scrobble to Last.fm or ListenBrainz, but it is just surprising when I see that a track that I didn't consider to be obscure was scrobbled like 50 times this year.

So I'm saying that music worlds seems to be terribly fragmented, even more than I imagined. So the very premise of AA backing-up 97% of Spotify (by the number of plays) may be much lesser achievement at "preserving culture" than it may sound. And of course we are about 8 years too late to backup everything, since by now half of it must be generative NN bullshit. And I'm not even sure it's in those leftover 3% (bots listen to bot-generated music too, right)?


> It turned out I hardly can find an artist I recognize

I've heard of 9 of the top 10 and 15 of the top 20 at https://chartmasters.org/most-monthly-listeners-on-spotify/

You might not listen, but surely you have heard of Taylor Swift, Justin Bieber, Ariana Grande, Ed Sheeran, Coldplay and of course Christmas Staples of Mariah Carey and Wham?


First off, this is not the top we are talking about, since there is one that AA provided[0]. I am not sure what it matters which names exactly I've heard of, but if you are that curious: I don't know what is Ed Sheeran and Wham (but cannot vouch I've never heard their music in a supermarket), but I definitely remember "Coldplay" being mentioned in a joke onstage by a NIN member[1], but I didn't bother to check out what they are. I can imagine the faces of Taylor Swift & Justin Bieber, but cannot name any song, and I'm sure I've heard Mariah Carey somewhere, since that name is around longer than Rihanna. I have a song or two of Ariana Grande in my playlist though.

Edit: Ok, I've finally googled "Coldplay". Yeah, definitely heard "Clocks" somewhere.

[0] https://annas-archive.li/blog/spotify/spotify-top-10k-songs-...

[1] https://www.youtube.com/watch?v=qboe5CebixA


You're a (waaay) outlier.

Are you sure? See, my point is a conjecture (based on a reasonable assumption that I cannot be that special), that there must be really a lot of us "outliers" out there (so I'm not even sure it's reasonable to call us that).

Let's reiterate. I am well aware that more people listen to that Bad Rabbit, Taylor Swift or Justin Bieber than they listen to <random name from my playilist>, it's not really a surprise. There even is a special name for people like that, it's "celebrity". In fact, that's probably how most people who are into music (including myself, I might say) would categorize them, as "celebrities", not as "musicians" (though, mind you, of course they are musicians, as everyone who ever sang a song is, it's just that when I hear the word "musician" I don't necessarily think of Taylor Swift). Hence these people indulge themselves for not knowing who these guys are, explaining it that "they are not into celebrities".

And it's no surprise that a lot of people listen to celebrities. I mean, if Trump would release a song right now, it would become #1 on Spotify in no time (for a very short time, but still). Well, maybe not #1, but close.

But I also suppose there are a lot of people who are into music. Maybe not so many, as there are people who are into celebrities, but it's still a lot. And after seeing that top-10 000 I suddenly find it very plausible, that a lot of tracks these people call "massive hits" may turn out to be "virtually unplayed". And hence not in those "97% of Spotify (by # of plays)" that AA archived. I am not even claiming it, I'm just saying that this doesn't seem to be impossible.

For instance, any DnB fan would say that "everyone knows Noisia and Black Sun Empire". It would be absolutely laughable attempt at "preserving human culture" not to include them. Surely all of their tracks must be at least in top-5M, right? Well, after seeing top 10K I'm not so sure anymore.

Maybe you've never heard of them, but surely you've heard of Prodigy. Not a single track from Prodigy on top-10K. Or Chemical Brothers. Or Burial, or Placebo, or Nighwish, or King Crimson. These are very famous names in respective circles. There are 2 tracks from Massive Attack — both featured in super-famous movies and trending on TikTok right now. For God's sake, there are only 8 tracks from Madonna in top 10K. Versus 26 from Imagine Dragons and 124 from "Bad Bunny", whatever it is. How do you like Madonna for an obscure artist?

So, my point is that there may be a lot of people listening almost exclusively to "virtually unplayed" music. Entire discographies of (niche) cult-artists may turn out to be buried in these 66% of "virtually unplayed" tracks.

I guess I should just get the metadata and check, but I'm pretty sure that would be outside of capabilities of the hardware I have on hand, so I'm not sure how to go about that.


The metadata torrent is only ~200GB, which should be well within your capabilities.

https://annas-archive.li/torrents/spotify

Anyway, I think you should keep in mind 2 things:

1) 10,000 tracks really is not a lot. It sounds like a lot, but isn't. My own - relatively small - collection is nearly double that.

2) 10,000 tracks... out of 256,000,000 that AA archived.

I'd be very interested to see some more analysis done on this, particularly as it relates to, say, Last.fm statistics - but I suspect the missing music is not as significant as you think.

In any case, even if every one of those "niche" artists you list are missing from this collection, I don't think it's fair to say it's a "laughable attempt" - it's certainly better than nothing, even if it's not perfect.


The funny thing is, since the advent of streaming I no longer listen to the radio. I listen to new music, but little pop music, and I have never heard a single track from Swift, Bieber, Grande or Sheeran. Coldplay is the only act I like on that list, and the streaming services are pretty good at only playing what I like.

If they were pre-streaming artists I probably would have heard a lot of their catalog because radio played it over and over. Unfortunately you just can’t get away from the Christmas music.


Sure, but I'm sure you've heard of Taylor Swift and Justin Bieber.

Traditional radio mostly sucks, but Soma.fm and KEXP are both great for discovering new music.

Very hard if you have little talent..


> For now this is a torrents-only archive aimed at preservation, but if there is enough interest, we could add downloading of individual files to Anna’s Archive. Please let us know if you’d like this.

If it is torrents only, what relevance does unregistering the domain make?


Ideally, if AA doesn't have any public web presence it's a lot harder to publicly disseminate those torrents.

Realistically, it's just a way for someone to say something is being done about this, even if it's not going to actually make a difference.


Establishing a position Anna's opponents may consider an advantage.

And there is a site idea!

Annasopponents.news --> Can inform passersby on anything related to Anna's Archive along with activism related material, how to's and the like.


Yeah, obviously I don't know if it is actually related, but my first thought when I couldn't open it today was "Told you so"...

Spotify was created from a library of pirated music.. the irony

Came here to say that.

An while back, another site started with a pile of pirated music, and that was allofmp3.com Remember those peeps?

Their business model was to sell music by selling bandwidth. Basically is was all the music you want charged by the megabit download.

Pop titles were $0.10 to $0.25. A whole album at 256mbps was roughly $3 give or take.

What got me really thinking was how great the UX experience was. At the time, few came close.

The end of that site was packaged up with Russia's entry into the WTO.

I seem to remember hearing about huge torrents out there too. The right infohash can point a person to huge archives of various kinds, books, video, academic papers, music, the WikiLeak insurance files, which is password protected, as perhaps all of these are.


As someone who grew up poor in an ex-Eastern Bloc country, allofmp3.com was a godsend.

I think he means it would need to be calibrated on Mars as the exact ground density and composition isnt known from earth


Correct. The easiest way to calibrate a GPR is to stick a metal plate in the ground and cover it with a few feet of earth dug up on location. Can’t do that with some awkward rovers and an experimental helicopter.

NASA can do some fancy signal processing to get some useful data but until its properly calibrated, any interpretation of that data especially visual should be taken with a Phobos sized grain of salt.


I still don't understand. Even if you are off about density, aren't you studying the differences in density, so that the image you generate would still be showing where those differences are located relative to each other -- even if scale might be somewhat off if you have your base density off? It doesn't seem like it would be abject failure, but more like incrementally less useful. It sounds like you are saying it is almost at abject failure on the scale of usefulness.


Yes and no. The radar isn’t only looking down into the ground. The antenna pattern has side lobes which can potentially generate large echos in the radargram, e.g. from rocks on the surface etc. you only know that there is something in some distance (or rather time delay).

The useful signal is extremely weak anyway and the clutter from the surface hides the useful signal in many cases unless you habe really strong scatterers (large and highly reflective) buried in the ground.


Plenty of commercial GPR devices operate on Earth just fine with the explicit goal to detect changes in the subsurface's dielectric properties. It doesn't matter if you're on Mars or here, GPR works in the same way and I'm pretty sure that the antenna and the signal processing has been designed for the purpose, possibly even more meticulously than the antennas of commercial GPR pushcarts. Your comment makes something simple sound highly involved and problematic.


Your understanding is correct. It's about detecting variations in dielectric properties across layer interfaces. GPR works just fine for that, whether here or on Mars. The other commenter's negativity and theorized worries about side lobes and reflectors are unwarranted.


I know someone who works as a social worker in a homeless shelter.

She is the sweetest young woman in the world.

It is a hellish job. I cannot even imagine her managing belligerent addicts. She would quit in a heartbeat.

These rules are in place for the simple reason that workers demand it


> Even those of us dabbling in stocks must find this increase mind-boggling

No, those of us invested in tech stocks have also seen large double digit returns this year alone.


> new ideas for advancing equity and inclusion

It means providing research containing useful narratives that can be used by others to promote DEI political objectives.

For example, in math it means publishing commentary on how math is widely used for racial discrimination and should generally not be trusted. See the popular book "Math Is Racist". Several Mathematicians are cited in the book.


YYYY-MM-DD is the international standard more of us should use.


That casual readers know nothing of. I've always used 12-July-2024. It's the only non-ambiguous date format, that takes no consideration to understand.


In a perfect world everyone would agree, but regardless of my personal thoughts on ISO 8601 I think pointing to it as a silver bullet kind of misses the point.

Fundamentally storing and displaying dates serve two completely different purposes, but a format like 3/4/2023 is not suitable for either.


If you display it as 2024-12-03 then you can ignore locale date formatting.


> regardless of my personal thoughts on ISO 8601 I think pointing to it as a silver bullet kind of misses the point.

What is your point?

Is it to not use MM/DD/YYYY format and avoid any concrete recommendation for disambiguation?

Forgive my lack of acuity when missing your point. I may have been distracted by the dissonance between your evasive pedantry and your misuse of “ambivalent/ambivalence” when your semantic context calls for “ambiguous/ambiguity”.


A simple Airthings or Awair air quality monitor shows you the VOC rate in the home and it is easy to see how a fan keeps the levels at zero or even just opening a window drops the pollutant rate to almost zero very very quickly after cooking.

I trust my data over internet opinions.


That's why furnaces have a chimney.


As far as I know, they're still bad. Especially when everyone in the neighborhood has one.


Non-Western powers are doing the exact same thing online.

... and have been caught and demonstrated to be doing it explicitly.

Not sure why you focus on the less demonstrated western powers doing it.


> Non-Western powers are doing the exact same thing online.

Yeah, I'm saying it used to be a clear distinction.

Most of what we were told about living in the USSR or China is true here now.


...lots of things SHOULD be true, but are not in practice.


Emission reductions and or efficiency improvements are also possible outcomes if the tax is higher than the cost of these changes.


Only if you can finance/afford it. If not, it could even delay change as it reduces available funds.


The target is really industry. If no one has to pay the cost of emissions no one has any incentive to change.

Look to the oil crisis of the 70s for examples - it was a bad time but the cost of fuel spurred a surge in sale of small cars because fuel efficiency finally mattered. On another front it spurred the cycling culture of the Netherlands - they didn’t take up cycling and build infrastructure to support it out of altruism, they did it because their fuel supply was nearly cut off entirely.

Innovation only happens when there is a reason to innovate. If carbon emissions don’t cost the emitter anything then there’s no reason to invest in ways to emit less.


If the costs can be passed on, then there is potentially no incentive to innovate, either. It might also still be more cost efficient to innovate less to reduce costs than to actually innovate to reduce costs. Predicting where these things go is quite tricky.


But if the costs can be passed on then that’s a means for emitters to differentiate on cost. If manufacturer A emits more and passes the tax onto the consumer, manufacturer B can undercut on them on price if they’re more efficient.

Right now there’s minimal financial benefit to being more carbon efficient. If anything it’s disincentivised because efficiency is itself costly, so it’s cheaper to just emit.


But only if undercutting isn't too expensive/pays back fast enough. The issue is that from a revenue perspective things always look simple, from a profit perspective it gets more complicated, especially if things have switching costs/limited fungibility, are oligopolies and other structured markets. It might work like you outline but I would not be surprised if there are a lot of unexpected or undesirable results, too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: