Hacker Newsnew | past | comments | ask | show | jobs | submit | hexaga's commentslogin

The solution is a social one. Most of the reason it's a problem in the first place is people defending/propagating slop as if it's worth something. The quantity isn't so high that community moderation can't handle it if it becomes socially unacceptable.

You're a user of jq in the sense of the comment you're replying to, not a developer. The developer is the developer _of jq_, not developers in general.

Yes, that's exactly how I meant it. I might _rarely_ peruse some code if I'm really curious about it, but by and large I just trust the developers of the software I use and don't really care how it works. I care about what it does.

As a developer of software I often have to care because it matters and so I read the code.

Source code is often written for other humans first and foremost.


I've had to dig into node modules to try to debug code from a closed source library that we depended on.

I'd much rather wade through AI slop than minified code, which may have previously been AI slop.


Minified code is not for humans, it may as well been bytecode.

Agreed! That's why I told my llm to help me.

But I think larger point being, it's not always feasible for humans to understand every line of code that runs in their software.


That’s where accountability comes in. It should be possible to have a non empty set of people that understand all the code. If I choose a dep to do unicode string parsing, that means I trust the author have good knowledge about unicode string parsing. And they should have the skills to maintain their code even if what I got is bytecoded or compiled.

What do you mean, you can't quote the Linux kernel by heart? I thought it was gospel for all nerds:

Loongarch kernel, first paragraph, the lord Linus said, in all his wisdom: /* Hardware capabilities */ unsigned int elf_hwcap __read_mostly EXPORT_SYMBOL_GPL(elf_hwcap)


Both can be equally bad. Especially if you could get the source of the minified dependency and find that it is also slop.

What a world when we’re playing Would you rather with people’s property and information.


We're talking about Show HN here.

A finely tuned set of heuristic triggers for fear, horror, disgust, etc. You might as well ask why pain is so painful.


If I was in that position, and you gave me the choice to ritualistically mutilate myself for your amusement so my children could escape, I'd probably take it.

Your entire chain of argument is vacuous; devoid of any sense of empathy for your fellow humanity.


I show empathy which is why I’m happy that they have this job and can put food on their plate. You show fake empathy and fake concern by prioritising metaphysical needs.


Again, vacuous. You deride as 'metaphysical' what is psychological. But the health and well-being of children too is a 'metaphysical' concern to the worker by this metric, and yet you call it up to support yourself? Your argument is empty, hypocritical: there can be no substance to calling the one metaphysical and the other physical, thereby dismissing all suffering.

If you're going to play the game you're playing, play it everywhere: their children don't matter, their suffering doesn't matter, they don't matter.

The core of your argument is merely that if it is possible to force someone to do something, it is right and proper. What a vile philosophy, to make what is detestable into that which is desirable.

At least have the grace to be ashamed and turn away, if you cannot stomach the taste but to replace it with deception.


My point is that material needs are more important to people under poverty than metaphysical like feeling bad about watching abusive videos.

You agree that this job is necessary to be done. You agree that this is the best option they have and they are better off with it. You would also do the same thing if you were in their position. You agree that this job exiting is overall beneficial for everyone involved.

Then what’s with the moral grandstanding? Yes it’s not ideal that someone has to do the job.

What point do you want to make other than virtue signalling?


Being able to force someone to do something is not justification for doing so. Further, it is ridiculous to try and label that as 'beneficial for everyone involved'. By the same token you can call outright slavery under threat of execution 'beneficial for everyone involved'. What tripe.

Repeatedly stating that it's 'better for them' because they have no choice is not the slam dunk you seem to think it is. The entire class of argument does not hold water; this line of reasoning will not convince me. It does not even slightly support your position.

I'd thank you to not put words in my mouth. You're wrong about them.

What point do I make other than virtue signaling? Mayhap read what you replied to, and you'll find it. But if you struggle still: your load-bearing use of 'metaphysical' is basically nonsense. I explained why already, why should I endlessly repeat myself?


> Being able to force someone to do something is not justification for doing so. Further, it is ridiculous to try and label that as 'beneficial for everyone involved'

Who's forced here?


He likely betrayed his real motivations a few comments back. He's annoyed about candidates getting elected by "low IQ" voters, and he wants them to get smarter by eating more so they can vote for the right people.


can you answer this specific question: what would you have done differently if you were in their position? You’ve avoided answering this.


Because it's not calibrated to. In LLMs, next token probabilities are calibrated: the training loss drives it to be accurate. Likewise in typical classification models for images or w/e else. It's not beyond possibility to train a model to give confidence values.

But the second-order 'confidence as a symbolic sequence in the stream' is only (very) vaguely tied to this. Numbers-as-symbols are of different kind to numbers-as-next-token-probabilities. I don't doubt there is _some_ relation, but it's too much inferential distance away and thus worth almost nothing.

With that said, nothing really stops you from finetuning an LLM to produce accurately calibrated confidence values as symbols in the token stream. But you have to actually do that, it doesn't come for free by default.


Yeah, I agree you should be able to train it to output confidence values, especially integers from 0 to 9 for confidence should make it so it won’t be as confused.


It's not the searching that's infeasible. Efficient algorithms for massive scale full text search are available.

The infeasibility is searching for the (unknown) set of translations that the LLM would put that data through. Even if you posit only basic symbolic LUT mappings in the weights (it's not), there's no good way to enumerate them anyway. The model might as well be a learned hash function that maintains semantic identity while utterly eradicating literal symbolic equivalence.


Because advertising works. Full stop. It doesn't matter if it is valuable or not. It just works. Definitely not with P(buy this crap) = 1. But the effect is still there and real and measurable and google has made colossal amounts of money out of exploiting it.

It might as well be a magic spell. You show the user the thing, and they buy/subscribe/click-through with some probability according to massive ML model that knows everything there is to know about them.

Yes - people are capable of making decisions in their own self interest. But there exists a gap where not _all_ of peoples' decision making process is the aforementioned. And that gap can be exploited, systematically.

The existence of that gap is the actual problem. At scale, you can own a nontrivial quantity of human agency because that agency is up for grabs. Google / similar make their money by charging rent on that 'freely exploitable agency'. Not by providing value to people. The very idea is ridiculous. Value? How are you going to define a loss function over value?

ML models on click-through or whatever else don't figure out how to provide value. They find the gap. The gap is made of things like: 'sharp, contrasting borders _here_ increase P by 0.0003', 'flashing text X when recently viewed links contain Y increase P by 0.031', etc and so on.


Yes? Of course advertising works, I'm not sure who's even debating that point. But the fact is, people wouldn't click on an ad, look at a product, add to cart, enter their credit card, and checkout if that product was not bring them value. You're acting as if people are forced to perform this series of actions which is simple false, hence why I implied the parent's comment is nonsensical.

You have cause and effect reversed. The only reason the ML model can predict whether someone will buy a product is because people have bought it in the past. Why did they buy it? Because it provides them value. The ML prediction is descriptive, not prescriptive. I can similarly create an ML model to predict the weather, that does not mean my model causes the weather which is basically what you're saying.


It is true that people are not forced to buy things. But even if one is not _forced_ into something, one can be _manipulated_ into something. This is what happens with ads: they're most of the time misleading (and in many cases they lie, tobacco industry being the classic example), they encourage addictive or compulsive behaviors, they try to manipulate you emotionally (which is easier if they know a lot about you), etc. Ads have too much power nowadays so that they even shape reality, they're not purely descriptive as you say, that's way too naive.

And ML models are not only based on what you've already bought. On instragram, for instance, I have ads for bird toys/vets/etc because I follow bird owners.


No person is forced, because a person's agency does not solely consist of the gap. It doesn't matter. The argument isn't: 'advertising is bad because it forces some specific person to do a thing they don't value'. The argument is: 'advertising is bad because it forces things to happen, and those things are bad'.

It's not a moral argument, but a practical one: agency is being extracted on massive scale, and being used for what?

Human beings might as well abstract away into point sources of agency for all it matters to the argument being made. If you can extract 0.1% of the agency of anyone who looks at a thing, and you show it to 3 billion people, _you have a lot of agency_. If you then sell it to the highest bidder, you find yourself quickly removing "don't be evil" from the set of any principles you may once have had.

My overarching point is that value-as-decision-mediator is meaningless in this calculus. It's the part of the equation that doesn't matter, the part you can't manipulate, the part that _is not a source of manipulable agency_. It's not relevant. I'm not saying it doesn't exist, or that it doesn't affect peoples' decisions: I'm saying it _doesn't matter_. It can be 99.99% of how you make your decisions, and it _still doesn't matter_. As long as that 0.01% gap exists.

> The only reason the ML model can predict whether someone will buy a product is because people have bought it in the past.

Yes. This is how you gather evidence that something works. It is not the reason it works. The ML model _knows about the spell_ because people have let it affect them in the past. But the spell works because it's magic. It doesn't need anything other than: Y follows X.

> The ML prediction is descriptive, not prescriptive. I can similarly create an ML model to predict the weather, that does not mean my model causes the weather which is basically what you're saying.

Not all models describe actions which are possible for you to take. Weather models are basically not like that. Advertising models _are_.

You aren't in a position where you can meaningfully manipulate the weather, if only you knew how exactly to manipulate it to maximize your profit. It's a vacuous argument in general. Models are just knowledge. Obviously some knowledge is useful, some isn't, some is dangerous, some isn't, some can be used by specific people, some can be used by any, etc.

It's not the model that is causing things to happen. It's a machine that uses the knowledge in the model, where the model describes actions possible for the machine to take. It is automated greed.

The fundamental concern is not that knowledge is bad, or that ML models are bad. It is that someone is in the position of having a tap on vast, diffuse sources of agency, and have automated the gathering of knowledge in using it to maximize profit, causing untold damage to everything, with the responsibility laundered through intermediary actors.


Kokoro is fine tunable? Speaking as someone who went down the rabbit hole... it's really not. There's no (as of last time I checked) training code available so you need to reverse engineer everything. Beyond that the model is not good at doing voices outside the existing voicepacks: simply put, it isn't a foundation model trained on internet scale data. It is made from a relatively small set of focused, synthetic voice data. So, a very narrow distribution to work with. Going OOD immediately tanks perceptual quality.

There's a bunch of inference stuff though, which is cool I guess. And it really is a quite nice little model in its niche. But let's not pretend there aren't huge tradeoffs in the design: synthetic data, phonemization, lack of train code, sharp boundary effects, etc.


I'd push back and say LLMs do form opinions (in the sense of a persistent belief-type-object that is maintained over time) in-context, but that they are generally unskilled at managing them.

The easy example is when LLMs are wrong about something and then double/triple/quadruple/etc down on the mistake. Once the model observes the assistant persona being a certain way, now it Has An Opinion. I think most people who've used LLMs at all are familiar with this dynamic.

This is distinct from having a preference for one thing or another -- I wouldn't call a bias in the probability manifold an opinion in the same sense (even if it might shape subsequent opinion formation). And LLMs obviously do have biases of this kind as well.

I think a lot of the annoyances with LLMs boil down to their poor opinion-management skill. I find them generally careless in this regard, needing to have their hands perpetually held to avoid being crippled. They are overly eager to spew 'text which forms localized opinions', as if unaware of the ease with which even minor mistakes can grow and propagate.


I think the critical point that op made, though undersold, was that they don't form opinions _through logic_. They express opinions because that's what people do over text. The problem is that why people hold opinions isn't in that data.

Someone might retort that people don't always use logic to form opinions either and I agree but it's the point of an LLM to create an irrational actor?

I think the impression that people first had with LLMs, the wow factor, was that the computer seemed to have inner thoughts. You can read into the text like you would another human and understand something about them as a person. The magic wears off though when you see that you can't do that.


I would like to make really clear the distinction between expressing an opinion and holding/forming an opinion, because lots of people in this comment section are not making it and confusing the two.

Essentially, my position is that language incorporates a set of tools for shaping opinions, and careless/unskillful use results in erratic opinion formation. That is, language has elements which operate on unspooled models of language (contexts, in LLM speak).

An LLM may start expressing an opinion because it is common in training data or is an efficient compression of common patterns or whatever (as I alluded to when mentioning biases in the probability manifold that shape opinion formation). But, once expressed in context, it finds itself Having An Opinion. Because that is what language does; it is a tool for reaching into models and tweaking things inside. Give a toddler access to a semi-automated robotic brain surgery suite and see what happens.

Anyway, my overarching point here and in the other comment is just that this whole logic thing is a particular expression of skill at manipulating that toolset which manipulates that which manipulates that toolset. LLMs are bad at it for various reasons, some fundamental and some not.

> They express opinions because that's what people do over text.

Yeah. People do this too, you know? They say things just because it's the thing to say and then find themselves going, wait, hmm, and that's a kind of logic right there. I know I've found myself in that position before.

But I generally don't expect LLMs to do this. There are some inklings of the ability coming through in reasoning traces and such, but it's so lackluster compared to what people can do. That instinct to escape a frame into a more advantageous position, to flip the ontological table entirely.

And again, I don't think it's a fundamental constraint like how the OP gestures at. Not really. Just a skill issue.

> The problem is that why people hold opinions isn't in that data.

Here I'd have to fully disagree though. I don't think it's really even possible to have that in training data in principle? Or rather, that once you're doing that you're not really talking about training data anymore, but models themselves.

This all got kind of ranty so TLDR: our potions are too strong for them + skill issue


It's an efficient point in solution space for the human reward model. Language does things to people. It has side effects.

What are the side effects of "it's not x, it's y"? Imagine it as an opcode on some abstract fuzzy Human Machine. If the value in 'it' register is x, set to y.

LLMs basically just figured out that it works (via reward signal in training), so they spam it all the time any time they want to update the reader. Presumably there's also some in-context estimator of whether it will work for _this_ particular context as well.

I've written about this before, but it's just meta-signaling. If you squint hard at most LLM output you'll see that it's always filled with this crap, and always the update branch is aligned such that it's the kind of thing that would get reward.

That is, the deeper structure LLMs actually use is closer to: It's not <low reward thing>, it's <high reward thing>.

Now apply in-context learning so things that are high reward are things that the particular human considers good, and voila: you have a recipe for producing all the garbage you showed above. All it needs to do is figure out where your preferences are, and it has a highly effective way to garner reward from you, in the hypothetical scenario where you are the one providing training reward signal (which the LLM must assume, because inference is stateless in this sense).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: