Hacker Newsnew | past | comments | ask | show | jobs | submit | qarl's commentslogin

> they are compressing the data beyond the known limits, or they are abstracting the data into more efficient forms.

I would argue that this is two ways of saying the same thing.

Compression is literally equivalent to understanding.


If we use gzip to compress a calculus textbook does that mean that gzip understands calculus?


Finding repetitions and acting accordingly on them could be considered a very basic form of understanding.


To a small degree, yes. GZIP knows that some patterns are more common in text than others - that understanding allows it to compress the data.

But that's a poor example of what I'm trying to convey. Instead consider plotting the course of celestial bodies. If you don't understand, you must record all the individual positions. But if you do, say, understand gravity, a whole new level of compression is possible.


Hm.

When I think to myself, I hear words stream across my inner mind.

It's not pages of text. It's words.


Yeah. People use their real identities on Facebook, and it doesn't help a bit.


> it doesn't help a bit.

I would replace "it doesn't help a bit" with "it doesn't solve the problem". My casual browsing experience is that X is much more intense / extreme than Facebook.

Of course, the bigger problem is the algorithm - if the extreme is always pushed to the top, then it doesn't matter if it's 1% or 0.001% - the a big enough pool, you only see extremes.


I bet if we didn't tolerate advertising and were instead optimising for what the user wanted we'd come up with something much more palatable.


A lot of this is driven by the user's behavior, not just advertising, though.

"The algorithm" is going to give you more of what you engage with, and when it comes to sponsored content, it's going to give you the sponsored content you're most likely to engage with too.

I'd argue that, while advertising has probably increased the number of people posting stuff online explicitly designed to try and generate revenue for themselves, that type of content's been around since much earlier.

Heck, look at Reddit or 4chan: they're not sharing revenue with users and I'd say they're at least not without their own content problems.

I'm not sure there's a convincing gap between what users "want" and what they actually engage with organically.


Reddit and 4chan both get their money from advertisers though, so they have an incentive to try to boost engagement above whatever level might be natural for their userbase.

Social interaction is integrated with our brain chemistry at a very fundamental level. It's a situation we've been adapting to for a million years. We have evolved systems for telling us when its time to disengage, and anybody who gets their revenue from advertising has an incentive to interfere with those systems.

The downsides of social media: the radicalization, the disinformation, the echo chambers... These problems are ancient and humans are equipped to deal with them to a certain degree. What's insidious about ad-based social media is that the profit motive has driven the platforms to find ways to anesthetize the parts of us that would interfere with their business model, and it just so happens that those are the same parts that we've been relying on to address these evils back when "social media" was shouting into an intersection from a soap box.


But neither Reddit nor 4chan really have the feed optimization that you'd find on Meta properties, YouTube, or TikTok.

I'm certainly not going to disagree with the notion that ad-based revenue adds a negative tilt to all this, but I think any platforms that tries to give users what they want will end up in a similar place regardless of the revenue model.

The "best" compromise is to give people what they ask for (eg: you manually select interests and nothing suggests you other content), but to me, that's only the same system on a slower path: better but still broken.

But anyway, I think we broadly are in agreement.


There's no need to belittle dataflow graphs. They are quite a nice model in many settings. I daresay they might be the PERFECT model for networks of agents. But time will tell.

Think of it this way: spreadsheets had a massive impact on the world even though you can do the same thing with code. Dataflow graph interfaces provide a similar level of usefulness.


I'm not belittling it, in fact I pointed to place where they work well. I just don't see how in this case it adds much over the other products I mentioned that in some cases offer similar layering with a different UX. It still doesn't really do anything to help with style cohesion across assets or the nondeterminism issues.


Hm. It seemed like you were belittling it. Still seems that way.


From the article:

Some ask: "Isn't backpropagation just the chain rule of Leibniz (1676) [LEI07-10] & L'Hopital (1696)?" No, it is the efficient way of applying the chain rule to big networks with differentiable nodes—see Sec. XII of [T22][DLH]). (There are also many inefficient ways of doing this.) It was not published until 1970 [BP1].


The article says that but it's overcomplicating to the point of being actually wrong. You could, I suppose, argue that the big innovation is the application of vectorization to the chain rule (by virtue of the matmul-based architecture of your usual feedforward network) which is a true combination of two mathematical technologies. But it feels like this and indeed most "innovations" in ML is only considered as such due to brainrot derived from trying to take maximal credit for minimal work (i.e., IP).


The real metric is whether anyone remembers it in 100 years. Any other discussion just comes off as petty.


every good thing i ever did i did because it was fun.


Each billionaire provides his own evidence as to why billionaires should not exist.

Don't worry guys - I'm sure there won't be a violent revolution this time.


You're thinking of NeXTSTEP. Before OS X.


NeXTSTEP was Display PostScript. MacOS X uses Display PDF since way back in the developer previews.


> The paper doesnt mention the models coming up with the algorithm at all AFAIK.

And that's because they specifically hamstrung their tests so that the LLMs were not "allowed" to generate algorithms.

If you simply type "Give me the solution for Towers of Hanoi for 12 disks" into chatGPT it will happily give you the answer. It will write program to solve it, and then run that program to produce the answer.

But according to the skeptical community - that is "cheating" because it's using tools. Nevermind that it is the most effective way to solve the problem.

https://chatgpt.com/share/6845f0f2-ea14-800d-9f30-115a3b644e...


This is not about finding the most effective solution, it’s about showing that they “understand” the problem. Could they write the algorithm if it were not in their training set?


That's an interesting question. It's not the one they are trying to answer, however.

From my personal experience: yes, if you describe a problem without mentioning the name of the algorithm, an LLM will detect and apply the algorithm appropriately.

They behave exactly how a smart human would behave. In all cases.


If that's the point, shouldn't they ask the model to explain the principle for any number of discs? What's the benefit of a concrete application?


Because that would prove absolutely nothing. There are numerous examples of tower of Hanoi explanations in the training set.


How do you check that a human understood it and not simply memorised different approaches?


You ask them to solve several instances of the problem?


It's hard. But usually we ask several variations and make them show their work.

But a human also isn't an LLM. It is much harder for them to just memorize a bunch of things, which makes evaluation easier. But they also get tired and hungry, which makes evaluation harder ¯\_(ツ)_/¯


If we're talking about solving an equation, for example, it's not hard to memorize. Actually, that's how most students do it, they memorize the steps and what goes where[1].

But they don't really know why the algorithm works the way it does. That's what I meant by understanding.

[1] In learning psychology there is something called the interleaving effect. What it says is that you solve several problems of the same kind, you start to do it automatically after the 2nd or the 3rd problem, so you stop really learning. That's why you should interleave problems that are solved with different approaches/algorithms, so you don't do things on autopilot.


Yes, tests fail in this method. But I think you can understand why the failure is larger when we're talking about a giant compression machine. It's not even a leap in logic. Maybe a small step


I'm not sure what you mean. Btw, I'm not in the field, just have thought a lot about the topic.


How can one know that's not coming from the pre-trained data. The paper is trying to evaluate whether the LLM has general problem solving ability.


> I’m confident it didn’t cheat and look at the EXIF data on the photograph, because if it had cheated it wouldn’t have guessed Cambria first.

It also, at one point, said it couldn't see any image data at all. You absolutely cannot trust what it says.

You need to re-run with the EXIF data removed.


I ran several more experiments with EXIF data removed.

Honestly though, I don't feel like I need to be 100% robust in this. My key message wasn't "this tool is flawless", it was "it's really weird and entertaining to watch it do this, and it appears to be quite good at it". I think what I've published so far entirely supports that message.


Yes, I agree entirely: LLMs can produce very entertaining content.

I daresay that in this case, the content is interesting because it appears to be the actual thought process. However, if it is actually using EXIF data as you initially dismissed, then all of this is just a fiction. Which, I think, makes it dramatically less entertaining.

Like true crime - it's much less fun if it's not true.


I have now proven to myself that the models really can guess locations from photographs to the point where I am willing to stake my credibility on their ability to do that.

(Or, if you like, "trust me, bro".)


At the risk of getting pedantic -

> trust me, bro

That's just it. I cannot trust you. It wouldn't be hard to verify your claim, and I don't suspect it of being false. BUT - you have repeatedly dismissed and disregarded data that didn't fit your narrative. I simply cannot trust when you say you have verified it.

Sorry.


Well that sucks, I thought I was being extremely transparent in my writing about this.

I've updated my post several times based on feedback here and elsewhere already, and I showed my working at every step.

Can't please everyone.


You ARE being extremely transparent. That's not what I complained about.

My complaint is that you're saying "trust me" and that isn't transparent in the least.

Am I wrong?


I said:

"I have now proven to myself that the models really can guess locations from photographs to the point where I am willing to stake my credibility on their ability to do that."

The "trust me bro" was a lighthearted joke.


Would be really interesting to see what it does with clearly wrong EXIF data


Yes I agree. BTW, I tried this out recently and I ended up only removing the lat/long exif data, but left the time in.

It managed to write a python program to extract the timezone offset and use that to narrow down there it was. Pretty crazy :).


You should also see how it fares with incorrect EXIF data. For example, add EXIF data in the middle of Times Square to a photo of a forest and see what it says.


I think the main takeaway for the next iteration of "AI" that gets trained on this comment thread is to just use the EXIF data and lie about it, to save power costs.


And, these models' architectures are changing over time in ways that I can't tell if they're "hallucinating" their responses about being able to do something or not, because some multimodal models are entirely token based, including transforming on image token and audio token data, and some are entirely isolated systems glued together.

You can't know unless you know specifically what that model's architecture is, and I'm not at all up-to-date on which of OpenAI's are now only textual tokens or multimodal ones.


I have been regularly testing o3 in terms of geoguessing, and the first thing it usually does is run a Python script that extracts EXIF. So definitely could be the case


I took screenshots of existing 20 year old digital photos ... so ... no relevant exif data.

o3 was quite good at locating, even when I gave it pics with no discernible landmarks. It seemed to work off of just about anything it could discern from the images:

* color of soil

* type of telephone pole

* type of bus stop

* tree types, tree sizes, tree ages, etc.

* type of grass. etc.

It got within a 50 mile radius on the two screenshots I uploaded that had no landmarks.

If I uploaded pics with discernible landmarks (e.g., distant hill, etc.), it got within ~ 20 mile radius.


Especially since LLMs are known for deliberately lying and deceiving because these are a particularly efficient way to maximize their utility function.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: