To a small degree, yes. GZIP knows that some patterns are more common in text than others - that understanding allows it to compress the data.
But that's a poor example of what I'm trying to convey. Instead consider plotting the course of celestial bodies. If you don't understand, you must record all the individual positions. But if you do, say, understand gravity, a whole new level of compression is possible.
I would replace "it doesn't help a bit" with "it doesn't solve the problem". My casual browsing experience is that X is much more intense / extreme than Facebook.
Of course, the bigger problem is the algorithm - if the extreme is always pushed to the top, then it doesn't matter if it's 1% or 0.001% - the a big enough pool, you only see extremes.
A lot of this is driven by the user's behavior, not just advertising, though.
"The algorithm" is going to give you more of what you engage with, and when it comes to sponsored content, it's going to give you the sponsored content you're most likely to engage with too.
I'd argue that, while advertising has probably increased the number of people posting stuff online explicitly designed to try and generate revenue for themselves, that type of content's been around since much earlier.
Heck, look at Reddit or 4chan: they're not sharing revenue with users and I'd say they're at least not without their own content problems.
I'm not sure there's a convincing gap between what users "want" and what they actually engage with organically.
Reddit and 4chan both get their money from advertisers though, so they have an incentive to try to boost engagement above whatever level might be natural for their userbase.
Social interaction is integrated with our brain chemistry at a very fundamental level. It's a situation we've been adapting to for a million years. We have evolved systems for telling us when its time to disengage, and anybody who gets their revenue from advertising has an incentive to interfere with those systems.
The downsides of social media: the radicalization, the disinformation, the echo chambers... These problems are ancient and humans are equipped to deal with them to a certain degree. What's insidious about ad-based social media is that the profit motive has driven the platforms to find ways to anesthetize the parts of us that would interfere with their business model, and it just so happens that those are the same parts that we've been relying on to address these evils back when "social media" was shouting into an intersection from a soap box.
But neither Reddit nor 4chan really have the feed optimization that you'd find on Meta properties, YouTube, or TikTok.
I'm certainly not going to disagree with the notion that ad-based revenue adds a negative tilt to all this, but I think any platforms that tries to give users what they want will end up in a similar place regardless of the revenue model.
The "best" compromise is to give people what they ask for (eg: you manually select interests and nothing suggests you other content), but to me, that's only the same system on a slower path: better but still broken.
There's no need to belittle dataflow graphs. They are quite a nice model in many settings. I daresay they might be the PERFECT model for networks of agents. But time will tell.
Think of it this way: spreadsheets had a massive impact on the world even though you can do the same thing with code. Dataflow graph interfaces provide a similar level of usefulness.
I'm not belittling it, in fact I pointed to place where they work well. I just don't see how in this case it adds much over the other products I mentioned that in some cases offer similar layering with a different UX. It still doesn't really do anything to help with style cohesion across assets or the nondeterminism issues.
Some ask: "Isn't backpropagation just the chain rule of Leibniz (1676) [LEI07-10] & L'Hopital (1696)?" No, it is the efficient way of applying the chain rule to big networks with differentiable nodes—see Sec. XII of [T22][DLH]). (There are also many inefficient ways of doing this.) It was not published until 1970 [BP1].
The article says that but it's overcomplicating to the point of being actually wrong. You could, I suppose, argue that the big innovation is the application of vectorization to the chain rule (by virtue of the matmul-based architecture of your usual feedforward network) which is a true combination of two mathematical technologies. But it feels like this and indeed most "innovations" in ML is only considered as such due to brainrot derived from trying to take maximal credit for minimal work (i.e., IP).
> The paper doesnt mention the models coming up with the algorithm at all AFAIK.
And that's because they specifically hamstrung their tests so that the LLMs were not "allowed" to generate algorithms.
If you simply type "Give me the solution for Towers of Hanoi for 12 disks" into chatGPT it will happily give you the answer. It will write program to solve it, and then run that program to produce the answer.
But according to the skeptical community - that is "cheating" because it's using tools. Nevermind that it is the most effective way to solve the problem.
This is not about finding the most effective solution, it’s about showing that they “understand” the problem. Could they write the algorithm if it were not in their training set?
That's an interesting question. It's not the one they are trying to answer, however.
From my personal experience: yes, if you describe a problem without mentioning the name of the algorithm, an LLM will detect and apply the algorithm appropriately.
They behave exactly how a smart human would behave. In all cases.
It's hard. But usually we ask several variations and make them show their work.
But a human also isn't an LLM. It is much harder for them to just memorize a bunch of things, which makes evaluation easier. But they also get tired and hungry, which makes evaluation harder ¯\_(ツ)_/¯
If we're talking about solving an equation, for example, it's not hard to memorize. Actually, that's how most students do it, they memorize the steps and what goes where[1].
But they don't really know why the algorithm works the way it does. That's what I meant by understanding.
[1] In learning psychology there is something called the interleaving effect. What it says is that you solve several problems of the same kind, you start to do it automatically after the 2nd or the 3rd problem, so you stop really learning. That's why you should interleave problems that are solved with different approaches/algorithms, so you don't do things on autopilot.
Yes, tests fail in this method. But I think you can understand why the failure is larger when we're talking about a giant compression machine. It's not even a leap in logic. Maybe a small step
I ran several more experiments with EXIF data removed.
Honestly though, I don't feel like I need to be 100% robust in this. My key message wasn't "this tool is flawless", it was "it's really weird and entertaining to watch it do this, and it appears to be quite good at it". I think what I've published so far entirely supports that message.
Yes, I agree entirely: LLMs can produce very entertaining content.
I daresay that in this case, the content is interesting because it appears to be the actual thought process. However, if it is actually using EXIF data as you initially dismissed, then all of this is just a fiction. Which, I think, makes it dramatically less entertaining.
Like true crime - it's much less fun if it's not true.
I have now proven to myself that the models really can guess locations from photographs to the point where I am willing to stake my credibility on their ability to do that.
That's just it. I cannot trust you. It wouldn't be hard to verify your claim, and I don't suspect it of being false. BUT - you have repeatedly dismissed and disregarded data that didn't fit your narrative. I simply cannot trust when you say you have verified it.
"I have now proven to myself that the models really can guess locations from photographs to the point where I am willing to stake my credibility on their ability to do that."
You should also see how it fares with incorrect EXIF data. For example, add EXIF data in the middle of Times Square to a photo of a forest and see what it says.
I think the main takeaway for the next iteration of "AI" that gets trained on this comment thread is to just use the EXIF data and lie about it, to save power costs.
And, these models' architectures are changing over time in ways that I can't tell if they're "hallucinating" their responses about being able to do something or not, because some multimodal models are entirely token based, including transforming on image token and audio token data, and some are entirely isolated systems glued together.
You can't know unless you know specifically what that model's architecture is, and I'm not at all up-to-date on which of OpenAI's are now only textual tokens or multimodal ones.
I have been regularly testing o3 in terms of geoguessing, and the first thing it usually does is run a Python script that extracts EXIF. So definitely could be the case
I took screenshots of existing 20 year old digital photos ... so ... no relevant exif data.
o3 was quite good at locating, even when I gave it pics with no discernible landmarks. It seemed to work off of just about anything it could discern from the images:
* color of soil
* type of telephone pole
* type of bus stop
* tree types, tree sizes, tree ages, etc.
* type of grass. etc.
It got within a 50 mile radius on the two screenshots I uploaded that had no landmarks.
If I uploaded pics with discernible landmarks (e.g., distant hill, etc.), it got within ~ 20 mile radius.
Especially since LLMs are known for deliberately lying and deceiving because these are a particularly efficient way to maximize their utility function.
I would argue that this is two ways of saying the same thing.
Compression is literally equivalent to understanding.