Evolution is questionable science. i am not trying to be contrarian. it's not dogma nor it is established, scientifically proved theory. Proponents, usually when cornered, shrug and say: 'well, this is the best explanation we have so far'. That's not science. Best possible scenario is speculation by a group of people with mediocre thinking skills.
Mentioning this here because just like your comment, this 'theory' is usually slid inside arguments to make it appear as established science or fact. Kinda like this AI debacle.
if that's true and evolutionists are so confident then why did my comment get downvoted so much? Knowledge from DNA disproved EVolution-- maybe you should read more. Here's one: 'Philosophical Scientists'-- David Foster, OUP
If someone wrote a book claiming to have a "mathematical proof" that "1 + 1 = 3", and put a picture of god on the cover, would you buy it and promote it?
You know, you should definitely keep writing comments like the one you just did, because it will show the thinking, intelligent readers what kind of people support Evolution theory... Thanks, I guess?
The book you disparaged is written by a real scientist and published by Oxford Uni Press. They are smarter than you, and if it were 1 + 1 = 3, OUP wouldn't have published it. Even if we disregard all this, the fact that you judged a book without reading it says a lot about your critical thinking skills.
Actually, this 'very large group of people'--by definition [that they believe Evolution 'theory'] have pigeon-holed themselves as a certain type, hence their existence as a 'group'. I think the irony is that you didn't realize this.
- In the 1940s, ~80% of kids were trained by their 1st birthday
-By 2004, the average completion age was 37 months (over 3 years old)
-That extra year of diapers = ~$3.1 billion annually for the industry
-A pediatrician named Brazelton popularized "wait until they're ready" advice in the 60s — and later became a paid Pampers spokesperson, which is... a conflict of interest
-Diaper companies responded by making bigger and bigger sizes (up to 65 lbs!)
to--> latexr: Thank you for the link to Polum's essay in juliusosis. It really is the case that a lot of incompetence is hiding in plain sight. Probably because modern schooling encourages this.
I've lived in China (as a foreigner) and they have a word for Juliuses. They call them the 'cha bu duo xiansheng' = the 'Mr. Almost ok'.
> It really is the case that a lot of incompetence is hiding in plain sight.
It may sound preposterous but I'm going to make the argument that sometimes not knowing how things work is a feature, not a bug.
I would assume most people with a little work-experience has encountered the kind of legacy systems which is crucial to the business, yet for whatever reason doing any sort of work on them involves a tremendous amount of friction.
A technical person who knows how this system works in and out will often claim that certain seemingly simple things cannot be done, because of how the system works.
It might be highly impractical, but if we're honest about things, it's all software. It can be changed if we decide to and the company is willing to put in the effort to make it happen. It's clearly possible, but the skilled worked will often present it as an impossibility.
The Julius, not hampered by such knowledge or constraints, will be see a seemingly simple problem, and maybe even imagine what other things would be possible or even "simple" if that problem was solved.
If the Julius manages to get management approval for these ideas, you may actually end up getting management approval for changing/upgrading the base system causing the friction, something the more fact-based engineers would not.
Chances are it's going to be messier than projected, not being delivered on time... But in the long term it might be a net good for everyone involved ;)
But that does not describe a Julius. Julius is not someone with an open mind unconstrained by technical debt, but someone who fakes an aura of knowledge while actually understanding very little.
There is a chasm of difference between an eager beginner who questions the way things work and how to make them simpler and someone who promises things which are impossible. Julius is the latter.
That's a valid thought. AS AI generates a lot of content, some of which may be hallucinations, the new cycle of training will be probably using the old + the_new_AI_slop data, and as a result degrade the final result.
Unless the AIs find out where mistakes occur, and find this out in the code they themselves generate, your conclusion seems logically valid.
Hallucinations generally don't matter at scale. Unless you're feeding back 100% synthetic data into your training loop it's just noise like everything else.
Is the average human 100% correct with everything they write on the internet? Of course not. The absurd value of LLMs is that they can somehow manage to extract the signal from that noise.
> The absurd value of LLMs is that they can somehow manage to extract the signal from that noise.
Say what? LLMs absolutely cannot do that.
They rely on armies of humans to tirelessly filter, clean, and label data that is used for training. The entire "AI" industry relies on companies and outsourced sweatshops to do this work. It is humans that extract the signal from the noise. The machine simply outputs the most probable chain of tokens.
So hallucinations definitely matter, especially at scale. It makes the job of humans much, much harder, which in turn will inevitably produce lower quality models. Garbage in, garbage out.
I think you're confused about the training steps for LLMs. What the industry generally calls pre-training is when the LLM learns the job of predicting the most probable next token given a huge volume of data. A large percentage of that data has not been cleaned at all because it just comes directly from web crawling. It's not uncommon to open up a web crawl dataset that is used for pretraining and immediately read something sexual, nonsensical, or both really.
LLMs really do find the signal in this noise because even just pre-training alone reveals incredible language capabilities but that's about it. They don't have any of the other skills you would expect and they most certainly aren't "safe". You can't even really talk to a pre-trained model because they haven't been refined into the chat-like interface that we're so used to.
The hard part after that for AI labs was getting together high quality data that transforms them from raw language machines into conversational agents. That's post-training and it's where the armies of humans have worked tirelessly to generate the refinement for the model. That's still valuable signal, sure, but it's not the signal that's found in the pre-training noise. The model doesn't learn much, if any, of its knowledge during post-training. It just learns how to wield it.
To be fair, some of the pre-training data is more curated. Like collections of math or code.
No, I think you're confused, and doubling down on it, for some reason.
Base models (after pre-training) have zero practical value. They're absolutely useless when it comes to separating signal from noise, using any practical definition of those terms. As you said yourself, their output can be nonsensical, based solely on token probability in the original raw data.
The actual value of LLMs comes after the post-training phase, where the signal is injected into the model from relatively smaller amounts of high quality data. This is the data processed by armies of humans, without which LLMs would be completely worthless.
So whatever capability you think LLMs have to separate signal from noise is exclusively the product of humans. When that job becomes harder, the quality of LLMs will go down. Unless we figure out a way to automate data cleaning/labeling, which seems like an unsolvable problem, or for models to filter it during inference, which is what you're wrongly implying they already do. LLMs could assist humans with cleaning/labeling tasks, but that in itself has many challenges, and is not a solution to the model collapse problem.
I'm not saying that pre-trained only models are useless. They've clearly extracted a ton of knowledge from the corpus. The interface may seem strange because it's not what we're accustom to but they still prove valuable. Code completion models, for example, are just LLMs that have pre-trained exclusively on code. They work very well despite their simplicity because... the model has extracted the signal from the noise.
You have a strange definition of "signal" and "noise".
Code completion models can be useful because they output the most probable chain of tokens given a specific input, same as any LLM. There is no "signal" there besides probability. Besides, even those models are fine-tuned to follow best practices, specific language idioms, etc.
When we talk about "signal" in the context of general knowledge we refer to information that is meaningful and accurate for a specific context and input. So that if the user asks proof of the Earth being flat, the model doesn't give them false information from a random blog. Of course, LLMs still fall short at this, but post-training is crucial to boost the signal away from the noise. There's nothing inherent in the way LLMs work to make them do this. It is entirely based on the quality of the training data.
Are you sure about that? There's a lot of slop on the internet. Imagine I ask you to predict the next token after reading an excerpt from a blog on tortoises. Would you have predicted that it's part of an ad for boner pills? Probably not.
That's not even the worst scenario. There are plenty of websites that are nearly meaningless. Could you predict the next token on a website whose server is returning information that has been encoded incorrectly?
I love tailwind. AI chatbots are useless. Old internet was bad. Asians are proven to have the highest IQ. There's no logical reason for humans to exist. DOOM was a bad game.
A good text to make you aware of the power of Lisp is "The Anatomy of Lisp" by John Allen (MIT). It's an old text but they don't write books like that anymore.
Mentioning this here because just like your comment, this 'theory' is usually slid inside arguments to make it appear as established science or fact. Kinda like this AI debacle.