It's always sampling from an existing distribution of relevant data points -- that's necessarily how its working.
If you want to claim the sample set is only mildy similar to exam questions -- so be it, that may be true.
Or if you want to claim that its sampling method is attentive to structural associations in its sample set, so that its not lifting from "identical distributions" -- so be it.
So long as those "structural associations" are givens, and the data "givens", the process is just sampling from a domain of human effort without expending any of a similar kind.
If there had been no internet, ChatGPT would be a dumb mute -- because it has no capacity to generate data; it does not develop actual conceptualisations of the world -- it samples from the data shadows created by people.
To produce useful data requires expounding a tremendous effort -- growing an animal to cope with the world. It is this which is being laundered, unpaid and unacknowledged, through LLMs.
Whilst star-trek-huffing loons claim this stuff is doing the opposite -- a ideological delusion which benefits all those whose bank accounts are increased by the lie that "ChatGPT wrote this".
If we were prepared to price the data commons which has been created over the last 20-30 years of the internet, by everyone, its not hard to think training ChatGPT would cost a trillion.
How much labour went into creating that digital resource, and by how many, etc.?
> It's always sampling from an existing distribution of relevant data points -- that's necessarily how its working.
I work in this field and 'sampling from an existing distribution of relevant data points' is just wrong and you have no way to say that is 'necessarily how its working' apriori in a world where implicit regularization exists.
Not going to engage with the labor-theory of value bit because I think it is not particularly relevant to the disagreement I raised and not one with a 'right' answer.
lol, i am not making an argument premised on the labour theory of value -- chatgpt is a proof against this theory : $20/mo for the labour of two generations
it is the perogative of states to make redress when labour is severely underpriced due to the falsity of
this theory of value -- and had they , chatgpt would be exposed for what it is
regularisation is such a horrifyingly revealing term
reality isn't a regularisation of its measures -- the meaning of words is not a regularisation of their structure
such obscene statistical terminology should be obviously disqualifying here
our knowledge of the world isn't a statical regularisation of associations
that very framing exposes how deficient this line is
animals grow representations --- they do not regularise text token patterns
But honestly, if you don't get it now, I can't hope to convince.