> As a single human, you don't notice, as the training material is greater than everything we could ever learn.
This bias is real. Current gen ai works proportionally well the more known it is. The more training data, the better the performance. When we ask something very specific, we have the impression that it’s niche. But there is tons of training data also on many niche topics, which essentially enhances the magic trick – it looks like sophisticated reasoning. Whenever you truly go “off the beaten path”, you get responses that are (a) nonsensical (illogical) and (b) “pulls” you back towards a “mainstream center point” so to say. Anecdotally of course..
I’ve noticed this with software architecture discussions. I would have some pretty standard thing (like session-based auth) but I have some specific and unusual requirement (like hybrid device- and user identity) and it happily spits out good sounding but nonsensical ideas. Combining and interpolating entirely in the the linguistic domain is clearly powerful, but ultimately not enough.
This bias is real. Current gen ai works proportionally well the more known it is. The more training data, the better the performance. When we ask something very specific, we have the impression that it’s niche. But there is tons of training data also on many niche topics, which essentially enhances the magic trick – it looks like sophisticated reasoning. Whenever you truly go “off the beaten path”, you get responses that are (a) nonsensical (illogical) and (b) “pulls” you back towards a “mainstream center point” so to say. Anecdotally of course..
I’ve noticed this with software architecture discussions. I would have some pretty standard thing (like session-based auth) but I have some specific and unusual requirement (like hybrid device- and user identity) and it happily spits out good sounding but nonsensical ideas. Combining and interpolating entirely in the the linguistic domain is clearly powerful, but ultimately not enough.