Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not pixels, but percels. Pixels are points in the image, while a "percel" is unit of perceptual information. It might be a pixel with an associated sound, in a given moment of time. In case of humans, percels include other senses as well, and they can also be annotated with your own thoughts (i.e. percels can also include tokens or embeddings).

Of course, NNs like LLM never process a percel in isolation, but always as a group of neighboring percels (aka context), with an initial focus on one of the percels.



I’ve had written up a proposal for a research grant to basically work exactly on this idea.

It got reviewed by 2 ML scientists and one neuroscientist.

Got totally slammed (and thus rejected) by the ML scientists due to „lack of practical application“ and highly endorsed by the neuroscientist.

There’s so much unused potential in interdisciplinary research but nobody wants to fund it because it doesn’t „fit“ into one of the boxes.


Make sure the ML scientists don't take credit for your work. Sometimes they reject a paper so they can work on it on their own.


Grant reviews are blind reviews - so you don’t know. Also - and even worse - there is no rebuttal process. It gets rejected without you having a chance to clarify / convince reviewers.

Instead you’d need to resubmit and start the entire process from scratch. What a waste of resources …

It’s the final nail what made me quit pursuing a scientific career path despite having good pubs & PhD /w honours.

Unfortunately it’s what I enjoy the most.


That's unfortunate. My personal sense is that while agentic LLM's are not going to get us close to AGI, a few relatively modest architectural changes to the underlying models might actually do that, and I do think mimicry of our own self-referential attention is a very important component of that.

While the current AI boom is a bubble, I actually think that AGI nut could get cracked quietly by a company with even modest resources if they get lucky on the right fundamental architectural changes.


I agree - and I think having interdisciplinary approach here is going to increase the odds here. There is a ton of useful knowledge in related disciplines - often just named differently - but turns out investigating the same problem from a different angle.


Sounds like those ML "scientists" were actually just engineers.


A lot of progress is made through engineering challenges

This is also "science"


I love this idea, but can't find anything about it. Is this a neologism you just coined? If so, is there any particular paper or work that led you to think about in those terms?


Yes, I just coined the neologism. It was supposed to be partly sarcastic (why stay at pixels, why not just go fully multimodal and treat the missing channels as missing information?), I am kind of surprised why it got so upvoted.

(IME, often my comments which I think are deep get ignored but silly things, where I was thinking "this is too much trolling or obvious", get upvoted; but don't take it the wrong way, I am flattered you like it.)


Pretending channels can be effectively merged into a single percel vector, that would open up interesting channels beyond human perception even, e.g. lidar. Or it would be interesting to train a model that feels at home in 4D space.


I think there's a decent chance you may have just created the ideal name for what will become one of the most important concepts ever. Bravo!


Deep things often, not always, take more attention to appreciate than the superficial. It's a precious resource people are seldom disposed to allocate a lot of when headline-surfing HN.


Should future attributions in white papers go to js8 from HN?


Isn't this effectively what the latent space is? A bunch of related vectors that all bundle together?


No, latent space doesn't have to be made of percels, just like not every 2D array of 3-element vectors is an image made of pixels. Percels are tied to your sensors, components of what you perceive, in totality.

Of course there is an interesting paradox - each layer of the NN doesn't know whether it's connected to the sensors directly, or what kind of abstractions it works with in the latent space. So the boundary between the mind and the sensor is blurred and to some extent a subjective choice.


“Percel” is still a way cooler and arguably more descriptive term than “token” though.


This is an interesting thought. Trying to imagine how you represent that as a vector.

You still need to map percels to a latent space. But perhaps with some number of dimensions devoted to modes of perception? E.g. audio, visual, etc


I'm not an ML expert or practitioner, so someone might need to correct me.

However, I believe the parcel's components together as a whole would capture the state of the audio+visual+time. However, I don't think the state of one particular mode (e.g. audio or visual or time) is encoded with a specific subset of the percel's components. Rather, each component of the percel itself would represent a mixture (or a portion of a mixture) of the audio+video+time. So, you couldn't isolate out just the audio or visual or time state specifically by looking at some specific subset of the percel's components, because each component is itself a mixture of the audio+visual+time state.

I think the classic analogy is that if river 1 and river 2 combine to form river 3, you cannot take a cup of water from river 3 and separate out the portions from river 1 and river 2; they're irreversibly mixed.


I was going to say toxel


Like a tokenized 3D voxel?


Tokenized pixel. I understand now that's not what js8 was talking about, so my original comment doesn't really make sense




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: