My wish is they would move on to the next phase. The whole deal with SSMs look r...

curious_cat_163 · on May 20, 2024

IMO, SSMs are an optimization. They don't represent enough of a fundamental departure from the kinds of things Transformers can _do_. So, while I like the idea of saving on the energy costs, I speculate that such saving can be obtained with other optimizations while staying with transformer blocks. Hence, the motivation to change is a bit of an uphill here. I would love to hear counter-arguments to this view. :)

Furthermore, I think a replacement will require that we _understand_ what the current crop of models are doing mechanically. Some of it was motivated in [1].

[1] https://openaipublic.blob.core.windows.net/neuron-explainer/...

inciampati · on May 20, 2024

Quadratic vs linear is not an optimization. It's a completely new game. With selective SSMs (mamba) the win is that associative training can be run in sublinear time via a log-cost associative scan. So you go from something quadratic wrt input sequence length to something logarithmic. If that's just an optimization it's a huge one.

curious_cat_163 · on May 20, 2024

Okay. Respect your point of view. I am curious, what applications do you think SSMs enable that a Transformer cannot? I have always seen it as a drop-in replacement (like for like) but maybe there is more to it.

Personally, I think going linear instead of quadratic for a core operation that a system needs to do is by definition an optimization.

throwawaymaths · on May 19, 2024

There's something about a transformer being at its core based on a differentiable hash table data structure that makes them special.

I think it's dominance is not going to substantially change any time soon. Dont you know, the solution to all leetcode interviews is a hash table?

tysam_and · on May 19, 2024

Heyo! Have been doing this for a while. SSMs certainly are flashy (most popular topics-of-the-year are), and it would be nice to see if they hit a point of competitive performance with transformers (and if they stand the test of time!)

There are certainly tradeoffs to both, the general transformer motif scales very well on a number of axis, so that may be the dominant algorithm for a while to come, though almost certainly it will change and evolve as time goes along (and who knows? something else may come along as well <3 :')))) ).

smel · on May 20, 2024

The solution to agi is not deep learning maybe with more compute and shit load of engineering it can work kind of baby agi.

My bet will be on something else than gradient descent and backprop but really I don't wish any company or country to reach agi or any sophisticated ai ...

inciampati · on May 20, 2024

Magical thinking. Nature uses gradient descent to evolve all of us and our companions on this planet. If something better were out there, we would see it at work in the natural world.

mopierotti · on May 20, 2024

Are you also saying that thoughts are formed using gradient descent? I don't think gradient descent is an accurate way to describe either process in nature. Also, we don't know that we "see" everything that is happening, we don't even understand the brain yet.

psychoslave · on May 20, 2024

Maybe it's there but in a ethereal form that is ungrabbable to mere conscious forms as ourself? :P