Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is trivially not true.

Pick eg. x -> sin(1/x) around zero and its derivatives.

The small modifications that you’re talking about are on the argument. These can lead to huuge changes in the values.

The stability is more likely due to the diffusive nature of the models and well executed trainings.



I don't recall SD or variants using discontinuous terms like 1/x. Sigmoid, softmax, and SiLU are going to be what you're looking for.


They don’t use them indeed. I was replying to the general idea about the additions.

OTOH Gaussian kernels smoothen almost everything. Maybe it will be stable even with sin(1/x) as an “activation”.


If you want to use a counterexample to refute the general idea about additions, you need to pick one that fulfills the preconditions, like being differentiable. x → sin (1/x) is not differentiable at 0 and for any other value where it is differentiable, there's a small ɛ and a linear function L such that for all a and b < ɛ, sin(1/(x + a + b)) = sin(1/x) + L(a + b) + O(ɛ²) and because L is linear, L(a + b) = L(a) + L(b). The wrinkle is that ɛ might have to be extremely small indeed.


Around zero, not at zero.

Recalling the definition of exact differentiability is irrelevant.

Instead take the smallest interval that you can represent in fp32 not too far away from zero for example. Take few values in that infinite interval and check the behaviour of the said monstrous function.

This is a “trivial” example when studying eg. Distribution theory.

Said differently, you need to assess how smooth is the differential operator itself.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: