According to the press release, "we achieved an impressive Time-to-First-Token o...

happyopossum · 2025-06-04T19:32:31 1749065551

> Imagine, you have a very small weak model, and you have to wait 20 seconds for your request.

For your first request, after having scaled to 0 while it wasn’t in use. For a lot of use cases, that sounds great.

steren · 2025-06-04T20:21:34 1749068494

Also, a GPU instance needs 5s to start. The test depends on how large the model is. So a "very small weak model" can lead much faster than 20s

infecto · 2025-06-04T12:18:59 1749039539

Imagine running a production client facing api and not overprovisioning it.