Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even regular Cloud Run can take a lot of time to boot (~3 to 30 seconds), so this can be a problem when scaling to 0


That's not my experience, using Go. Never measured, but it goes to 0 all the time, so I would definitely noticed more than a couple of seconds.


It depends on whether you're on gen1 or gen2 Cloud Run; the default execution environment is `default` which means "you have no idea because GCP selects for you" (not joking).

Counterintuitively (again, not joking): gen2 suffers from really bad startup speeds, because its more like a full-on linux VM/container than whatever weird shim environment gen1 runs. My Gen2 containers basically never start up faster than 3 seconds. Gen1 is much faster.

Note that gen1 and gen2 Cloud Run execution environments are an entirely different concept than first generation and second generation Cloud Functions. First gen Cloud Functions are their own thing. Second generation Cloud Functions can be either first generation or second generation Cloud Run workloads, because they default to the default execution environment. Believe it or not, humans made this.


I’m looking at logs for a service I run on cloud run right now which scales to zero. Boot times are approximate 200ms for a Dart backend.


Not to mention, if it's an ML workload, you'll also have to factor in downloading the weights and loading them into memory, which can double that time or more.


According to the press release, "we achieved an impressive Time-to-First-Token of approximately 19 seconds for a gemma3:4b model"

Imagine, you have a very small weak model, and you have to wait 20 seconds for your request.


> Imagine, you have a very small weak model, and you have to wait 20 seconds for your request.

For your first request, after having scaled to 0 while it wasn’t in use. For a lot of use cases, that sounds great.


Also, a GPU instance needs 5s to start. The test depends on how large the model is. So a "very small weak model" can lead much faster than 20s


Imagine running a production client facing api and not overprovisioning it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: