Imagine, you have a very small weak model, and you have to wait 20 seconds for your request.
For your first request, after having scaled to 0 while it wasn’t in use. For a lot of use cases, that sounds great.
Imagine, you have a very small weak model, and you have to wait 20 seconds for your request.