The front end was entirely in python AWS lambdas, with a message queue that talked to the (slow) GPU backend.
The "lambda tax" was about 100ms on average. given that an average request was around 4 seconds it seemed ok.
The front end was entirely in python AWS lambdas, with a message queue that talked to the (slow) GPU backend.
The "lambda tax" was about 100ms on average. given that an average request was around 4 seconds it seemed ok.