Writing a streaming S3 object archiving tool which collects all old objects (in my case they are near-zero in size, but occupy 4KB blocks). In total data is 10GB daily. So I have to stream all this process to not consume such amount of RAM.
These are audit data like external system request/responses for possible investigations. This saves a lot of space. Initially written in Python, now practicing with Rust. Container images is 2.2MB small :)
- Single digit TPS on rare chance it responds, and frequent complete hangs (1 out of maybe 20 requests even complete)
- 4k input token cap (vs native 128k context window)
- No pricing
- Unstated rate limits
It genuinely seems like they spun up a single H100 cluster to enable the headline of this post and help form a narrative then left it at that. Definitely not meant to genuinely provide access to R1 in any serious way.