More

linolevan · 2026-03-14T21:45:45 1773524745

Played around with the code to implement a little bit of SIMD. Was able to squeeze out a decent improvement, ~250 fps avg, ~140 low, ~333 high (on an m4). Looks pretty straightforward to do threading with as well. Cool stuff! Could work to bring more gpu stuff back down to the cpu.

linolevan · 2026-03-14T21:10:06 1773522606

Oops! Looks like we posted at the same time.

linolevan · 2026-03-14T21:09:23 1773522563

> Does bonus usage count against my weekly usage limit?

> No. The additional usage you get during off-peak hours doesn’t count toward any weekly usage limits on your plan.

lxgr · 2026-03-14T22:40:04 1773528004

So the first 100% of 5-hour usage are billed against weekly usage at normal rates, but the second additional 100% are not counted?

timmg · 2026-03-14T21:19:39 1773523179

I just watched my "weekly limit" get used while I ran a claude code command.

I'm not sure how to square that with the quote you gave.

jakubadamw · 2026-03-14T21:25:43 1773523543

Did you exhaust the five-hour usage limit already? As I understand it, the ”additional usage” refers to anything beyond the standard five-hour usage limit.

linolevan · 2026-03-14T04:41:29 1773463289

Did… you copy paste this from another discussion? I’ve read this comment before.

phanimahesh · 2026-03-14T16:40:58 1773506458

Me too. This is funny

linolevan · 2026-03-12T23:45:01 1773359101

According to the providers that I keep track of, Cumulus is typically pretty price competitive, except for MiniMax where DeepInfra and Together are much cheaper and GLM-5 where DeepInfra and z.AI's own hosting is much cheaper.

(Also technically qwen3 8b w/ novita being first place but barely)

linolevan · 2026-03-12T23:38:23 1773358703

Can we get context length / output length docs (looks like you mention "Max tokens (chat)" of 128k but it's unclear what that means)? Also it looks like your docs page is out of date from your playground page.

Also piece of feedback: it kind of sucks to have glm/minimax/kimi on separate api endpoints. I assume it's a game you play to get lower latency on routing for popular models but from a consumer perspective it's not great.

2uryaa · 2026-03-14T01:01:34 1773450094

Thank you for the feedback. Taking note of this!

linolevan · 2026-03-07T21:37:25 1772919445

Looks like at least a second release, they had one other LLM before this.

linolevan · 2026-03-06T16:20:57 1772814057

This is awesome. Almost all of these are believable even if you're looking at pretty carefully. I need this on a firestick or something.

linolevan · 2026-03-06T05:01:28 1772773288

This is awesome. Got completely lost reading this and was struggling to figure out where I got this link from. Amazing story.

linolevan · 2026-03-05T16:55:48 1772729748

Yep, I recall one of the big components being libc i18n