More

zaat · 2026-05-24T22:24:50 1779661490

How is that any different from the pre-llm days, when Jim was using stackoverflow to build the largest crypto exchange in the world? Where's stackoverflow accountability?

zaat · 2026-05-24T22:14:24 1779660864

At least for me, the answer is that despite the mistakes and the sheer annoyance the prose causes me, they are unbelievably useful. I accomplished multiple major achievements in the last two years that most probably wouldn't be possible at all, surely not within that timeframe.

zaat · 2026-05-24T22:05:52 1779660352

The idea is that by the time you will have time and remember the clothes might be smelly and wrinkled. The issue is with the genius product manager that decided the washing machine should have the most annoying beep possible, repeating every minute whether you like it or not, until turned off. Luckily, some manufacturers do employ better product manager.

zaat · 2026-04-02T17:41:12 1775151672

Thank you for your work.

You have an answer on your page regarding "Should I pick 26B-A4B or 31B?", but can you please clarify if, assuming 24GB vRAM, I should pick a full precision smaller model or 4 bit larger model?

petu · 2026-04-02T20:06:51 1775160411

Try 26B first. 31B seems to have very heavy KV cache (maybe bugged in llama.cpp at the moment; 16K takes up 4.9GB).

edit: 31B cache is not bugged, there's static SWA cost of 3.6GB.. so IQ4_XS at 15.2GB seems like reasonable pair, but even then barely enough for 64K for 24GB VRAM. Maybe 8 bit KV quantization is fine now after https://github.com/ggml-org/llama.cpp/pull/21038 got merged, so 100K+ is possible.

> I should pick a full precision smaller model or 4 bit larger model?

4 bit larger model. You have to use quant either way -- even if by full precision you mean 8 bit, it's gonna be 26GB + overhead + chat context.

Try UD-Q4_K_XL.

danielhanchen · 2026-04-02T20:12:31 1775160751

Yes UD-Q4_K_XL works well! :)

mixtureoftakes · 2026-04-02T20:25:01 1775161501

what is the main difference between "normal" quants and the UD ones?

car · 2026-04-02T20:58:11 1775163491

They explain it here:

https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs

For the best quality reply, I used the Gemma-4 31B UD-Q8_K_XL quant with Unsloth Studio to summarize the URL with web search. It produced 4.9 tok/s (including web search) on an MacBook Pro M1 Max with 64GB.

Here an excerpt of it's own words:

Unsloth Dynamic 2.0 Quantization

Dynamic 2.0 is not just a "bit-reduction" but an intelligent, per-layer optimization strategy.

- Selective Layer Quantization: Instead of making every layer 4-bit, Dynamic 2.0 analyzes every single layer and selectively adjusts the quantization type. Some critical layers may be kept at higher precision, while less critical layers are compressed more.

- Model-Specific Tailoring: The quantization scheme is custom-built for each model. For example, the layers selected for quantization in Gemma 3 are completely different from those in Llama 4.

- High-Quality Calibration: They use a hand-curated calibration dataset of >1.5M tokens specifically designed to enhance conversational chat performance, rather than just optimizing for Wikipedia-style text.

- Architecture Agnostic: While previous versions were mostly effective for MoE (Mixture of Experts) models, Dynamic 2.0 works for all architectures (both MoE and non-MoE).

danielhanchen · 2026-04-02T18:00:18 1775152818

Thank you!

I presume 24B is somewhat faster since it's only 4B activated - 31B is quite a large dense model so more accurate!

ryandrake · 2026-04-02T19:44:50 1775159090

This is one of the more confusing aspects of experimenting with local models as a noob. Given my GPU, which model should I use, which quantization of that model should I pick (unsloth tends to offer over a dozen!) and what context size should I use? Overestimate any of these, and the model just won't load and you have to trial-and-error your way to finding a good combination. The red/yellow/green indicators on huggingface.co are kind of nice, but you only know for sure when you try to load the model and allocate context.

danielhanchen · 2026-04-02T19:57:12 1775159832

Definitely Unsloth Studio can help - we recommend specific quants (like Gemma-4) and also auto calculate the context length etc!

ryandrake · 2026-04-02T20:05:37 1775160337

Will have to try it out. I always thought that was more for fine-tuning and less for inference.

danielhanchen · 2026-04-02T20:12:19 1775160739

Oh yes sadly we partially mis-communicated haha - there's both and synthetic data generation + exporting!

zaat · on March 25, 2025

Just to make sure, opkssh supports OpenID for sftp as well?

EthanHeilman · on March 25, 2025

It should, opkssh just creates ssh public keys. The integration tests don't current cover that case so I created an issue to add that to the integration tests:

https://github.com/openpubkey/opkssh/issues/40

EthanHeilman · on March 30, 2025

Tested sftp works and created an integration test for sftp

zaat · on March 7, 2025

One can still hold that it's a better link, it's a matter of preference. Anyways, if you scroll the original article to the very end it does contain an impressive set of photos. It is beautiful.

zaat · on Sept 17, 2024

I can't find any viable alternative. Keyboard is much faster than those click and release interfaces. Keyboards also have repeat keys, when you press a character for a long time you can actually press and depress the shift key and see the change in the line of characters input. This is extremely useful feature in games, graphic design software and other applications.

Generation of keystroke based on the up event, beside been incompatible with repeating keys for long strokes, will slow down typing significantly, as it requires tracking timing pressing keys for longer duration. I'm pretty sure that this isn't only effect of me being used to track keypress timing on the way down, but an unavoidable result of the duration of the action.

Waiting for up event on contemporary GUI, when the contempt UI is a sluggish fit-to-nothing dirty touchscreen in a public kiosk is sensible. When you know an interface will yield more errors than intended input it is only sensible to assume that any input is a mistake unless the user is making an effort to validate it.

kazinator · on Sept 17, 2024

Keyboard repeat is only useful in ANSI terminal games on Unix, and games on some old 8 bit home computers that didn't have up and down events (Apple II+, ...).

A game written for an IBM PC and everything after that can know exactly which keys are being held down and when they are released by intercepting the "scan codes" (or abstract keyboard events in a GUI event loop).

All that is missing is synthesizer-like velocity and pressure info. :)

> will slow down typing significantly

Only for people who have to look at the screen. :)

The hunt-and-peck beginners who look at their fingers are not effect, and neither are those who can look at something else or close their eyes.

A serious concern affecting even those people is that using the release event could reorder things, causing mistakes. Like say if someone types the sequece wh (involving two hands) such that they release the w later than h.

zaat · on Sept 15, 2024

Do you use the community edition or SaaS?

zaat · on July 9, 2024

While I didn't find the joke funny, it does thematically match the piece - the hacker who supposedly see the possibility to get free internet as a viable opportunity. Later in the piece the author does distance himself from that image, revealing the tone in the opening was merely a stylistic choice, a writer's device, as clearly he is not the kind of a person who will in practice exploit the airline systems.

zaat · on May 30, 2024

Assuming you are a man, I'm not sure you would say the same thing if the arbitrary lingual fact was opposite, and on your door it would be written in bold chairwoman, or if you were been introduced as a policewoman.

I'm not sure I'm for changing the language all over, but I don't think dismissing issues that disturb a group that you don't belong to is a manner that fits a gentleman.