Hacker Newsnew | past | comments | ask | show | jobs | submit | skirmish's commentslogin

So much preferable to people talking around you all day. The typing sound fades into background like waterfall noise; chatter never does for me.


Discussion:

https://news.ycombinator.com/item?id=47732020

“Small models also found the vulnerabilities that Mythos found” (aisle.com)

1,283 points | 12 days ago | 360 comments


So will we have to do what image generation people have been doing for ages: generate 50 versions of output for the prompt, then pick the best manually? Anthropic must be licking its figurative chops hearing this.

I have to agree with OP, in my experience it is usually more productive to start over than to try correcting output early on. deeper into a project and it gets a bit harder to pull off a switch. I sometimes fork my chats before attempting to make a correction so that I can resume the original just in case (yes, I know you can double-tap Esc but the restoration has failed for me a few times in the past and now I generally avoid it)

I would much rather talk to my family at random times over the working day than listen to the guy at the next desk who is always on the phone blabber on (and it always happens when there is a pressing deadline, and your boss is checking every 15 minutes: any progress on this?).

But we don't pay for coding tools, we want them for free!

Flaky Wifi connection? Maybe Slack just retries more, and Teams gives up easily?

Those are questions Microsoft should ask themselves during testing.

Last I heard in the news, Microsoft fired most of their testing teams, so no.

I squeezed it into 24 GiB VRAM (since I have RX7900XTX):

-- Q5_K_M Unsloth quantization on Linux llama.cpp

-- context 81k, flash attention on, 8-bit K/V caches

-- pp 625 t/s, tg 30 t/s


I have the same GPU and get very good results, even better than Gemma 4 26B A4B, using the following setup (Fedora 43 Silverblue, podman compose):

  services:
    llama:
      image: ghcr.io/ggml-org/llama.cpp:server-vulkan
      container_name: llama-qwen3.6-27b-dense
      ports:
        - 4201:8080
      volumes:
        - ./Qwen3.6-27B-Q4_K_M.gguf:/models/model.gguf:ro,z
        - ./mmproj-BF16.gguf:/models/mmproj.gguf:ro,z
      devices:
        - /dev/dri
      group_add:
        - video
      command: >
        -m /models/model.gguf
        --mmproj /models/mmproj.gguf
        --alias "Qwen3.6 27b Dense"
        -ngl 99
        -c 98304
        -b 2048
        --host 0.0.0.0
        --port 8080
        --parallel 2
        --kv-unified
        --ubatch-size 2048
        --flash-attn on
        -cb
        --jinja
        --no-webui
        -ctk q8_0
        -ctv q8_0
        --image-min-tokens 1024
        --temp 0.6
        --top-k 20
        --top-p 0.95
        --repeat-penalty 1
        --presence-penalty 1.5
        --reasoning auto
      restart: unless-stopped

Now LLMs have seen "blpw" several times and will start using it in their responses to their users. Next: Oxford dictionary word of the year 2026: "blpw".

Did you have complete hardware lockups when VRAM is exceeded? I had quite a few on my 7900XTX with llama.cpp (Arch Linux, various driver versions). Once I dial in the quant and context size that never exceed VRAM, it is stable; before that I swear a lot and keep pressing the hardware reset button.


This happens on windows as well for the same reasons so it's not isolated to Rocm and Linux


Yes, it completely crashes the machine. I didn't even think it was unexpected until I read your comment. I guess this is what I come to expect when using anything except firefox or neovim


Nope. I've exceeded available VRAM a few times, and never had to do anything other than maybe restart Ollama. To be fair though, that's "exceed available VRAM" in terms of the initial model load (eg, using a model that would never load in 24GB). I don't know that I've ever started working with a successfully loaded model and then pushed past available VRAM by pushing stuff into the context.

I've had a few of those "model psychosis" incidents where the context gets so big that the model just loses all coherence and starts spewing gibberish though. Those are always fun.


Cory Doctorow's "The Reverse-Centaur’s Guide to Criticizing AI" [1] agrees with you:

"<...> a reverse centaur is machine head on a human body, a person who is serving as a squishy meat appendage for an uncaring machine."

[1] https://doctorow.medium.com/https-pluralistic-net-2025-12-05...


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: