More

onlyrealcuzzo · 2026-05-30T21:13:35 1780175615

I've been building a memory safe language that transpiles to Zig with a Go-like runtime that can run interpreted (no GC) or compiled - high-level that feels like Ruby but with incremental typing like TypeScript.

The Zig team between 0.16 and this has really made me glad I chose Zig as the target instead of Rust - which probably would've been a lot easier to target (since it's already memory safe).

I believed it had the best build system design and was the best transpilation target, and I really believe that 6 months later.

The main reason I wanted no GC is because I think aliasing is the root of all evil, and I want a language with zero global complexity (but doesn't require a PhD to use).

keithasaurus · 2026-05-30T22:03:48 1780178628

Working on something kinda similar. No GC, Python feel, managed memory, performance approaching C. It's here: https://blorp-lang.org if you want to compare approaches.

onlyrealcuzzo · 2026-05-30T22:19:04 1780179544

It looks pretty cool!

It's not clear how much concurrency is part of what you're trying to solve.

All I could find is this: https://blorp-lang.org/docs/concurrency/ - which doesn't give me much as to how you handle shared memory, safety, deadlocks, etc.

Definitely down to chat more - looks like you've got some traction, which is impressive and awesome!

I'd love to pick your brain as it appears you're further along than I am.

keithasaurus · 2026-05-30T23:32:57 1780183977

Yeah, concurrency in blorp doesn't allow shared mutable references, so deadlocks aren't really a concern. Otherwise it's meant to be simple-ish -- virtual threads, channels, no async/await. Pure functions allow safe parallelism naturally, so that's fairly straightforward, though the API is still incomplete, for example the "Parallel" section here: https://blorp-lang.org/docs/lists/. It's still under heavy development (working on it right now).

What are the over-arching goals of your language?

onlyrealcuzzo · 2026-05-30T23:42:30 1780184550

Right on.

1) I want to minimize global complexity, which by definition maximizes local reasoning.

2) I want to make the vast majority of bugs simply unrepresentable - taking it past Rust, and even past Pony - WHILE allowing shared mutable memory, but without requiring a PhD to use.

The goal is in EASY mode, it's barely harder to use than Ruby or Python (just the occasional pedant compiler error that has automatic options to fix itself most of the time). You don't even have to supply types or compile. It has a REPL, etc.

When you bump to DEFAULT mode and then to STRICT mode, all the annotation is automatic - your code just might look "ugly" if you like having no types anywhere etc.

But DEFAULT & STRICT mode give people and LLMs everything they need to know to understand the effects of an individual function.

keithasaurus · 2026-05-30T23:54:07 1780185247

I have some similar goals. Have you considered leaning more into inference than gradual typing? One pattern I like is allowing the compiler to develop a more complex mental model, but keeping it straightforward for users -- you can do that with inference, ownership, purity, effect types, etc. What I actually think is really tantalizing is using tooling to fill in some of those gaps -- for instance, the editor could know types, required capabilities etc, without the user ever needing to type anything, but when the user needs it, they can find it, query it, test against it.

onlyrealcuzzo · 2026-05-31T01:15:28 1780190128

Cool - it sounds pretty similar. It's interesting that it looks so different. I'll have to investigate more.

WRT to inference, yes. I infer everything in EASY mode.

And the compiler give the user autofix via choice when a type is ambiguous (I don't default to huge union types - I assume no one wants to do that and make them choose a type - may potentially allow AutoUnion to allow that).

I couldn't tell if you're using affine ownership, but I assume so if you don't have a GC. If they try to create an alias - they get a use after move error, and the compiler tells them they need to either COPY (auto-fix) or create a RefCount (usually auto-fix) - they pick.

onlyrealcuzzo · 2026-05-30T20:31:10 1780173070

The whole point of destroying a country is to raid it of its wealth and to get out before it burns to the ground...

No one has the honor to ride a ship into the ground.

These days the captains would jump ship at the site of an iceberg and leave everyone to a certain death if there was a chance they'd lose 1 dollar.

culi · 2026-05-31T01:48:50 1780192130

The longer we go without a wealth tax the more we're allowing ourselves to be raided

onlyrealcuzzo · 2026-05-30T20:14:10 1780172050

The interesting thing is...

There may be a lot of demand for do-nothing services.

A lot of corporate work is just do-nothing box-ticking.

Boss: get me a report about X, so I can give that report to my boss who won't read it.

You: E&Y, please get me a report. Here's $200k.

bombcar · 2026-05-30T20:19:50 1780172390

This underlying much of the non-coding AI revolution (and some of the coding perhaps) - so much corporate activity is write-only and never read.

fragmede · 2026-05-30T20:56:24 1780174584

The trope about external consultants is that your VP brings them in to review the company, and they talk to everybody and write a report on how to improve the business, and the report says exactly what you've been telling your VP but they've been ignoring you.

2fff · 2026-05-30T21:26:02 1780176362

You are closer to the truth :)

they are not simply paid to do nothing. They are paid to do dirty work.

mapontosevenths · 2026-05-30T22:20:43 1780179643

They are paid to justify decisions executives have already made. It's often referred to as due diligence, but in practice these reports mostly just allow executives to tell the board it wasn't their fault if it goes wrong.

onlyrealcuzzo · 2026-05-30T17:17:52 1780161472

Ah, yes, grants should definitely be tied to how much you want to brownnose for the current political team.

What could go wrong?

Definitely not more corruption.

Definitely not more uncertainty that kills gross fixed capital formation.

onlyrealcuzzo · 2026-05-30T15:10:02 1780153802

> We’ll be releasing 0.17.0 within a couple weeks from now.

This is amazing. Didn't 0.16 take >1 year?

I was not expecting such a fast 0.17 release, but am very pleased to find this out today.

peesem · 2026-05-30T16:05:17 1780157117

mainly because this build system change, along with upgrading to LLVM 22, are the only major changes for 0.17.0: https://ziglang.org/download/0.16.0/release-notes.html#Roadm...

onlyrealcuzzo · 2026-05-30T15:07:18 1780153638

> RAII is great; I wish they'd use some light (optional) RAII for strings and containers etc.

Is it not possible to build a wrapper that does this? It seems like it should be.

Arch485 · 2026-05-30T17:36:59 1780162619

It is. I definitely agree that strings in Zig can be tedious, but the upside is that if you need it, you can build a string library that does everything you want it to do, in the way you want.

For comparison, while Rust offers a very rich string library, it's also very strict about what you can/cannot do with strings, so if your use case falls outside of that you're out of luck. With Zig, you can pretty easily roll your own and make it do what you want. (and when Zig is post 1.0, I imagine there will be some very nice pre-made string libraries by the community etc.)

steveklabnik · 2026-05-30T18:17:49 1780165069

You can also roll your own strings in Rust just fine. Take the bstr crate, for example.

ngrilly · 2026-05-30T18:53:39 1780167219

But I don't think you can implement RAII in Zig?

onlyrealcuzzo · 2026-05-30T13:52:14 1780149134

The actual cost is going to drop 99% in ~4 years.

How much that makes it into enterprise pricing is TBD, since none of the hyper scalers are making money yet of selling AI inference.

Almost all businesses are ahead of the gun. For most of their use cases, AI is either not yet good enough on its own, or good enough but too expensive.

No one wants to get left behind, so everyone's trying to get onto it now, even though it's not ready for what most enterprises want to do with it.

It's easy for them to look at a small startup without billions of lines of legacy business logic debt and see them having success and wonder why they can't have just as much - or more - why they're bigger so they should have better and more success, right???

Wrong...

But when it gets ~99% cheaper for local inference over the next 4 years, at the same time the price per watt improve 4x -> a lot of those cases will start to pencil out.

BearOso · 2026-05-30T14:27:46 1780151266

Going from Opus 4.5 to 4.7 secretly required 6x more compute to run. 4.8 is apparently 30% more on top. I haven't seen any optimizations lately aside from distillation. Nobody's optimizing, they're just scaling up.

rescbr · 2026-05-30T14:43:59 1780152239

> Nobody's optimizing

The Chinese, since they lack computing hardware due to US export controls, are.

trollbridge · 2026-05-30T14:51:46 1780152706

And our export controls are going to turn China into a winner in the AI arms race if we're not careful.

rented_mule · 2026-05-30T19:02:48 1780167768

I retired a few years ago, but I still write a fair bit of code. I was using Copilot's code completion before I retired, but coding agents hadn't come around yet. I've been wanting to try them, but I kept putting it off, and now the price increases make it hard to justify.

So I just started trying CodeWhale (https://github.com/Hmbown/CodeWhale) with DeepSeek V4. I expected to be impressed by the abilities (which still require plenty of oversight). I didn't expect to be completely shocked by how cheep it is. After most of a week of using it 4-8 hours a day, which would amount to a full week of coding in many jobs after you account for non-coding activities, I'm about to hit $3 in total usage. So we're talking $10-20 per month for single-agent use by a full time software developer? And I'm sure some of my usage is waste as I'm still getting my head around things like compaction. If I take a break for a few weeks, I pay nothing because there is no subscription.

If DeepSeek and Xiaomi MiMo stay within a few months of the US-based models in terms of capabilities and US companies don't figure out how to drastically cut prices, I can't see how China hasn't already won. Protectionism would be one reason, but that might be ceding 50-90% of the total addressable market, and bring us closer to moving knowledge work out of the US the same way we did with manufacturing because it's too expensive in the US.

sgc · 2026-05-31T00:21:59 1780186919

How are you using it? More to complete specific functions or scripts, or for larger architectural design and longer implementation runs?

zzleeper · 2026-05-30T21:51:05 1780177865

Holy F.. $3 .. once I'm done with my base cursor allocation, each nontrivial question costs $5 . And yes, I'm now switching to a mix of codex and ds4pro

trollbridge · 2026-05-30T14:51:31 1780152691

DeepSeek and Alibaba would like to have a word.

krona · 2026-05-30T13:56:33 1780149393

> The actual cost is going to drop 99%

Do you mean the marginal cost by the producer, or the cost on the consumer? I can't see the price of electricity falling much, and the demand curve is apparently exponential if the hype is to be believed.

trollbridge · 2026-05-30T14:51:21 1780152681

DeepSeep V4 Pro is 99% cheaper than similarly performing models were 2 years ago (if such a model even existed).

Computing has always been about how to wring out more efficiency. The ENIAC was 150,000 watts, with 3 phase 240 volt power, and cost about $500,000.

My day to day laptop (a year old) is 35 watts, with 1 phase 20 volt power, and cost $1,000, so that's 99.98% less power consumption, 99.8% cheaper, and it has about 10 orders of magnitude more computing power, all on a time span of 80 years.

cratermoon · 2026-05-30T15:46:00 1780155960

Moore’s law is dead.

HappMacDonald · 2026-05-30T23:29:00 1780183740

It died before AI came around and today's coding agents are somewhere upwards of twice as competent as whatever the state of the art of automatic coding was in 2020. 8I

mrandish · 2026-05-31T01:20:13 1780190413

A good chunk of that was one-time gains from shifting GPU and memory architectures to better match what LLMs need at scale as well as some algorithmic improvements. Most of the low-hanging architecture optimization has already been harvested. We'll certainly have more algorithmic gains but the consensus is they'll generally be smaller and less frequent.

There's always a chance we'll have some dramatic gains far larger than DeepSeek's optimizations a year ago, but it hasn't happened again yet at even that scale. It would be nice but I certainly wouldn't count on it.

packetlost · 2026-05-30T13:56:48 1780149408

I don't see how this is even remotely true. Unless there's some super breakthrough into a fundamentally different architecture, there's not really a path to a 50% reduction in price, much less a 99% reduction.

kilroy123 · 2026-05-30T15:52:13 1780156333

In fairness, I think _current_ capabilities will be cheaper. So the models of today will be run drastically cheaper in 4 years.

onlyrealcuzzo · 2026-05-30T14:34:58 1780151698

And yet 90% drops for the same level of quality every 18 months have happened like clockwork...

And the technology already exists on the algorithmic front TODAY to lock in another 10x gain -> when, typically, algorithmic gains only account for ~30% of that drop and the other ~70% comes from better data (often synthetic) and knowledge distilation from frontier models.

Just look at DeepSeek's pricing...

datakan · 2026-05-30T13:53:45 1780149225

What makes you think prices will drop? Everyone I’ve spoken to believes they will only skyrocket. Genuinely curious

onlyrealcuzzo · 2026-05-30T13:59:24 1780149564

The technology already exists now on the algorithmic front for the next 10x drop between everyone adopting DeepSeek's MLA, MoE (mostly already done), Medusa (a better version of Google's speculative decoding), Kimi's Attn Residuals, and Mimo's Sliding Window Attn, and (possibly) Microsoft's 1.58b (this may be a nothing burger).

Historic trends, every 18 months, performance for the same level of quality has gone down 90%.

See: https://www.reddit.com/r/LocalLLaMA/comments/1gpr2p4/llms_co...

And Chart 13 here: https://www.rdworldonline.com/ais-great-compression-20-chart...

And here: https://epoch.ai/data-insights/llm-inference-price-trends

Historically, algorithmic gains are only ~30% of the pie, but there's enough out there to get to 10x, with just what's available already. The other ~70% of the pie is better training data (often synthetic) and distilling frontier knowledge. There's no sign we are tapped out on that front.

Additionally, GRAM (from ~10 days ago) is likely to be a 5-10x on its own (if not substantially more for smaller models). It's unlikely within 4 years LeCun's JEPA ideas and similar ideas like GRAM applied to LLMs have ZERO impact. The preliminary results are absolutely astounding (5000x better reasoning - this is not peanuts).

Further, that's not even counting that cost per watt is still dropping ~2x every 2 years on its own on the hardware front.

If you look at the "cost" of inference. People think it's electricity - but it's currently almost ~80% hardware amortization. The memory shortage is not going to last, nor are Nvidia's ~80-90% margins.

The human brain is still 8-10 orders of magnitude more efficient than the best LLMs of today. With ~1/10th of global capex riding on AI, if you don't think they're going to knock of 2 orders of magnitude more, when it's this obvious and easy... I don't know what to tell you...

Sure, it might take 6 years instead of 4. My crystal ball isn't perfect.

HarHarVeryFunny · 2026-05-30T14:41:04 1780152064

Sure, the price will come down a lot, even if we can argue about the timeline.

I think what will also happen, once we get past this current CEO AI FOMO mania, is that companies will start to look at AI spending more rationally like any other company expense, and will revert to more rational decision making.

Even if the cost comes down considerably over the next few years, that's plenty of time for companies to look at their financial results and question why AI expenditure isn't resulting in increase in revenue and/or profitability.

datakan · 2026-05-30T14:25:48 1780151148

This is great food for thought, thank you

onlyrealcuzzo · 2026-05-30T14:40:13 1780152013

Additionally, on the context front -> all the labs are aware that for many tasks you can get 10x+ increases in output quality by feeding better context.

See https://arxiv.org/abs/2604.04364.

This won't really show up in benchmarks, but it will impact real world usage on the most common use cases.

I'm doing a study right now on the impacts of better context for small models to fix bugs.

A very dumb algorithm can make small models perform at 10x+ model sizes. I'll be surprised if it can't get to 20x+

rednb · 2026-05-30T14:39:04 1780151944

I didn't take you seriously initially but after reading this, i think you are the real deal.

Thank you for sharing this and for having the intellectual courage to hold to a sound reasoning that may be unpopular initially.

Nimitz14 · 2026-05-30T14:44:36 1780152276

This is mostly slop. But you may be directionally correct

mrandish · 2026-05-31T01:03:20 1780189400

> The actual cost is going to drop 99% in ~4 years.

We have little visibility into current frontier model costs at mass scale. As a broad historical trend, tech costs tend to fall over longer time periods but your claim far exceeds Moore's Law rates in its heyday - and that heyday is long gone.

In 2021 TSMC announced it was increasing it's price per gate for new nodes for the first time in its history. In the past five years cutting edge nodes have delivered ~8-15% real-world performance gains on average at costs at least 10-20% more than the last node. If you're positing a string of unprecedented efficiency breakthroughs in LLM algorithms - such extraordinary claims require extraordinary evidence.

AllegedAlec · 2026-05-30T21:48:21 1780177701

> The actual cost is going to drop 99% in ~4 years.

And fusion power is just 2 decades into the future!

jjav · 2026-05-30T22:06:50 1780178810

Full self driving guaranteed here before the end of the year (every year).

bakugo · 2026-05-30T13:56:47 1780149407

Prices have been very obviously trending up, not down. Even open weights models are becoming more expensive with every release. Computer hardware is ballooning in price.

onlyrealcuzzo · 2026-05-30T14:44:10 1780152250

Prices are going up for BETTER quality -> not for the SAME level of quality.

People are willing to pay more for BETTER quality.

You obviously haven't seen DeepSeek v4 Pro's pricing if you think pricing only goes up...

abalashov · 2026-05-30T14:24:40 1780151080

Just wait for the next model and the next model architecture. Just wait for it, bro.

onlyrealcuzzo · 2026-05-30T17:31:41 1780162301

Gemini 3.5 flash is 25% cheaper than 3.1 pro, and outperforms it on almost every benchmark, most by a pretty wide margin...

bigstrat2003 · 2026-05-31T02:23:56 1780194236

There has never yet been a new model which actually improved over the previous ones. They suck just as much, and in the same ways, as the models of 3 years ago.

abalashov · 2026-05-30T19:46:40 1780170400

Cool.

trollbridge · 2026-05-30T14:55:03 1780152903

Grab a 5090 and run Qwen 3.6 35b on it (6 parameter seems to work best for me).

Then buy $10 (or $2, if you're cheap, and they take PayPal) of DeepSeek credits.

Whilst you're at it spring for a Claude subscription too and GPT.

Switch models between Qwen, DeepSeek Flash, DeepSeek Pro, and you can meet 99% of your code generation needs.

Hop over to Opus 4.7 (or 4.8, but I haven't really used it yet) and GPT-5.5 when doing very complex architecture/design or troubleshooting something where DeepSeek Pro is getting stuck.

It is ridiculous how cheap this stuff is now. It's affordable at third world prices.

onlyrealcuzzo · 2026-05-29T22:48:43 1780094923

I just tested this on a bug fixing benchmark I'm working on.

It did not perform as well as I expected. Qwen2.5-Coder-3B (2 years old) outperformed it by a wide range -> fixing ~50% of bugs whereas this model only fixed ~12%.

Granted, it's not a coder specific model, but given its benchmark performance to Gemma models, and that it's two years newer, and that it's an MoE with 8B total params, I expected it to be more competitive.

walrus01 · 2026-05-30T01:40:03 1780105203

I personally find any model smaller than something like Qwen 3.6 35B-A3B (8-bit quantization, about 49GB memory usage when loaded into llama.cpp) to be too "stupid" for reliable use.

I would much rather not run the model on my local laptop hardware and offload that to some system sitting under my desk in my home office, accessible via VPN, than take the risk of using an unreliable and flaky tool for the convenience of having it on the same hardware on my lap.

I pay very little attention to 8 billion or whatever (or even much smaller) models these days and I don't feel like I'm missing much.

satvikpendem · 2026-05-30T02:31:05 1780108265

Qwen 3.6 27B dense is much better than the 35B MoE model for coding, not sure if you've tried that yet.

walrus01 · 2026-05-30T02:33:37 1780108417

yes, I have, I use both. 27B slower in tok/s due to density, obviously, 35B-A3B for speed on simpler tasks.

intothemild · 2026-05-30T09:59:49 1780135189

You should enable MTP now that its available.

LLamaCPP has had some massive updates in the last week or so.

npodbielski · 2026-05-30T14:05:01 1780149901

Yes, Qwen 3.6 MoE is hitting like 80-90tk/s on Strix halo. On R9700 I had like 170t/s. It was not possible to keep up. But MoE is circling very often. I switch then to dense model and have 20-30t/s but it is able to solve quite a lot of tasks.

alfiedotwtf · 2026-05-30T17:45:38 1780163138

For those speeds, I’m assuming Q4?

intothemild · 2026-05-30T14:32:00 1780151520

I get 50-60t/s tg on my r9700 with the dense, unsloth MTP quant UD-Q5_K_XL, K@8/V@4 256k context.

Using Vulkan backend.

``` llama-server -fa on -t 7 -ngl 999 --mlock --fit off --kv-offload --no-webui --metrics --chat-template-kwargs {"preserve_thinking": true} -b 2048 -ub 1024 -m /mnt/models/unsloth/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-UD-Q5_K_XL.gguf --mmproj /mnt/models/unsloth/Qwen3.6-27B-MTP-GGUF/mmproj-F16.gguf -c 262144 --kv-unified -ctk q8_0 -ctv q4_0 --spec-type draft-mtp --spec-draft-n-max 3 --spec-draft-ngl 99 --alias unsloth/Qwen3.6-27B-MTP-GGUF --temp 0.60 --top-k 20 --top-p 0.95 --min-p 0.00 --presence-penalty 0.00 --repeat-penalty 1.00 ```

sheeshkebab · 2026-05-30T15:40:21 1780155621

27b is slow as molasses vs 35b on local stuff I have (m5 max). Mtp doesn’t make any difference either.

theanonymousone · 2026-05-30T06:35:02 1780122902

Have you seen the 8bit quantisation matter a lot? The "consensus" in r/LocalLlama is that up to 4 bits the loss is tolerable.

walrus01 · 2026-05-30T06:55:50 1780124150

Absolutely. Difference in Q6 vs Q8 is not as immediately noticeable, but if I test by starting from a blank slate context and giving it the same complicated task with Q4 vs a Q8 GGUF file loaded, the difference is apparent. The Q4 will struggle or do 'stupid' things with even simple bash or python. Q4 might not be as noticeable for conversational purely text one on one interaction with an LLM, but when you dig deeper into something that's more esoteric in a training dataset than a chat conversation, absolutely a big gap there.

I think some of the folks in the local llm social media communities are using them for things like company-hosted customer service chat bots, or purely english text writing stuff where Q4 will probably not cause a problem. For more discrete technical work I stick pretty much exclusively to Q8.

theanonymousone · 2026-05-30T09:45:49 1780134349

Thanks a lot. How about Q8 vs FP16/BF16? Have you checked them too?

walrus01 · 2026-05-30T21:32:38 1780176758

I have not spent a lot of time running FP16 'full precision' versions of some things, but as the other commenter says, it's not much difference. There's a really wide array of benchmarks and tests from a lot of third parties unrelated to the trainer of the AI models that shows at most a two percent difference in score and capability between BF16 and Q8.

bradfa · 2026-05-30T12:35:33 1780144533

Q8 quant is very minimal fall off in terms of KLD against the lab 16 bit. If you have the memory for BF16 KV-cache (which is usually easier to stomach) then the Q8 is very close. But even Q8 quant model with Q8 KV-cache is very close.

Smaller quants for the model start to fall off but more importantly, smaller KV-cache quants fall off much faster so avoid less than Q8 there.

alfiedotwtf · 2026-05-30T17:47:28 1780163248

It’s not a general rule, and depends highly on the model and the quantisation used. Don’t guess, Unsloth sometimes publish graphs in their tutorials showing the error rate vs file size… sometimes Q4 is great, other times I go for Q6

thot_experiment · 2026-05-30T02:39:05 1780108745

q6 is fine for that qwen with ctx @ q8, and the dense models of that size are solid at q4 with q8 ctx

h14h · 2026-05-30T15:07:43 1780153663

That's not all that surprising, IMO. From what I understand, LiquidAI is focusing pretty narrowly on building models that operate as the "agentic core" of a larger system.

If I were going to use this model, I'd be looking to use it more as is the primary chat interface of a larger system, and having it orchestrate & delegate tasks to other places via tool calls. It's not quite as exciting on the surface as a local "do it all" model, but it does enable some pretty neat use-cases, IMO.

I'm imagining a local agent that is super low latency, works entirely offline, and capable of queuing up complex tasks for larger/smarter cloud agents which execute them asynchronously.

onlyrealcuzzo · 2026-05-30T18:13:11 1780164791

Interesting...

Two of the other responses speak about it being abysmal at tool calling.

Overall, I'm pretty impressed a model this small can find/fix ~12% of bugs with crappy context - even if they're about as easy as possible to fix.

I just assumed it would perform better, given all the advancements in the space.

It's possible 1B active parameters is just not enough - even if it has 8B params of knowledge to reason through bugs.

Playing around with the context I fed it, it was able to fix up to ~34% of bugs vs ~46% for Qwen2.5-Coder-3B and ~54% for Qwen2.5-Coder-7B.

debazel · 2026-05-30T00:17:03 1780100223

I tried it with OpenCode and it is borderline incapable of using tool calls, so that might be why it is doing so bad on your test.

peder · 2026-05-30T00:48:43 1780102123

I just did the same. Absolutely awful. I assume OpenCode's heavy context is a problem, and it's probably better to use Liquid's own OpenCode alternative for this.

solarkraft · 2026-05-30T09:24:03 1780133043

Where can I find that agent harness? A look at their Docs and asking Gemini yielded no results.

Edit: Is it this? https://github.com/Liquid4All/cookbook/tree/main/examples/lo...

FYI: Opencode is very well tuned for Qwen models, but I haven’t found it that rare for niche models to perform badly in it.

mike_hearn · 2026-05-30T15:40:14 1780155614

It's not intended to be a coding model, however.

XCSme · 2026-05-30T00:33:10 1780101190

I will test it when it's accessible via OpenRouter, but the previous LFM2 model (lfm-2-24b-a2b) didn't do well on my tests, it got only 1/20 questions/tasks right, way below Gemma 31B or Qwen 35b-a3b (those get like 10/20 right)

BoorishBears · 2026-05-30T13:57:57 1780149477

I tested it against Gemma 4 31B and it's expectedly not favorable for world knowledge.

But even against E4B it's shaky, which is surprising given how many tokens they trained on. I guess it was on a lot of synthetic data.

HanClinto · 2026-05-29T22:56:02 1780095362

Some of the coding-specific fine-tunes were really impressive boosts. Qwen2.5-3B-Instruct is also available [0] -- if it's not too much to ask, I'd be curious how more general models stack up in your benchmark?

[0] - https://huggingface.co/Qwen/Qwen2.5-3B-Instruct

onlyrealcuzzo · 2026-05-29T18:56:44 1780081004

It's almost as if Postgres isn't perfect, and one size shoe doesn't fit all.

Some people want some of the benefits you get from SQLite.

SQLite is obviously not perfect, but it's an incredible piece of software, and people regularly find good ways to make use of an excellent pieces of software.

onlyrealcuzzo · 2026-05-29T18:53:23 1780080803

> SQLite is surprisingly performant for single node applications even when comparing to Postgres.

In the context of SQLite being understood to be a quite excellent piece of software - shouldn't we expect it to be?

In the context of a single-node, Postgres is overkill. It should not be expected to be competitive with SQLite.

This is almost like benchmarking an in-memory HashMap to Redis and being surprised that it performs well in ideal conditions.

shukantpal · 2026-05-29T18:59:33 1780081173

Yes, agreed on SQLite/Postgres. But I'm going to benchmark RocksDB next and see what the performance characteristics are. I suspect the LSM tree storage engine of RocksDB might perform better since agents are so write heavy when running highly concurrent workloads. After all, you are streaming LLM tokens into disk and fanning them out to subscribed clients.

onlyrealcuzzo · 2026-05-29T19:01:05 1780081265

You might want to start here: https://docs.cozodb.org/en/latest/releases/v0.3.html

password4321 · 2026-05-30T10:07:21 1780135641

Thank you for sharing a benchmark pretty much exactly like the parent comment is planning to do.

Also thanks for the incidental exposure to a DB I'd never heard of before... with a browser-based demo CozoDB may be a good way to start experimenting with Datalog.

andriy_koval · 2026-05-29T19:15:39 1780082139

That project has 0 commits for 2 years.

onlyrealcuzzo · 2026-05-29T20:02:34 1780084954

What does that have to do with their research on the exact topic OP was looking into?

andriy_koval · 2026-05-29T20:09:38 1780085378

Abandoned research of unknown quality is strong signal to downprioritize that direction

recursive · 2026-05-29T19:37:33 1780083453

Sounds pretty stable

password4321 · 2026-05-30T09:58:55 1780135135

v0.7 with the following disclaimer:

> Versions before 1.0 do not promise syntax/API stability or storage compatibility.