Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a Chinese user, I can say that many people use Kimi, even though I personally don’t use it much. China’s open-source strategy has many significant effects—not only because it aligns with the spirit of open source. For domestic Chinese companies, it also prevents startups from making reckless investments to develop mediocre models. Instead, everyone is pushed to start from a relatively high baseline. Of course, many small companies in the U.S., Japan, and Europe are also building on Qwen. Kimi is similar: before DeepSeek and others emerged, their model quality was pretty bad. Once the open-source strategy was set, these companies had no choice but to adjust their product lines and development approaches to improve their models.

Moreover, the ultimate competition between models will eventually become a competition over energy. China’s open-source models have major advantages in energy consumption, and China itself has a huge advantage in energy resources. They may not necessarily outperform the U.S., but they probably won’t fall too far behind either.





One thing to add: the most popular product in china on AI is not kimi i think' it shoud be DOUBAO by bytedance(tiktok owner) and yuanbao by tencent. The have a better UI and feature set and you can also select deepseek model from it. Kimi still has a lot of users but I think in the long term it still may not doing well. So its still a win for closed model?

There’s a lot of indications that we’re currently brute forcing these models. There’s honestly not a reason they have to be 1T parameters and cost an insane amount to train and run on inference.

What we’re going to see is as energy becomes a problem; they’ll simply shift to more effective and efficient architectures on both physical hardware and model design. I suspect they can also simply charge more for the service, which reduces usage for senseless applications.


There are also elements of stock price hype and geopolitical competition involved. The major U.S. tech giants are all tied to the same bandwagon — they have to maintain this cycle: buy chips → build data centers → release new models → buy more chips.

It might only stop once the electricity problem becomes truly unsustainable. Of course, I don’t fully understand the specific situation in the U.S., but I even feel that one day they might flee the U.S. altogether and move to the Middle East to secure resources.


Sundar is talking about fleeing earth to secure photons and cooling in space.

Good luck. Space has lots of photons but really quite poor cooling resources.

> There’s honestly not a reason they have to be 1T parameters and cost an insane amount to train and run on inference.

Kimi K2 Thinking is rumored to have cost $4.6m to train - according to "a source familiar with the matter": https://www.cnbc.com/2025/11/06/alibaba-backed-moonshot-rele...

I think the most interesting recent Chinese model may be MiniMax M2, which is just 200B parameters but benchmarks close to Sonnet 4, at least for coding. That's small enough to run well on ~$5,000 of hardware, as opposed to the 1T models which require vastly more expensive machines.


That number is as real as the 5.5 million to train DeepSeek. Maybe it's real if you're only counting the literal final training run, but total costs including the huge number of failed runs all other costs accounted for, it's several hundred million to train a model that's usually still worse than Claude, Gemini, or ChatGPT. It took 1B+ (500 billion on energy and chips ALONE) for Grok to get into the "big 4".

Using such theory, one can even argue that the real cost needs to include the infrastructures, like total investment into the semiconductor industry, the national electricity grid, education and even defence etc.

Correct! You do have to account for all of these things! Unironically correct! :)

That's baked in to the cost of the hardware though.

the china government has been heavily subsidizing the electricity

> That's small enough to run well on ~$5,000 of hardware...

Honestly curious where you got this number. Unless you're talking about extremely small quants. Even just a Q4 quant gguf is ~130GB. Am I missing out on a relatively cheap way to run models well that are this large?

I suppose you might be referring to a Mac Studio, but (while I don't have one to be a primary source of information) it seems like there is some argument to be made on whether they run models "well"?


Yes, I mean a Mac Studio with MLX.

An M3 Ultra with 256GB of RAM is $5599. That should just about be enough to fit MiniMax M2 at 8bit for MLX: https://huggingface.co/mlx-community/MiniMax-M2-8bit

Or maybe run a smaller quantized one to leave more memory for other apps!

Here are performance numbers for the 4bit MLX one: https://x.com/ivanfioravanti/status/1983590151910781298 - 30+ tokens per second.


It’s kinda misleading to omit the generally terrible prompt processing speed on Macs

30 tokens per second looks good until you have to wait minutes for the first token


The tweet I linked to includes that information in the chart.

Thanks for the info! Definitely much better than I expected.

Running in cpu ram works fine. It’s not hard to build a machine with a terabyte of RAM.

Admittedly I've not tried running on system RAM often, but every time I've tried it's been abysmally slow (< 1 T/s) when I've tried on something like KoboldCPP or ollama. Is there any particular method required to run them faster? Or is it just "get faster RAM"? I fully admit my DDR3 system has quite slow RAM...

i assume that $4.6 mil is just the cost of the electricity?

Hard to be sure because the source of that information isn't known, but generally when people talk about training costs like this they include more than just the electricity but exclude staffing costs.

Other reported training costs tend to include rental of the cloud hardware (or equivalent if the hardware is owned by the company), e.g. NVIDIA H100s are sometimes priced out in cost-per-hour.


Citation needed on "generally when people talk about training costs like this they include more than just the electricity but exclude staffing costs".

It would be simply wrong to exclude the staffing costs. When each engineer costs well over 1 million USD in total costs year over year, you sure as hell account for them.


If you have 1,000 researchers working for your company and you constantly have dozens of different training runs in the go, overlapping each other, how would you split those salaries between those different runs?

Calculating the cost in terms of GPU-hours is a whole lot easier from an accounting perspective.

The papers I've seen that talk about training cost all do it in terms of GPU hours. The gpt-oss model card said 2.1 million H100-hours for gpt-oss:120b. The Llama 2 paper said 3.31M GPU-hours on A100-80G. They rarely give actual dollar costs and I've never seen any of them include staffing hours.


Do they include the costs of dead-end runs?

No, they don't! That's why the "5.5 million" deepseek V3 number as read by American investors was total bullshit (because investors ignored their astrik saying "only final training run")

Yeah, that's one of the most frustrating things about these published numbers. Nobody ever wants to share how much money they spent on runs that didn't produce a useful model.

As with staffing costs though it's hard to account for these against individual models. If Anthropic run a bunch of training experiments that help them discover a new training optimization, then use that optimization as part of the runs for the next Opus and Sonnet and Haiku (and every subsequent model for the lifetime of the company) how should the cost of that experimental run be divvied up?


No, because what people are generally trying to express with numbers like these, is how much compute went into training. Perhaps another measure, like zettaflop or something would have made more sense.


Can confirm MiniMax M2 is very impressive!

> What we’re going to see is as energy becomes a problem

This is much more likely to be an issue in the US than in China. https://fortune.com/2025/08/14/data-centers-china-grid-us-in...


Disagree. Part of the reason China produces more power (and pollution) is due to China manufacturing for the US.

https://www.brookings.edu/articles/how-do-china-and-america-...

The source for China's energy is more fragile than that of the US.

> Coal is by far China’s largest energy source, while the United States has a more balanced energy system, running on roughly one-third oil, one-third natural gas, and one-third other sources, including coal, nuclear, hydroelectricity, and other renewables.

Also, China's GDP is a bit less inefficient in terms of power used per unit of GDP. China relies on coal and imports.

> However, China uses roughly 20% more energy per unit of GDP than the United States.

Remember, China still suffers from blackouts due to manufacturing demand not matching supply. The fortune article seems like a fluff piece.

https://www.npr.org/2021/10/01/1042209223/why-covid-is-affec...

https://www.bbc.com/news/business-58733193


These stories are from 2021.

China has been adding something like a 1GW coal plant’s worth of solar generation every eight hours in the past year, and the rate is accelerating. The US is no longer a serious competitor for China when it comes to energy production.


The reason it happened in 2021, I think, might be that China took on the production capacity gap caused by COVID shutdowns in other parts of the world. The short-term surge in production led to a temporary imbalance in the supply and demand of electricity

This was very surprising to me, so I just fact-check this statement (using Kimi K2 thinking, natch), and it's presently is off by a factor of 2 - 4. In 2024 China installed 277 GW solar, so 0.25 GW / 8 hours. First half of 2025 they installed 210 GW, so 0.39 GW / 8 hours.

Not quite at 1 GW / 8 hrs, but approaching that figure rapidly!

(I'm not sure where the coal plant comes in - really, those numbers should be derated relative to a coal plant, which can run 24/7)


> (I'm not sure where the coal plant comes in - really, those numbers should be derated relative to a coal plant, which can run 24/7)

It works both ways: you have to derate the coal plant somewhat due to the transmission losses, whereas with a lot of solar power being generated and consumed on/in the same building the losses are practically nil.

Also, pricing for new solar with battery is below the price of building a new coal plant and dropping, it's approaching the point where it's economical to demolish existing coal plants and replace them with solar.


China’s breakneck development is difficult for many in the US to grasp (root causes - baselining on sluggish domestic growth, and possessing a condescending view of China). This article offers a far more accurate picture than of how China is doing right now: https://archive.is/wZes6

Eye-opening summary... I knew China was ahead, but wow. Thanks for sharing that article.

Thank you for sharing this article. Eye opening.

As counterpoints to illustrate Chinas current development:

* China has produced more PV panel capacity in the first half of this year than the US has installed, all in all, in all of its history

* China alone has installed PV capacity of over 1000 GW today

* China has installed battery electrical storage of about 100 GW / 300 GWh today and aims to have 180 GW in 2027


art of the reason China produces more power (and pollution) is due to China manufacturing for the US.

Presumably they'd stop doing that once AI becomes a more beneficial use for the energy though.


I don’t remeber much details about the situation in 2021. But China is in a period of technological explosion—many things are changing at an incredible speed. In just a few years, China may have completely transformed in various fields.

Western media still carry strong biases toward China’s political system, and they have done far too little to portray the country’s real situation. The narrative remains the same old one: “China succeeded because it’s capitalist,” or “China is doomed because it’s communist.”

But in reality, barely a few days go by without some new technological breakthrough or innovation happening in China. The pace of progress is so fast that even people inside the country don’t always keep up with it. For example, just since the start of November, we’ve seen China’s space station crew doing a barbecue in orbit, researchers in Hefei working on an artificial sun make some new progress, and a team discovering a safe and efficient method for preparing aromatic amines. Apart from the space station bit—which got some attention—the others barely made a ripple.Also, China's first electromagnetic catapult aircraft carrier has officially entered service

about a year ago, I started using Reddit intensively. what I read more on Reddit are reports related to electricity, because it involves environmental protection and hatred towards Trump, etc. There are too many leftists, so the discussions are somewhat biased. But the related news reports and nuclear data are real. China reach carbon peak in 2025, and this year it has truly become a powerhouse in electricity. National data centers are continuously being built, but residential electricity prices have never been and will never be affected.China still has a lot of coal-fired power, but it continues to carry out technological upgrades on them. At the same time, wind, solar, nuclear and other sources are all advancing steadily. China is the only country that is not controlled by ideology and is increasing its electricity capacity in a scientific way.

(maybe in AI field people like to talk about more. not only kimi release a new model, Xpeng has a new robot and brought some intension. these all happends in a few days )


> China is the only country that is not controlled by ideology and is increasing its electricity capacity in a scientific way.

Have recently noticed a lot of pro-CCP propaganda on social media (especially Instagram and TikTok), but strangely also on HN; kind of interesting. To anyone making the (trivially false) claim that China is not controlled by ideology, I'm not quite sure how you'd convince them of the opposite. I'm not a doomer, but as China ramps up their aggression towards Taiwan (and the US will inevitably have to intervene), this will likely not end well in the next 5-10 years.


I also think that one claim is dubious, but do you really have to focus on only that part to the exclusion of everything else? All the progress made is real, regardless of your opinion on the existance of ideology.

I mean only on this specific topic: electricity. Arguing with other things is pointless since HN has the same political leaning as reddit so I will pass

What's your Reddit username? I'm interested in reading your posts there.

I don’t have one now. I used to post lots of comments on china stuff but I got banned once and every time I registered a new one it will be banned soon. I guess they banned all my ip. So I only go anonymous now

It's absolutely impressive to see China's development. I'm happy my country is slowly but surely moving to China's orbit of influence, especially economically.

if its improving living standards for the people, then its surely is a good thing.

"Not controlled by ideology" is a pretty bold statement to make about a self-declared Communist single-party country. There is always an ideology. You just happen to agree with whatever this one is (Controlled-market Communism? I don't know what the precise term is).

I cannot edit this now so I want to add some clarification, it just means on this specific topic: electricity, china dont act like us or german, abandoned wind or nuclear, its only based on science

Having larger models is nice because they have a much wider sphere of knowledge to draw on. Not in the sense of using them as encyclopedias. More in the sense that I want a model that is going to be able to cross reference from multiple domains that I might not have considered when trying to solve a problem.

you guys will outperform the US, no doubt.

energy generation multiples of what the US is producing. What does AI need ? Energy.

second - the open source nature of the models - means as you said a high baseline to start with - faster iteration.


> will outperform

does outperform

China is absolutely winning innovation in the 21st century. I'm so impressed. For an example from just this morning, there was an article that they're developing thorium reactor-powered cargo ships. I'm blown away.


I remember this thing. The tech is from America actually, decades ago. (Thorium). But they give up and china counties the work recent years

> The tech is from America actually, decades ago. (Thorium).

I guess it depends on how you see it, but regardless, the people putting it to use today doesn't seem to be in the US.

FWIW:

> Thorium was discovered in 1828 by the Swedish chemist Jöns Jacob Berzelius during his analysis of a new mineral [...] In 1824, after more deposits of the same mineral in Vest-Agder, Norway, were discovered [...] While thorium was discovered in 1828 its first application dates only from 1885, when Austrian chemist Carl Auer von Welsbach invented the gas mantle [...] Thorium was first observed to be radioactive in 1898, by the German chemist Gerhard Carl Schmidt

For being an American discovery, it sure has a lot of European people involved in it :) (I've said it elsewhere but worth repeating; trying to track down where a technology/invention actually comes from is a fools errand, and there is always something earlier that led to today, so doesn't serve much purpose except nationalism it seems to me).



"The tech is from America actually, decades ago... But they give up and china continues the work"

Many such cases...


rare earth..

Jm2c, but I really dislike those winners/losers narratives. They lack any nuance, are juvenile, and ultimately do not contribute much but noise like endless of pointless "who's better Jordan or Lebron?" debates.

Maybe, or maybe the current models are just a massive waste of energy because trying to run the economy on tokens is a stupid idea.

Going on a tangent, is Europe even close? Mistral has been underwhelming

I don't know if how close Europe is, but I'm sufficiently whelmed by Mistral that I don't need to look elsewhere yet. It's kind-of like having a Toyota Corolla while everybody else is driving around in smart cars but it gets it done. On top of it, there's a loyal community that (maybe because I'm not looking) I don't see with other products. It probably depends on your uses, but if I spent all my time chasing the latest chat models (like Kimi K2 for instance) I wouldn't actually get anything done.

> I don't know if how close Europe is, but I'm sufficiently whelmed by Mistral that I don't need to look elsewhere yet. It's kind-of like having a Toyota Corolla while everybody else is driving around in smart cars but it gets it done.

My problem was that it really doesn't, none of the models out there are that great at agentic coding when you care about maintainability. Sonnet 4.5 sometimes struggles and is only okay with some steering, same for Gemini Pro 2.5, GPT-5 recently seems like it's closer to "just working" with high reasoning, but still is expensive and slow. Cerebras recently started offering GLM-4.6 and it's roughly on par with Sonnet 4 so not great, but 24M tokens per day for 50 USD seems like good value even with 128k context limitation.

I don't think there is a single model that is good enough and dependable enough in my experience out there yet, I'll probably keep jumping around for the next 5-10 years (assuming the models keep improving until we hit diminishing returns so hard that it all evens out, hopefully after they've reached a satisfying baseline usefulness).

Don't get me wrong, all of those models can already provide value, it's just that they're pretty finnicky a lot of the time, some of it inherent due to how LLMs work, but some of it also because they should just be trained better and more. And the tools they're given should be better. And the context should be managed better. And I shouldn't see something as simple as diffs fail to apply repeatedly just because I'm asking for 100% accuracy in the search/replace to avoid them messing up the brackets or whatever else.


Coding isn't the only use case.

Neither is being bleeding edge.

I use Mistral's models, I've built an entire internal-knowledge-pipeline of sort using Mistral's products (which involved anything from OCR, to summarization, to linking stuff across different services like Jira or Teams, etc) and I've been very happy with it.

We did consider alternatives and truth to be told none was as cost-effective, fast and satisfying (and also our company does not trust US AI companies to not do stuff with our data).


My god the cost right? It's so much less than any of the competition that just feeding off of an api key (for coding, yeah) works great.

But as you say the rest of it is good too. I use it for research and to me it does a great job, all for a fraction of the price and the carbon of the U.S. players.


So you're not able to trust inference providers like Google Cloud w/ ZDR etc with your data?

My EU-based clients are unwilling to do so as we see all clouds as black boxes you have no real idea what you getting into.

Most of our hosting is also on European providers, my team's the only one that deploys some services on Azure.


Probably cuz you aren't looking yeah. Anthropic seems to be leading the "loyalty" war in the US.

Yeah and I'll probably end up going that way as work locks down the models we're allowed to use, saving Mistral for personal projects.

You have to try the latest Corolla then. Really smart. Lane and collision assistance, ... Unlike my old Corolla which is total dumb. It even doesn't turn the light off when I leave the car

Hah! I need to update my analogies.

Not anywhere near close.

Europe doesn't have the infrastructure (legal or energy) and US companies offer far better compensation for talent.

But hey, at least we have AI regulation! (sad smile :))


How can I use it ?

google play or app store? or https://www.kimi.com/en/



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: