Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What lengths will Chinese companies go to get an Nvidia A100 chip? (chinai.substack.com)
60 points by yorwba on Feb 26, 2024 | hide | past | favorite | 41 comments


If money is not a problem for them, I don't see how can we deny Chinese companies access to those cards.

The soviets were able to smuggle whole mainframes and mini-computers during the 70s. Giant fucking machines that you could only buy trough very specific sales channels. It was not like you could ebay yourself a S/360, and yet, they did it.


It's not a question of "if" , but "how many". Abd that's largely what matters, same as in the case of USSR.


Wait why? Do you really think the number could be reduced below the requisite amount to do good reverse engineering? That number can’t be more than a couple dozen, and good luck preventing that.


Sure, they can reverse engineer it, at a great cost. The production costs will be then much higher than what TSMC charges as well. In the end everything can be reduce to question of "how much will it cost?".

It was the same in USSR. They also managed to reverse engineer many western chips and produce clones. But the production wasn't economical and they were a couple of years behind anyway.


Asianometry on YouTube has a ton of videos on Soviet computing including one's discussing things like the failed semiconductor industry in East Germany. The Soviets certainly smuggled and tried to clone a lot of machines but they were always behind the United States.


I love that channel. It's a mix of topics that I find very, very interesting.


Yet they got hopelessly behind in networks, satellite tech and even consumer electronics


Here's an interesting question: why are there lots of startups claiming they can create better bespoke DL hardware accelerators than Nvidia's offerings?

I understand how a bespoke architecture could in theory accommodate larger models and offer better throughput despite being based on older generations of MOFSET nodes. But if that was the case, wouldn't China simply create their own bespoke hardware accelerators? So what's stopping them?


> wouldn't China simply create their own bespoke hardware accelerators? So what's stopping them?

Fabrication.

SMEE is supposedly able to mass produce 28nm lithography machines, but most modern (2016/Pascal onwards) GPUs are fabricated using 16nm lithography or lower (eg. Ampere uses a 7nm process, and there are multiple newer architectures in the pipeline that leverage 3nm fabrication processes at Samsung, TSMC, and Intel).

Chinese companies like SMEE are trying, but it will take 3-5 years to reach 16nm lithography at scale.

Also, GPUs are being limited for simulation (aka Nuclear weapons testing) reasons - not "AI" - as just about every country except North Korea honors the Comprehensive Nuclear Test Ban Treaty, forcing countries to test using HPC (edit: also used for Jet Turbine simulation a la Autodesk Federal).

A 28nm process is more than enough to make EWS, Avionics, and Precision weapons, which is what Russia uses to manufacture the Elbus-8s chipset domestically.

This is partially related to why companies like Nvidia has begun moving fabrication to Samsung fabs over TSMC in the short term, as SK has a formal defense agreement unlike Taiwan.


>as just about every country except North Korea honors the Comprehensive Nuclear Test Ban Treaty

i don't think this is correct - https://www.nti.org/education-center/treaties-and-regimes/co... there are 8 other countries that have not signed so why would they honor the treaty?

india in particular.


> india in particular

Last time India (and Pakistan) tested a live nuclear weapon was 1998, and both faced SEVERE sanctions at the time - and was a major reason both countries stopped buying American weapons systems and switched to Israeli and Chinese vendors respectively.

The only country left that tests nuclear weapons live is North Korea


> and was a major reason both countries stopped buying American weapons systems and switched to Israeli and Chinese vendors respectively.

India stopped buying American weapons systems after the 1965 war. The Soviet Union was already supplying more than 80% of all weapons used by India even before the 1971 war.


From extended family and/or family friends in the South Block - in the 90s, after the Cold War ended, the IAF began using DEC ALPHA machines and was in talks w/ the Clinton admin about importing AEW&C systems from Grumman to replace the juggard Hawker Sidleys.

After the Pokhran-2 tests, the radar+AEW&C sales ended and DEC/Compaq cancelled federal Indian deals at the time, and India based Sales reps for American firms switched to IAI, Elbit, and Rafael which ended up winning IAF deals from then on (some corruption played a role as well in this at the beginning)

A lot of the wing commanders from that era work as lobbyists for Israeli firms now and most of the cream of the IAF, DRDO, and SCL in the 1980s-90s work as mid-upper level Engineering or Product Management in the US now after leaving in the 90s-2000s like my parent's generation.


yes but couldn't the countries that want to do simulations just buy through india/saudi arabia etc?


It doesn't scale out logistically.

While some lossage is expected, you can't build a Tianhe-2 type supercomputer by smuggling Nvidia (or AMD) GPUs - and a nuclear program needs dozens.

China, Russia, NK, and Iran have very severe hardware import restrictions circa 2024.

This is why Russia, China, and Iran have been building our domestic fabrication capabilities (Russia in the early 2010s, China in the early 2020s, and Iran presently)


Interesting, thank you for answering!


Pretty funny that US sanctions (according to this thread) won't let you buy hardware from el US, but you totally absolutely can from our client states.

Man, wouldn't it be great if we operated a real empire, instead of this optical delusion?


> GPUs are being limited for simulation (aka Nuclear weapons testing) reasons - not "AI"

that information seems dated in 2024


Not really.

You don't need bespoke cutting edge hardware or models for most defense applications (aka to kill people) today.

For example, C-RAMs are using Maxwell level hardware at most.

The biggest driver for GPU, FPGA, and CPU development has been nuclear research, and is a major reason why the top supercomputers and HPC programs globally are usually linked with Nuclear Weapons Labs (eg. LLNL, LBL, Argonne, Oak Ridge, NSC Guangzhou).

It just so happens that you use the same math for nuclear simulations as you would for "ML", bioinformatics, and computer graphics.

It's all Numerical Analysis and Optimization Theory at the end of the day.


I don't know. I thought Gina specifically said the recent ban is for AI. [0]

> "What we cannot allow them to ship is the most sophisticated, highest-processing power AI chips, which would enable China to train their frontier models," she added.

[0] https://www.reuters.com/technology/us-talks-with-nvidia-abou...


The origin of these bans started in 2016-17, with Intel Xeon processors being restricted after the Chinese NSC program was found to be using some of their HPC infra for nuclear weapons simulations.

No one wants to say the "Ne" word as it causes some pretty severe domestic political blowback.

Already China has been looking at countering American second strike capabilities with their nuclear weapons buildup over the past decade.

It also doesn't help that unlike most previous Premiers post-Mao, Xi Jinping started his administration career in the PLA.


Xeon Phi bans were targeted and successful. It shows that you don't need to issue a country-wide ban to prevent China from using a chip to build their supercomputers.

And then, Sunway happened, it's built for military-use supercomputers. Do we even know whether it's 40nm, 28nm or 14nm or who on the earth fabricated them? And nothing changed on how export control works after that, that's certainly not the trigger.

GPU bans before ChatGPT happened were also targeted, similar to how BIS Entity List works.

Let's face it: the recent ban-entire-China movement was just about "AI", instead of HPC/simulations, the only purpose of it is to deny China NVIDIA GPU access and ensure they can't compete on SOTA language models.


> Do we even know whether it's 40nm, 28nm or 14nm or who on the earth fabricated them

Based on the network bandwidth and clock speed of the SW26010P, my hunch is most likely 40nm [0]

> ensure they can't compete on SOTA language models

I disagree.

For commercial NLP applications, a leading edge GPU like an A100 can help (due to cost constraints), but the true value is unlocked from Monte Carlo Simulations (heavily used in Nuclear Physics to simulate interactions among particles), as bechmacks have show magnitudes of performance in GPUs over CPUs for MC and MCMC simulations [1] just with commercial untuned hardware alone.

This itself was a major reason Nvidia has been working with the DoE since the beginning of the Exascale program [2], which itself was driven by NatSec priorities [3]

Raimondo's ban also came after the WSJ leaked how the CAEP and other sanctioned entities were accessing Nvidia GPUs despite an export ban since 1997 [4][5]

Of course this is dual use, as math is critical to everything, but at the end of the day - Nuclear Weapons is the driving factor and China's ability to erode American Nuclear Deterrence [6] and the fact that China now has a "launch-on-warning" posture [7] since 2022 instead of "on-launch" has severely spooked the US.

[0] - https://sc23.supercomputing.org/proceedings/tech_paper/tech_...

[1] - https://indico.cern.ch/event/1106990/contributions/4991264/a...

[2] - http://helper.ipam.ucla.edu/publications/nmetut/nmetut_19423...

[3] - https://www.exascaleproject.org/research-group/national-secu...

[4] - https://www.wsj.com/articles/chinas-top-nuclear-weapons-lab-...

[5] - https://www.federalregister.gov/documents/2022/06/30/2022-14...

[6] - https://direct.mit.edu/isec/article/47/4/147/115920/The-Dyna...

[7] - https://media.defense.gov/2023/Oct/19/2003323409/-1/-1/1/202...


To what degree do you discount proclamations of the official propaganda organ? In what world, and under what set of motivations, would they ever say what their internal truth is?

Far safer, intellectually, to assume everything they omit to be a ploy, and to try to understand (with what limited information we have, comrade), the Real Juice.


Huawei has created the “ascend” series ai chip. Reportedly it has 80% of A100 in theoretical performance.

Rumor is that they had trouble selling it against A100 due to worse price to performance and worse software integration. But the US sanction has now created a huge market for it.


It's still fabricated by TSMC though.

Domestic fabrication is still at the 28nm phase at most.

That said, the Ascend series a fairly massive leap, as it means domestic EDA capabilities have grown massively in China.

That said, I remember they had a massive Design labs in both SV and Delhi+Bangalore that poached from the Samsung, Nvidia, and Intel labs in town before they were kicked out of US+India so idk how much design was done domestically in China.


> Domestic fabrication is still at the 28nm phase at most.

Allegedly SMIC has both 14nm and 7nm fabs as of last year.


Allegedly but with low yield rates and leveraging ASML+Nikon DUV products.

Not to say that Chinese vendors won't eventually crack that nut, but it'll take several (3-7) years. Also, domestic design capacity seems to be somewhat lacking (though rapidly changing)

The most cutting edge domestic lithography tool China has is SMEE's 28nm one - that's still a MASSIVE accomplishment, but still significantly far from 20/16/14/7nm processes.

Even Russia had domestic capabilities for 28nm fabrication at scale in the 2010s.

That said, for most military use cases, this is more than enough.


> Even Russia had domestic capabilities for 28nm fabrication at scale in the 2010s.

Edit: this is wrong. They have design capabilities but domestic fabrication is limited to 90nm.


hint: they are


Because most of them only implement inference, duh.


Interesting, I didn't know this was so sought after.

I actually have one for sale (the 40GB PCIe one), but I haven't got to list it on eBay due to lack of time yet (and because I didn't think there was so much interest in it).

To be sincere, maybe for DL this is really much better than the alternatives, but for some simulations and parallelizing come radiative transfer code, it was not that much better than a RTX 4090 with the extra hassle that it's more difficult to cool it.


The A100 is comparable to the 3090 but with more memory. The H100 is the one comparable to the 4090.

The advantage of these is the access to the larger memory. And they are able to be linked together such that they all share the same memory via NVLink. This makes them scalable for processing the large data and holding the models for the larger scale LLMs and other NN/ML based models.


>And they are able to be linked together such that they all share the same memory via NVLink. This makes them scalable for processing the large data and holding the models for the larger scale LLMs and other NN/ML based models.

GPUs connected with NVLink do not exactly share memory. They don't look like a single logical GPU. One GPU can issue loads or stores to a different GPU's memory using "GPUDirect Peer-To-Peer", but you cannot have a single buffer or a single kernel that spans multiple GPUs. This is easier to use and more powerful than the previous system of explicit copies from device to device, perhaps, but a far cry from the way multiple CPU sockets "just work". Even if you could treat the system as one big GPU you wouldn't want to. The performance takes a serious hit if you constantly access off-device memory.

NVLink doesn't open up any functionality that isn't available over PCIe, as far as I know. It's "merely" a performance improvement. The peer-to-peer technology still works without NVLink.

NVidia's docs are, as always, confusing at best. There are several similarly-named technologies. The main documentation page just says "email us for more info". The best online documentation I've found is in some random slides.

https://developer.nvidia.com/gpudirect

https://developer.download.nvidia.com/CUDA/training/cuda_web...


Interesting. So that would mean that you would still need a 40 or 80 GB card to run the larger models (30B LLM, 70B LLM, 8x7B LLM) and perform training of them.

Or would it be possible to split the model layers between the cards like you can between RAM and VRAM? I suppose in that case each card would be able to evaluate the results of the layers in its own memory and then pass those results to the other card(s) as necessary.


You don't need nvlink for inference with models that need to fit in multiple cards. I'm using a laptop with a 3080 ti mobile and a 3090 in an eGPU enclosure for running LLM models over 24GB.


Have you seen an actual A100?

They are massive, I can imagine them being comparable to a 3090 at all.


A reference 3090 is longer by 69mm, wider by 29mm, and thicker by a slot than a PCIe A100.

Though I think the comment you're replying to was talking about them both using the same Nvidia GPU architecture, Ampere.


As someone who used various types of GPUs in graduate school. For most simulations, and even machine learning (unless you need the VRAM) you are generally better off going with a consumer card. There is generally about the same number of CUDA cores and the higher clock speeds will generally net you better performance overall.

Simulations where this isn't true is any that need double floating point (which you previously were able to do in the Titan series of consumer-ish cards). And where it is super important for DL is the VRAM it allows you to use much larger models. Plus the added features of being able to string them together and share memory which is an important feature that has been left off consumer cards (honestly in a way that makes sense because SLI has been dumb for some time).


How did you end up cooling it? I have an a40 and it's been interesting testing all kinds of methods, from two 40mm fans to a 3A 9030 centrifugal blower with 3d printed duct


I have seen some reports of modded GPU's in Brazil and China where they convert earlier gen consumer cards (3000 series) into 12GB or 16GB variants using GDDR6 from Samsung.

Seems the GPU boot code is not hard limited but reads capacity from what is installed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: