JPEG-XL provides the best migration path for image conversion from JPEG, with lossless recompression. It also supports arbitrary HDR bit depths (up to 32 bits per channel) unlike AVIF, and generally its HDR support is much better than AVIF. Other operating systems and applications were making strides towards adopting this format, but Google was up till now stubbornly holding the web back in their refusal to support JPEG-XL in favour of AVIF which they were pushing. I’m glad to hear they’re finally reconsidering. Let’s hope this leads to resources being dedicated to help build and maintain a performant and memory safe decoder (in Rust?).
avif is just better for typical web image quality, it produces better looking images and its artifacts aren't as annoying (smoothing instead of blocking and ringing around sharp edges).
You also get it for basically free because it's just an av1 key frame. Every browser needs an av1 decoder already unless it's willing to forego users who would like to be able to watch Netflix and YouTube.
I don't understand what you're trying to say. Mozilla said over a year ago that they would support JXL as soon as there's a fast memory safe decoder that will be supported.
Google on the other hand never expressed any desire to support JXL at all, regardless of the implementation. Only just now after the PDF Association announced that PDF would be using JXL, did they decide to support JXL on the web.
> avif is just better for typical web image quality, it produces better looking images and its artifacts aren't as annoying (smoothing instead of blocking and ringing around sharp edges).
AVIF is certainly better for the level of quality that Google wants you to use, but in reality, images on the web are much higher quality than that.
And JXL is pretty good if you want smoothing, in fact libjxl's defaults have gotten so overly smooth recently that it's considered a problem which they're in the process of fixing.
> I don't understand what you're trying to say. Mozilla said over a year ago that they would support JXL as soon as there's a fast memory safe decoder that will be supported.
Did they actually say that? All the statements i've seen them have been much more guarded and vauge. More of a, maybe we will think about it if that happens.
> If they successfully contribute an implementation that satisfies these properties and meets our normal production requirements, we would ship it.
That's what they said a year ago. And a couple of Mozilla devs have been in regular contact with the JXL devs ever since then, helping with the integration. The patches to use jxl-rs with Firefox already exist, and will be merged as soon as a couple of prerequisite issues in Gecko are fixed.
Their standards position is still neutral; what switched a year ago was that they said they would be open to shipping an implementation that met their requirements. The tracking bug hasn't been updated[2] The patches you mention are still part of the intent to prototype (behind a flag), similar to the earlier implementation that was removed in Chrome.
They're looking at the same signals as Chrome of a format that's actually getting use, has a memory safe implementation, and that will stick around for decades to justify adding it to the web platform, all of which seem more and more positive since 2022.
I disagree about the image quality at typical sizes - I find JPEG-XL is generally similar or better than AVIF at any reasonable compression ratios for web images. See this for example: https://tonisagrista.com/blog/2023/jpegxl-vs-avif/
AVIF only comes out as superior at extreme compression ratios at much lower bit rates than are typically used for web images, and the images generally look like smothered messes at those extreme ratios.
If that's the case, let it be a feature of image editing packages that can output formats that are for the web. It's a web standard we're talking about here, not a general-purpose image format, so asking browsers to carry that big code load seems unreasonable when existing formats do most of what we need and want for the web.
People generally expect browsers to display general-purpose image formats. It's why they support formats like classical JPEG, instead of just GIF and PNG.
Turns out people really like being able to just drag-and-drop an image from their camera into a website - being forced to re-encode first it isn't exactly popular.
> Turns out people really like being able to just drag-and-drop an image from their camera into a website - being forced to re-encode first it isn't exactly popular.
That’s a function of the website, not the browser.
> That’s a function of the website, not the browser.
That's hand-waving away quite a lot. The task changes from serving a copy of a file on disk, as every other image format in common use, to needing a transcoding pipeline more akin to sites like YouTube. Technically possible, but lots of extra complexity in return for what gain?
Even though AVIF decoding support is fairly widespread by now, it is still not ubiquitous like JPEG/PNG/GIF. So typically services will store or generate the same image in multiple formats including AVIF for bandwidth optimization and JPEG for universal client support. Browser headers help to determine compatibility, but it's still fairly complicated to implement, and users also end up having to deal with different platforms supporting different formats when they are served WebP or AVIF and want to reupload an image somewhere else that does not like those formats. As far as I can tell, JXL solves that issue for most websites since it is backwards-compatible and can be decoded into JPEG when a client does not support JXL. I would happily give up a few percent in compression efficiency to get back to a single all-purpose lossy image format.
It's almost as if Google had an interest in increased storage and bandwidth. Of course they don't but as paying Driver used I'm overcharged for the same thing.
I have no previous first-hand knowledge of this, but I vaguely remember discussions of avif in google photos from reddit a while back so FWIW I just tried uploading some avif photos and it handled them just fine.
Listed as avif in file info, downloads as the original file, though inspecting the network in the web frontend, it serves versions of it as jpg and webp, so there's obviously still transcoding going on.
I'm not sure when they added support, the consumer documentation seem to be more landing site than docs unless I'm completely missing the right page, but the API docs list avif support[1], and according to the way back machine, "AVIF" was added to that page some time between August and November 2023.
You are correct it is possible to upload avif files into Google Photo. But you lose the view and of course the thumbnail. Defeating the whole purpose of putting them into Photo.
Given it's an app, they didn't even need Google chrome to add support. Avif is supported on Android natively.
> You are correct it is possible to upload avif files into Google Photo. But you lose the view and of course the thumbnail.
I'm not sure what you mean. They appear to act like any other photo in the interface. You can view them and they're visible in the thumbnail view, but maybe I'm misinterpreting what you mean?
I take a photo, the format is jpeg. It backs up to Google photo, the Google photo app on Android renders the photo just fine.
I then convert that photo (via a local converter) to AVIF, Google backs it up, I can see the file in Google Photo on Android but it doesn't render the image. That being full size or thumbnail, all I get is a grayed square. So I concluded the app doesn't support avif rasterizing.
I then gave up on the automation that converted all my jpeg into avif, which in turn would have saved hundred of gigabytes given I have 10y worth of photos.
The experiment was done about 3 months ago, as of 2025 Google Photo on Android, latest version, would not render my AVIF photos.
See cousin comment, it accepts AVIF files. At least they would render on the app. Which would be enough for many. As it stands it accepts this format and renders nothing at all.
The killer feature of JXL is that most websites already have a whole bunch of images in JPEG format, and converting those to JXL shrinks them by about 30% without introducing any new artifacts.
> Mozilla has no desire to introduce a barely supported massive C++ decoder for marginal gains
On a slightly related note, I wanted to have a HDR background image in Windows 11. Should be a breeze in 2025 right?
Well, Windows 11 only supports JPEG XR[1] for HDR background images. And my commonly used tools did either not support JPEG XR (Gimp fex) or they did not work correctly (ImageMagick).
So I had a look at the JPEG XR reference implementation, which was hosted on Codeplex but has been mirrored on GitHub[2]. And boy, I sure hope that isn't the code that lives in Windows 11...
Ok most of the gunk is in the encoder/decoder wrapper code, but still, for something that's supposedly still in active use by Microsoft... Though not even hosting their own copy of the reference implementation is telling enough I suppose.
Another JPEG XR user is Zeiss. It saves both grayscale and color microscope images with JPEG XR compression in a container format. Zeiss also released a C++ library (libczi) using the reference JPEG XR implementation to read/write these images. Somehow Zeiss is moving away from JPEG XR - its newer version of microscope control software saves with zstd compression by default.
>avif is just better for typical web image quality,
What does "typical web image quality" even mean? I see lots of benchmarks with very low BPPs, like 0.5 or even lower, and that's where video-based image codecs shine.
However, I just visited CNN.com and these are the BPPs of the first 10 images my browser loaded: 1.40, 2.29, 1.88, 18.03 (PNG "CNN headlines" logo), 1.19, 2.01, 2.21, 2.32, 1.14, 2.45.
I believe people are underestimating the BPP values that are actually used on the web. I'm not saying that low-BPP images don't exist, but clearly it isn't hard to find examples of higher-quality images in the wild.
> Can AVIF display 10 bit HDR with larger color gamut that any modern phone nowadays is capable of capturing?
Sure, 12-bit too, with HDR transfer functions (PQ and HLG), wide-gamut primaries (BT.2020, P3, etc.), and high-dynamic-range metadata (ITU/CTA mastering metadata, content light level metadata).
JPEG XL matches or exceeds these capabilities on paper, but not in practice. The reality is that the world is going to support the JPEG XL capabilities that Apple supports, and probably not much more.
Typical web image quality is like it is partly because of lack of support. It’s literally more difficult to show a static HDR photo than a whole video!
HDR should not be "typical web" anything. It's insane that websites are allowed to override my system brightness setting through HDR media. There's so much stuff out there that literally hurts my eyes if I've set my brightness such that pure white (SDR FFFFFF) is a comfortable light level.
I want JXL in web browsers, but without HDR support.
What does that achieve? Isn't it simpler to just not support HDR than to support HDR but tone map away the HDR effect?
Anyway, which web browsers have a setting to tone map HDR images such that they look like SDR images? (And why should "don't physically hurt my eyes" be an opt-in setting anyway instead of just the default?)
Because then a user who wants to see the HDR image in all its full glory can do so. If the base image is not HDR, then there is nothing they can do about it.
> And why should "don't physically hurt my eyes" be an opt-in setting anyway instead of just the default?
While I very much support more HDR in the online world, I fully agree with you here.
However, I suspect the reason will boil down to what it usually does: almost no users change the default settings ever. And so, any default which goes the other way will invariably lead to a ton of support cases of "why doesn't this work".
However, web browsers are dark-mode aware, they could be HDR aware and do what you prefer based on that.
That video is clearly not encoded correctly. If it were the levels would match the background, given there is no actual HDR content visible in that video frame.
Anyway, even if the video was of a lovely nature scene in proper HDR, you might still find it jarring compared to the surrounding non-HDR desktop elements. I might too, depending on the specifics.
However, like I said, it's up to the browser to handle this.
One suggestion I saw mentioned by some browser devs was to make the default to tone map HDR if the page is not viewed in fullscreen mode, and switch to full HDR range if it is fullscreen.
Even if that doesn't become the default, it could be a behavior the browser could let the user select.
Actually I forgot about auto-HDR conversion of SDR videos which some operating systems do. So it might not be the video itself, but rather the OS and video driver ruining things in this case.
Just because we're in the infancy of wide HDR adoption and thus experience some niggling issues while software folks work out the kinks isn't a good reason to just wholesale forego the feature in such a crucial piece of infrastructure.
Sure, if you don't want HDR in the browser I do think there should be a browser option to let you achieve that. I don't want to force it on everyone out there.
Keep in mind the screenshot you showed is how things looked on my Windows until I changed the auto-HDR option. It wasn't the browser that did it, it was completely innocent.
It was just so long ago I completely forgot I had changed that OS configuration.
If you want to avoid eye pain then you want caps on how much brightness can be in what percent of the image, not to throw the baby out with the bathwater and disable it entirely.
And if you're speaking from iphone experience, my understanding is the main problem there isn't extra bright things in the image, it's the renderer ignoring your brightness settings when HDR shows up, which is obviously stupid and not a problem with HDR in general.
> If the brightness cap of the HDR image is full SDR brightness, what value remains in HDR?
If you set #ffffff to be a comfortable max, then that would be the brightness cap for HDR flares that fill the entire screen.
But filling the entire screen like that rarely happens. Smaller flares would have a higher cap.
For example, let's say an HDR scene has an average brightness that's 55% of #ffffff, but a tenth of the screen is up at 200% of #ffffff. That should give you a visually impressive boosted range without blinding you.
I don't want the ability for 10% of the screen to be so bright it hurts my eyes. That's the exact thing I want to avoid. I don't understand why you think your suggestion would help. I want SDR FFFFFF to be the brightest any part of my screen goes to, because that's what I've configured to be at a comfortable value using my OS brightness controls.
I just don't want your "in between" "only hurt my eyes a little" solution. I don't see how that's so hard to understand. I set my brightness so that SDR FFFFFF is a comfortable max brightness. I don't understand why web content should be allowed to go brighter than that.
Yes, it's uncomfortable to have it get "ridiculously" bright.
But there's a level that is comfortable that is higher than what you set for FFFFFF.
And the comfortable level for 1% of the screen is even higher.
HDR could take advantage of that to make more realistic scenes without making you uncomfortable. If it was coded right to respect your limits. Which is probably isn't right now. But it could be.
I severely doubt that I could ever be comfortable with 10% of my screen getting much brighter than the value I set as max brightness.
But say you're right. Now you've achieved images looking completely out of place. You've achieved making the surrounding GUI look grey instead of white. And the screen looks broken when it suddenly dims after switching tabs away from one with an HDR video. What's the point? Even ignoring the painful aspects (which is a big thing to ignore, since my laptop currently physically hurts me at night with no setting to make it not hurt me, which I don't appreciate), you're just making the experience of browsing the web worse. Why?
it actually is somewhat an HDR problem because the HDR standards made some dumb choices. SDR standardizes relative brightness, but HDR uses absolute brightness even though that's an obviously dumb idea and in practice no one with a brain actually implements it.
In a modern image chain, capture is more often than not HDR.
These images are then graded for HDR or SDR. I.e., sacrifices are made on the image data such that it is suitable for a display standard.
If you have an HDR image, it's relatively easy to tone-map that into SDR space, see e.g. BT.2408 for an approach in Video.
The underlying problem here is that the Web isn't ready for HDR at all, and I'm almost 100% confident browsers don't do the right things yet. HDR displays have enormous variance. From "Slightly above SDR" to experimental displays at Dolby Labs. So to display an image correctly, you need to render it properly to the displays capabilities. Likewise if you want to display a HDR image on an SDR monitor. I.e., tone mapping is a required part of the solution.
A correctly graded HDR image taken of the real world will have like 95% of the pixel values falling within your typical SDR (Rec.709/sRGB) range. You only use the "physically hurt my eyes" values sparingly, and you will take the room conditions into consideration when designing the peak value. As an example: cinemas using DCI-P3 peaks at 48 nits because the cinema is completely dark. 48 nits is more than enough for a pure white in that environment. But take that image and put it on a display sitting inside during the day, and it's not nearly enough for a white. Add HDR peaks into this, and it's easy to see that in a cinema, you probably shouldn't peak at 1000 nits (which is about 4.x stops of light above the DCI-P3 peak). In short: your rendering to the displays capabilities require that you probe the light conditions in the room.
It's also why you shouldn't be able to manipulate brightness on an HDR display. We need that to be part of the image rendering chain such that the right decisions can be made.
You asked “which web browsers have a setting to tone map HDR images such that they look like SDR images?”; I answered. Were you not actually looking for a solution?
Wanted to note https://issues.chromium.org/issues/40141863 on making the lossless JPEG recompression a Content-Encoding, which provides a way that, say, a CDN could deploy it in a way that's fully transparent to end users (if the user clicks Save it would save a .jpg).
(And: this is great! I think JPEG XL has chance of being adopted with the recompression "bridge" and fast decoding options, and things like progressive decoding for its VarDCT mode are practical advantages too.)
The last discussion in libjxl about this was seemingly taking the stance it wasn't necessary since JXL has "native HDR" which completely fails to understand the problem space entirely.
Also, just because there's a spec for using gainmaps with JPEG doesn't mean that it works well. With only 8 bits of precision, it really sucks for HDR, gainmap or no gainmap. You just get too much banding. JXL otoh is completely immune to banding, with or without gainmaps.
> With only 8 bits of precision, it really sucks for HDR, gainmap or no gainmap. You just get too much banding.
This is simply not true. In fact, you get less banding than you do with 10-bit bt2020 PQ.
> JXL otoh is completely immune to banding
Nonsense. It has a lossy mode (which is its primary mode so to speak), so of course it has banding. Only lossless codecs can plausibly be claimed to be "immune to banding".
> The JXL spec already has gainmaps...
Ah looks like they added that sometime last year but decided to call it "JHGM" and also made almost no mention of this in the issue tracker, and didn't bother updating the previous feature requests asking for this that are still open.
> Nonsense. It has a lossy mode (which is its primary mode so to speak), so of course it has banding. Only lossless codecs can plausibly be claimed to be "immune to banding".
color banding is not a result of lossy compression*, it results from not having enough precision in the color channels to represent slow gradients. VarDCT, JPEG XL's lossy mode, encodes values as 32-bit floats. in fact, image bit depth in VarDCT is just a single value that tells the decoder what bit depth it should output to, not what bit depth the image is encoded as internally. optionally, the decoder can even blue-noise dither it for you if your image wants to be displayed in a higher bit depth than your display or software supports
this is more than enough precision to prevent any color banding (assuming of course the source data that was encoded into a JXL didn't have any banding either). if you still want more precision for whatever reason, the spec just defines that the values in XYB color channels are a real number between 0 and 1, and the header supports signaling an internal depth up to 64 bit per channel
* technically color banding could result from "lossy compression" if high bit depth values are quantized to lower bit depth values, however with sophisticated compression, higher bit depths often compress better because transitions are less harsh and as such need fewer high-frequency coefficients to be represented. even in lossless images, slow gradients can be compressed better if they're high bit depth, because frequent consistent changes in pixel values can be predicted better than sudden occasional changes (like suddenly transitioning from one color band to another)
Isn't this exactly the case that wuffs [1] is built for? I had the vague (and, looking into it now, probably incorrect) impression that Google was going to start building all their decoders with that.
Those ratios seem way off if you're referring to the M1 Max and not the base M1. If we use Geekbench CPU performance, the Ryzen 9 7945HX (which is from 2023) is around 12% faster single core and 32% faster multicore than the M1 Max (which is from 2021). If you look at the 2024 M4 Max, it's substantially faster than the Ryzen and Intel you mentioned.
153 GB/s is not bad at all for a base model; the Nvidia DGX Spark has only 273 GB/s memory bandwidth despite being billed as a desktop "AI supercomputer".
Models like Qwen 3 30B-A3B and GPT-OSS 20B, both quite decent, should be able to run at 30+ tokens/sec at typical (4-bit) quantizations.
Even at 1.8x the base memory bandwidth and 4x the memory capacity Nvidia spent a lot of time talking about how you can pair two DGXs together with the 200G NIC to be able to slowly run quantized versions of the models everyone was actually interested in.
Neither product actually qualifies for the task IMO, and that doesn't change just because two companies advertised them as such instead of just one. The absolute highest end Apple Silicon variants tend to be a bit more reasonable, but the price advantage goes out the window too.
It's colleges they they have been clamping down on, as they were bringing in absolutely massive numbers of mostly Indian students who were coming mainly to work in low-end jobs and get out of India rather than to legitimately study.
The number of graduate students being allowed in hasn't changed significantly, and undergraduate university students are also continuing to be brought in at rates similar to pre-pandemic times.
Mistral models are largely along the likes of what you were asking for. However, Grok (any version) absolutely is not a “don’t say gay” model, it talks about sexuality of all forms quite openly and fairly and is happy to produce creative content of any level of explicitness about these topics. It’s the least censored unmodified model I’ve encountered on any topic. People dismiss Grok as a Nazi model based on Musk’s politics without using it themselves.
In general, I agree. However, many older cars were small, light, simple, and raw - characteristics that have largely disappeared from modern cars. Automatic transmissions from the mid-90s and earlier generally sucked, though good old manual transmissions are not much different from good modern ones.
As an example, I owned a W126 S class from the late 80s, and it was fun in its unique way and no modern cars replicate its experience. It had somewhat heavy and very feedback rich steering feel, and Porsche-like firm and tactile pedal feel, while having a super supple ride over the most awful roads with SUV-like ground clearance and tremendous suspension travel. The car was also super simple and reliable; my 300SE had nearly 400k km with all original powertrain when I sold it, it never let me down, and it weighed less than a modern A class or CLA. While not as safe as modern cars, it was exceptionally safe for its era and comparable to normal cars of the early 2000s for crash structure safety.
The W140 (I used to own one too) had a much better powertrain, but it lost the raw tactile scrappy nature of its predecessor, and nor could it handle super awful potholed roads as well as the W126. There are no modern cars that combine the rich raw tactile control feel and super supple ride the W126 had.
Look at cars like the BMW E30, or Mercedes-Benz 190E (W201), or the superbly engineered workhorses that the W123 and W124 were. There are no modern cars that replicate the genuinely delightful driving experience of those.
Oh yes, preach the gospel of the W126. I had a 1986 300SD for a while, and I’d own one again in a heartbeat. I’ve never felt safer, or cooler, driving a car. You had a gasser, which I bet was faster than mine, but the sound of that diesel spinning up the turbo was something else.
I agree about the W123 as well. I’ve owned half a dozen of those. For a couple generations there it seemed that Mercedes had cars just about solved.
My daily is a W126 with the OM603. It's getting harder every day to find parts (when I need them, which is infrequent) but it's worth the hassle because like you say there is nothing modern that has the same combination of feel and ride quality. Or visibility! I can parallel park this car (long wheelbase too) in tiny spots easier than a modern compact because you can actually see.
I've got a W140 with the M120 and a W123 with the OM616 and a 4-speed too, and while they have their charms (especially the W123) nothing tops the W126. It truly was not just the finest production sedan Mercedes made, but ever made by anyone. (Other contenders being the W100, the W140, and the Lexus LS.)
> However, many older cars were small, light, simple, and raw - characteristics that have largely disappeared from modern cars.
I feel parent's point still stands.
Sure, you won't be able to go to a random Ford dealership and go home with a small light and simple car, but there are plenty of modern car accessible through a modicum of effort. Even buying something new abroad and bring it back home will probably be less hassle than restoring an old car.
I wonder if buying a kit car would still be simpler, for still better results.
Aside from the Mazda MX-5 (which isn’t the most practical car), almost all small, simple, and light cars made today are econoboxes. They’re not designed to have the rich control feel, balanced and satisfying handling near the limits, responsiveness, material quality, suspension sophistication, etc. compared to say German luxury compact cars of the 1980s (BMW E30 or M-B W201). Even cars like 90s Hondas, while front wheel drive and built to a much lower price point, had rich control feel, liveliness, and agility that modern cars don’t give.
Modern luxury cars from essentially all brands around the world have become huge, heavy, numb, and over-complicated. They’re much faster and quieter than the say the old Benzes and BMWs of the 80s, but they don’t have the fun raw feel, small size, light weight, tossability, and simplicity of the old cars.
A BMW E30 or M-B W201 have a weight somewhere between a Mazda MX-5 and Subaru BRZ, but are far more practical than either for passengers and cargo despite being around the same width and only slightly longer.
The only modern cars with similar size and weight are some European market compact cars and econoboxes like the Mitsubishi Mirage, Nissan Micra, and Chevy Spark (which are also disappearing from North America). For steering feel, handling, general raw and connected driving feel, powertrain responsiveness, and interior quality, these modern economy cars can’t compete. Some of the European market specific B-segment cars come closest to those older compact luxury cars, but they still don’t match them for the qualities I described.
Kit cars generally suck from a practical perspective compared to well engineered 80s/90s cars and aren’t a very practical option either.
> They’re not designed to have the rich control feel, balanced and satisfying handling near the limits, responsiveness, material quality, suspension sophistication, etc.
Sounds to me like you're looking for a Lotus or a 911 at budget prices. I agree with you that's pretty far from the "simple, simple, light" vehicle, and it's fully in the hobby realm.
If you're that deep into cars, I'd say more power to you, and spending ungodly amount of money time and effort on vintage cars is probably a pleasure as well.
That’s the thing - old German compact luxury sedans from the 80s had the control feel, balance, and light weight you get from a Porsche, while also being practical family cars. There’s nothing like that made today. They were also decently safe and comfortable and reliable and generally just good.
Also the bigger ones like the W126, while not as light and agile as a Porsche or Lotus, still had similar control feel, very comfortable and spacious interiors, and could glide over the worst most broken and potholed roads better than any modern car I’ve driven. They’re also much simpler than any modern luxury cars, much less to break, and they just keep going and going as long as you take basic care. From personal experience, a much younger used W220 or W221 S class needs far more maintenance and repair than an old W126.
The more powerful but still reliable engines and nicer transmissions of the late W140 or W220 would be nice to have in a W126 though. My problem with the newer S classes is the complexity and fragility of the rest of the car.
Of course, these are 40 year old cars and need more care and maintenance than a new car, but they’re not too bad either as long as you get a good example of the car. They’re pretty reliable once sorted, and can last a very long time and very high mileage as long as they’re at least somewhat cared for.
While cloud models are of course faster and smarter, I've been pretty happy running Qwen 3 Coder 30B-A3B on my M4 Max MacBook Pro. It has been a pretty good coding assistant for me with Aider, and it's also great for throwing code at and asking questions. For coding specifically, it feels roughly on par with SOTA models from mid-late 2024.
At small contexts with llama.cpp on my M4 Max, I get 90+ tokens/sec generation and 800+ tokens/sec prompt processing. Even at large contexts like 50k tokens, I still get fairly usable speeds (22 tok/s generation).
Privacy, both personal and for corporate data protection is a major reason. Unlimited usage, allowing offline use, supporting open source, not worrying about a good model being taken down/discontinued or changed, and the freedom to use uncensored models or model fine tunes are other benefits (though this OpenAI model is super-censored - “safe”).
I don’t have much experience with local vision models, but for text questions the latest local models are quite good. I’ve been using Qwen 3 Coder 30B-A3B a lot to analyze code locally and it has been great. While not as good as the latest big cloud models, it’s roughly on par with SOTA cloud models from late last year in my usage. I also run Qwen 3 235B-A22B 2507 Instruct on my home server, and it’s great, roughly on par with Claude 4 Sonnet in my usage (but slow of course running on my DDR4-equipped server with no GPU).
Add big law to the list as well. There are at least a few firms here that I am just personally aware of running their models locally. In reality, I bet there are way more.
A ton of EMR systems are cloud-hosted these days. There’s already patient data for probably a billion humans in the various hyperscalers.
Totally understand that approaches vary but beyond EMR there’s work to augment radiologists with computer vision to better diagnose, all sorts of cloudy things.
It’s here. It’s growing. Perhaps in your jurisdiction it’s prohibited? If so I wonder for how long.
In the US, HIPAA requires that health care providers complete a Business Associate Agreement with any other orgs that receive PHI in the course of doing business [1]. It basically says they understand HIPAA privacy protections and will work to fulfill the contracting provider's obligations regarding notification of breaches and deletion. Obviously any EMR service will include this by default.
Most orgs charge a huge premium for this. OpenAI offers it directly [2]. Some EMR providers are offering it as an add-on [3], but last I heard, it's wicked expensive.
I'm pretty sure the LLM services of the big general-purpose cloud providers do (I know for sure that Amazon Bedrock is a HIPAA Eligible Service, meaning it is covered within their standard Business Associate Addendum [their name for the Business Associate Agreeement as part of an AWS contract].)
Sorry to edit snipe you; I realized I hadn't checked in a while so I did a search and updated my comment. It appears OpenAI, Google, and Anthropic also offer BAAs for certain LLM services.
In the US, it would be unthinkable for a hospital to send patient data to something like ChatGPT or any other public services.
Might be possible with some certain specific regions/environments of Azure tho, because iirc they have a few that support government confidentiality type of stuff, and some that tout HIPAA compliance as well. Not sure about details of those though.
Possibly stupid question, but does this apply to things like M365 too? Because just like with Inference providers, the only thing keeping them from reading/abusing your data is a pinky promise contract.
Basically, isn't your data as safe/unsafe in a sharepoint folder as it is sending it to a paid inference provider?
I do think Devs are one of the genuine users of local into the future. No price hikes or random caps dropped in the middle of the night and in many instances I think local agentic coding is going to be faster than the cloud. It’s a great use case
I am extremely cynical about this entire development, but even I think that I will eventually have to run stuff locally; I've done some of the reading already (and I am quite interested in the text to speech models).
(Worth noting that "run it locally" is already Canva/Affinity's approach for Affinity Photo. Instead of a cloud-based model like Photoshop, their optional AI tools run using a local model you can download. Which I feel is the only responsible solution.)
I agree totally. My only problem is local models running on my old macMini run very much slower than that for example Gemini-2.5-flash. I have my Emacs setup so I can switch between a local model and one of the much faster commercial models.
Someone else responded to you about working for a financial organization and not using public APIs - another great use case.
These being mixture of expert (MOE) models should help. The 20b model only has 3.6b params active at any one time, so minus a bit of overhead the speed should be like running a 3.6b model (while still requiring the RAM of a 20b model).
Here's the ollama version (4.6bit quant, I think?) run with --verbose
total duration: 21.193519667s
load duration: 94.88375ms
prompt eval count: 77 token(s)
prompt eval duration: 1.482405875s
prompt eval rate: 51.94 tokens/s
eval count: 308 token(s)
eval duration: 19.615023208s
eval rate: 15.70 tokens/s
15 tokens/s is pretty decent for a low end MacBook Air (M2, 24gb of ram). Yes, it's not the ~250 tokens/s of 2.5-flash, but for my use case anything above 10 tokens/sec is good enough.
On my M4 Max MacBook Pro, with MLX, I get around 70-100 tokens/sec for Qwen 3 30B-A3B (depending on context size), and around 40-50 tokens/sec for Qwen 3 14B. Of course they’re not as good as the latest big models (open or closed), but they’re still pretty decent for STEM tasks, and reasonably fast for me.
I have 128 GB RAM on my laptop, and regularly run multiple multiple VMs and several heavy applications and many browser tabs alongside LLMs like Qwen 3 30B-A3B.
Of course there’s room for hardware to get better, but the Apple M4 Max is a pretty good platform running local LLMs performantly on a laptop.