Your point is dogmatic so I can see how you'd read an article that boils down to "they gave us the most important pieces and explained every missing part is such detail that we feel confident we can reproduce it" and still protest "but they didn't give us everything!".
But the reality is by giving us the model weights alone they'd already be providing an immense boon given how important being able to cheaply produce reasoning traces is for distillation. And they went even further and documented the exact path they took which in immense detail that's already being digested and extended on by the community. Releasing R1-Zero and explaining where it fits in the puzzle wasn't necessary and yet it helped leapfrog open attempts at reasoning models and will likely even influence future closed models from other providers.
They've given us a lot more than any closed source provider has when the leading closed source provider won't even provide thinking traces because of fear of competitive distillation.
-
Also "these people" are people who just write prompts and call a REST API so it doesn't matter if the weights are available or not. At most they might replay some requests to a different REST API that returns a finetuned model for them. Sounds like what you do?
I rely on models I've posttrained with custom vocublaries, run through AWQ specific to my downstream task, and that are being inferenced on with custom samplers specific to my downstream task. All things actually closed source models can't do.
In other words, I derive enough value from "open source" models to not quibble over definitions. I find the people who do bother quibbling aren't doing anything with all the capabilities unlocked so they're willing to argue about just how open things are, but ironically wouldn't do anything even if the models were open by their absolutist definitions.
> In other words, I derive enough value from "open source" models to not quibble over definitions
I have never said _anything_ about whether there is or isn't any value. Ive mentioned correctly that it isn't opensource.
I'm sorry you feel triggered by this, but both things can exist at the same time. It can be useful, come with a recipe, and it can stil not be opensource.
> At most they might replay some requests to a different REST API that returns a finetuned model for them. Sounds like what you do?
Please refrain from ad-hominem statements.
What I am saying is that when we redefine existed well understood definitions for marketing purposes as it muddies the waters.
On top of that, what if there are true opensource models that both provide training data + training code + inference code what do we call them? Extra opensource?
> I rely on models I've posttrained with custom vocublaries, run through AWQ specific to my downstream task, and that are being inferenced on with custom samplers specific to my downstream task. All things actually closed source models can't do.
Ironic that your comment contains the only problematic ad-hominem statement in this conversation...
You jumped straight to "you're triggered" when all I did was reject the idea that we should let people who aren't familiar with where value resides in the model pipeline get to define what parts of the pipeline need to be shared to count as open source.
My HF profile has over 50 post-tained models available for anyone to download by the way, and I've had sampler options upstreamed to multiple inference projects.
> Technically, R1 is “open” in that the model is permissively licensed, which means it can be deployed largely without restrictions. However, R1 isn’t “open source” by the widely accepted definition because some of the tools used to build it are shrouded in mystery. Like many high-flying AI companies, DeepSeek is loathe to reveal its secret sauce.
and
> You are right about the lack of data information for DeepSeek, which is a requirement from the OSAID.
That quote is straight up wrong to claim Deepseek has "loathed to reveal their secret sauce".
The source of all the excitement is exactly how much they revealed, and I feel like that thread as a whole emphasizes why people who aren't deeply familiar with the pipeline should not get to define these things.
There is a lot of detail about the nature of the data used and the exact steps needed to reproduce their findings with your own data. They even provide R1-Zero to demonstrate things that might be dead ends just in case someone can continue them. That should be enough to satisfy any useful definition of open source.
Even in the same thread you linked:
> Just a curiosity, according to the Model Openness Framework from the Linux Foundation, DeepSeek-R1 classifies as an Open Model:
At the end of the day this is as good as it needs to be for LLMs: By their nature a lot of data being used to train them cannot or should not be openly shared, but the shape and motivations behind the data used are able to push others very far along the way to reproduction and iteration.
But the reality is by giving us the model weights alone they'd already be providing an immense boon given how important being able to cheaply produce reasoning traces is for distillation. And they went even further and documented the exact path they took which in immense detail that's already being digested and extended on by the community. Releasing R1-Zero and explaining where it fits in the puzzle wasn't necessary and yet it helped leapfrog open attempts at reasoning models and will likely even influence future closed models from other providers.
They've given us a lot more than any closed source provider has when the leading closed source provider won't even provide thinking traces because of fear of competitive distillation.
-
Also "these people" are people who just write prompts and call a REST API so it doesn't matter if the weights are available or not. At most they might replay some requests to a different REST API that returns a finetuned model for them. Sounds like what you do?
I rely on models I've posttrained with custom vocublaries, run through AWQ specific to my downstream task, and that are being inferenced on with custom samplers specific to my downstream task. All things actually closed source models can't do.
In other words, I derive enough value from "open source" models to not quibble over definitions. I find the people who do bother quibbling aren't doing anything with all the capabilities unlocked so they're willing to argue about just how open things are, but ironically wouldn't do anything even if the models were open by their absolutist definitions.