The model's source code (the training data) is hundreds of GB and much harder to transfer. The compiling (training) process is also very costly. This is very different from the Linux case.
Only big techs have enough resources to make these things happen.
I like looneysquash's viewpoint about the definition of open source AI. You will need to have all parts involved open-sourced to make a model "open", not just the weights:
> The trained model is object code. Think of it as Java byte code. You have some sort of engine that runs the model. That's like the JVM, and the JIT. And you have the program that takes the training data and trains the model. That's your compiler, your javac, your Makefile and your make. And you have the training data itself, that's your source code.
> Each of the above pieces has its own source code. And the training set is also source code. All those pieces have to be open to have a fully open system. If only the training data is open, that's like having the source, but the compiler is proprietary. If everything but the training set is open, well, that's like giving me gcc and calling it Microsoft Word.
The model's source code (the training data) is hundreds of GB and much harder to transfer. The compiling (training) process is also very costly. This is very different from the Linux case.
Only big techs have enough resources to make these things happen.
I like looneysquash's viewpoint about the definition of open source AI. You will need to have all parts involved open-sourced to make a model "open", not just the weights:
> The trained model is object code. Think of it as Java byte code. You have some sort of engine that runs the model. That's like the JVM, and the JIT. And you have the program that takes the training data and trains the model. That's your compiler, your javac, your Makefile and your make. And you have the training data itself, that's your source code.
> Each of the above pieces has its own source code. And the training set is also source code. All those pieces have to be open to have a fully open system. If only the training data is open, that's like having the source, but the compiler is proprietary. If everything but the training set is open, well, that's like giving me gcc and calling it Microsoft Word.
https://news.ycombinator.com/item?id=41952722