They detail the energy used and therefore estimated carbon emissions which is interesting. When I estimate the raw electricity cost using 7-20 cents per kWh for US commercial rates, then we are only talking about $16-50k for electricity, that seems pretty small! Is my math wrong?
Is there any information on how much the computing costs were for renting the clusters?
Is the barrier to entry for a 7B model only a couple $100K?
I feel like we trying to optimize what we measure. No such measurements happen for other industries. How much does Las Vegas use electricity for the extravagant display of lights, water shows and so on.
Despite the typical complaints about "X new thing harming the environment!!!", LLMs are as friendly as it gets, it
1. Consumes a minor amount of electricity (Data centers is only 2% of US electricity use, and currently AI is maybe only 5-10% of that). Its trivial compared to say metal smelting.
2. Consume water for cooling.
That's it, there is 0 direct pollution generated from AI, and even the water use is very minor compared to say farming, and can be improved via more water efficient cooling techs.
The main concern is the scaling speed. As LLMs scale up 10x, 100x, 1000x, those previously very minor electricity costs can quickly become grid impacting in a decade.
I can't buy this kind of argument anymore. How about the external effect of AI steering the entire semiconductor industry to increase GPU/NPU capacity?
> we are only talking about $16-50k for electricity, that seems pretty small
I suppose this depends greatly on how you view the utility of LLMs. In a capitalist sense, sure—there's great utility here persuading VCs to part with their coins and jobs to be replaced with correspondingly larger profit margins. But the opportunity cost of not solving major problems most of humanity can agree on seems nearly incalculably large. Not that capitalists give a shit.
This is a process of exploring new technology. Research is expensive and probably doesnt always yield immediate returns, but when it does you get infinite returns.
Imagine how not obvious the first machines must have seemed at the start of the industrial revolution. You only have to feed a man and he can work, but a machine requires iron, oil, water, fuel, engineers, operators. The up front cost for exploring early digging machines must have been absurd. And im sure some people at the time thought: "Wow we could be spending this money on bread for the poor instead."
If we had spent the money on bread for the poor instead, we wouldn't be facing an existential threat created by our lack of understanding of the consequences of our actions, and our collective inability to respond to that effectively.
You really want to have 10 kids and have 50% or more of them die before 10 years old? You want a world before penecillin and antibiotics? No computers? No travel. Women getting marrried off at 15 immediatly pregnant. Most of the world in absolute poverty. Destroyed by a single bad season. Mass famines, plagues, tribal warfare that sweeps over your village. No clean water and soap. malnutrition.
These are just non problems for huge portions of the planet now.
What if investing in AI tech like LLMs eventually allows knowledge workers to be more productive with fewer resources, and therefore ultimately frees up more people to focus on the so-called major problems?
Maybe we can invest more human hours in speeding up the path to zero emissions and energy abundance, or re-planting deserts, or cleaning up forever chemicals / microplastics, or helping at-risk kids, etc etc.
not disagreeing, but sidenote, many of the issues causing the human issues may be semi orthogonal to the level of technology going forward. We already have enough resources for the poor amd hungry and homeless. Its behavioural issues we dont know how to fix. How to bootstrap a crackhead into a bank teller, so to speak.
i hope the bottom 10% rung on a dyson sphere society doesnt just look like hungry homeless people, but on a space station.
Or view quality education and healthcare for children, and keeping their parents out of survival mode, as a much better investment for everyone, than funding the adventures of overly war happy presidents.
I am enthusiastically agreeing with you. Behavioral changes at the top and bottom of society are most of the problem - not tech.
I live in japan and they have implemented what you are requesting. ive recently been to a relatives house here where they live off government handouts despite having jobs. the government pays the woman for having children. as you can imagine this is a perverse incentive. She and her four ish children have jobs. Despite collectively having more than 100k a year to work with, they live in essentially a dirty crack house with a toilet that hasnt worked for years. They fight over money and emotionally blackmail family to get 1-10k dollars at a time, and never pay it back.
Being poor like this is not a money problem. Its a behaviour problem.
You cant fix this by giving them money. They just spend it on alcohol and cigarettes.
Having interacted with them I know they arent obviously stupid and they are educated. My wife attended the same strict japanese school. Very high quality compared to an average american school, they made it up through calculus as high schoolers. She still remembers reiman sums 15 years later.
Your current perception of the world isnt quite right. It sounds like youve got this magic fix in your head, but in reality it just wouldnt work. Youre ignoring the thing you profess to actually care about... the people.
> Behavioral changes at the top and bottom of society are most of the problem
(Added emphasis)
I agree with everything you say, lots of irresponsible people and culture. But that isn't the whole story.
The wealthy and asset owners also tilt the economy toward themselves and away from labor and the less wealthy in many ways.
Poor outcomes for young individuals do have strong correlations, with strong causal support, to low income districts with poor health and education resources, poor safety, and poverty level parents. That is a circular problem created by treating the education, health, and safety of children as a "local" issue, instead of what it obviously is, a national issue.
Also, housing is a problem for many working people, while the rich magnify the problem by using the limited availability of real estate as a useful financial instrument to park money, making profitable returns based on exclusivity and productive economic growth elsewhere which increases further investment in land, even if the land is underutilized.
This is due to the perverse incentive of taxation on total land and development value instead of just the land. (Development on land should be encouraged, not taxed. Other developent and property isn't "wealth" taxed. Whereas, the underlying land is limited, so taxing those who make it unavailable for others is a community neutral bargain - and makes the underutilization of land unprofitable.)
This goes on and on ... regulation capture, use of personal loans against personal property give wealthy asset owners liquidity events that fund high lifestyles without any taxes associated with it, taxes on labor that increase beyond tax rates on capital and for corporations, etc.
The rich and asset ownership classes use government policy to actively tilt things there way, on the backs of those who's primary "asset" is their labor value, throughout society.
Who will be the first to do a useful Instruct-trained variant?
It's a pity the Mistral 7B Instruct 0.2 dataset isn't available because I've found that a much higher quality than any of the finetunes around, and I suspect we'll have to rely on the same groups doing finetunes for this.
And Capybara be lookin' fiiine for tuning too. Seriously, though, you're right. These are some of the highest quality generative datasets in existence, and I'm surprised more isn't being done with them.
I'm sorry, I don't understand the exact contribution here? There's many tutorials on how to train a language model. If it's a repository of SOTA techniques for training, this will be outdated in at max 3 months, and anyways the ground shifts under you in this field so you might as well read Arxiv all day if your intention is to keep up with SOTA.
It looks like this team gave us everything we need to reproduce their models, the actual artifacts needed to reproduce it. As far as I can tell, they share the data and every step along the way to final model...not just describing what they did.
wdym by cross check each others work? Surely just reporting the final loss is good enough if that's the intention. The final end goal is lower loss anyways so it's not even a bad metric.
If you read around, training a 7B model costs on the order of $85,000; the 1.4 stable diffusion release cost around $600,000 to train.
You don't see a lot of 70B or larger models being released for the same reason; it's expensive.
We should just be grateful for what we're getting right now: basically, people are spending 100s of thousands of dollars on training and giving the results away for free. Hugging face is hosting them for free. ollama is hosting them for free. People are writing free inference engines (eg. llama.cpp) and giving them away.
Don't complain. We've got it pretty damn good right now.
You’ve just described Google which derives most of its ad revenue from ads it places on the search engine that’s crawling the public web. It has always been thus that derivative products that provide a meaningful transformation of the input is a wholly separate piece of copyright.
No, this is very different. Google will link you to the NYT, you read there, and see ads. If GPT eats the web and pay walls it, they are 100% free riding.
Now, I also think the Google model is proven at this point to be a bad model since the web is 90% ads and SEO dogshit. They strip mined the value, took them a while, but its nearly decimated.
The value of ChatGPT isn't that it regurgitates the NYT. The value is that it will read the NYT and the Washington Post and Fox News and The Guardian and everything else for you and synthesise a new view from it all that represents the viewpoint you ask for.
That's completely different to Google a d completely different to anything done before. It's as transformative as a human expert news analyst giving you a new perspective on a story.
Of course, that is partially my point: if OpenAI et al wants to make the argument anything online is fair game, then they should release the weights. If not, they have no leg to stand on.
Whether that’s true or not, the fact remains that a lot of people are spending real money in astonishing large amounts and not asking for anything in return.
Seriously, complaining they haven’t spent enough money or didn’t spend 600k making exactly you the model you wanted is…
Let’s just say, ungracious.
Got some cake for my birthday, but it wasn’t the chocolate deluxe cream cake I wanted.
…just remember, the cake is pretty good, and it’s free. :)
Over time the cost of training models will come down and bigger open models will turn up, eventually.
yes the size is different, but training a diffusion model and a language model are really different, like how RL models can be small but take a long time to train aswell
Open source means i have documentation to reproduce the same results. This is only true with tinyllama and this model. The other models (llama, mistral) are free to use and not open source.
There are few that are >1B params, competitive, and "open source" in the sense that the necessary ingredients to re-train are available. Models like Llama and thus its descendants (including Mistral's public models) have weights available but not the training data.
...
Weights & Biases logs for our training runs."
That's amazing. I've never seen that before in a paper of this quality. Or, any paper at all.