To this list I'd add pulling the jaw backward/inward.
Moving the jaw forward and then to the right has the biggest effect for me, causing the ringing on the left ear to increase. It's asymmetric in that moving the jaw to the front left has only a very small effect on the right ear.
The moratorium on new large training runs needs to be indefinite and worldwide. There can be no exceptions, including for governments or militaries. If the policy starts with the U.S., then China needs to see that the U.S. is not seeking an advantage but rather trying to prevent a horrifically dangerous technology which can have no true owner and which will kill everyone in the U.S. and in China and on Earth.
.
.
.
Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.
Since when are military spooks and political opportunists better at deciding on our technological future than startups and corporations? The degree of global policing and surveillance necessary to fully prevent secret labs from working on AI would be mind-boggling. How would you ensure all government actors are sticking to the same safety standards rather than seizing power by implementing AI hastily? This problem has long been known as quis custodiet ipsos custodes - "who guards the guards themselves?".
Training a SOTA model is expensive, but you only need to do it once, and fine-tune a thousand times for various purposes.
And it's not even that expensive when compared to the cost of building other large scale projects. How much is a dam, or a subway station? There are also corporations who would profit from making models widely available, such as chip makers, they would commoditise the complement.
Once you have your very capable, open sourced model, that runs on phones and laptops locally, then fine-tuning is almost free.
This is not make belief. A few recent fine-tunes of Mistral-7B for example are excellent quality, and run surprisingly fast on a 5 year old GPU - 40T/s. I foresee a new era of grassroots empowerment and privacy.
In a few years we will have more powerful phones and laptops, with specialised LLM chips, better pre-trained models and better fine-tuning datasets distilled from SOTA models of the day. We might have good enough AI on our terms.
> Once you have your very capable, open sourced model, that runs on phones and laptops locally, then fine-tuning is almost free.
Hence the idea to ban development of more capable models.
(We're really pretty lucky that LLM based AGI might be the first type of AGI made, it seems much lower risk and lower power than some of the other possibilties)
Of course each emergency is one too much, but I wonder what are the sample sizes? Are the counts statistically significantly different from each other? Another thing to consider: Emergencies are rare events and so a small difference in circumstances can make the outcomes vary substantially. Is a well oiled manufacturing pipeline like a Audi factory comparable to a new factory that has not rounded all sharp corners yet?
> Is a well oiled manufacturing pipeline like a Audi factory comparable to a new factory that has not rounded all sharp corners yet?
I wonder about this as well. It would make sense that a brand new factory that is rapidly growing would experience more safety issues. No idea if this is a reasonable increase though.
Well ... or the owner of the company operates on Zucks quote of "Move fast and break things". When break does not mean software or money, than this is no longer a fun quote.
I would expect fewer safety issues, since learnings from previous years can be taken into account during design, which isn't easy to do in existing factories.
Both effects could be in place at the same time, in which case you would expect a higher rate of workplace accidents initially, and a lower rate in the mid- to long-term.
Until either of us finds studies on this topic it's meaningless speculation, because I honestly can't imagine more accidents, even initially, if things are done as they should be.
3x as many emergencies per employee (not total, per employee) would be at least moderately alarming even if the numbers were 0.003 accidents per employee at Tesla and 0.001 accidents per employee at Audi.
I wouldn't get too hung up on statistical significance here. Whenever someone brings up statistical significance, I always like to ask: "significant with respect to what? by what standard?"
However it might be interesting to consider whether 3x is normal for all new production facilities. But that's a separate question.
Even though p-values can be hacked, they are very useful when they aren't. At p = 0.1 I'd ignore the finding because there would be a 10% chance it was explained by random chance. p = 0.01 would pique my interest. p < 0.001 I'd accept it as true, but I'd still watch out for systematic biases such as comparing new to old factories.
Right, if you're at the point of constructing some kind of principled estimate of variation in the data then I think you have a pass to at least talk about "significance". But in that case I'm sure you're aware that this requires a particular hypothesis test in mind, not just an abstract notion of "significance", and that p-values interpreted as "strength of evidence" are problematic.
In this study recycled glass bottles were not better when it comes to microplastics. It's everywhere. In lids, detergents etc. In fact, single-use plastic bottles did best!
Getting a custom URL for a Discord server (e.g. discord.gg/hackernews) requires 14 "server boosts" which cost $35/year each, so nearly $500/year. There's a discount if you have their premium Nitro package, but even then it's something on the order of $300.
Meanwhile discord.io is free and you won't lose your URL to a crypto scam server when someone forgets to renew their boost. Kind of inevitable that such a service would pop up.
> to reproduce, distribute, and create derivative works of the Software Products solely for your non-commercial research purposes
I wouldn't call these terms permissive. It's in line with the recent trend in released AI models, but fairly restrictive in what you're actually allowed to do with it.
The Instruct model has that non-commercial restriction, but I'm not sure why. They say it was trained with Alpaca-formatted questions and responses, but I'm not sure if that includes the original Alpaca dataset.
I believe that is more related to how the default Huggingface inference UI is prompting. Running locally with the correct prompt template it gives default completes, eg
I guess you could come up with a thousand example prompts and pay some students to pick which output is better, but I can also see why you wouldn't bother. It probably depends on language, type of prompt, etc.
Sure it's easy -- you can use benchmarks like HumanEval, which Stability did. They just didn't compare to Codex or GPT-4. Of course such benchmarks don't capture all aspects of an LLM's capabilities, but they're a lot better than nothing!
One could team up with Hackerrank/leetcode, let the model code in the interface (maybe there's an API for that already, no idea), execute their code verbatim and see how many test cases they get right the first time around. Then, like for humans, give them a clue about one of the tests no passing (or code not working, too slow, etc.). Give points based on the difficulty of the question and the number of clues needed.
I guess the obvious caveat is that these model are probably overfitted on these types of questions. But a specific benchmark could be made containing question kept secret for models. Time to build "Botrank" I guess.
> On HumanEval, Copilot is 40+ on pass@1 comparing to 26 for stable code 3b.
Can you put those numbers into context for those who haven't done HumanEval? Are those percentages so that 40+ means 40+% and 26 is 26%? If so does that imply both would be failing scores?
From personal experience I need a regular cleaning every year and since I'm not cheaping out anymore on this as I did as student my teeth related health problems went to 0. I think it is useful if you aren't that disciplined with taking care of it yourself. I forget to floss every other time or simply forget to brush at times when I sit at a longer coding session and might eat during it but then forget to brush teeth afterwards. Procedure always removes plenty of scale/tartar.
Over 2-3 years, which isn't insignificant, and does suggest cleanings can be done less frequently, but it isn't particularly long term.
I'm curious if that still holds over say 5 years. And how does it intersect with other practices. For example are people who don't floss religiously more likely to benefit from more frequent cleanings, etc.
The first one is about just 1 tooth they found in a pool of 1000 extracted teeth. The second one says "there was a high association between gingivitis and plaque status with calculus accumulation" in the abstract, thus contradicting your earlier comment.
If by "this" you mean "no reflowing", yes. If the maths is laid out client-side at run time, the exact dimensions of the formula are not known until the Latex code is interpreted.
Thanks. I was hoping there was a way to manipulate the DOM on the client before it is rendered, but this seems to be impossible with JS. Currently, I render various things during page load with NodeObserver. It mostly performs well on modern devices, but it can get sluggish on very large HTML pages (like 5MB+ of HTML code), so it looks like I have to resort to infinite scrolling with automatic unloading.
Moving the jaw forward and then to the right has the biggest effect for me, causing the ringing on the left ear to increase. It's asymmetric in that moving the jaw to the front left has only a very small effect on the right ear.
Moving the ears backwards has no effect for me.