More

harisec · 2025-04-16T10:25:06 1744799106

$44M/year?

https://www.usaspending.gov/award/CONT_AWD_70RCSJ23FR0000015...

londons_explore · 2025-04-16T12:03:16 1744804996

Totally worth cancelling then.

Some volunteer will set up a GitHub pages and mailing list to fulfill the same duties.

Peanuts99 · 2025-04-16T14:19:36 1744813176

Or 10 people will create that list and nobody will use any of them. The whole point here is that the CVE program had the network effect of being the defacto list of issues but now that's been pissed away.

harisec · 2025-04-06T17:04:56 1743959096

Yes, during training multiple checkpoints are created, you can distill from any checkpoint you want.

harisec · 2025-02-18T13:06:38 1739883998

Anybody can try Grok3 on Chatbot Arena (even if you are in Europe). Select Direct Chat and select the model early-grok-3. https://lmarena.ai/

harisec · on Nov 17, 2024

It doesn’t matter much how many users Bluesky is gaining, it matters how many of them will be using Bluesky in a few months. We will see.

harisec · on Nov 14, 2024

Congrats, good luck with your new company!

I have one question regarding your ARC Prize competition: The current leader from the leaderboard (MindsAI) seems not to be following the original intention of the competition (fine tune a model with millions of tasks similar with the ARC tasks). IMO this is against the goal/intention of the competition, the goal being to find a novel way to get neural networks to generalize from a few samples. You can solve almost anything by brute-forcing it (fine tunning on millions of samples). If you agree with me, why is the MindsAI solution accepted?

versteegen · on Nov 15, 2024

> the goal being to find a novel way to get neural networks to generalize from a few samples

Remove "neural networks". Most ARC competitors aren't using NNs or even machine learning. I'm fairly sure NNs aren't needed here.

> why is the MindsAI solution accepted?

I hope you're not serious. They obviously haven't broken any rule.

ARC is a benchmark. The point of a benchmark is to compare differing approaches. It's not rigged.

Borchy · on Nov 15, 2024

I also don't understand why MindsAI is included. ARC is supposed to grade LLMs on their ability to generalize i.e. the higher score the more useful they are. If MindsAI scores x2 than the current SOTA then why are we wasting our $20 on inferior LLMs like ChatGPT adn Claude when we could be using the one-true-god MindsAI? If the answer is "because it's not a general-purpose LLM" then why is ARC marketed as the ultimate benchmark, the litmus test for AGI (I know I know, passing ARC doesn't mean AGI, but the opposite is true, I know)?

fchollet · on Nov 15, 2024

ARC was never supposed to grade LLMs! I designed the ARC format back when LLMs weren't a thing at all. It's a test of AI systems' ability to generalize to novel tasks.

fchollet · on Nov 15, 2024

I believe the MindsAI solution does feature novel ideas that do indeed lead to better generalization (test-time fine-tuning). So it's definitely the kind of research that ARC was supposed to incentivize -- things are working as intended. It's not a "hack" of the benchmark.

And yes, they do use a lot of synthetic pretraining data, which is much less interesting research-wise (no progress on generalization that way...) but ultimately it's on us to make a robust benchmark. MindsAI is playing by the rules.

harisec · on Nov 12, 2024

Recraft’s image generation service could leak its internal system prompts due to its unique architecture combining Claude (an AI language model) with a diffusion model. Unlike other image generators, Recraft could perform calculations and answer questions, which led to the discovery that carefully crafted prompts could expose the system’s internal instructions.

harisec · on Oct 24, 2024

These are toys but in 2 years they will probably be full projects and 2 years later people will ask "why do i need a software developer?"

simonw · on Oct 24, 2024

I just don't think that's true.

If all someone does is write code based on specifications handed over by someone else then yes, they have cause to be worried - but in my career as a software engineer the "typing code into a computer" bit has only ever been 10-20% of the work that I do.

The big challenge of software development has always been turning human needs into working software. That requires a great depth of experience in terms of what's possible, what isn't possible, how software works and how to architect and design software to deliver value today while still staying flexible for future development.

LLMs can accelerate that process a bit, but I don't think they can replace it. Someone still has to drive the LLMs. I think people with software development skills are best placed to do that.

harisec · on Oct 24, 2024

That's a good point and I agree with you. However, would you agree that in a few years we will need far less developers than we need right now?

simonw · on Oct 24, 2024

I had a podcast conversation about this recently: https://newsletter.pragmaticengineer.com/p/ai-tools-for-soft...

I think LLMs mean developers can build stuff faster, which reduces the cost of developing software.

My optimistic scenario is that this expands the market for custom software, a lot. Companies that would never have considered developing their own software - because they'd need six developers working for twelve months - can now afford to do so, because they need two developers for three months instead.

The result is more jobs for engineers, and engineers become more valuable because they'd can get more done.

I'm not an economist so I won't pretend I'm confident this will happen, but it's my optimistic scenario.

palata · on Oct 24, 2024

Not only it has the potential to increase productivity: it has the potential to lower the overall quality of software (by making it more accessible to people who don't really understand how to write good code).

I believe that we can already observe that modern tools/languages have made programming a lot more accessible, and that the average quality of software has decreased dramatically (not that all software is bad: just that this new accessibility brought a lot more bad software than good software).

Your example is interesting: it says "it's good because people will be able to produce more", not "developers will have more time to focus on fixing bugs and optimizing their code".

chadcmulligan · on Oct 24, 2024

All these sort of statements assume growth is linear without justification, its more likely exponential. ie it took 2 years to get here, so it will take 2 more to get to this point, but in reality it may be 100 years to get to the next point. No one knows, and if in 2 years it is able to write that sort of code then the singularity is very close indeed.

harisec · on Oct 24, 2024

Actually, i think it will take less than 2 years. I've been using Aider + Claude 3.5 Sonnet almost daily for a long time and the progress is very fast. We will see.

harisec · on Oct 24, 2024

If you want to really get depressed about the future of software developers try aider.

harisec · on Oct 16, 2024

Qwen 2.5 models are better than Llama and Mistral.

speedgoose · on Oct 16, 2024

I disagree. I tried the small ones but they too frequently output Chinese when the prompt is English.

harisec · on Oct 16, 2024

I never had this problem but i guess it depends on the prompt.

harisec · on Oct 3, 2024

There are no "LLM-generated images". These images are generated using diffusion models. LLMs are very different from diffusion models.

https://en.m.wikipedia.org/wiki/Diffusion_model