Hacker Newsnew | past | comments | ask | show | jobs | submit | swaraj's commentslogin

Exactly


Gotta love HN, only popular site in world where a list of hex codes on a nicely formatted website gets to top ranking


*oklch codes

And trust me, I didn't expect to get more than 10 upvotes max. I'm slightly bewildered.


I love the color choices, but what's the reason for defining each color twice, once with a "raw"?


> If you want more granularity (read: opacity), you can call the raw version of the color (append -raw to the variable name) which will return the OKLCH values so you can do something like this: oklch(var(--uchu-gray-1-raw) / 40%).

via https://github.com/NeverCease/uchu/blob/primary/documentatio...


technically, all websites are just a bunch of nicely formatted hex codes.


You should try the arc agi puzzles yourself, and then tell me you think these things aren't intelligent

https://arcprize.org/blog/openai-o1-results-arc-prize

I wouldn't say it's full agi or anything yet, but these things can definitely think in a very broad sense of the word


[Back in the 1980s]

You should try to play chess yourself, and then tell me you think these things aren't intelligent.


While I agree that we should be skeptical about the reasoning capabilities of LLMs, comparing them to chess programs misses the point. Chess programs were specifically created to play chess. That's all they could do. They couldn't generalize and play other board games, even related games like Shogi and Xiangqi, the Japanese and Chinese versions of chess. LLMs are amazing at being able to do things they never were programmed to do simply by accident.


Are they though? They’ve been shown to generalize poorly to tasks where you switch up some of the content.


Here's an example. I'm interested in obscure conlangs like Volapük. I can feed a LLM (which had no idea what Volapük was), a English-language grammar of Volapük and suddenly it can translate to and from the language. That couldn't work with a chess program. I couldn't give it a rule book of Shogi and have it play that.


That's not true


They don’t generalize well on logic puzzles

https://huggingface.co/blog/yuchenlin/zebra-logic


Apologies, I was a bit curt because this is a well-worn interaction pattern.

I don't mean anything by the following either, other than, the goalposts have moved:

- This doesn't say anything about generalization, nor does it claim to.

- The occurrences of the prefix general* refer to "Can fine-tuning with synthetic logical reasoning tasks improve the general abilities of LLMs?"

- This specific suggestion was accomplished publicly to some acclaim in September

- To wit, the benchmark the article is centered around hasn't been updated since since September, because the preview of the large model accomplishing that blew it out of the water, 33% on all at the time, 71%: https://huggingface.co/spaces/allenai/ZebraLogic

- these aren't supposed to be easy, they're constraint satisfaction problems, which they point out are used on the LSAT

- The major other form of this argument is the Apple paper, which shows a 5 point drop from 87% to 82% on a home-cooked model


LLMs don't do too well on those ARC-AGI problems. Even though they're pretty easy for a person.



Let me know when they can perform that well without a 300-shot. Or that well on unseen ARC-AGI-2.


Two years give or take 6 months.


Give a group of "average human" two years, give or take 6 months, and they will also saturate the benchmark and probably some humans would beat the SOTA LLM/RLM.

People tend to do so all the time, with games for example.


Average humans cannot be copy-pasted.


Average companies also don't pay humans to complete a benchmark consisting of a fixed set of problems.


Done (link says 6 samples?)


> OpenAI shared they trained the o3 we tested on 75% of the Public Training set.

I'm talking transfer learning and generalization. A human who has never seen the problem set can be told the rules of the problem domain and then get 85+% on the rest. o3 high compute requires 300 examples using SFT to perform similarly. An impressive feat, but obviously not enough to just give an agent instructions and let it go. 300 examples for human level performance on the specific task, but that's still impressive compared to SOTA 2 years ago. It will be interesting to see performance on ARC-AGI-2.


I spent 10 hrs this week upgrading our pandas/snowflake libs to latest bc there was apparently a critical vulnerability in the version we used (which we need to fix bc a security cert we need requires us to fix these). The latest versions are not major upgrades, but completely changed the types of params accepted. Enormous waste of time delivering 0 value to our business


Security updates are probably the only type of updates that I wouldn't ever call a waste of time. It sucks when they are conflated with feature updates or arbitrary changes, but by itself I don't understand calling them a waste of time.


They are when the only reason they are flagged as security updates is because some a single group deems a very rare, obscure edge case as a HIGH severity vuln when in practice it rarely is => this leads to having to upgrade a minor version of a library that ends up causing breaking changes.

This is the recent thread I'm down. Pandas 2.2 broke SQLalchemy backwards compatibility: https://stackoverflow.com/questions/38332787/pandas-to-sql-t... + https://github.com/pandas-dev/pandas/issues/57049#issuecomme...


Nailed it


Like top comment, my first exposure to programming was basic on a ti-86 (better than 83, but quickly outdone by 84 shortly after)

My first program was doubly cheating, not only did I have a program for solving quadratic equation, but I copied the basic off the internet in true open src fashion

When I told my dad I copied code from internet, he was so disappointed and thought I had 0 skills. Now, we pip/npm/etc install anything and are heroes for choosing to "buy" not "build"


We're all on on https://www.sigmacomputing.com/ bc we don't like hosting/managing/provisioning essential tools like this + this seems more complicated to configured.

I would recommend a simpler setup like Metabase Docker (which I re-evaluated recently): https://www.metabase.com/docs/latest/installation-and-operat...


Appreciate the feedback! We'll keep this in mind.

There is nothing to host/provision, so it's simple in that sense. You just run it locally with your credentials and connect directly to your database.

It is definitely not the easiest to set up especially when thinking as a team so we'll keep that in mind.


Always happy to see new stuff on the block, but hard to leave pandas and python ecosystem for this

Not sure where this fits in to any workflow tbh, with sufficiently large datasets, you will inevitably need spark (which has same API as pandas)


Love Signal over Telegram, Wickr, etc.


I haven't used Codespaces, but how does this work with databases? A common problem we have during onboarding is getting your local database (MySQL) setup properly: run all the migrations => load some sample, de-identified, production-like data => update certain rows to allow for personal development (e.g. for our texting service, make sure you text your own phone number, instead of someone else's).

What's the workflow for this?

A related issue for us is being able to test another developer's pull request with database migrations without wiping out your current database state. Is there a Devpods workflow for this?


If you can script your setup steps, you can also run it in a devcontainer, either by using docker-compose[1] to bake it into the workspace image or using lifecycle hooks to run some scripts after creating the workspace[2]

[1]http://blog.pamelafox.org/2022/11/running-postgresql-in-devc... [2]https://github.com/pascalbreuninger/devpod-react-server-comp...


We've built www.snaplet.dev to introduce the exact workflow that you're describing. Unfortunately we're PostgreSQL only at the moment.

We give you a serverless PostgreSQL database per branch in your code. [via Neon.tech] Each time you branch your code we grab the latest snapshot of your production database which is de-identified and transformed [Transformations are via TypeScript] and a subset of the original.

If a coworker and yourself are coding against the same branch you're coding against the same database.

Your devs only run a single command `snaplet dev` and all this happens automatically in the background.


This looks amazing, we have written a similar script for both our local dev dbs as well as our staging env. Would love MySQL support


We'll add it as soon as we've found figured out the core experience, we're close. V1.0 is just around the corner, and then we'll add MySQL and SQLite.


This exact workflow is why we have integrated Cloud IDE with full-stack branch preview (both of which know about database seed/migrations) at Coherence (withcoherence.com). [disclaimer - I’m a cofounder]. You can also integrate other seeding tools like snaplet, mentioned in a sibling comment here, which is an awesome solution to this problem!

Would be happy to discuss getting a PoC setup to see if it helps in your case, or to answer any questions, feel free to reach out


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: