Hacker Newsnew | past | comments | ask | show | jobs | submit | mathisfun123's commentslogin

this shit is so normcore that i'm honestly embarrassed

"think different"

> It replicated and trained a credible baseline

...

> The prospect of recursive self improvement feels more to real to me all of sudden

you really don't understand why these are two completely different tasks?


> The main problem is good [doctors] have no need to sit through your 12 [years of school]. It actively selects only for the most desperate or money driven people (if you pay very well).

do you agree with this?


Absolutely. Witnessed it directly in the form of med students paying other people to take their tests for them.

Except Doctors have to do that regardless. They can't choose a hospital that will hire them with 6 years of school instead of 12.

A good engineer is likely to find an equivalent job with a shorter or less bureaucratic interview process.


don't you people get tired of reposting this take?

don't you realize it's exactly like

"attractive women reject the wrong suitors"

???


I think I find someone sexy and fun is obviously more a personal feeling on matters while I am looking for someone who will help me meet our goals for Q4 with a high degree of technical excellence seems something that should be measurable and not left up to a feeling at the moment.

So I guess I don't realize it's exactly like, I personally I feel it is significantly different.


Lol you've turned down offers or recruiter reach outs? Two very different things lololol.

I turned down a 285k/yr +MM RSU staff engr offer from Google in 2021.

I’d probably still be in that job and would have a few million in the bank (instead of $10,000) if I had taken it, but I would have sold out my principles.

So yes some of us live by principles


this is the pcmasterrace equivalent of being all upper body and with scrawny legs lol

Actually not that crazy of a spread. E.g. I have 48 GB + 32 GB in my gaming PC because if you go beyond 48 GB you start having to trade off more and more performance to keep the memory controller from falling over, so you really have to have a good reason to want to load more. Server platforms, like Epyc, it tends not to matter as much because you have so many channels for bandwidth and a beefier memory controller to handle them. Then on the VRAM side it's more about what makes sense for the GPU and how you plan on using it there (games or AI or modeling or whatever), and for a lot of cases the 5090 is just a good card to get for one reason or another (it just has a ton of compute + bandwidth for a consumer part).

What's this trade off about?

I thought it was a simple 2 dims are probably better than 4, but unsure how you'd ever land on 48?


DRAM chips aren't always manufactured in power of two sizes. It's been common for years to have non power of two capacities for LPDDR used in phones, and has started to show up in other DRAM types with the current generation standards: DDR5 for desktops/servers and GDDR7 for GPUs. That's how there have been 24GB single-rank DIMMs and 48GB dual-rank DIMMs for desktops and 96GB RDIMMs for servers for a few years, and how a mobile RTX 5090 has 24GB VRAM vs mobile RTX 5080 having only 16GB VRAM despite both GPUs being different bins of the same silicon and both configurations using a 256-bit memory bus.

Not that simple. 4 dimms were getting higher clocks on 2 CCD Ryzen models (12 & 16 cores) compared to those with one CCD. Motherboard topology is a factor too.

But there is no single configuration where having 4 DIMMs populated gives higher speeds than when 4 DIMMs are populated on the same configuration. This is because while the higher end parts tend to have the higher binned components they still inly have 1 shared memory die between the CCD and the motherboard topology is either it has 4 slots or it doesn't, but no matter how they are ran it's still better to only use 1 rank of each channel.

More capacity is also harder to drive, even on the same number of channels, but needing to go from 2 to 4 channels is also a (bigger) drag.

You can go up to 64 GB per DIMM on the current consumer offerings (max of 256 GB total across 4 DIMMs). So you could could 128 GB over 2 DIMMs, but it's still going to perform worse than 2x24 GB or 2x16 GB.


I’ve got 64GB with a 3950x working great, although the speeds are not high. Just 3200MHz, IIRC.

Exactly, that's the tradeoff. I have one consumer machine running 192 GB but the latency and bandwidth is terrible compared to when it runs 48 GB.

It's fine for dense models where you need them in VRAM, less so for MoE where you're offloading layers to ram. But 32/32 is pretty good for both in the popular ~30b range right now.

running 5090 on 32GB RAM is just weird, still

> I don't even know what vector addition should look like.

I think you're trying to imply you're inventing something new and racket enables you to explore... But what I read (as someone with a PhD in deep learning that has worked on sparsity) is you actually don't know the prior art and you're using racket as an excuse to reinvent a whole bunch of stuff that already exists in plenty of mature libraries in more mundane languages (including python/pytorch). Which is of course fine for personal growth but please don't oversell racket as a "superpower" - to wit I can manipulate any part of my stack too because it's all written in cpp.


I once replaced IEEE 754 floating point numbers in a model by balanced ternary floating point numbers.

It took me 20 minutes.

Tell me how you'd do that in cpp?


lol the same way we implement all of the reduced precision fp8, fp4 types today: by storing them in the corresponding uint:

https://github.com/ggml-org/llama.cpp/discussions/15095


Balanced ternary fp is not a reduced precision type of binary fp: https://arxiv.org/abs/2512.10964

>Unlike their binary counterparts, posits and takums, tekums simultaneously accommodate both ∞ and NaR, while retaining the simplicity of negation by flipping the underlying trit string. Perhaps most strikingly, tekums enable rounding by truncation, a property that eradicates at a stroke some notorious problems of rounding in binary arithmetic: double rounding errors, cascading carries in hardware, and the attendant inefficiencies.


> Balanced ternary fp is not a reduced precision type of binary fp

Yes I can read very well - can you?

> ... by storing them in the corresponding uint


You do realise that you need to store arbitrary binary blobs which don't nicely align to memory words?

And that once you can store them you need to write custom functions that do bitwise manipulation on those arbitrary blocks of memory?

The stuff that's done in hardware for you on all binary fp?

Meanwhile in racket I got arbitrary balanced ternary manista and exponent precision in less time it took to write this post. Something that not available in C/Cpp even for binary fp?


> You do realise that you need to store arbitrary binary blobs which don't nicely align to memory words? And that once you can store them you need to write custom functions that do bitwise manipulation on those arbitrary blocks of memory?

Yes what part of my response to you gave you the impression that I did not?

> Meanwhile in racket I got arbitrary balanced ternary manista and exponent precision in less time it took to write this post.

Your claim was that it could not be done in cpp, not that it was faster/simpler/whatever-new-goalpost-you're-now-presenting in racket.

> less time it took to write this post

An interpreted language with a runtime and a GC is easier to use than a systems language? I think this novel discovery is worth a turing award indeed! I'll be sure to refer you for one. Maybe even an honorary doctorate at my alma mater.


> Your claim was that it could not be done in cpp, not that it was faster/simpler/whatever-new-goalpost-you're-now-presenting in racket.

You need to reread the reply they made. They asked how you would do it in cpp, not that it’s not possible.

> I think this novel discovery is worth a turing award indeed! I'll be sure to refer you for one. Maybe even an honorary doctorate at my alma mater.

You could have made your point sufficiently without being condescending.


Racket is not interpreted. It is compiled to machine code. You don't know what you're talking about.

I interviewed at Google last year and they said something similarly magnanimous: that they rejected people who wouldn't have been successful at Google and that the rejects actually thanked them for the wisdom. My eyes rolled all the way back in my head. I cancelled the rest of my loop and went to a different FAANG. When I sent the cancellation email I thanked the recruiter for sharing his wisdom.


Every layer thinks they're the most important, most highly specialized, most highly skilled layer. Every layer is wrong because every layer is built on top of the abstractions of the layer beneath. Take it all the way down to the physics and the math and you'll notice that even the set theorists assume some axioms (no one knows what the logicians are doing...)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: