> The main problem is good [doctors] have no need to sit through your 12 [years of school]. It actively selects only for the most desperate or money driven people (if you pay very well).
I think I find someone sexy and fun is obviously more a personal feeling on matters while I am looking for someone who will help me meet our goals for Q4 with a high degree of technical excellence seems something that should be measurable and not left up to a feeling at the moment.
So I guess I don't realize it's exactly like, I personally I feel it is significantly different.
I turned down a 285k/yr +MM RSU staff engr offer from Google in 2021.
I’d probably still be in that job and would have a few million in the bank (instead of $10,000) if I had taken it, but I would have sold out my principles.
Actually not that crazy of a spread. E.g. I have 48 GB + 32 GB in my gaming PC because if you go beyond 48 GB you start having to trade off more and more performance to keep the memory controller from falling over, so you really have to have a good reason to want to load more. Server platforms, like Epyc, it tends not to matter as much because you have so many channels for bandwidth and a beefier memory controller to handle them. Then on the VRAM side it's more about what makes sense for the GPU and how you plan on using it there (games or AI or modeling or whatever), and for a lot of cases the 5090 is just a good card to get for one reason or another (it just has a ton of compute + bandwidth for a consumer part).
DRAM chips aren't always manufactured in power of two sizes. It's been common for years to have non power of two capacities for LPDDR used in phones, and has started to show up in other DRAM types with the current generation standards: DDR5 for desktops/servers and GDDR7 for GPUs. That's how there have been 24GB single-rank DIMMs and 48GB dual-rank DIMMs for desktops and 96GB RDIMMs for servers for a few years, and how a mobile RTX 5090 has 24GB VRAM vs mobile RTX 5080 having only 16GB VRAM despite both GPUs being different bins of the same silicon and both configurations using a 256-bit memory bus.
Not that simple. 4 dimms were getting higher clocks on 2 CCD Ryzen models (12 & 16 cores) compared to those with one CCD. Motherboard topology is a factor too.
But there is no single configuration where having 4 DIMMs populated gives higher speeds than when 4 DIMMs are populated on the same configuration. This is because while the higher end parts tend to have the higher binned components they still inly have 1 shared memory die between the CCD and the motherboard topology is either it has 4 slots or it doesn't, but no matter how they are ran it's still better to only use 1 rank of each channel.
More capacity is also harder to drive, even on the same number of channels, but needing to go from 2 to 4 channels is also a (bigger) drag.
You can go up to 64 GB per DIMM on the current consumer offerings (max of 256 GB total across 4 DIMMs). So you could could 128 GB over 2 DIMMs, but it's still going to perform worse than 2x24 GB or 2x16 GB.
It's fine for dense models where you need them in VRAM, less so for MoE where you're offloading layers to ram. But 32/32 is pretty good for both in the popular ~30b range right now.
> I don't even know what vector addition should look like.
I think you're trying to imply you're inventing something new and racket enables you to explore... But what I read (as someone with a PhD in deep learning that has worked on sparsity) is you actually don't know the prior art and you're using racket as an excuse to reinvent a whole bunch of stuff that already exists in plenty of mature libraries in more mundane languages (including python/pytorch). Which is of course fine for personal growth but please don't oversell racket as a "superpower" - to wit I can manipulate any part of my stack too because it's all written in cpp.
>Unlike their binary counterparts, posits and takums, tekums simultaneously accommodate both ∞ and NaR, while retaining the simplicity of negation by flipping the underlying trit string. Perhaps most strikingly, tekums enable rounding by truncation, a property that eradicates at a stroke some notorious problems of rounding in binary arithmetic: double rounding errors, cascading carries in hardware, and the attendant inefficiencies.
You do realise that you need to store arbitrary binary blobs which don't nicely align to memory words?
And that once you can store them you need to write custom functions that do bitwise manipulation on those arbitrary blocks of memory?
The stuff that's done in hardware for you on all binary fp?
Meanwhile in racket I got arbitrary balanced ternary manista and exponent precision in less time it took to write this post. Something that not available in C/Cpp even for binary fp?
> You do realise that you need to store arbitrary binary blobs which don't nicely align to memory words?
And that once you can store them you need to write custom functions that do bitwise manipulation on those arbitrary blocks of memory?
Yes what part of my response to you gave you the impression that I did not?
> Meanwhile in racket I got arbitrary balanced ternary manista and exponent precision in less time it took to write this post.
Your claim was that it could not be done in cpp, not that it was faster/simpler/whatever-new-goalpost-you're-now-presenting in racket.
> less time it took to write this post
An interpreted language with a runtime and a GC is easier to use than a systems language? I think this novel discovery is worth a turing award indeed! I'll be sure to refer you for one. Maybe even an honorary doctorate at my alma mater.
I interviewed at Google last year and they said something similarly magnanimous: that they rejected people who wouldn't have been successful at Google and that the rejects actually thanked them for the wisdom. My eyes rolled all the way back in my head. I cancelled the rest of my loop and went to a different FAANG. When I sent the cancellation email I thanked the recruiter for sharing his wisdom.
Every layer thinks they're the most important, most highly specialized, most highly skilled layer. Every layer is wrong because every layer is built on top of the abstractions of the layer beneath. Take it all the way down to the physics and the math and you'll notice that even the set theorists assume some axioms (no one knows what the logicians are doing...)
reply