> (~1TB / SSD_read_speed + computation_time_per_chunk_in_RAM) = a few minutes pe...

docjay · 2025-11-25T21:31:23 1764106283

Yeah thanks for calling that out. I kind of panicked when I reached that part of the explanation and was stuck on whether or not I should go into dense models vs MoE. The question was about ‘big stuff like that’, which they most certainly use MoE, then I even chose an MoE as an example, but then there are giant dense models like Llama, but that’s not what was asked, although it wasn’t not asked because ‘also big league stuff’…anyway, I basically thought “you’re welcome” and “no problem”, then said “you’re problem”.