Actually, we can significantly boost single-threaded raw speed. We can't do this for the memory wall however, because the approach is based on MOS current mode logic (MCML).
We can build 20 GHz CPUs now with passive cooling, but they don't beat cutting-edge CMOS cores in memory-hard single-threaded workloads. They do reach 2-cycle add and like 3-cycle mul latency though. I hope someone just plops a RISC-V core with that kind of design down, which you can use with like explicit preloading into a tiny cache that gets 2 or 3 cycle load latency into registers.
I'm sure some computations could work well on that sort of very fast, shallow-pipeline core suited well for highly-sequential stuff like maybe SAT/SMT solvers and other inherently divide-and-conquer algorithms.
You're changing the effective throughput weather you do it by upping the clockrate or deepening the pipeline. Using half the data per cycle at twice the clock rate will cause the same memory pressure.
> which you can use with like explicit preloading into a tiny cache
That will kill it. As soon as you put it on the compiler designers or programmers to do something special to realize performance benefits, you're going to loose to architectures that don't.
Sure, compiler writers and programmers will optimize for your architecture... if it's popular and widely used. So you have a chicken and egg problem where you need to get adoption in the first place by running existing workloads faster.
> We can build 20 GHz CPUs now with passive cooling,
Citation? Like for real, that's cool and I'd like to read about it!