I had a similar experience with even simpler tools: Classic memoization, caching, and some basic process + instruction parallelism produced a stack that did the job in seconds compared to 10+ minutes.
It's amazing how much "production" code can be optimized if you take time to find hot loops. I really wish it was some black magic only I could do (would help with salary negotiations), but it's mundane diligence and saying "no" to anything else for a month-two.
It's amazing how much "production" code can be optimized if you take time to find hot loops. I really wish it was some black magic only I could do (would help with salary negotiations), but it's mundane diligence and saying "no" to anything else for a month-two.