Curious, in what cases might this help? The compute would have to be python-bound (not C library-bound); and the frequency of module lookups would have to be in the ballpark of other dictionary lookups. I wonder if cases like this exist in the real world.
The case where I've seen those tricks used with measurable effect was a Python debugger - specifically, its implementation of the sys.settrace callback. Since that gets executed on every line of code, every little bit helps.
(It's much faster if you implement the callback in native code, but then that doesn't work on IronPython, Jython etc.)