There's more to lock than just the write, some of this stuff will be lots better in future Mongo releases, but at the moment doing a safe write means a lock is held longer when there's network latency of any kind. Local updates are shockingly faster than even local network updates.
Honestly, the entire point of Mongo isn't that it's fast, it's that it's fast enough and can scale a reasonably flexible schema through the life of an app.
A theory: One mongod instance per-core on the same box perhaps, using a mongos instance to coordinate? This seems like a complicated way to run a "simple" database engine however, just to get the write rate up.
mongos has extremely high overhead - 700% CPU under moderate load on a webserver. The more shards the higher that load, and eventually you run out of local CPU due to mongos overhead. Then you're looking at remote mongos, making it even more complicated.
also, if it's single threaded: why does adding more cores increase the update rate?
isn't the entire point of mongodb that it's fast? if it can only do 1500 increments a second it's pretty useless...