I think the code is correct. To implement smoothing, you want to add 1 to the count of every word, regardless of whether it appears in the training data or not.
That is to say, a word that appears once should get a count of 2, and word that doesn't appear at all should get a count of 1.
This is a perfect example of where a (short) code comment would be helpful. The "lambda: 1" a notable piece of code, but it's hard to tell that at a glance.
nonsense, any software dev that can't follow a lambda that adds 1 should be taken outside and shot.
When Norvig says talks of regular folks, he means people with 1/10th his IQ, which is still the top 1%. Norvig is so far out there on the IQ scale that I find it funny when some noob says he's found a bug!
That is to say, a word that appears once should get a count of 2, and word that doesn't appear at all should get a count of 1.