Fwiw, I believe the best estimator has been improved on since HyperLogLog, with a more recent result that is provably optimal (and slightly faster asymptotically, dropping the loglog factor), which perhaps more importantly can also process streamed data online: http://people.seas.harvard.edu/~minilek/papers/f0.pdf