It turns out that deflate can be much faster when implemented specifically for PNG data, instead general-purpose compression (while still remaining 100%-standard-compatible).
Note he also expects a worse compression as tradeoff. I think he implements RLE in terms of zlib:
[...]Deflate compressor which was optimized for simplicity over high ratios. The "parser" only supports RLE matches using a match distance of 3/4 bytes, [...]
https://github.com/richgel999/fpng
It turns out that deflate can be much faster when implemented specifically for PNG data, instead general-purpose compression (while still remaining 100%-standard-compatible).