I cannot run the demo on Firefox, but you might get better performance with an 8x4 thread group.
Also, mega-kernels of this sort are generally bad for occupancy. Wavefront path tracing improves this at the expense of additional IO and a more involved implementation. https://research.nvidia.com/publication/2013-07_megakernels-...
Overall your code looks easy to read. Too bad I couldn't run it.
I cannot run the demo on Firefox, but you might get better performance with an 8x4 thread group.
Also, mega-kernels of this sort are generally bad for occupancy. Wavefront path tracing improves this at the expense of additional IO and a more involved implementation. https://research.nvidia.com/publication/2013-07_megakernels-...
Overall your code looks easy to read. Too bad I couldn't run it.