Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It was a bit of a surprise to wake up and see myself on HN! I'm the presenter in the video - I can answer any questions you might have here.


A bit off-topic, but the player does not load if I have the referer disabled, i.e.

    network.http.sendSecureXSiteReferrer;true
    network.http.sendRefererHeader;2
    network.http.referer.trimmingPolicy;0
    network.http.referer.spoofSource;true
    network.http.referer.XOriginPolicy;1




Thanks.


In the video you mentioned that you worked on a bit of RTL for the codec. I am curious on how the codec can be accelerated. You didn't talk much about it, so I am interested in what you have learned. Do you have some info on where most of the CPU time is spent in the codec? What did you try out?


My RTL accelerated the transform stage of the codec. Because Daala has no pixel domain dependencies, the frequency domain coefficients can just be offloaded to hardware, which returns the pixel domain result. There is more information on timing on my website [1].

The transforms no longer take as much CPU time as they used to, due to having better SIMD accelerated versions. Much of the time is now spent on the PVQ decoder, which is not optimized for speed at the moment.

One thing I learned after writing large fixed-function transform hardware is that it doesn't take much extra hardware to turn them into microcoded programmable pipelines, much like a GPU. In fact, this approach is very common internal to hardware video decoders, though the firmware is not exposed to the user. One notable exception is Broadcom's Videocore IV, which is quite an interesting architecture [2].

Also, with the latest mobile processors having 8 or more ARM cores, we can also exploit CPU parallelism in much the same way. I feel it is very important to have the codec perform well on CPU alone, as not everyone will have hardware that can decode it right away. This is something I would like to play with a lot more.

[1] http://thomasdaede.com/wordpress/?cat=9 [2] http://www.broadcom.com/docs/support/videocore/VideoCoreIV-A...


Cool stuff. Thanks a bunch. I agree with you that software optimization takes preference, but I have also seen optimized crypto code in assembly that was completely undocumented and unreadable. I hope that Daala won't fall into that trap.


All functions in Daala have a C version, along with corresponding assembly versions which have tests to make sure that the assembly version matches the C reference.


Given this is funded by Mozilla, I'm surprised its not written in Rust. Any particular reasons?


ARM and its partners will probably start supporting HSA in 2016 and beyond, so you might want to look at that, too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: