Considering Intel, AMD, ARM, Nvidia, Apple, and other hardware companies are all members of the Alliance for Open Media - which is the consortium behind AV1 - I'd expect that work to support it in hardware is already well underway.
It doesn't really matter even if there are, in the case of hardware implementations -- because hardware decoders/encoders are typically fixed-function hardware blocks with fixed functionality. They are not designed for flexibility or reuse between formats. They are designed for exactly 1 function and nothing else.
Even if AV1 and VP9 share structurally similar components, no VP9 decoder is going to work out of the box. It may mean that people who produce AV1-capable encoding/decoding hardware have a better starting point, though (provided they had VP9 hardware, beforehand)
AFAIK a lot of "hardware" codecs are DSPs with some extra instructions that are good for video (like sum of absolute differences). In theory maybe they could support new algorithms with a firmware update. In reality, such firmware may simply not be developed or the DSP may not be fast enough to run a new codec.
Most DSPs I've worked with have an assortment of really specific extra instructions added for the particular application. I suspect these DSPs are similar - adding a new codec probably means adding a few more instructions, or even the odd hardware block dangling off the side.
Do you think the Mill family of processors could find a niche here? (if it ever sees the light of day, and I hope so because the technical videos are super-interesting (and educational))
If you don't want your macbook to burn a hole in your lap, potentially. I'm sure most recent processors will be able to do playback, just not optimized, and at high cpu load.
I wonder if AV1 could work with an OpenCL decoder since it does block prediction. Would almost certainly need too much per-frame sychronization to work though...
Yet if you bought a Nvidia 960 about 1.5 years ago when I did (because it supported HEVC, and I upgraded my 10 year old plasma to a 4k TV), it can do it for you. My HTPC can happily play back high bitrate 4k HEVC content without as much as breaking into a sweat.
Sadly Netflix doesn't see it that way and demands that I update my perfectly functional i7-2500 system to a newer one to play back their 4k content.
FPGAs will never be able to compete on size, cost, or performance/watt compared to optimized dedicated silicon. You're not going to find an FPGA capable of being programmed to encode video on a mobile device.
On the performance/watt scale they sit between dedicated silicon and general cpus, right? If so, they could be useful as a stop-gap to keep older devices relevant longer as new standards are released that didn't make it into the silicon yet.
in theory yes, but in practicality, there are two points that make it probably impractical.
First, the reprogrammability of FPGAs means lots of unused gates and less density, or wasted space. with flash technology nowadays, it doesn't waste as much power, but the footprint is just so large it's not worth it.
Second, a lot of the things that make custom silicon fast can be found in GPUs, such as ALU, MAC, FFT, FIR, SIMD and other DSP slices. Sure, there is a whole additional layer of optimization that can be done with custom silicon, but the computational powerhouses already exist. It's mostly (not all) a matter of reprogramming the memory movements from block to block, delegating certain operations to the CPU and general optimization. Most new codec algorithms can probably done pretty well with GPUs on phones these days.
And unfortunately, cell phone companies aren't interested in keeping older HW relevant :( Other industries might, though. My friend said lots of military radar projects he worked on used FPGAs.
The power of FPGA's is their flexibility, not their pure performance.
Dedicated, fixed silicon will outperform FPGA's in performance and energy use practically every time.
For rare/uncommon use cases, having an FPGA you can adapt to your algorithm is fantastic, but for a use case as common and day-to-day as decoding video, a dedicated chip is far more ideal.
https://aomedia.org