H.266/Versatile Video Coding (VVC)

Unklejoe · on July 6, 2020

It's interesting that they are able to continue improving video compression. You'd think that it would have all been figured out by now.

Is this continued improvement related to the improvement of technology? Or just coincidental?

Like, why couldn't have H.266 been invented 30 years ago? Is it because the computers back in the day wouldn't have been fast enough to realistically use it?

Do we have algorithms today that can compress way better but would be too slow to encode/decode?

giantrobot · on July 6, 2020

Video compression is a calculus of IO capacity, memory, and algorithmic complexity. Take the MPEG-1 codec for instance, it was new about 30 years ago. While today most people think of MPEG-1 videos as low quality the spec provides the ability to handle bit rates up to 100Mb/s and resolutions up to 4095x4095. That was way higher than the hardware of the time supported.

One of MPEG-1's design goals was to get VHS-quality video at a bitrate that could stream over T1/E1 lines or 1x CD-ROMs. The limit on bitrate led to increased algorithmic complexity. It was well into the Pentium/PowerPC era until desktop systems could play back VCD quality MPEG-1 video in software.

Later MPEG codecs increased their algorithmic complexity to squeeze better quality video into low bit rates. A lot of those features existed on paper 20-30 years ago but weren't practical on hardware of the time, even custom ASICs. Even within a spec features are bound to profiles so a file/stream can be handled by less capable decoders/hardware.

There's plenty of video codecs or settings for them that can choke modern hardware. It also depends on what you mean by "modern hardware". There's codecs/configurations a Threadripper with 64GB of RAM in a mains powered jet engine sounding desktop could handle in software that would kill a Snapdragon with 6GB of RAM in a phone. There's also codecs/configurations the Snapdragon in the phone could play using hardware acceleration that would choke a low powered Celeron or Atom decoding in software.

AareyBaba · on July 6, 2020

Are there codecs that require high compute (Threadripper) for encode but can be easily decoded on a Snapdragon ?

pcl · on July 6, 2020

Yes — many codecs can be optimized for decoding at the expense of encoding. This is appropriate for any sort of broadcast (YouTube, television, etc).

Also, in many applications, it’s suitable to exchange time for memory / compute. You can spend an hour of compute time optimally encoding a 20-minute YouTube video, with no real downside.

Neither of these approaches are suitable for things like video conferencing, where there is a small number of receivers for each encoded stream and latency is critical. At 60fps, you have less than 17ms to encode each frame.

Interestingly, for a while, real-time encoders were going in a massively parallel direction, in which an ASIC chopped up a frame and encoded different regions in parallel. This was a useful optimization for a while, but now, common GPUs can handle encoding an entire 1080p frame (and sometimes even 4K) within that 17ms budget. Encoding the whole frame at once is way simpler from an engineering standpoint, and you can get better compression and / or fewer artifacts since the algorithm can take into account all the frame data rather than just chopped up bits.

beervirus · on July 6, 2020

Surely videoconferencing doesn’t actually use 60 FPS...

naikrovek · on July 7, 2020

Why not? It's not full-motion video, it's literally talking heads. Talking heads are easy to push to 60fps on a relatively low bitrate.

giantrobot · on July 7, 2020

Some web conferencing would want to do 60fps. There's also realtime streaming like Twitch, PS Now, and Google's Stadia.

imtringued · on July 7, 2020

Twitch isn't real time.

mathw · on July 7, 2020

Yes it is. The delay on a Twitch stream doesn't mean they don't have to deal with encoding and transmitting frames at full speed. If Twitch wasn't real time, you'd only be able to watch live streams slowed down!

Kihashi · on July 7, 2020

That's a different definition than most people mean when they say "real-time".

pavon · on July 7, 2020

Not me. All realtime systems have some latency, but what makes them realtime is that they must maintain throughput, processing data as quickly as it comes in. You can subdivide to hard-realtime and soft-realtime depending on how strict your latency requirements are, but it is still realtime.

neonfuz · on July 7, 2020

really? I think not. Lets use speech synthesis as an example: I would call speech synthesis real time if it takes less than one second to produce one second of synthesized speech. I think you're probably thinking of the word "live". There's always going to be a small delay when re-encoding. Real time doesn't mean 0ms delay, that's impossible. Twitch has a small delay, but it's still re-encoded in real time (encoding 1 second takes ≤ 1 second for twitch).

ygra · on July 7, 2020

The delay (configurable) isn't due to the encoder. And the encoder has to process everything in real time, otherwise you start skipping frames or fall behind.

giantrobot · on July 6, 2020

Pretty much all of them. Encode complexity for most codecs is way higher than decode complexity (on purpose).

This has been an issue with AV1, it's got relatively high decode complexity and there's not a lot of hardware acceleration available. The encode complexity is fantastic though and is very slow even on very powerful hardware, less than 1fps so ~30 hours to encode a one hour video. Even Intel's highly optimized AV1 encoder can't break 10fps (three hours to encode an hour of video) while their h.265 encoder can hit 300fps on the same hardware.

monocasa · on July 6, 2020

A lot of video codecs are NP hard to encode optimally, so rely on heuristics. So you could certainly say that some approaches take a lot of compute power to encode, but are much more easily decodable.

cogman10 · on July 7, 2020

The codecs aren't NP hard. Rather, the "perfect" encode is. That's where the heuristics are coming into play. The codec just specifies what the stream can look like, the encoders have to pick out how to write that language and the decoders how to read it.

Decoders are relatively simple book keepers/transformers. Encoders are complex systems with tons of heuristics.

This is also why hardware decoders tend to be in everything and are relatively cheap with equal quality to software counterparts. On the flip side, hardware encoders are almost always worse than their software counterparts when it comes to the quality of the output (while being significantly faster).

monocasa · on July 7, 2020

> The codecs aren't NP hard. Rather, the "perfect" encode is.

That's what I meant by my first sentence.

And I'll throw out there that the vast majority of 'hardware codecs' are in fact software codecs running on a pretty general purpose DSP. You could absolutely reach the same quality as a high quality encoder given the right organizational impetus of the manufacturer; they simply are focused on reaching a specific real time bitrate for resolution rather than overall quality. By the time they've hit that, there's a new SoC with it's own DSPs and it's own Jira cards that needs attention. If these cores were more open, I'm sure you'd see less real time focused encoder software targeting them as well.

cellularmitosis · on July 6, 2020

I wonder why all of the MPEG1 encoders of the day enforced a maximum of 320x240?

giantrobot · on July 6, 2020

While the spec allowed for outrageous settings playback wouldn't have been possible. Most hardware decoders were meant for (or derived from) VCD playback. The VCD spec covered CIF video which mean QVGA would fall into the supported macroblock rate for hardware decoders.

In MPEG-1's heyday there would haven't been a lot of point in encoding presets producing content common hardware decoders couldn't handle.

There were several other video codecs in the same era that didn't have hardware decode requirements. Cinepak was widely used and could be readily played on a 68030, 486, and even CD-ROM game consoles. As I recall Cinepak encoders had more knobs and dials since the output didn't need to hit a hardware decoder limitation.

anthk · on July 6, 2020

PAL/NTSC resolutions.

dahfizz · on July 6, 2020

Here is a hint:

> Because H.266/VVC was developed with ultra-high-resolution video content in mind, the new standard is particularly beneficial when streaming 4K or 8K videos on a flat screen TV.

Compressing video is very different from gzipping a file. It's more about human perception than algorithms, really. The question is "what data can we delete without people noticing?", and it makes sense that answer is different for an 8k video than a 480p video.

sparkling · on July 6, 2020

So compressing a 1080p video with H266 will not result in similar file size/quality improvements as a 4k video? How much are we looking at for 1080p, 10%?

jitendrac · on July 6, 2020

yup, from the example I remember(I read it through link on HN but cannot find it in quick search, I wish I could link it),

if you film(still film no movement) the macbook pro top to bottom in h264/MP4 at 1024 p resolution and again you take a picture from your camera.

the results will be shocking,

the video of 5-10 seconds will have lower storage size than the size of single Image. but when you inspect video carefully you will see the tiny details are missing, like the edges are not as sharp for metal body, the gloss of metal is bit different, the tiny speaker holes on top of keyboard are clear in Images and can be individually examined while in video they are fuzzy pixelated and so on.

so, The end results: A 5 second video with 10s of frames per second is smaller in size than single Image taken from same camera.

cheeze · on July 6, 2020

You're thinking of "h264 is magic"

https://sidbala.com/h-264-is-magic/

GREAT article

jitendrac · on July 6, 2020

yes,that is the article I was referring.

dylan604 · on July 6, 2020

Let's also be clear, the still image will be the full resolution of the sensor. The video taken on the same camera is usually a cropped section of the sensor. You're also comparing a spatial compression (still image) vs a temporal compression (video), and at what compression levels are each image taken?

the_pwner224 · on July 6, 2020

Additionally the images are not very compressed (edit: you mentioned that in your comment, sorry). While the RAW files can be a couple dozen megabytes and the losslessly compressed PNGs are still 5-15 mb, good cameras normally set the JPEG quality factor to a high amount and so even with a JPEG you're getting pretty close to a lossless image. Whereas in video you can often plainly notice the compression artifacts & softness. A more fair comparison would be the video file to a JPEG with equivalent visual quality.

jitendrac · on July 6, 2020

see more details, https://sidbala.com/h-264-is-magic/

I know that's not fair comparison, but imaging clever compressions were not invented and you would download terabytes of data to view a small movie.

dylan604 · on July 6, 2020

What question are you trying to answer? Nobody asked what is compression and why do we use it. I have been encoding videos since VideoCDs were a thing, so I have a pretty good understanding of how compression works. The fact that I differentiated between spatial and temporal compression should have been a clue. All I was pointing out was that compressing a postage stamp sized video and comparing its filesize to a large megapixel image isn't a fair comparison. (yes, I'm jaded by calling 1080p frame size a postage stamp. I work in 4K and 8K resolutions all day.)

proverbialbunny · on July 6, 2020

>How much are we looking at for 1080p, 10%?

We don't know yet. There are no public technical details (that I know of) for H266 yet, but if I recall H265 gave the same 50% reduction in bandwidth claims, and for years people stuck with H264, because it was higher quality due to dropping off less subtle parts of the video you really want to see. Only in the last couple of years has H265 really started to become embraced and used by piracy groups. Frankly, I don't know what changed. I wouldn't be surprised if there was some sort of H265 feature addition that improved the codec.

acdha · on July 6, 2020

H.265 was always better from a technical perspective but that's not everything which factors into a video codec decision. H.264 was supported everywhere, including hardware support on most platforms. You could generate one file and have it work with great experience everywhere, whereas switching to anything else likely required adding additional tools to your workflow and trying to balance bandwidth savings against both client compatibility and performance — if you guess wrong on the latter case, users notice their fans coming on / battery draining in the best case and frames dropping in the worst cases.

Encoder maturity is also a big factor: when H.265 first came out, people were comparing the very mature tools like x264 to the first software encoders which might have had bugs which could affect quality and were definitely less polished. It especially takes time for people to polish those tools and develop good settings for various content types.

uryga · on July 6, 2020

> H.264 was supported everywhere

this. our TV (a few years old, "smart") can play videos from a network drive, but doesn't support H.265. reencoding a season's worth of episodes to H.264 takes a while...

nicoburns · on July 7, 2020

H.264 will become the "mp3" of video I think. Universally supported, and the pstents will run out micj sooner than the newer formats.

acdha · on July 7, 2020

I think you’re right - for a lot of people it was the first to hit the “good enough” threshold: going from MPEG 1/2 to the various Windows Media / Real / QuickTime codecs you saw very noticeable improvements in playback quality with each new release, especially in things like high motion scenes or with sharp borders.

That didn’t stop, of course, but I generally don’t notice the improvements if I’m not looking for them. Someone with a 4K or better home theater will 100% benefit from newer codecs’ many improvements on all those extra pixels but if you’re the other 95% of people watching on a phone or tablet, lower-end TV with the underpowered SoC the manufacturer could get for $15, etc. you probably won’t notice much difference and convenience will win out for years longer.

mratsim · on July 7, 2020

Reencoding will cumulate H264 artifacts with H265 artifacts (and psychovisual optimization for one with psychovisual optimization for the other). Unless you reencode from source don't do that.

uryga · on July 7, 2020

i appreciate your concern :) but honestly, is that a practical problem, or a theoretical one? the end result was fine to watch in our case. (and upload itself wasn't of great quality anyway)

acdha · on July 8, 2020

It can be a practical problem depending on the source: if you're moving from a relatively higher-resolution / less-compressed video it won't be noticeable but if you're starting from video which has already been compressed fairly aggressively it can be fairly noticeable.

One area where this can be important to remember is when comparing codecs: a fair number of people will make the mistake where they'll take a relatively heavily compressed video, recompress it with something else, and get a size reduction which is a lot more dramatic than what you'd get if you compared both codecs starting from a source video which has most of the original information.

cogman10 · on July 7, 2020

Another thing to consider, exactly the same thing happened when h.264 was first released.

Even though x264 quickly started seeing better results compared to DivX and XVid, you didn't see pirate encodes switch to x264 for years.

StuffMaster · on July 7, 2020

Yes and x264 is such an improvement that it was worth it. HW support became nearly universal. DivX and Xvid didn't have that.

So it was kind of a magical codec upgrade and now it seems more incremental to me.

proverbialbunny · on July 7, 2020

The scene switched to x264 almost immediately. It did not take years.

math0ne · on July 6, 2020

It was def hardware support that changes the piracy groups policies.

Uhhrrr · on July 6, 2020

Having GPU or hardware support to speed up encoding can make a big difference in adoption.

ww520 · on July 7, 2020

H.265 is patent laden. H.264 is much better in avoiding getting sued.

CaptainZapp · on July 7, 2020

That's patently wrong[1]

Money quote :

"H.264 is a newer video codec. The standard first came out in 2003, but continues to evolve. An automatically generated patent expiration list is available at H.264 Patent List based on the MPEG-LA patent list. The last expiration is US 7826532 on 29 nov 2027 ( note that 7835443 is divisional, but the automated program missed that). US 7826532 was first filed in 05 sep 2003 and has an impressive 1546 day extension. It will be a while before H.264 is patent free."

(emphasis mine)

[1] https://www.osnews.com/story/24954/us-patent-expiration-for-...

hexmiles · on July 6, 2020

not an expert, but from what i understand is more like that they "extend" the codec with technique more effective on higher resolution content, or new "profile" (parameter) more effective for higher resolution content (a bit like you can have different parameter when you zip file).

This new techniques can also be used for 1080p video (for example) but with lower gain. Also the "old" algorithm/system are generally still used but they may be improved/extend.

ksec · on July 6, 2020

>Like, why couldn't have H.266 been invented 30 years ago?

It is all a matter of trade offs and engineering.

For MPEG / H.26x codec, the committees start the project by asking or defining the encoding and decoding complexities. And if you only read Reddit or HN, most of the comment's world view would be Video codec are only for Internet Video and completely disregard other video delivering platform. Which all have their own trade off and limitations. There is also cost in decoding silicon die size and power usage. If more video are being consumed on Mobile and battery is a limitation, can you expect hardware decoding energy usage to be within the previous codec? Does it scale with adding more transistors, are there Amdahl's law somewhere. etc It is easy to just say adding more transistor, but ultimately there is a cost to hardware vendors.

Vast majority of the Internet seems to think most people working on MPEG Video Codec are patents trolls and idiots and paid little to no respect to its engineering. When as a matter of fact Video Codec are thousands of small tools within the spec, and pretty much insane amount of trial and error. It may not be as complicated as 3GPP / 5G level of complexity, but it is still lot of work. Getting something to compress better while doing it efficiently is hard. And as Moore's Law is slowing down. No one can continue to just throwing transistors at the problem.

sp332 · on July 6, 2020

I don't know much about H.266, but some of the advances in H.265 depended on players having enough RAM to hold a bunch of previous decoded frames, so they could be referred to by later compressed data. Newer codecs tend to have a lot more options for the encoder to tune, so they need a combination of faster CPUs and smarter heuristics to explore the space of possible encodings quickly.

pedrovhb · on July 6, 2020

I wonder if instead of heuristics, machine learning could be used to figure out the best parameters.

Sohcahtoa82 · on July 6, 2020

In a somewhat-related topic, you might be interested in DLSS [0], where machine learning is being used in graphics rendering in games to draw the games at a lower resolution, then upscale the image to the monitor's resolution using a neural network to fill in the missing data. I imagine a similar thing could be done with video rendering, though you'd need some crazy computing power to train the neural network for each video, just like DLSS requires training for each game.

[0] https://en.wikipedia.org/wiki/Deep_learning_super_sampling

DavidVoid · on July 6, 2020

That seems likely at least.

You could actually use ML for all of the video decoding, but that research is still in it's early stages. It has been done rather well with still images [1], so I'm sure it'll eventually be done with video too.

Those ML techniques are still a little slow and require large networks (the one in [1] decodes to PNG at 0.7 megapixels/s and its network is 726MB) so more optimizations will be needed before they can see any real-world use.

[1] https://hific.github.io/ HN thread: https://news.ycombinator.com/item?id=23652753

giantrobot · on July 6, 2020

That's already done today. Most modern codecs support variable bitrate encoding so more of the data budget can be given to high complexity scenes. Source video can also be processed multiple times with varied parameters and that output then compared structurally to the source to find the parameters that best encode the scene. This is beyond the more typical multi pass encoding where a first pass over the source just provides some encoding hints for the second pass. It takes several times longer to encode (though is embarrassingly parallel) but the output ends up higher quality for a given data rate than a more naïve approach.

social_quotient · on July 7, 2020

Any apps you can recommend for giving this a spin?

sp332 · on July 7, 2020

x264 with 2-pass encoding? "Machine learning" doesn't only mean deep neural networks.

IfOnlyYouKnew · on July 7, 2020

In the limit case, compression and AI are identical.

Once you get to an AI that has full comprehension of what humans perceive to be reality, you can just give them a rough outline of a story, add some information on casting, writers, and Spielberg's mood during production, and they'll fill in the (rather large) blanks.

That's a bit exaggerated, but I remember reading about one such algorithm a few days ago (by Netflix, maybe?). It was image compression that had internal representations such as "there is an oak tree on the left".

It would then run the "decompression", find the differences to the original, and add further hints where neccessary.

fwip · on July 6, 2020

Sure. Machine Learning is just "heuristics we don't understand."

giantrobot · on July 7, 2020

Or more typically Monte Carlo heuristics.

mratsim · on July 7, 2020

Reinforcement learning uses Monte Carlo a lot but traditional machine learning or deep learning don't.

proverbialbunny · on July 6, 2020

I was hoping H266 was going to be a neural network based approach, but it looks like that might end up being H267.

Right now neural networks allow for higher compression for tailored content, so you need to ship a decoder with the video, or have several categories of decoders. The future is untold and it might end up not being done this way.

Dylan16807 · on July 7, 2020

I think the number of previous frames for typical settings went from about 4 to about 6 as we went from H.264 to H.265. And the actual max in H.264 was 16. So that doesn't seem like a huge factor.

nwallin · on July 6, 2020

Computers wouldn't have been fast enough. Moore's law is a hell of a drug.

In the mid '90s, PCs often weren't fast enough to decode DVDs, which were typically 720x480 24FPS MPEG2. DVD drives were often shipped with accelerator cards that decoded MPEG2 in hardware. I had one. My netbook is many orders of magnitude faster than my old Pentium Pro. But it's not fast enough to decode 1080p 30fps H.265 or VP9 in software. It must decode VP9/H.265 on the GPU or not at all. MPEG2 is trivial to decode by comparison. I would expect a typical desktop PC of the mid '90s to take seconds to decode a frame of H.265, if it even had enough RAM to be able to do it at all.

It's an engineering tradeoff between compression efficiency of the codec and the price of the hardware which is required to execute it. If a chip which is capable of decoding the old standard costs $8, and a chip which is capable of decoding the new standard costs $9, sure, the new standard will get lots of adoption. But if a chip which is capable of decoding the new standard costs $90, lots of vendors will balk.

savoytruffle · on July 7, 2020

Indeed. The brand-new fancy Blue & White Power Mac G3's from early 1999 were the first Macs that shipped with a DVD drive, and they could play video DVD's but they had an obvious (and strange) additional decoder mezzanine card on the already unusual Rage128 PCI video card.

By the end of that year the G4 Power Macs were just barely fast enough to play DVD's with software decoding and assistance from the PCI or later AGP video card. And after a while (perhaps ~ 2002?), even the Blue G3's could do it in software even if you got a different video card, as long as you also upgraded to a G4 CPU (they were all in ZIF sockets).

It was very taxing on computers at y2k!

Later autumn 2000 G3 iMacs could also play DVD's but I think they needed special help from a video co-processor.

k_bx · on July 6, 2020

From what I've heard (would love to hear more expertise on this), it's incredibly hard to invent a new video compression algorithm without breaking an existing set of patents, and there's also no easy way to even know whether you're breaking anything as you develop the algo. Thus the situation we're in is not that it's too hard to develop better codecs, but that you've very disincentivized to do so.

zanny · on July 6, 2020

Which then begs the question - why are video compression standards developed in the US at all? MPEG is obviously US based but Xiph is also a US nonprofit. The software patents should be hugely crippling the ability for Americans to develop competitive video codecs when every other nation doesn't have such nonsense. Why hasn't Europe invested in and developed better codecs that combine the techniques impossible to mix in the states?

Is it just basically the same mechanism that leads to so much drug development happening in the US despite how backwards its medical system is, because those regressive institutions create profit incentives not available elsewhere (to develop drugs or video codecs for profit) and thus the US already has capitalists throwing money at what could be profitable whereas everyone else would look at it as an investment cost for basically research infrastructure.

not2b · on July 6, 2020

MPEG is not US-based.

https://en.wikipedia.org/wiki/Moving_Picture_Experts_Group

The article we are all commenting on is by a German research organization that has been a major contributor to video coding standards.

Perhaps you're confused by the patent issue? European companies are happy to file for US patents and collect the money.

k_bx · on July 7, 2020

This sounds like a somewhat obvious way to side-step the patent mechanism, so I would assume patents prevent this kind of a thing, when you develop patent-breaking technology abroad and then "just use" it wherever you want. You're probably not allowed to use the patented technology in any of the products you're building.

sly010 · on July 6, 2020

It's about the assumptions made during the standardization.

Compared to 30 years ago, we now have better knowledge and statistics about what low level primitives are useful in a codec.

E.g. jpeg operates on fixed 8×8 blocks independently, which makes it less efficient for very large images than a codec with variable block size. But variable block size adds overhead for very small images.

An other reason can be common hardware. As hardware evolves, different hardware accelerated encoding/decoding techniques become feasible that gets folded into the new standards.

pault · on July 6, 2020

Something that I learned about 10 years ago when bandwidth was still expensive is that you can make a very large version of an image, set the jpeg compression to a ridiculously high value, then scale the image down in the browser. The artifacts aren't as noticeable when the image is scaled down, and the file size is actually smaller than what it would be if you encoded a smaller image with less compression.

tatersolid · on July 7, 2020

This “huge JPEG at low quality” technique has been widely known for years. But it is typically avoided by larger sites and CDNs, as it requires a lot more memory and processing on the client.

Depending on the client or the number of images on the site the huge JPEG could be a crippling performance issue, or even a “site doesn’t work at all” issue.

MaxBarraclough · on July 6, 2020

Interesting. I've never heard of this, but it makes some sense: the point of lossy image compression is to provide a better quality/size ratio than downscaling.

Dylan16807 · on July 7, 2020

> But variable block size adds overhead for very small images.

What kind of overhead?

The extra code shouldn't make a big difference if nothing is running it.

And space-wise, it should only cost a few bits to say that the entire image is using the smallest block size.

Is the worry just about complicating hardware decoders?

Slamidan · on July 7, 2020

I still remember when my pc would take 24 hours to compress a dvd to a high quality h264 mkv, sure you could squeeze it down with fast presets in handbrake but the point was transparency. Now I'm sure for most normal pc's the time to compress at the same quality with h.265 is the same 24 hours, in 4k, even longer, I'm sure h266 would take more than twice as long easily.

Early pc's had separate and very expensive mpeg decode boards just to decode dvd, creative sold a set, the cpu simply couldn't even handle mpeg 2. I know its hard to believe but there was a time when playing back an mp3 was a big ask, all these algorithms could be made long ago, but they would have been impractical fantasy. Only now are we seeing real partial cheated resolution ray tracing in modern high end gaming hardware now which is a good comparison, ray tracing has been with us for a long time, only hardware advancement over decades has made it viable.

It amused me that they claimed 4k uhd h265 is now 10GB for a movie, that's garbage bitrate, they always ask too much of these codecs.

samoa42 · on July 7, 2020

> I know its hard to believe but there was a time when playing back an mp3 was a big ask

can confirm. audio playback would stutter on my 486dx if one dared to multitask.

joveian · on July 7, 2020

Good compression is quite complex and can go wrong in an unimaginable variety of ways. Remember when Xerox copiers would copy numbers incorrectly due to compression? The numbers would look clear, they just wouldn't always be the same numbers that you started with.

https://www.theregister.com/2013/08/06/xerox_copier_flaw_mea...

rasz · on July 8, 2020

Xerox problem stemmed from simple replacement of "recognized" numbers with learned dictionary. Good implementation would use learned alphabet atlas as a supplement, encoding difference between guess and source image. That way even predicted 0 instead of 8 wouldnt be catastrophic with encoder filling in missing detail.

johnvanommen · on July 7, 2020

> Like, why couldn't have H.266 been invented 30 years ago? Is it because the computers back in the day wouldn't have been fast enough to realistically use it?

Here's something to consider:

In 1995, a typical video stream was 640 x 480 x 24fps. That's 7,372,800 pixels per second.

In 2020, we have some content that's 7680 x 4320 x 120fps. That's 3,981,312,000 pixels per second, or a 540 fold increase in 25 years.

The massive increase in image size actually makes it easier to use high compression ratios. I found this out the hard way recently, when I was trying to compress and email a powerpoint presentation that a coworker had presented on video. In a nutshell, the powerpoint doc with it's sharp edges and it's low resolution made it difficult to compress.

Increase framerate plays a factor too; due to decades of research on motion interpolation, algorithms have become quite good at guessing what content can be eliminated from a moving stream.

aey · on July 6, 2020

Compression is AI. It’s never going to be “all” figured out.

0-_-0 · on July 6, 2020

Another way if saying it is that compression is understanding.

jacobush · on July 6, 2020

Lossy compression is, I feel compelled to add.

aey · on July 6, 2020

Actually both! Arithmetic coding works over any kind of predictor.

milansuk · on July 6, 2020

End-credits are just text. So it should be possible to put it through OCR and save only text, positions, and fonts. And the text is also possible to compress with a dictionary.

lukifer · on July 6, 2020

Credits also contain logos/symbols (near the end), and often have stylistic flairs as well. Video compression is based on making predictions and then adding information (per Shannon's definition) for the deltas from those predictions. The pattern of credits statically sliding at a consistent rate is exactly the sort of prediction codecs are optimized for; for instance, the same algorithms will save space by predicting repeated pixel patterns during a slow camera pan.

Still, I've often thought it would be nice if text were a more first-class citizen within video codecs. I think it's more a toolchain/workflow problem than a shortcoming in video compression technology as such. Whoever is mastering a Blu-Ray or prepping a Hollywood film for Netflix is usually not the same person cutting and assembling the original content. For innumerable reasons (access to raw sources, low return on time spent, chicken-egg playback compatibility), it just doesn't make sense to (for instance) extract the burned-in stylized subtitles and bake them into the codec as text+font data, as opposed to just merging them into the film as pixels and calling it a day.

Fun fact: nearly every Pixar Blu-Ray is split into multiple forking playback paths for different languages, such that if you watch it in French, any scenes with diegetic text (newspapers, signs on buildings) are re-rendered in French. Obviously that's hugely inefficient; yet at 50GB, there's storage to spare, so why not? The end result is a nice touch and a seamless experience.

giantrobot · on July 6, 2020

Text with video is difficult to do correctly for a few different reasons. Just rendering text well is a complicated task that's often done poorly. Allowing arbitrary text styling leads to more complexity. However for the sake of accessibility (and/or regulations) you need some level of styling ability.

This is all besides complexity like video/audio content synced text or handling multiple simultaneous speakers. Even that is besides workflow/tooling issues that you mentioned.

The MPEG-4 spec kind of punted on text and supports fairly basic timed text subtitles. Text essentially has timestamp where it appears and a duration. There's minimal ability to style the text and there's limits on the availability of fonts though it does allow for Unicode so most languages are covered. It's possible to do tricks where you style words at time stamps to give a karaoke effect or identify speakers but that's all on the creation side and is very tricky.

The Matroska spec has a lot more robust support for text but it's more of just preserving the original subtitle/text encoding in the file and letting the player software figure out what to do with that particular format and then displaying it as an overlay on the video.

It's unfortunate text doesn't get more first class love from multimedia specs. There's a lot that could be done, titles and credits as you mention, but also better integration of descriptive or reference text or hyperlink-able anchors.

dfox · on July 6, 2020

MPEG 4 (taken as a the whole body of standards, not as two particular video codecs) actually has provisions for text content, vector video layers and even rudimentary 3D objects. On the other hand I'm almost sure that there are no practical implementations of any of that.

duskwuff · on July 6, 2020

Oh, and that's only the beginning. The MPEG-4 standard also includes some pretty wacky kitchen-sink features like animated human faces and bodies (defined in MPEG-4 part 2 as "FBA objects"), and an XML format for representing musical notation (MPEG-4 part 23, SMR).

giantrobot · on July 6, 2020

Don't forget Java bytecode tracks!

occamrazor · on July 6, 2020

Scene releases often had optimized compression settings for credits (low keyframes, b&w, aggressive motion compensation, etc.)

jaywalk · on July 6, 2020

The text, positions and fonts could very well take up more space than the compressed video. And then with fonts, you have licensing issues as well.

adrianmonk · on July 6, 2020

Recognizing text and using it to increase compression ratios is possible. I believe that's what this 1974 paper is about:

https://www.semanticscholar.org/paper/A-Means-for-Achieving-...

gsich · on July 6, 2020

True, but end credits take very little space compared to the rest of the movie.

blattimwind · on July 6, 2020

x264 is kinda absurdly good at compressing screencasts, even a nearly lossless 1440p screencast will only have about 1 Mbit/s on average. The only artifacts I can see are due to 4:2:0 chroma subsampling (i.e. color bleed on single-pixel borders and such), but that has nothing to do with the encoder, and would almost certainly not happen in 4:4:4, which is supported by essentially nothing as far as distribution goes.

imhoguy · on July 6, 2020

Why not use deep learning to recognize actor face patterns in scenes and build entire movies from AI models?

adrianmonk · on July 6, 2020

I'm not super strong on theory, but if I'm not mistaken, doesn't Kolmogorov complexity (https://en.wikipedia.org/wiki/Kolmogorov_complexity) say we can't even know if it is all figured out?

The way I understand it is that one way to compress a document would be to store a computer program and, at the decompression stage, interpret the program so that running it outputs the original data.

So suppose you have a program of some size that produces the correct output, and you want to know if a smaller-sized program can also. You examine one of the possible smaller-sized programs, and you observe that it is running a long time. Is it going to halt, or is it going to produce the desired output? To answer that (generally), you have to solve the halting problem.

(This applies to lossless compression, but maybe the idea could be extended to lossy as well.)

corty · on July 7, 2020

I really ain't a theorist either, but:

If you are looking at Kolmogorov complexity you are right, we can't ever know. But Kolmogorov complexity is about single points in the space of possible outputs. It basically says "there might be possible outputs that do look random, but are actually produced by a very short encoding". One example would be the digits of pi.

But if you look at the overall statistics of possible output streams, and at their averages, there is a lower bound for compression on average. As soon as the bitlength in the compressed stream matches the entropy in the uncompressed stream in bits, you reached maximum compression. There will be some streams that don't conform to those statistics, but their averages will.

However, we are somewhat far away from matched entropy equilibrium for video compression. And even then, improvements can be made, not in compression ratio but in time, ops and energy needed for de/encoding.

masklinn · on July 6, 2020

> It's interesting that they are able to continue improving video compression. You'd think that it would have all been figured out by now.

Would you? AV1 was only officially released 2 years ago, h.265 7, h.264 14, …

Ace17 · on July 6, 2020

Ten-year software video compression engineer here:

TL;DR: it's partly because we're using higher video resolutions. A non-negligible part of the improvement stems from adapting existing algorithms to the now-doubled-resolution.

Almost all video compression standards split the input frame into fixed-size square blocks, aka "macroblocks". To put it simply, the macroblock is the coarsest granularity level at which compression happens.

- H.264 and MPEG-2 Video use 16x16 macroblocks (ignoring MBAFF).

- H.265 use configurable quad-tree-like macroblocks, with a frame-level configurable size up to 64x64.

- AV1 makes this block-size configurable up to 128x128.

Which means:

Compression to H.264 a SD video (720x576, used by DVDs) results in 1620 macroblocks/frame.

Compressing to H.265 a HD video (1920x1080) results in at least 506 macroblocks/frame.

Compressing to AV1 a 4K video (3840x2160) results in at least 506 macroblocks/frame.

But compressing to H.264 a 4K video (3840x2160) will result in 32400 macroblocks/frame.

The problem is, there are constant bitcosts per-macroblock ((mostly) regardless of the input picture). So using H.264 to compress 4K video will be inefficient.

When you take an old compression standard to encode recent-resolution content, you're using the compression standard outside of the resolution domain for which it was optimized.

> Is this continued improvement related to the improvement of technology? Or just coincidental?

Of course, there also "real" improvements (in the sense "qualitative improvements that would have benefited to compression old video resolutions, if only we had invented them sooner").

For example:

- the context-adaptive arithmetic coding from H.264, which is a net improvement over classic variable-length huffman coding used by MPEG-2 (and H.264 baseline profile).

- the entropy coding used by AV1, which is a net improvement over H.264's CABAC.

- integer DCT (introduced by H.264), which allow bit-accuracy checking and way lot easier and smaller hardware implementations (compared to floating point DCT that is used by MPEG2).

- loop filters: H.264 pioneered the idea of a normative post-processing step, whose output could be used to predict next frames. H.264 had 1 loop filter ("deblocking"). HEVC had 2 loop filters: "deblocking" and "SAO". AV1 has 4 loop filters.

All of these a real improvements, brought to us by time, and extremely clever and dedicated people. However, the compression gains of these improvements are nowhere near the "50% less bitrate" that is used to sell each new advanced-high-efficiency-versatile-nextgen video codec. Without increasing - a lot - the frame resolution, selling a new video compression standard will be a lot harder.

Besides, now that the resolutions seems to have settled up around 4K/8K (and that "high definition" has become the lowest resolution we might have to deal with :D), things are going to get interesting ... provided that we don't start playing the same game with framerates!

fomine3 · on July 8, 2020

I hope next target is VR optimized codec.

videoborat · on July 8, 2020

H.266 VVC includes tools specifically for VR use cases like doing a motion vector wrap around at the boundaries of 360 equirectangular video or better support for independently coded tiles (subpictures in VVC lingo) which are used in viewport-dependent streaming of 360 content.

marta_morena_25 · on July 7, 2020

"You'd think that it would have all been figured out by now."

Would you? Video compression is one of the few things that we will work on for the next 1000 years and still be nowhere near finished. The best video compression would be to know the state of the universe at the big bang, have a timestamp of the beginning and end of your clip and spatial coordinates defining your viewport. Then some futuristic quantum computer would just simulate the content of your clip...

So yeah, sure we are done with video compression :). This is of course an extreme example of constant time compression that may or may not be ever feasible (if we live in a computer simulation of an alien race, then it is already happening).

But the gist is the same. Video compression is mostly about inferring the world and computing movement not by storing the content of the image.

For instance by taking a snapshot of the world, decomposing it into geometric shapes (pretty much the opposite of 3D rendering) and then computing the next frames by morphing these shapes + some diff data that snaps these approximations back in line with the actual data.

We are all but in the very infancy of video compression. What should surprise you is why it takes us so long to get anywhere.

linuxftw · on July 6, 2020

The way I read the release was that it's not a lossless compression, it reads like it's downscaling 4k+ video to a lower format with 'no perceptible loss of quality.' Since this is also seemingly targeted at mobile, I'm guessing the lack of perceptible loss of quality is a direct function of screen size and pixel density on a smaller mobile devices.

For me, this is another pointless advance in video technology. 720p or 1080p is fantastic video resolution, especially on a mobile phone. Less than 1% of the population cares or wants higher resolution.

What new technologies are doing now is re-setting the patent expiration clock. As long as new video tech comes out every 5-10 years, HW manufacturers get to sell new chips, phone manufacturers get to sell new phones, TV manufacturers get to sell new TVs, rinse, repeat.

Sohcahtoa82 · on July 6, 2020

> 720p or 1080p is fantastic video resolution, especially on a mobile phone.

720p is far from fantastic. It's noticeably blurry, even on mobile.

1080p is minimally acceptable, and is now over 10 years old.

> Less than 1% of the population cares or wants higher resolution.

That's a very bold claim. Have any studies or polls to back that up?

bscphil · on July 6, 2020

> 720p is far from fantastic. It's noticeably blurry, even on mobile. 1080p is minimally acceptable, and is now over 10 years old.

Not OP. I would say these are far-fetched claims that need defending. Most blurriness of mobile video comes from the low bitrate it's encoded at. Basically nobody is watching Bluray quality 720p or 1080p materials on a phone - and that's the problem, not the resolution.

My guess is that a typical middle or even middle-upper class family is going to have a TV that is less than 70 inches, and is 10+ feet from most viewers. Even with 20-20 vision, the full quality provided by 1080p is not even visible at that distance! (You'd need to go all the way up to 78 inches at 10 feet, or sit 7 feet from your 55 inch set to even get the full benefit from a 1080p set.) See this very helpful chart: http://s3.carltonbale.com/resolution_chart.html

Most of the benefit in 4k video comes from recent advances in HDR presentation and better codecs, not from the resolution. Sure, if you're a real stickler for quality, you might be sitting 6 feet from your 80 inch OLED set, and 4k is definitely for you in that case, but it's really not that important to the average person. In my case, I can barely distinguish between 720p and 1080p on my set even with glasses on.

Now granted, it's great to have laptops and tablets at higher resolutions, because your face is smashed up against them and you're often trying to read fine text. But that's not the video case that's being talked about here.

Dylan16807 · on July 7, 2020

I don't need to get the full benefit to care about the difference. And in my experience there's a lot of screens closer than 10 feet to couches.

On mobile, 720p starts to get shoddy once your screen hits 5 inches across.

So while 4k is situational, and encoding quality is more important than the extra resolution most of the time, 1080 vs. 720 is pretty clear-cut; 1080 should be considered the minimum for most content.

bscphil · on July 7, 2020

Note that I was talking about the full benefit of 1080p, not 4k. My points are that 4k is usually pointless, and therefore that 1080p is usually better than just "minimally acceptable", since most of the time we don't even get the full benefit of it.

> On mobile, 720p starts to get shoddy once your screen hits 5 inches across.

Even assuming you're right abut this, I guess I really just have a hard time caring. Anything you watch on your phone is at best something you don't give a shit about, artistically speaking, and it's hard to imagine 1080p vs 720p making any kind of difference to the experience. (I suppose I might be biased since my screen is "only" 5.2 inches diagonally - I get the smallest one I can whenever I buy a new phone.)

And for what it's worth, using the same math as for my previous comment, you can't even get the full benefit of 720p on a 5 inch diag screen unless you're holding it less than a foot from your face. Granted, you can get the full benefit of 1080p at a little under 8 inches, but I'm suffering even imagining trying to watch a video this way. Even at this distance, I would dispute using "shoddy" to describe how 720p will look.

The math is actually pretty simple: for a 720p screen, there are sqrt(12802 + 7202) pixels on the 5 inch diagonal, so a distance of 5 inches / sqrt(12802 + 7202) per pixel. 20/20 visual acuity can resolve roughly 1 arc minute, or pi/10800 radians. By the arc length formula, the distance we calculated subtends that angle at (5 inches / sqrt(19202+10802)) * (10800/pi), or 11.7 inches.

> And in my experience there's a lot of screens closer than 10 feet to couches.

Note that I addressed this point. If you have a pretty typical 50 inch TV, you've got to have it closer than 7 feet from your couch for 4k to make any difference at all.

Dylan16807 · on July 7, 2020

> Note that I was talking about the full benefit of 1080p

Yes, so am I.

> therefore that 1080p is usually better than just "minimally acceptable", since most of the time we don't even get the full benefit of it.

If most of your users would be limited by 720p, and 1080p is standard and easy to do, then I'm comfortable calling 1080p the minimum.

> I'm suffering even imagining trying to watch a video this way.

The official "Retina" numbers have a phone 10 inches from your face. And that's about where I hold it when I have my glasses on.

> 20/20 visual acuity can resolve roughly 1 arc minute, or pi/10800 radians.

That's a good baseline number, but a lot of people can beat it by a significant fraction.

ants_a · on July 7, 2020

Average visual acuity for young people is about 0.7 arcminutes, best being around 0.5.

There seems to be disconnect about how people are consuming media. For me holding a 2280x1080 6.3" phone ~8" from my eyeballs is a natural viewing distance and I can see the full resolution without difficulty. And at least from my point of view a 65" TV is also a pretty typical size.

linuxftw · on July 6, 2020

> That's a very bold claim. Have any studies or polls to back that up?

From 1:

> In Japan, only 14 percent of households will own a 4K TV in 2019 because most households already have relatively new TVs, IHS said.

Let's dissect the reasoning. Most households already have a relatively new TV, thus a low adoption rate. Implying that needing a new TV generally, not the desire to upgrade resolution, is the primary motivating factor in purchasing a TV. In fact, most 4K TVs are already as cheap as the rest of the market.

I truly believe that almost everyone does not care about 4K whatsoever. In fact, even if they do 'care' it's not because they know what they're talking about. Most of the enhancements that 4K TVs bring are an artifact of better display technology, rather than increased resolution. See 2.

Streaming 4K+ video is a waste of resources with no tangible benefit to anyone other than marketing purposes. Netflix streams '4K' because everyone has a '4K' TV now and they demand it.

[1]: https://www.twice.com/research/us-4k-tv-penetration-hit-35-2...

[2]: https://www.cnet.com/news/why-ultra-hd-4k-tvs-are-still-stup...

bscphil · on July 6, 2020

I agree with most of what you're saying, but this is actually wrong.

> Streaming 4K+ video is a waste of resources with no tangible benefit to anyone other than marketing purposes. Netflix streams '4K' because everyone has a '4K' TV now and they demand it.

It's not inherently wrong, just practically so. The difference is that Netflix (and often other streaming services) max out at a much higher bitrate for their 4k streaming, and are using a better codec (H.265) as well. By comparison, Netflix's bitrate for 1080p is severely limited and so if you compare the two, even watching at the 1080p level of detail, streaming in 4k will often be a vastly superior experience.

So it's not inherent (not a result of the resolution), but still, streaming 4k is not pointless at present.

Sohcahtoa82 · on July 6, 2020

Your first link includes an extra detail which is important:

> In addition, “with the Japanese consumer preference for smaller TV screens, it will be more difficult for 4K TV to expand its household penetration in the country"

With a smaller TV screen, yeah, you don't need 4K.

But in the USA, larger screens are desirable. And that's seen in the expected 34% 4K adoption rate in the USA your article describes.

I still use a 1080p TV, but it's also only 46" and is 10 years old. I'll probably be buying a 4K 70-75" OLED later this year.

My computer monitor is 1440p. I could have bought 4K when I upgraded, but I'm primarily a gamer and I wanted 144 hz, and 4K 144 hz monitors didn't exist yet.

KitDuncan · on July 6, 2020

Can we all just agree on using AV1 instead of another patent encumbered format?

otterley · on July 6, 2020

No, because the market is more than happy to pay a few cents or dollars per device to get better compression and lower transmission bandwidth. This observation has held true consistently in the 3 decades since compressed digital media was invented.

throw0101a · on July 6, 2020

> No, because the market is more than happy to pay [...]

Is it? Because Google/YouTube, Amazon/Twitch, Netflix, Microsoft, Apple, Samsung, Facebook, Intel, AMD, ARM, Nvidia, Cisco, etc, are all part of AO Media:

* https://aomedia.org/membership/members/

The main major tech player I don't see is Qualcomm.

threeseed · on July 6, 2020

The use cases for video is significantly larger than a few tech companies e.g broadcast.

And most of those companies are also part of MPEG as well.

throw0101a · on July 8, 2020

What would be a list of non-tech companies prevalent in the broadcast space?

They're part of MPEG because of legacy reasons in having to deal with H.264.

corty · on July 6, 2020

The market is rather unhappy. E.g. Win10 doesn't ship an H265 codec because it's too expensive.

Const-me · on July 6, 2020

> Win10 doesn't ship an H265 codec

Very few Win10 users would want a CPU-targeted HEVC codec.

Intel, nVidia and AMD have that codec in their hardware. They are probably paying for a license to use these patents, they ship Win10 drivers for their hardware, and Microsoft publishes that drivers on Windows update.

otterley · on July 6, 2020

At 99 cents for the add on, the decision to charge users smells more like a political decision than an economic one. The cost to Microsoft is undoubtedly far less than that. 99 cents is basically the bare minimum you can charge when you accept credit cards as a method of payment.

TD-Linux · on July 6, 2020

https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding#P...

$0.20 for MPEG-LA, $0.40 for HEVC Advance, and "call us" for Technicolor and Velos Media. $0.99 doesn't sound far off.

eggsnbacon1 · on July 6, 2020

That's 1% of the cost of windows, for a video codec when there's already dozens of free alternatives

otterley · on July 7, 2020

There’s a fixed cap on the royalty rate, so even if we assume Microsoft paid for a license for each copy of Windows sold to customers, on a per-copy basis it would be much less than 1%.

TiredOfLife · on July 6, 2020

The constantly growing list of H265 patent pool organizations with various licensing plans made even Apple to join AV1 bandwagon.

otterley · on July 6, 2020

Apple is also on the list of organizations behind H.266 so it’s difficult to conclude anything beyond wanting to bet on all the horses.

awill · on July 6, 2020

that's not why Apply joined AV1. They joined AV1 because they were forced to. Netflix, Google, Amazon are all on the AV1 bandwagon. If Youtube, Netflix and Prime Video all use AV1 for future higher quality streams, Apple cannot avoid supporting it.

elithrar · on July 7, 2020

It works both ways: if popular devices don’t support it in decode, encoding your library in a format that is slow(er) to decode and/or more taxing on battery life isn’t a clear win.

TiredOfLife · on July 9, 2020

Youtube has been vp9 only above 1080p for a long time, but Apple is adding support for VP9 only in the upcoming OS releases.

pkulak · on July 6, 2020

Why are we assuming H.266 is better than AV1? Or at least better enough to warrant all the trouble and cost of licensing?

restalis · on July 6, 2020

I am not a fan of patent encumbered technologies, but here the assumption of being better has merit. There isn't much sense from a business prospective for a research team to publish a commercial solution that would exhibit inferior or even comparable performance to an existing free solution. As for the problem of convincing people to swallow the cost, just organize a show off campaign and leave the rest to sellers. That's how (at least) Apple does it.

pkulak · on July 6, 2020

I dunno, there are a lot of reasons companies choose technologies. If Fraunhofer gets this thing into hardware (by whatever means it takes), that could be the end of the debate. There's also probably a bit of "no one ever got fired for choosing Fraunhofer" going on as well.

xigency · on July 6, 2020

I wouldn't evaluate the pros and cons of a technology based on decisions made by business executives. I would look at objective third-party comparisons of the encoding and decoding before deciding which is better: H.266 or AV1.

nickik · on July 6, 2020

Now audio uses open codecs for the part and new video will to.

ksec · on July 6, 2020

That is because Audio Encoding hasn't seen much improvement as compared to video.

Nearly 30 years after MP3, the only audio codec that could rivals mp3 at the standard rate of 128kbps at a significant lower bitrate was Opus at 96Kbps.

And MP3 is still by far the most popular codec due to compatibility reason.

This is similar to JPEG, although things are about to change.

bscphil · on July 6, 2020

> Nearly 30 years after MP3, the only audio codec that could rivals mp3 at the standard rate of 128kbps at a significant lower bitrate was Opus at 96Kbps.

AAC and Vorbis were doing this for years before Opus was on the scene. Opus is a further improvement on audio codecs, but not an unprecedented one.

moonchild · on July 6, 2020

Opus and vorbis have both improved on mp3, flac has improved in the lossless space, and there are other codecs that do better at very low bitrates (think 20-30 kbps).

zanny · on July 6, 2020

I don't have the Netflix / Disney+ / etc containers to analyze but Youtube has pretty much totally purged mp3 from every video on the site. Billions of watch hours a day of Opus audio there at least.

rasz · on July 7, 2020

Nah, Dolby excels at injecting itself where not needed. Blurays support 8-channel LPCM (uncompressed) audio, yet Dolby managed to push its proprietary junk codecs in there.

jimktrains2 · on July 6, 2020

It's not that the market is happy to pay more, it's that there is essentially no choice.

inscionent · on July 6, 2020

A captive market is a happy market, no?

rbinv · on July 6, 2020

Not for consumers, obviously?

otterley · on July 6, 2020

This is not a captive market. If someone were to invent a free codec that performed similarly and had both software and hardware reference implementations, the market would adopt it very quickly.

This is a market that is voluntarily paying for perceived value.

jimktrains2 · on July 6, 2020

Please show me where I can pick between similar consumer devices where there supported codecs are easy to find, or, better yet, I can pay extra for non free codecs.

otterley · on July 6, 2020

The few HNers who actually care about these things do not make a market that vendors think is worth serving, most likely because it would be unprofitable. There’s not going to be a market if sellers don’t find it profitable.

jimktrains2 · on July 6, 2020

That's exactly my point: those options don't exist so there is no hard data on what people prefer. It's silly to say the market decided when it's only speculated on the most profitable path, but that doesn't make it the only profitable one.

otterley · on July 7, 2020

There have been many attempts to make a market for free hardware, particularly in the mobile phone market. All have failed thus far.

jimktrains2 · on July 7, 2020

Libre hardware is not the same as a device supporting only royalty free or otherwise libre codecs.

otterley · on July 7, 2020

You’re splitting hairs. Customers, by and large, just do not care.

ksec · on July 6, 2020

The market has plenty of choices from VP8 to Theora.

jimktrains2 · on July 6, 2020

Please show me where I can purchase a roku-style devices where I can easily chose between codec support or pay extra for non-free codecs.

randall · on July 6, 2020

That's the direction everyone is going, so I think h.266 is an attempt to give vendors pause before moving to av1.

AV1 will probably win in most circumstances (big tech) but is unlikely to win where there are big gains to be had by reducing file size (broadcasters with gigantic libraries).

Broadcasters are also used to paying a lot and not getting much.

donatj · on July 6, 2020

Honestly, the first step in this is getting the ffmpeg av1 library to a good usable place. It's currently so slow as to be near unviable. I'd happily switch when it becomes a usable option.

zanny · on July 6, 2020

Is there any indication that rav1e is so inefficient as to have substantive double digit percentage speedup left to be realized? Because even if you cut encode times in half, they already take hours per minute of video on a quad core.

AV1 is unlikely to ever be practical for "muggle" encode use, at least in this decade. It will only be worth committing that much compute workload to making a smaller file if the recipients will number in at least what, millions?

I'd be really curious what a hardware realtime AV1 encoder would even look like. How much silicon would that take? That kind of chip would have be be colossal even if it sacrifices huge amounts of efficiency to spit out frames at reasonable time (in the same way hardware hevc and vp9 encoders kind of suck).

ClumsyPilot · on July 7, 2020

HEVC encoder on 20 series nvidia cards is actually really good

fomine3 · on July 8, 2020

Ice Lake's also great

gyan · on July 6, 2020

There is no ffmpeg av1 library i.e. no native decoders or encoders. ffmpeg has wrappers for libaom and dav1d/rav1e. 3rd party scripts also add wrappers for svt-av1.

zionic · on July 6, 2020

Isn't that largely dependent on hardware acceleration from CPU manufacturers? Or is ffmpeg always software encoding?

oefrha · on July 6, 2020

FFmpeg isn’t “always software encoding”, that statement doesn’t make much sense since FFmpeg/libavcodec is more of an interface and you can add support for any external encoder/decoder, hardware accelerated or not. However, FFmpeg’s builtin encoders and the most popular external encoders including x264 and x265 are all software encoders. There are hardware accelerated encoders from GPU vendors, e.g. the nvenc encoder for H.264 and H.265 is available for use on Nvidia GPUs if your FFmpeg is compiled against CUDA SDK. It’s a lot faster than x264 and x265 on comparable settings but results are usually a bit worse.

You were probably thinking about hardware decoding though.

donatj · on July 6, 2020

I'm curious why hardware encoding is generally worse? All my experiments with it (h264/h265) have lead to significantly lower quality output to the point that I've avoided it for any final outputs, but I always assumed I was doing something wrong.

tomku · on July 6, 2020

Hardware encoding cares about consistent real-time output above all else, with power usage as the runner-up. It can focus on that because software encoders are just flat-out better for use cases involving quality - they have access to more flexible/accurate math, they don't necessarily have a real-time/low-latency constraint, they can be updated when new encoding tricks are discovered, they can be massively more complex without ballooning the cost of a device, etc.

They're complementary options rather than competing. Each does something well, and it sounds like you want the thing that software encoding does.

oefrha · on July 6, 2020

I’ve written various applications on top of FFmpeg but never looked into encoder technicalities, so I don’t know if and why GPU encoding has fundamental limitations. I heard nvenc has vastly improved on RTX cards; I’m still rocking an old GPU so can’t verify that, but presumably that means there’s no fundamental weakness and GPU encoding is just playing catch up?

sudosysgen · on July 6, 2020

ffmpeg supports CPU, CPU-Accelerated and GPU-accelerated encoding and decoding.

hirako2000 · on July 6, 2020

Just found av1 is about 20 to 30% more efficient than h265. I guess there was no reason to use patented algs, but h265 is now significantly more efficient than av1.

I would still take freedom over patented software

ch_sm · on July 6, 2020

> Just found av1 is about 20 to 30% more efficient than h265

> […] but h265 is now significantly more efficient than av1.

What did you mean?

smiley1437 · on July 6, 2020

Not OP but I think he meant "h266 is now significantly more efficient than av1"

zionic · on July 6, 2020

AV1 beats h.265 by 20-30%, h.266 beats h.265 by 50%. Honestly for that little of an improvement I'll go with AV1.

zanny · on July 6, 2020

The audience of h266 and whom will "win" the codec war isn't individual consumers, even powerusers that know what a codec is.

H266 will be adopted by broadcasting and archival and will make mpeg tankers of money. Whatever the next generation of physical home media is after blu-ray will use it, the player for it will read it, and your TV cable box will take h266 signals in to decode. The costs of paying mpeg will be in the cost of the discs, the cost of the cable package, etc.

The real win we should... hope? For is that h266 never sees a personal computer hardware decoder from Qualcomm / Samsung / Intel /AMD / Nvidia / etc. If online video is exclusively distributed with AV1 then none of these companies need touch the festering MPEG patent hell and consumers avoid that parasite leeching money out of their computer purchases.

Because the cable box and physical media player are dying. You can generally opt out of them and avoid filling the MPEG coffers with software patent money. And the big web companies that have the power to dictate what computers are using for the next decade and beyond are all way favoring AV1 with the exception of Apple.

fomine3 · on July 8, 2020

I expect H.266 patent fee is not so much expensive like you worried.

freeone3000 · on July 6, 2020

Really? Because by that metric, H.266 is as far ahead of AV1 as AV1 is ahead of h.265.

mda · on July 6, 2020

well depends, if x is 30% better than y and z is 50% better than y then z is ~15% better than x.

Also this kind of performance claims are 100% hot air. Real world benchmarks talk.

mekster · on July 6, 2020

If that kills h.26x why not?

prvc · on July 6, 2020

So they themselves claim. Will wait for empirical tests with available codecs to really evaluate this. Also note that for a given bitstream specification, there will be "room to grow" for encoder implementations (AV1 is designed to be flexible in this way). I am not aware of any precise method for evaluating how much room exactly there theoretically could be ex ante.

sly010 · on July 6, 2020

second h265 was probably meant to be h266

MayeulC · on July 6, 2020

I wonder what would happen if ffmpeg made the choice to not implement a decoder for it. Or maybe just not a encoder?

Can we agree not to work on such projects? I feel that the lack of a good open-source encoder/decoder would spell the death of most codecs nowadays. That would also teach Fraunhoffer about it.

Of course, everyone is free to scratch their itch. And the bigger the void, the more itchy it gets. Luckily, we still have AV1.

hirako2000 · on July 6, 2020

How does x264 etc get away with it?

p_l · on July 6, 2020

They are compliant with MPEG patent licenses - they distribute source code you have to compile yourself and for non-commercial use.

If you build x264 or other open source implementation of h.264/h.265, and embed it for example in commercial video conferencing software/appliance, you have to pay patent licensing fees for that product.

It's also why Firefox downloads a blob from Cisco to handle MPEG-4 video - Cisco covers the licensing for distribution et al.

jiggunjer · on July 6, 2020

How much does Cisco pay (to MPEG?) for that?

detaro · on July 6, 2020

If I understand correctly, around 10 million USD per year (according to https://www.mpegla.com/wp-content/uploads/avcweb.pdf that's the cap, and from what I've heard Cisco is selling enough actual products to hit that cap, so providing their software for free to everyone else doesn't cost them any extra in licensing fees, just hosting and such)

gibolt · on July 6, 2020

That is such a nice service. Benefits everyone and near-zero cost for them.

mekster · on July 6, 2020

Why is Cisco being nice about it? (Besides why not?)

toast0 · on July 6, 2020

I don't recall the specific timing of the release, so this might not line up, and I have no inside knowledge, just public information.

Cisco has some products which use compressed video in a browser setting. It would be useful if all browsers supported a good codec. Individually downloaded codec plugins suck, because installing is iffy. Therefore, give something away which doesn't cost licensing money to make your existing licensed products more usable.

And get some good feels on the interwebs.

brtech99 · on July 7, 2020

Because several years ago, there was a fight over mandatory to implement video codecs in WebRTC. It was VP8 vs H.264. The biggest thing VP8 had going for it was no royalty payments. Cisco wanted H.264 because all of their devices supported H.264 and none supported VP8 and they already paid the royalties. So Jonathan Rosenberg, then CTO of the division of Cisco that managed this part of the business arranged to have Cisco cover the royalty payments for anyone implementing the WebRTC standards.

That wasn't enough, and WebRTC requires both VP8 and H.264 as MTI codecs.

eqvinox · on July 6, 2020

Because it forces their competitors (in the video conferencing business) to take similar costs.

moonchild · on July 6, 2020

Could firefox download x264 and compile it on-demand?

ksec · on July 6, 2020

x264 is for encoding only.

clouddrover · on July 6, 2020

They don't get away with it. The are two separate issues: the software copyright license and the H.264 patent license. x264 itself is licensed under the GPL:

http://www.videolan.org/developers/x264.html

But if you use an H.264 encoder or decoder in a country that recognizes software patents then you need to buy a patent license if your usage comes under the terms of the license:

https://www.mpegla.com/programs/avc-h-264/

realusername · on July 6, 2020

Software patents are not valid everywhere, x264 is brought up by VideoLAN in France, where software patents don't apply like in the rest of the EU.

pault · on July 6, 2020

So if a company in france writes open source software that infringes on a US software patent, and a company in the US bundles that source code in a product, is the US company liable for damages?

pas · on July 6, 2020

Yes. The company distributing the work has to make sure it has licenses for any required parts.

realusername · on July 7, 2020

Yes, that's exactly how it works, the US company is responsible for following the US laws.

jacobush · on July 6, 2020

They sort of half not apply in Sweden too.

anticensor · on July 6, 2020

What do you mean by half valid?

jacobush · on July 7, 2020

Software patents are valid only if part of a larger invention, the algorithm itself cannot be patented.

However, patent clerks have also from time to time registered algorithmic patents.

DrBazza · on July 6, 2020

Naively hoped I'd read 'this will be released to the community under a GPL license' or similar. Instead found the words 'patent' and 'transparent licensing model'.

I appreciate that it costs money and time to develop these algorithms, but when you're backed by multi-billion dollar "partners from industry including Apple, Ericsson, Intel, Huawei, Microsoft, Qualcomm, and Sony" perhaps they could swallow the costs? It is 2020 after all.

balls187 · on July 6, 2020

That's Fraunhofer for you.

In the early days of MP3, all MP3 rippers and players were built off of their implementation.

Hardware and software companies had to license in order to play MP3 files. As such there was not native support for MP3's for quite some time.

In the late 90's right around the explosion of MP3's on the internet, Fraunhofer was going after companies for doing so.

In my humble opinion, that license mess set back innovation in the portable audio space by a good 5 years.

grishka · on July 6, 2020

> In my humble opinion, that license mess set back innovation in the portable audio space by a good 5 years.

Seeing all this, I'm convinced that copyright in general and patent system in particular does more harm than good by slowing down the technical progress of the humanity as a whole for the sake of some already rich people becoming a bit richer.

The initial idea behind patent system was sensible, but the way it's abused now... I mean, it could work in today's world as intended if patents lasted a year or two, not what is effectively eternity.

trentnix · on July 6, 2020

Seeing all this, I'm convinced that copyright in general and patent system in particular does more harm than good by slowing down the technical progress of the humanity as a whole for the sake of some already rich people becoming a bit richer

There are plenty of societies that don't respect intellectual property and copyright. And those societies don't innovate at the rate as those who do.

There are certainly abuses in the copyright, trademark, and patent systems. But throwing out the baby with the bathwater is not the answer. Identifying the abuses and improving the system is the answer.

Perhaps that progress won't happen at the rate you'd prefer, but it's significantly better than burning the whole thing to the ground.

m12k · on July 6, 2020

> There are plenty of societies that don't respect intellectual property and copyright. And those societies don't innovate at the rate as those who do.

That's not necessarily so - for example, during the industrial revolution, Germany largely overtook the British in mechanical engineering skill during a period where the British had copyright but before the Germans got it: https://www.wired.com/2010/08/copyright-germany-britain/

Also, it's quite possible the causation goes the other way: A society might not succeed at innovation because of IP laws - it's just as possible that because a society has succeeded at innovation, it passes IP legislation, in order to 'kick away the ladder'. But just like regulatory capture shows in general, legislation that helps yesterday's winners seek rent is not necessarily the same (and is often in fact the opposite) of legislation to help tomorrow's winners see the light of day.

cure · on July 6, 2020

> There are certainly abuses in the copyright, trademark, and patent systems. But throwing out the baby with the bathwater is not the answer. Identifying the abuses and improving the system is the answer.

That sounds nice and reasonable, in theory.

In practice, all the money is behind expanding the copyright and patent systems. When is the last time the duration of copyright terms was shortened? When is the last time the patent system was adapted to be less draconian and less protective of those poor, poor multinational corporations that somehow end up holding all those patents?

Spoiler alert - that has happened exactly never. Instead, all we get is 'harmonization' which always means extending terms and giving those laws more teeth, to match the strictest law implemented anywhere in the world.

Steamboat Willy's copyright expires January 1st, 2024. How much do you want to bet that Disney will be pushing for another copyright term extension before then?

> Perhaps that progress won't happen at the rate you'd prefer, but it's significantly better than burning the whole thing to the ground.

The current systems only ever get stricter. Where are the much shorter terms? Where is the recognition that cooperation and remixing fosters innovation and progress and as such should be encouraged, not punished? Where is the PTO following the actual law that says math and logic (i.e. software) are not patentable? Etc, etc, etc.

The abuses have been very well documented over the past several decades. There has been zero progress on incremental improvements. Tell me again how you propose we improve the system without a drastic overhaul?

balls187 · on July 6, 2020

> Perhaps that progress won't happen at the rate you'd prefer, but it's significantly better than burning the whole thing to the ground.

Changing patent length (pc's suggestion) is hardly seems like burning the whole thing to the ground.

In the US, if I understand correctly, there are ways to defend against patent abuse, though it typically involves costly legal fees. Fees many cannot justify in spending.

Addressing patent challenges may be a way to protect innovators, both those who should gain reward for their innovations, and those who seek to grain reward by building upon innovations.

reaperducer · on July 6, 2020

In the US, if I understand correctly, there are ways to defend against patent abuse

That's the key: In the U.S. Good luck stopping someone infringing on your patient in a good number of other countries.

zucker42 · on July 6, 2020

> There are plenty of societies that don't respect intellectual property and copyright. And those societies don't innovate at the rate as those who do.

For one, this claim suffers from a correlation/causation issue. But also, do you have an actual citation for research which shows this is true?

waterhouse · on July 6, 2020

Chapter 8 [1] of Against Intellectual Monopoly, "Does Intellectual Monopoly Increase Innovation?" looks into this, including comparing countries with stronger patent laws vs weaker laws. The results seem pretty decidedly mixed.

[1] http://www.dklevine.com/papers/ip.ch.8.m1004.pdf

dylan604 · on July 6, 2020

> There are plenty of societies that don't respect intellectual property and copyright. And those societies don't innovate at the rate as those who do.

What it does do though is allow someone to start from zero, and catch up to the rest of the competition very fast and cheap. They can then offer their "product" cheaper in order to gain market share. As long as they are getting/keeping customers, there's no need to innovate. You can have a viable business without spending tons of cash on R&D, and if you're making money that way, who cares?

*I am no way endorsing this kind of business model, but it exists and does well.