Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
H.266/Versatile Video Coding (VVC) (fraunhofer.de)
463 points by caution on July 6, 2020 | hide | past | favorite | 426 comments


It's interesting that they are able to continue improving video compression. You'd think that it would have all been figured out by now.

Is this continued improvement related to the improvement of technology? Or just coincidental?

Like, why couldn't have H.266 been invented 30 years ago? Is it because the computers back in the day wouldn't have been fast enough to realistically use it?

Do we have algorithms today that can compress way better but would be too slow to encode/decode?


Video compression is a calculus of IO capacity, memory, and algorithmic complexity. Take the MPEG-1 codec for instance, it was new about 30 years ago. While today most people think of MPEG-1 videos as low quality the spec provides the ability to handle bit rates up to 100Mb/s and resolutions up to 4095x4095. That was way higher than the hardware of the time supported.

One of MPEG-1's design goals was to get VHS-quality video at a bitrate that could stream over T1/E1 lines or 1x CD-ROMs. The limit on bitrate led to increased algorithmic complexity. It was well into the Pentium/PowerPC era until desktop systems could play back VCD quality MPEG-1 video in software.

Later MPEG codecs increased their algorithmic complexity to squeeze better quality video into low bit rates. A lot of those features existed on paper 20-30 years ago but weren't practical on hardware of the time, even custom ASICs. Even within a spec features are bound to profiles so a file/stream can be handled by less capable decoders/hardware.

There's plenty of video codecs or settings for them that can choke modern hardware. It also depends on what you mean by "modern hardware". There's codecs/configurations a Threadripper with 64GB of RAM in a mains powered jet engine sounding desktop could handle in software that would kill a Snapdragon with 6GB of RAM in a phone. There's also codecs/configurations the Snapdragon in the phone could play using hardware acceleration that would choke a low powered Celeron or Atom decoding in software.


Are there codecs that require high compute (Threadripper) for encode but can be easily decoded on a Snapdragon ?


Yes — many codecs can be optimized for decoding at the expense of encoding. This is appropriate for any sort of broadcast (YouTube, television, etc).

Also, in many applications, it’s suitable to exchange time for memory / compute. You can spend an hour of compute time optimally encoding a 20-minute YouTube video, with no real downside.

Neither of these approaches are suitable for things like video conferencing, where there is a small number of receivers for each encoded stream and latency is critical. At 60fps, you have less than 17ms to encode each frame.

Interestingly, for a while, real-time encoders were going in a massively parallel direction, in which an ASIC chopped up a frame and encoded different regions in parallel. This was a useful optimization for a while, but now, common GPUs can handle encoding an entire 1080p frame (and sometimes even 4K) within that 17ms budget. Encoding the whole frame at once is way simpler from an engineering standpoint, and you can get better compression and / or fewer artifacts since the algorithm can take into account all the frame data rather than just chopped up bits.


Surely videoconferencing doesn’t actually use 60 FPS...


Why not? It's not full-motion video, it's literally talking heads. Talking heads are easy to push to 60fps on a relatively low bitrate.


Some web conferencing would want to do 60fps. There's also realtime streaming like Twitch, PS Now, and Google's Stadia.


Twitch isn't real time.


Yes it is. The delay on a Twitch stream doesn't mean they don't have to deal with encoding and transmitting frames at full speed. If Twitch wasn't real time, you'd only be able to watch live streams slowed down!


That's a different definition than most people mean when they say "real-time".


Not me. All realtime systems have some latency, but what makes them realtime is that they must maintain throughput, processing data as quickly as it comes in. You can subdivide to hard-realtime and soft-realtime depending on how strict your latency requirements are, but it is still realtime.


really? I think not. Lets use speech synthesis as an example: I would call speech synthesis real time if it takes less than one second to produce one second of synthesized speech. I think you're probably thinking of the word "live". There's always going to be a small delay when re-encoding. Real time doesn't mean 0ms delay, that's impossible. Twitch has a small delay, but it's still re-encoded in real time (encoding 1 second takes ≤ 1 second for twitch).


The delay (configurable) isn't due to the encoder. And the encoder has to process everything in real time, otherwise you start skipping frames or fall behind.


Pretty much all of them. Encode complexity for most codecs is way higher than decode complexity (on purpose).

This has been an issue with AV1, it's got relatively high decode complexity and there's not a lot of hardware acceleration available. The encode complexity is fantastic though and is very slow even on very powerful hardware, less than 1fps so ~30 hours to encode a one hour video. Even Intel's highly optimized AV1 encoder can't break 10fps (three hours to encode an hour of video) while their h.265 encoder can hit 300fps on the same hardware.


A lot of video codecs are NP hard to encode optimally, so rely on heuristics. So you could certainly say that some approaches take a lot of compute power to encode, but are much more easily decodable.


The codecs aren't NP hard. Rather, the "perfect" encode is. That's where the heuristics are coming into play. The codec just specifies what the stream can look like, the encoders have to pick out how to write that language and the decoders how to read it.

Decoders are relatively simple book keepers/transformers. Encoders are complex systems with tons of heuristics.

This is also why hardware decoders tend to be in everything and are relatively cheap with equal quality to software counterparts. On the flip side, hardware encoders are almost always worse than their software counterparts when it comes to the quality of the output (while being significantly faster).


> The codecs aren't NP hard. Rather, the "perfect" encode is.

That's what I meant by my first sentence.

And I'll throw out there that the vast majority of 'hardware codecs' are in fact software codecs running on a pretty general purpose DSP. You could absolutely reach the same quality as a high quality encoder given the right organizational impetus of the manufacturer; they simply are focused on reaching a specific real time bitrate for resolution rather than overall quality. By the time they've hit that, there's a new SoC with it's own DSPs and it's own Jira cards that needs attention. If these cores were more open, I'm sure you'd see less real time focused encoder software targeting them as well.


I wonder why all of the MPEG1 encoders of the day enforced a maximum of 320x240?


While the spec allowed for outrageous settings playback wouldn't have been possible. Most hardware decoders were meant for (or derived from) VCD playback. The VCD spec covered CIF video which mean QVGA would fall into the supported macroblock rate for hardware decoders.

In MPEG-1's heyday there would haven't been a lot of point in encoding presets producing content common hardware decoders couldn't handle.

There were several other video codecs in the same era that didn't have hardware decode requirements. Cinepak was widely used and could be readily played on a 68030, 486, and even CD-ROM game consoles. As I recall Cinepak encoders had more knobs and dials since the output didn't need to hit a hardware decoder limitation.


PAL/NTSC resolutions.


Here is a hint:

> Because H.266/VVC was developed with ultra-high-resolution video content in mind, the new standard is particularly beneficial when streaming 4K or 8K videos on a flat screen TV.

Compressing video is very different from gzipping a file. It's more about human perception than algorithms, really. The question is "what data can we delete without people noticing?", and it makes sense that answer is different for an 8k video than a 480p video.


So compressing a 1080p video with H266 will not result in similar file size/quality improvements as a 4k video? How much are we looking at for 1080p, 10%?


yup, from the example I remember(I read it through link on HN but cannot find it in quick search, I wish I could link it),

if you film(still film no movement) the macbook pro top to bottom in h264/MP4 at 1024 p resolution and again you take a picture from your camera.

the results will be shocking,

the video of 5-10 seconds will have lower storage size than the size of single Image. but when you inspect video carefully you will see the tiny details are missing, like the edges are not as sharp for metal body, the gloss of metal is bit different, the tiny speaker holes on top of keyboard are clear in Images and can be individually examined while in video they are fuzzy pixelated and so on.

so, The end results: A 5 second video with 10s of frames per second is smaller in size than single Image taken from same camera.


You're thinking of "h264 is magic"

https://sidbala.com/h-264-is-magic/

GREAT article


yes,that is the article I was referring.


Let's also be clear, the still image will be the full resolution of the sensor. The video taken on the same camera is usually a cropped section of the sensor. You're also comparing a spatial compression (still image) vs a temporal compression (video), and at what compression levels are each image taken?


Additionally the images are not very compressed (edit: you mentioned that in your comment, sorry). While the RAW files can be a couple dozen megabytes and the losslessly compressed PNGs are still 5-15 mb, good cameras normally set the JPEG quality factor to a high amount and so even with a JPEG you're getting pretty close to a lossless image. Whereas in video you can often plainly notice the compression artifacts & softness. A more fair comparison would be the video file to a JPEG with equivalent visual quality.


see more details, https://sidbala.com/h-264-is-magic/

I know that's not fair comparison, but imaging clever compressions were not invented and you would download terabytes of data to view a small movie.


What question are you trying to answer? Nobody asked what is compression and why do we use it. I have been encoding videos since VideoCDs were a thing, so I have a pretty good understanding of how compression works. The fact that I differentiated between spatial and temporal compression should have been a clue. All I was pointing out was that compressing a postage stamp sized video and comparing its filesize to a large megapixel image isn't a fair comparison. (yes, I'm jaded by calling 1080p frame size a postage stamp. I work in 4K and 8K resolutions all day.)


>How much are we looking at for 1080p, 10%?

We don't know yet. There are no public technical details (that I know of) for H266 yet, but if I recall H265 gave the same 50% reduction in bandwidth claims, and for years people stuck with H264, because it was higher quality due to dropping off less subtle parts of the video you really want to see. Only in the last couple of years has H265 really started to become embraced and used by piracy groups. Frankly, I don't know what changed. I wouldn't be surprised if there was some sort of H265 feature addition that improved the codec.


H.265 was always better from a technical perspective but that's not everything which factors into a video codec decision. H.264 was supported everywhere, including hardware support on most platforms. You could generate one file and have it work with great experience everywhere, whereas switching to anything else likely required adding additional tools to your workflow and trying to balance bandwidth savings against both client compatibility and performance — if you guess wrong on the latter case, users notice their fans coming on / battery draining in the best case and frames dropping in the worst cases.

Encoder maturity is also a big factor: when H.265 first came out, people were comparing the very mature tools like x264 to the first software encoders which might have had bugs which could affect quality and were definitely less polished. It especially takes time for people to polish those tools and develop good settings for various content types.


> H.264 was supported everywhere

this. our TV (a few years old, "smart") can play videos from a network drive, but doesn't support H.265. reencoding a season's worth of episodes to H.264 takes a while...


H.264 will become the "mp3" of video I think. Universally supported, and the pstents will run out micj sooner than the newer formats.


I think you’re right - for a lot of people it was the first to hit the “good enough” threshold: going from MPEG 1/2 to the various Windows Media / Real / QuickTime codecs you saw very noticeable improvements in playback quality with each new release, especially in things like high motion scenes or with sharp borders.

That didn’t stop, of course, but I generally don’t notice the improvements if I’m not looking for them. Someone with a 4K or better home theater will 100% benefit from newer codecs’ many improvements on all those extra pixels but if you’re the other 95% of people watching on a phone or tablet, lower-end TV with the underpowered SoC the manufacturer could get for $15, etc. you probably won’t notice much difference and convenience will win out for years longer.


Reencoding will cumulate H264 artifacts with H265 artifacts (and psychovisual optimization for one with psychovisual optimization for the other). Unless you reencode from source don't do that.


i appreciate your concern :) but honestly, is that a practical problem, or a theoretical one? the end result was fine to watch in our case. (and upload itself wasn't of great quality anyway)


It can be a practical problem depending on the source: if you're moving from a relatively higher-resolution / less-compressed video it won't be noticeable but if you're starting from video which has already been compressed fairly aggressively it can be fairly noticeable.

One area where this can be important to remember is when comparing codecs: a fair number of people will make the mistake where they'll take a relatively heavily compressed video, recompress it with something else, and get a size reduction which is a lot more dramatic than what you'd get if you compared both codecs starting from a source video which has most of the original information.


Another thing to consider, exactly the same thing happened when h.264 was first released.

Even though x264 quickly started seeing better results compared to DivX and XVid, you didn't see pirate encodes switch to x264 for years.


Yes and x264 is such an improvement that it was worth it. HW support became nearly universal. DivX and Xvid didn't have that.

So it was kind of a magical codec upgrade and now it seems more incremental to me.


The scene switched to x264 almost immediately. It did not take years.


It was def hardware support that changes the piracy groups policies.


Having GPU or hardware support to speed up encoding can make a big difference in adoption.


H.265 is patent laden. H.264 is much better in avoiding getting sued.


That's patently wrong[1]

Money quote :

"H.264 is a newer video codec. The standard first came out in 2003, but continues to evolve. An automatically generated patent expiration list is available at H.264 Patent List based on the MPEG-LA patent list. The last expiration is US 7826532 on 29 nov 2027 ( note that 7835443 is divisional, but the automated program missed that). US 7826532 was first filed in 05 sep 2003 and has an impressive 1546 day extension. It will be a while before H.264 is patent free."

(emphasis mine)

[1] https://www.osnews.com/story/24954/us-patent-expiration-for-...


not an expert, but from what i understand is more like that they "extend" the codec with technique more effective on higher resolution content, or new "profile" (parameter) more effective for higher resolution content (a bit like you can have different parameter when you zip file).

This new techniques can also be used for 1080p video (for example) but with lower gain. Also the "old" algorithm/system are generally still used but they may be improved/extend.


>Like, why couldn't have H.266 been invented 30 years ago?

It is all a matter of trade offs and engineering.

For MPEG / H.26x codec, the committees start the project by asking or defining the encoding and decoding complexities. And if you only read Reddit or HN, most of the comment's world view would be Video codec are only for Internet Video and completely disregard other video delivering platform. Which all have their own trade off and limitations. There is also cost in decoding silicon die size and power usage. If more video are being consumed on Mobile and battery is a limitation, can you expect hardware decoding energy usage to be within the previous codec? Does it scale with adding more transistors, are there Amdahl's law somewhere. etc It is easy to just say adding more transistor, but ultimately there is a cost to hardware vendors.

Vast majority of the Internet seems to think most people working on MPEG Video Codec are patents trolls and idiots and paid little to no respect to its engineering. When as a matter of fact Video Codec are thousands of small tools within the spec, and pretty much insane amount of trial and error. It may not be as complicated as 3GPP / 5G level of complexity, but it is still lot of work. Getting something to compress better while doing it efficiently is hard. And as Moore's Law is slowing down. No one can continue to just throwing transistors at the problem.


I don't know much about H.266, but some of the advances in H.265 depended on players having enough RAM to hold a bunch of previous decoded frames, so they could be referred to by later compressed data. Newer codecs tend to have a lot more options for the encoder to tune, so they need a combination of faster CPUs and smarter heuristics to explore the space of possible encodings quickly.


I wonder if instead of heuristics, machine learning could be used to figure out the best parameters.


In a somewhat-related topic, you might be interested in DLSS [0], where machine learning is being used in graphics rendering in games to draw the games at a lower resolution, then upscale the image to the monitor's resolution using a neural network to fill in the missing data. I imagine a similar thing could be done with video rendering, though you'd need some crazy computing power to train the neural network for each video, just like DLSS requires training for each game.

[0] https://en.wikipedia.org/wiki/Deep_learning_super_sampling


That seems likely at least.

You could actually use ML for all of the video decoding, but that research is still in it's early stages. It has been done rather well with still images [1], so I'm sure it'll eventually be done with video too.

Those ML techniques are still a little slow and require large networks (the one in [1] decodes to PNG at 0.7 megapixels/s and its network is 726MB) so more optimizations will be needed before they can see any real-world use.

[1] https://hific.github.io/ HN thread: https://news.ycombinator.com/item?id=23652753


That's already done today. Most modern codecs support variable bitrate encoding so more of the data budget can be given to high complexity scenes. Source video can also be processed multiple times with varied parameters and that output then compared structurally to the source to find the parameters that best encode the scene. This is beyond the more typical multi pass encoding where a first pass over the source just provides some encoding hints for the second pass. It takes several times longer to encode (though is embarrassingly parallel) but the output ends up higher quality for a given data rate than a more naïve approach.


Any apps you can recommend for giving this a spin?


x264 with 2-pass encoding? "Machine learning" doesn't only mean deep neural networks.


In the limit case, compression and AI are identical.

Once you get to an AI that has full comprehension of what humans perceive to be reality, you can just give them a rough outline of a story, add some information on casting, writers, and Spielberg's mood during production, and they'll fill in the (rather large) blanks.

That's a bit exaggerated, but I remember reading about one such algorithm a few days ago (by Netflix, maybe?). It was image compression that had internal representations such as "there is an oak tree on the left".

It would then run the "decompression", find the differences to the original, and add further hints where neccessary.


Sure. Machine Learning is just "heuristics we don't understand."


Or more typically Monte Carlo heuristics.


Reinforcement learning uses Monte Carlo a lot but traditional machine learning or deep learning don't.


I was hoping H266 was going to be a neural network based approach, but it looks like that might end up being H267.

Right now neural networks allow for higher compression for tailored content, so you need to ship a decoder with the video, or have several categories of decoders. The future is untold and it might end up not being done this way.


I think the number of previous frames for typical settings went from about 4 to about 6 as we went from H.264 to H.265. And the actual max in H.264 was 16. So that doesn't seem like a huge factor.


Computers wouldn't have been fast enough. Moore's law is a hell of a drug.

In the mid '90s, PCs often weren't fast enough to decode DVDs, which were typically 720x480 24FPS MPEG2. DVD drives were often shipped with accelerator cards that decoded MPEG2 in hardware. I had one. My netbook is many orders of magnitude faster than my old Pentium Pro. But it's not fast enough to decode 1080p 30fps H.265 or VP9 in software. It must decode VP9/H.265 on the GPU or not at all. MPEG2 is trivial to decode by comparison. I would expect a typical desktop PC of the mid '90s to take seconds to decode a frame of H.265, if it even had enough RAM to be able to do it at all.

It's an engineering tradeoff between compression efficiency of the codec and the price of the hardware which is required to execute it. If a chip which is capable of decoding the old standard costs $8, and a chip which is capable of decoding the new standard costs $9, sure, the new standard will get lots of adoption. But if a chip which is capable of decoding the new standard costs $90, lots of vendors will balk.


Indeed. The brand-new fancy Blue & White Power Mac G3's from early 1999 were the first Macs that shipped with a DVD drive, and they could play video DVD's but they had an obvious (and strange) additional decoder mezzanine card on the already unusual Rage128 PCI video card.

By the end of that year the G4 Power Macs were just barely fast enough to play DVD's with software decoding and assistance from the PCI or later AGP video card. And after a while (perhaps ~ 2002?), even the Blue G3's could do it in software even if you got a different video card, as long as you also upgraded to a G4 CPU (they were all in ZIF sockets).

It was very taxing on computers at y2k!

Later autumn 2000 G3 iMacs could also play DVD's but I think they needed special help from a video co-processor.


From what I've heard (would love to hear more expertise on this), it's incredibly hard to invent a new video compression algorithm without breaking an existing set of patents, and there's also no easy way to even know whether you're breaking anything as you develop the algo. Thus the situation we're in is not that it's too hard to develop better codecs, but that you've very disincentivized to do so.


Which then begs the question - why are video compression standards developed in the US at all? MPEG is obviously US based but Xiph is also a US nonprofit. The software patents should be hugely crippling the ability for Americans to develop competitive video codecs when every other nation doesn't have such nonsense. Why hasn't Europe invested in and developed better codecs that combine the techniques impossible to mix in the states?

Is it just basically the same mechanism that leads to so much drug development happening in the US despite how backwards its medical system is, because those regressive institutions create profit incentives not available elsewhere (to develop drugs or video codecs for profit) and thus the US already has capitalists throwing money at what could be profitable whereas everyone else would look at it as an investment cost for basically research infrastructure.


MPEG is not US-based.

https://en.wikipedia.org/wiki/Moving_Picture_Experts_Group

The article we are all commenting on is by a German research organization that has been a major contributor to video coding standards.

Perhaps you're confused by the patent issue? European companies are happy to file for US patents and collect the money.


This sounds like a somewhat obvious way to side-step the patent mechanism, so I would assume patents prevent this kind of a thing, when you develop patent-breaking technology abroad and then "just use" it wherever you want. You're probably not allowed to use the patented technology in any of the products you're building.


It's about the assumptions made during the standardization.

Compared to 30 years ago, we now have better knowledge and statistics about what low level primitives are useful in a codec.

E.g. jpeg operates on fixed 8×8 blocks independently, which makes it less efficient for very large images than a codec with variable block size. But variable block size adds overhead for very small images.

An other reason can be common hardware. As hardware evolves, different hardware accelerated encoding/decoding techniques become feasible that gets folded into the new standards.


Something that I learned about 10 years ago when bandwidth was still expensive is that you can make a very large version of an image, set the jpeg compression to a ridiculously high value, then scale the image down in the browser. The artifacts aren't as noticeable when the image is scaled down, and the file size is actually smaller than what it would be if you encoded a smaller image with less compression.


This “huge JPEG at low quality” technique has been widely known for years. But it is typically avoided by larger sites and CDNs, as it requires a lot more memory and processing on the client.

Depending on the client or the number of images on the site the huge JPEG could be a crippling performance issue, or even a “site doesn’t work at all” issue.


Interesting. I've never heard of this, but it makes some sense: the point of lossy image compression is to provide a better quality/size ratio than downscaling.


> But variable block size adds overhead for very small images.

What kind of overhead?

The extra code shouldn't make a big difference if nothing is running it.

And space-wise, it should only cost a few bits to say that the entire image is using the smallest block size.

Is the worry just about complicating hardware decoders?


I still remember when my pc would take 24 hours to compress a dvd to a high quality h264 mkv, sure you could squeeze it down with fast presets in handbrake but the point was transparency. Now I'm sure for most normal pc's the time to compress at the same quality with h.265 is the same 24 hours, in 4k, even longer, I'm sure h266 would take more than twice as long easily.

Early pc's had separate and very expensive mpeg decode boards just to decode dvd, creative sold a set, the cpu simply couldn't even handle mpeg 2. I know its hard to believe but there was a time when playing back an mp3 was a big ask, all these algorithms could be made long ago, but they would have been impractical fantasy. Only now are we seeing real partial cheated resolution ray tracing in modern high end gaming hardware now which is a good comparison, ray tracing has been with us for a long time, only hardware advancement over decades has made it viable.

It amused me that they claimed 4k uhd h265 is now 10GB for a movie, that's garbage bitrate, they always ask too much of these codecs.


> I know its hard to believe but there was a time when playing back an mp3 was a big ask

can confirm. audio playback would stutter on my 486dx if one dared to multitask.


Good compression is quite complex and can go wrong in an unimaginable variety of ways. Remember when Xerox copiers would copy numbers incorrectly due to compression? The numbers would look clear, they just wouldn't always be the same numbers that you started with.

https://www.theregister.com/2013/08/06/xerox_copier_flaw_mea...


Xerox problem stemmed from simple replacement of "recognized" numbers with learned dictionary. Good implementation would use learned alphabet atlas as a supplement, encoding difference between guess and source image. That way even predicted 0 instead of 8 wouldnt be catastrophic with encoder filling in missing detail.


> Like, why couldn't have H.266 been invented 30 years ago? Is it because the computers back in the day wouldn't have been fast enough to realistically use it?

Here's something to consider:

In 1995, a typical video stream was 640 x 480 x 24fps. That's 7,372,800 pixels per second.

In 2020, we have some content that's 7680 x 4320 x 120fps. That's 3,981,312,000 pixels per second, or a 540 fold increase in 25 years.

The massive increase in image size actually makes it easier to use high compression ratios. I found this out the hard way recently, when I was trying to compress and email a powerpoint presentation that a coworker had presented on video. In a nutshell, the powerpoint doc with it's sharp edges and it's low resolution made it difficult to compress.

Increase framerate plays a factor too; due to decades of research on motion interpolation, algorithms have become quite good at guessing what content can be eliminated from a moving stream.


Compression is AI. It’s never going to be “all” figured out.


Another way if saying it is that compression is understanding.


Lossy compression is, I feel compelled to add.


Actually both! Arithmetic coding works over any kind of predictor.


End-credits are just text. So it should be possible to put it through OCR and save only text, positions, and fonts. And the text is also possible to compress with a dictionary.


Credits also contain logos/symbols (near the end), and often have stylistic flairs as well. Video compression is based on making predictions and then adding information (per Shannon's definition) for the deltas from those predictions. The pattern of credits statically sliding at a consistent rate is exactly the sort of prediction codecs are optimized for; for instance, the same algorithms will save space by predicting repeated pixel patterns during a slow camera pan.

Still, I've often thought it would be nice if text were a more first-class citizen within video codecs. I think it's more a toolchain/workflow problem than a shortcoming in video compression technology as such. Whoever is mastering a Blu-Ray or prepping a Hollywood film for Netflix is usually not the same person cutting and assembling the original content. For innumerable reasons (access to raw sources, low return on time spent, chicken-egg playback compatibility), it just doesn't make sense to (for instance) extract the burned-in stylized subtitles and bake them into the codec as text+font data, as opposed to just merging them into the film as pixels and calling it a day.

Fun fact: nearly every Pixar Blu-Ray is split into multiple forking playback paths for different languages, such that if you watch it in French, any scenes with diegetic text (newspapers, signs on buildings) are re-rendered in French. Obviously that's hugely inefficient; yet at 50GB, there's storage to spare, so why not? The end result is a nice touch and a seamless experience.


Text with video is difficult to do correctly for a few different reasons. Just rendering text well is a complicated task that's often done poorly. Allowing arbitrary text styling leads to more complexity. However for the sake of accessibility (and/or regulations) you need some level of styling ability.

This is all besides complexity like video/audio content synced text or handling multiple simultaneous speakers. Even that is besides workflow/tooling issues that you mentioned.

The MPEG-4 spec kind of punted on text and supports fairly basic timed text subtitles. Text essentially has timestamp where it appears and a duration. There's minimal ability to style the text and there's limits on the availability of fonts though it does allow for Unicode so most languages are covered. It's possible to do tricks where you style words at time stamps to give a karaoke effect or identify speakers but that's all on the creation side and is very tricky.

The Matroska spec has a lot more robust support for text but it's more of just preserving the original subtitle/text encoding in the file and letting the player software figure out what to do with that particular format and then displaying it as an overlay on the video.

It's unfortunate text doesn't get more first class love from multimedia specs. There's a lot that could be done, titles and credits as you mention, but also better integration of descriptive or reference text or hyperlink-able anchors.


MPEG 4 (taken as a the whole body of standards, not as two particular video codecs) actually has provisions for text content, vector video layers and even rudimentary 3D objects. On the other hand I'm almost sure that there are no practical implementations of any of that.


Oh, and that's only the beginning. The MPEG-4 standard also includes some pretty wacky kitchen-sink features like animated human faces and bodies (defined in MPEG-4 part 2 as "FBA objects"), and an XML format for representing musical notation (MPEG-4 part 23, SMR).


Don't forget Java bytecode tracks!


Scene releases often had optimized compression settings for credits (low keyframes, b&w, aggressive motion compensation, etc.)


The text, positions and fonts could very well take up more space than the compressed video. And then with fonts, you have licensing issues as well.


Recognizing text and using it to increase compression ratios is possible. I believe that's what this 1974 paper is about:

https://www.semanticscholar.org/paper/A-Means-for-Achieving-...


True, but end credits take very little space compared to the rest of the movie.


x264 is kinda absurdly good at compressing screencasts, even a nearly lossless 1440p screencast will only have about 1 Mbit/s on average. The only artifacts I can see are due to 4:2:0 chroma subsampling (i.e. color bleed on single-pixel borders and such), but that has nothing to do with the encoder, and would almost certainly not happen in 4:4:4, which is supported by essentially nothing as far as distribution goes.


Why not use deep learning to recognize actor face patterns in scenes and build entire movies from AI models?


I'm not super strong on theory, but if I'm not mistaken, doesn't Kolmogorov complexity (https://en.wikipedia.org/wiki/Kolmogorov_complexity) say we can't even know if it is all figured out?

The way I understand it is that one way to compress a document would be to store a computer program and, at the decompression stage, interpret the program so that running it outputs the original data.

So suppose you have a program of some size that produces the correct output, and you want to know if a smaller-sized program can also. You examine one of the possible smaller-sized programs, and you observe that it is running a long time. Is it going to halt, or is it going to produce the desired output? To answer that (generally), you have to solve the halting problem.

(This applies to lossless compression, but maybe the idea could be extended to lossy as well.)


I really ain't a theorist either, but:

If you are looking at Kolmogorov complexity you are right, we can't ever know. But Kolmogorov complexity is about single points in the space of possible outputs. It basically says "there might be possible outputs that do look random, but are actually produced by a very short encoding". One example would be the digits of pi.

But if you look at the overall statistics of possible output streams, and at their averages, there is a lower bound for compression on average. As soon as the bitlength in the compressed stream matches the entropy in the uncompressed stream in bits, you reached maximum compression. There will be some streams that don't conform to those statistics, but their averages will.

However, we are somewhat far away from matched entropy equilibrium for video compression. And even then, improvements can be made, not in compression ratio but in time, ops and energy needed for de/encoding.


> It's interesting that they are able to continue improving video compression. You'd think that it would have all been figured out by now.

Would you? AV1 was only officially released 2 years ago, h.265 7, h.264 14, …


Ten-year software video compression engineer here:

TL;DR: it's partly because we're using higher video resolutions. A non-negligible part of the improvement stems from adapting existing algorithms to the now-doubled-resolution.

Almost all video compression standards split the input frame into fixed-size square blocks, aka "macroblocks". To put it simply, the macroblock is the coarsest granularity level at which compression happens.

- H.264 and MPEG-2 Video use 16x16 macroblocks (ignoring MBAFF).

- H.265 use configurable quad-tree-like macroblocks, with a frame-level configurable size up to 64x64.

- AV1 makes this block-size configurable up to 128x128.

Which means:

Compression to H.264 a SD video (720x576, used by DVDs) results in 1620 macroblocks/frame.

Compressing to H.265 a HD video (1920x1080) results in at least 506 macroblocks/frame.

Compressing to AV1 a 4K video (3840x2160) results in at least 506 macroblocks/frame.

But compressing to H.264 a 4K video (3840x2160) will result in 32400 macroblocks/frame.

The problem is, there are constant bitcosts per-macroblock ((mostly) regardless of the input picture). So using H.264 to compress 4K video will be inefficient.

When you take an old compression standard to encode recent-resolution content, you're using the compression standard outside of the resolution domain for which it was optimized.

> Is this continued improvement related to the improvement of technology? Or just coincidental?

Of course, there also "real" improvements (in the sense "qualitative improvements that would have benefited to compression old video resolutions, if only we had invented them sooner").

For example:

- the context-adaptive arithmetic coding from H.264, which is a net improvement over classic variable-length huffman coding used by MPEG-2 (and H.264 baseline profile).

- the entropy coding used by AV1, which is a net improvement over H.264's CABAC.

- integer DCT (introduced by H.264), which allow bit-accuracy checking and way lot easier and smaller hardware implementations (compared to floating point DCT that is used by MPEG2).

- loop filters: H.264 pioneered the idea of a normative post-processing step, whose output could be used to predict next frames. H.264 had 1 loop filter ("deblocking"). HEVC had 2 loop filters: "deblocking" and "SAO". AV1 has 4 loop filters.

All of these a real improvements, brought to us by time, and extremely clever and dedicated people. However, the compression gains of these improvements are nowhere near the "50% less bitrate" that is used to sell each new advanced-high-efficiency-versatile-nextgen video codec. Without increasing - a lot - the frame resolution, selling a new video compression standard will be a lot harder.

Besides, now that the resolutions seems to have settled up around 4K/8K (and that "high definition" has become the lowest resolution we might have to deal with :D), things are going to get interesting ... provided that we don't start playing the same game with framerates!


I hope next target is VR optimized codec.


H.266 VVC includes tools specifically for VR use cases like doing a motion vector wrap around at the boundaries of 360 equirectangular video or better support for independently coded tiles (subpictures in VVC lingo) which are used in viewport-dependent streaming of 360 content.


"You'd think that it would have all been figured out by now."

Would you? Video compression is one of the few things that we will work on for the next 1000 years and still be nowhere near finished. The best video compression would be to know the state of the universe at the big bang, have a timestamp of the beginning and end of your clip and spatial coordinates defining your viewport. Then some futuristic quantum computer would just simulate the content of your clip...

So yeah, sure we are done with video compression :). This is of course an extreme example of constant time compression that may or may not be ever feasible (if we live in a computer simulation of an alien race, then it is already happening).

But the gist is the same. Video compression is mostly about inferring the world and computing movement not by storing the content of the image.

For instance by taking a snapshot of the world, decomposing it into geometric shapes (pretty much the opposite of 3D rendering) and then computing the next frames by morphing these shapes + some diff data that snaps these approximations back in line with the actual data.

We are all but in the very infancy of video compression. What should surprise you is why it takes us so long to get anywhere.


The way I read the release was that it's not a lossless compression, it reads like it's downscaling 4k+ video to a lower format with 'no perceptible loss of quality.' Since this is also seemingly targeted at mobile, I'm guessing the lack of perceptible loss of quality is a direct function of screen size and pixel density on a smaller mobile devices.

For me, this is another pointless advance in video technology. 720p or 1080p is fantastic video resolution, especially on a mobile phone. Less than 1% of the population cares or wants higher resolution.

What new technologies are doing now is re-setting the patent expiration clock. As long as new video tech comes out every 5-10 years, HW manufacturers get to sell new chips, phone manufacturers get to sell new phones, TV manufacturers get to sell new TVs, rinse, repeat.


> 720p or 1080p is fantastic video resolution, especially on a mobile phone.

720p is far from fantastic. It's noticeably blurry, even on mobile.

1080p is minimally acceptable, and is now over 10 years old.

> Less than 1% of the population cares or wants higher resolution.

That's a very bold claim. Have any studies or polls to back that up?


> 720p is far from fantastic. It's noticeably blurry, even on mobile. 1080p is minimally acceptable, and is now over 10 years old.

Not OP. I would say these are far-fetched claims that need defending. Most blurriness of mobile video comes from the low bitrate it's encoded at. Basically nobody is watching Bluray quality 720p or 1080p materials on a phone - and that's the problem, not the resolution.

My guess is that a typical middle or even middle-upper class family is going to have a TV that is less than 70 inches, and is 10+ feet from most viewers. Even with 20-20 vision, the full quality provided by 1080p is not even visible at that distance! (You'd need to go all the way up to 78 inches at 10 feet, or sit 7 feet from your 55 inch set to even get the full benefit from a 1080p set.) See this very helpful chart: http://s3.carltonbale.com/resolution_chart.html

Most of the benefit in 4k video comes from recent advances in HDR presentation and better codecs, not from the resolution. Sure, if you're a real stickler for quality, you might be sitting 6 feet from your 80 inch OLED set, and 4k is definitely for you in that case, but it's really not that important to the average person. In my case, I can barely distinguish between 720p and 1080p on my set even with glasses on.

Now granted, it's great to have laptops and tablets at higher resolutions, because your face is smashed up against them and you're often trying to read fine text. But that's not the video case that's being talked about here.


I don't need to get the full benefit to care about the difference. And in my experience there's a lot of screens closer than 10 feet to couches.

On mobile, 720p starts to get shoddy once your screen hits 5 inches across.

So while 4k is situational, and encoding quality is more important than the extra resolution most of the time, 1080 vs. 720 is pretty clear-cut; 1080 should be considered the minimum for most content.


Note that I was talking about the full benefit of 1080p, not 4k. My points are that 4k is usually pointless, and therefore that 1080p is usually better than just "minimally acceptable", since most of the time we don't even get the full benefit of it.

> On mobile, 720p starts to get shoddy once your screen hits 5 inches across.

Even assuming you're right abut this, I guess I really just have a hard time caring. Anything you watch on your phone is at best something you don't give a shit about, artistically speaking, and it's hard to imagine 1080p vs 720p making any kind of difference to the experience. (I suppose I might be biased since my screen is "only" 5.2 inches diagonally - I get the smallest one I can whenever I buy a new phone.)

And for what it's worth, using the same math as for my previous comment, you can't even get the full benefit of 720p on a 5 inch diag screen unless you're holding it less than a foot from your face. Granted, you can get the full benefit of 1080p at a little under 8 inches, but I'm suffering even imagining trying to watch a video this way. Even at this distance, I would dispute using "shoddy" to describe how 720p will look.

The math is actually pretty simple: for a 720p screen, there are sqrt(12802 + 7202) pixels on the 5 inch diagonal, so a distance of 5 inches / sqrt(12802 + 7202) per pixel. 20/20 visual acuity can resolve roughly 1 arc minute, or pi/10800 radians. By the arc length formula, the distance we calculated subtends that angle at (5 inches / sqrt(19202+10802)) * (10800/pi), or 11.7 inches.

> And in my experience there's a lot of screens closer than 10 feet to couches.

Note that I addressed this point. If you have a pretty typical 50 inch TV, you've got to have it closer than 7 feet from your couch for 4k to make any difference at all.


> Note that I was talking about the full benefit of 1080p

Yes, so am I.

> therefore that 1080p is usually better than just "minimally acceptable", since most of the time we don't even get the full benefit of it.

If most of your users would be limited by 720p, and 1080p is standard and easy to do, then I'm comfortable calling 1080p the minimum.

> I'm suffering even imagining trying to watch a video this way.

The official "Retina" numbers have a phone 10 inches from your face. And that's about where I hold it when I have my glasses on.

> 20/20 visual acuity can resolve roughly 1 arc minute, or pi/10800 radians.

That's a good baseline number, but a lot of people can beat it by a significant fraction.


Average visual acuity for young people is about 0.7 arcminutes, best being around 0.5.

There seems to be disconnect about how people are consuming media. For me holding a 2280x1080 6.3" phone ~8" from my eyeballs is a natural viewing distance and I can see the full resolution without difficulty. And at least from my point of view a 65" TV is also a pretty typical size.


> That's a very bold claim. Have any studies or polls to back that up?

From 1:

> In Japan, only 14 percent of households will own a 4K TV in 2019 because most households already have relatively new TVs, IHS said.

Let's dissect the reasoning. Most households already have a relatively new TV, thus a low adoption rate. Implying that needing a new TV generally, not the desire to upgrade resolution, is the primary motivating factor in purchasing a TV. In fact, most 4K TVs are already as cheap as the rest of the market.

I truly believe that almost everyone does not care about 4K whatsoever. In fact, even if they do 'care' it's not because they know what they're talking about. Most of the enhancements that 4K TVs bring are an artifact of better display technology, rather than increased resolution. See 2.

Streaming 4K+ video is a waste of resources with no tangible benefit to anyone other than marketing purposes. Netflix streams '4K' because everyone has a '4K' TV now and they demand it.

[1]: https://www.twice.com/research/us-4k-tv-penetration-hit-35-2...

[2]: https://www.cnet.com/news/why-ultra-hd-4k-tvs-are-still-stup...


I agree with most of what you're saying, but this is actually wrong.

> Streaming 4K+ video is a waste of resources with no tangible benefit to anyone other than marketing purposes. Netflix streams '4K' because everyone has a '4K' TV now and they demand it.

It's not inherently wrong, just practically so. The difference is that Netflix (and often other streaming services) max out at a much higher bitrate for their 4k streaming, and are using a better codec (H.265) as well. By comparison, Netflix's bitrate for 1080p is severely limited and so if you compare the two, even watching at the 1080p level of detail, streaming in 4k will often be a vastly superior experience.

So it's not inherent (not a result of the resolution), but still, streaming 4k is not pointless at present.


Your first link includes an extra detail which is important:

> In addition, “with the Japanese consumer preference for smaller TV screens, it will be more difficult for 4K TV to expand its household penetration in the country"

With a smaller TV screen, yeah, you don't need 4K.

But in the USA, larger screens are desirable. And that's seen in the expected 34% 4K adoption rate in the USA your article describes.

I still use a 1080p TV, but it's also only 46" and is 10 years old. I'll probably be buying a 4K 70-75" OLED later this year.

My computer monitor is 1440p. I could have bought 4K when I upgraded, but I'm primarily a gamer and I wanted 144 hz, and 4K 144 hz monitors didn't exist yet.


Can we all just agree on using AV1 instead of another patent encumbered format?


No, because the market is more than happy to pay a few cents or dollars per device to get better compression and lower transmission bandwidth. This observation has held true consistently in the 3 decades since compressed digital media was invented.


> No, because the market is more than happy to pay [...]

Is it? Because Google/YouTube, Amazon/Twitch, Netflix, Microsoft, Apple, Samsung, Facebook, Intel, AMD, ARM, Nvidia, Cisco, etc, are all part of AO Media:

* https://aomedia.org/membership/members/

The main major tech player I don't see is Qualcomm.


The use cases for video is significantly larger than a few tech companies e.g broadcast.

And most of those companies are also part of MPEG as well.


What would be a list of non-tech companies prevalent in the broadcast space?

They're part of MPEG because of legacy reasons in having to deal with H.264.


The market is rather unhappy. E.g. Win10 doesn't ship an H265 codec because it's too expensive.


> Win10 doesn't ship an H265 codec

Very few Win10 users would want a CPU-targeted HEVC codec.

Intel, nVidia and AMD have that codec in their hardware. They are probably paying for a license to use these patents, they ship Win10 drivers for their hardware, and Microsoft publishes that drivers on Windows update.


At 99 cents for the add on, the decision to charge users smells more like a political decision than an economic one. The cost to Microsoft is undoubtedly far less than that. 99 cents is basically the bare minimum you can charge when you accept credit cards as a method of payment.


https://en.wikipedia.org/wiki/High_Efficiency_Video_Coding#P...

$0.20 for MPEG-LA, $0.40 for HEVC Advance, and "call us" for Technicolor and Velos Media. $0.99 doesn't sound far off.


That's 1% of the cost of windows, for a video codec when there's already dozens of free alternatives


There’s a fixed cap on the royalty rate, so even if we assume Microsoft paid for a license for each copy of Windows sold to customers, on a per-copy basis it would be much less than 1%.


The constantly growing list of H265 patent pool organizations with various licensing plans made even Apple to join AV1 bandwagon.


Apple is also on the list of organizations behind H.266 so it’s difficult to conclude anything beyond wanting to bet on all the horses.


that's not why Apply joined AV1. They joined AV1 because they were forced to. Netflix, Google, Amazon are all on the AV1 bandwagon. If Youtube, Netflix and Prime Video all use AV1 for future higher quality streams, Apple cannot avoid supporting it.


It works both ways: if popular devices don’t support it in decode, encoding your library in a format that is slow(er) to decode and/or more taxing on battery life isn’t a clear win.


Youtube has been vp9 only above 1080p for a long time, but Apple is adding support for VP9 only in the upcoming OS releases.


Why are we assuming H.266 is better than AV1? Or at least better enough to warrant all the trouble and cost of licensing?


I am not a fan of patent encumbered technologies, but here the assumption of being better has merit. There isn't much sense from a business prospective for a research team to publish a commercial solution that would exhibit inferior or even comparable performance to an existing free solution. As for the problem of convincing people to swallow the cost, just organize a show off campaign and leave the rest to sellers. That's how (at least) Apple does it.


I dunno, there are a lot of reasons companies choose technologies. If Fraunhofer gets this thing into hardware (by whatever means it takes), that could be the end of the debate. There's also probably a bit of "no one ever got fired for choosing Fraunhofer" going on as well.


I wouldn't evaluate the pros and cons of a technology based on decisions made by business executives. I would look at objective third-party comparisons of the encoding and decoding before deciding which is better: H.266 or AV1.


Now audio uses open codecs for the part and new video will to.


That is because Audio Encoding hasn't seen much improvement as compared to video.

Nearly 30 years after MP3, the only audio codec that could rivals mp3 at the standard rate of 128kbps at a significant lower bitrate was Opus at 96Kbps.

And MP3 is still by far the most popular codec due to compatibility reason.

This is similar to JPEG, although things are about to change.


> Nearly 30 years after MP3, the only audio codec that could rivals mp3 at the standard rate of 128kbps at a significant lower bitrate was Opus at 96Kbps.

AAC and Vorbis were doing this for years before Opus was on the scene. Opus is a further improvement on audio codecs, but not an unprecedented one.


Opus and vorbis have both improved on mp3, flac has improved in the lossless space, and there are other codecs that do better at very low bitrates (think 20-30 kbps).


I don't have the Netflix / Disney+ / etc containers to analyze but Youtube has pretty much totally purged mp3 from every video on the site. Billions of watch hours a day of Opus audio there at least.


Nah, Dolby excels at injecting itself where not needed. Blurays support 8-channel LPCM (uncompressed) audio, yet Dolby managed to push its proprietary junk codecs in there.


It's not that the market is happy to pay more, it's that there is essentially no choice.


A captive market is a happy market, no?


Not for consumers, obviously?


This is not a captive market. If someone were to invent a free codec that performed similarly and had both software and hardware reference implementations, the market would adopt it very quickly.

This is a market that is voluntarily paying for perceived value.


Please show me where I can pick between similar consumer devices where there supported codecs are easy to find, or, better yet, I can pay extra for non free codecs.


The few HNers who actually care about these things do not make a market that vendors think is worth serving, most likely because it would be unprofitable. There’s not going to be a market if sellers don’t find it profitable.


That's exactly my point: those options don't exist so there is no hard data on what people prefer. It's silly to say the market decided when it's only speculated on the most profitable path, but that doesn't make it the only profitable one.


There have been many attempts to make a market for free hardware, particularly in the mobile phone market. All have failed thus far.


Libre hardware is not the same as a device supporting only royalty free or otherwise libre codecs.


You’re splitting hairs. Customers, by and large, just do not care.


The market has plenty of choices from VP8 to Theora.


Please show me where I can purchase a roku-style devices where I can easily chose between codec support or pay extra for non-free codecs.


That's the direction everyone is going, so I think h.266 is an attempt to give vendors pause before moving to av1.

AV1 will probably win in most circumstances (big tech) but is unlikely to win where there are big gains to be had by reducing file size (broadcasters with gigantic libraries).

Broadcasters are also used to paying a lot and not getting much.


Honestly, the first step in this is getting the ffmpeg av1 library to a good usable place. It's currently so slow as to be near unviable. I'd happily switch when it becomes a usable option.


Is there any indication that rav1e is so inefficient as to have substantive double digit percentage speedup left to be realized? Because even if you cut encode times in half, they already take hours per minute of video on a quad core.

AV1 is unlikely to ever be practical for "muggle" encode use, at least in this decade. It will only be worth committing that much compute workload to making a smaller file if the recipients will number in at least what, millions?

I'd be really curious what a hardware realtime AV1 encoder would even look like. How much silicon would that take? That kind of chip would have be be colossal even if it sacrifices huge amounts of efficiency to spit out frames at reasonable time (in the same way hardware hevc and vp9 encoders kind of suck).


HEVC encoder on 20 series nvidia cards is actually really good


Ice Lake's also great


There is no ffmpeg av1 library i.e. no native decoders or encoders. ffmpeg has wrappers for libaom and dav1d/rav1e. 3rd party scripts also add wrappers for svt-av1.


Isn't that largely dependent on hardware acceleration from CPU manufacturers? Or is ffmpeg always software encoding?


FFmpeg isn’t “always software encoding”, that statement doesn’t make much sense since FFmpeg/libavcodec is more of an interface and you can add support for any external encoder/decoder, hardware accelerated or not. However, FFmpeg’s builtin encoders and the most popular external encoders including x264 and x265 are all software encoders. There are hardware accelerated encoders from GPU vendors, e.g. the nvenc encoder for H.264 and H.265 is available for use on Nvidia GPUs if your FFmpeg is compiled against CUDA SDK. It’s a lot faster than x264 and x265 on comparable settings but results are usually a bit worse.

You were probably thinking about hardware decoding though.


I'm curious why hardware encoding is generally worse? All my experiments with it (h264/h265) have lead to significantly lower quality output to the point that I've avoided it for any final outputs, but I always assumed I was doing something wrong.


Hardware encoding cares about consistent real-time output above all else, with power usage as the runner-up. It can focus on that because software encoders are just flat-out better for use cases involving quality - they have access to more flexible/accurate math, they don't necessarily have a real-time/low-latency constraint, they can be updated when new encoding tricks are discovered, they can be massively more complex without ballooning the cost of a device, etc.

They're complementary options rather than competing. Each does something well, and it sounds like you want the thing that software encoding does.


I’ve written various applications on top of FFmpeg but never looked into encoder technicalities, so I don’t know if and why GPU encoding has fundamental limitations. I heard nvenc has vastly improved on RTX cards; I’m still rocking an old GPU so can’t verify that, but presumably that means there’s no fundamental weakness and GPU encoding is just playing catch up?


ffmpeg supports CPU, CPU-Accelerated and GPU-accelerated encoding and decoding.


Just found av1 is about 20 to 30% more efficient than h265. I guess there was no reason to use patented algs, but h265 is now significantly more efficient than av1.

I would still take freedom over patented software


> Just found av1 is about 20 to 30% more efficient than h265

> […] but h265 is now significantly more efficient than av1.

What did you mean?


Not OP but I think he meant "h266 is now significantly more efficient than av1"


AV1 beats h.265 by 20-30%, h.266 beats h.265 by 50%. Honestly for that little of an improvement I'll go with AV1.


The audience of h266 and whom will "win" the codec war isn't individual consumers, even powerusers that know what a codec is.

H266 will be adopted by broadcasting and archival and will make mpeg tankers of money. Whatever the next generation of physical home media is after blu-ray will use it, the player for it will read it, and your TV cable box will take h266 signals in to decode. The costs of paying mpeg will be in the cost of the discs, the cost of the cable package, etc.

The real win we should... hope? For is that h266 never sees a personal computer hardware decoder from Qualcomm / Samsung / Intel /AMD / Nvidia / etc. If online video is exclusively distributed with AV1 then none of these companies need touch the festering MPEG patent hell and consumers avoid that parasite leeching money out of their computer purchases.

Because the cable box and physical media player are dying. You can generally opt out of them and avoid filling the MPEG coffers with software patent money. And the big web companies that have the power to dictate what computers are using for the next decade and beyond are all way favoring AV1 with the exception of Apple.


I expect H.266 patent fee is not so much expensive like you worried.


Really? Because by that metric, H.266 is as far ahead of AV1 as AV1 is ahead of h.265.


well depends, if x is 30% better than y and z is 50% better than y then z is ~15% better than x.

Also this kind of performance claims are 100% hot air. Real world benchmarks talk.


If that kills h.26x why not?


So they themselves claim. Will wait for empirical tests with available codecs to really evaluate this. Also note that for a given bitstream specification, there will be "room to grow" for encoder implementations (AV1 is designed to be flexible in this way). I am not aware of any precise method for evaluating how much room exactly there theoretically could be ex ante.


second h265 was probably meant to be h266


I wonder what would happen if ffmpeg made the choice to not implement a decoder for it. Or maybe just not a encoder?

Can we agree not to work on such projects? I feel that the lack of a good open-source encoder/decoder would spell the death of most codecs nowadays. That would also teach Fraunhoffer about it.

Of course, everyone is free to scratch their itch. And the bigger the void, the more itchy it gets. Luckily, we still have AV1.


How does x264 etc get away with it?


They are compliant with MPEG patent licenses - they distribute source code you have to compile yourself and for non-commercial use.

If you build x264 or other open source implementation of h.264/h.265, and embed it for example in commercial video conferencing software/appliance, you have to pay patent licensing fees for that product.

It's also why Firefox downloads a blob from Cisco to handle MPEG-4 video - Cisco covers the licensing for distribution et al.


How much does Cisco pay (to MPEG?) for that?


If I understand correctly, around 10 million USD per year (according to https://www.mpegla.com/wp-content/uploads/avcweb.pdf that's the cap, and from what I've heard Cisco is selling enough actual products to hit that cap, so providing their software for free to everyone else doesn't cost them any extra in licensing fees, just hosting and such)


That is such a nice service. Benefits everyone and near-zero cost for them.


Why is Cisco being nice about it? (Besides why not?)


I don't recall the specific timing of the release, so this might not line up, and I have no inside knowledge, just public information.

Cisco has some products which use compressed video in a browser setting. It would be useful if all browsers supported a good codec. Individually downloaded codec plugins suck, because installing is iffy. Therefore, give something away which doesn't cost licensing money to make your existing licensed products more usable.

And get some good feels on the interwebs.


Because several years ago, there was a fight over mandatory to implement video codecs in WebRTC. It was VP8 vs H.264. The biggest thing VP8 had going for it was no royalty payments. Cisco wanted H.264 because all of their devices supported H.264 and none supported VP8 and they already paid the royalties. So Jonathan Rosenberg, then CTO of the division of Cisco that managed this part of the business arranged to have Cisco cover the royalty payments for anyone implementing the WebRTC standards.

That wasn't enough, and WebRTC requires both VP8 and H.264 as MTI codecs.


Because it forces their competitors (in the video conferencing business) to take similar costs.


Could firefox download x264 and compile it on-demand?


x264 is for encoding only.


They don't get away with it. The are two separate issues: the software copyright license and the H.264 patent license. x264 itself is licensed under the GPL:

http://www.videolan.org/developers/x264.html

But if you use an H.264 encoder or decoder in a country that recognizes software patents then you need to buy a patent license if your usage comes under the terms of the license:

https://www.mpegla.com/programs/avc-h-264/


Software patents are not valid everywhere, x264 is brought up by VideoLAN in France, where software patents don't apply like in the rest of the EU.


So if a company in france writes open source software that infringes on a US software patent, and a company in the US bundles that source code in a product, is the US company liable for damages?


Yes. The company distributing the work has to make sure it has licenses for any required parts.


Yes, that's exactly how it works, the US company is responsible for following the US laws.


They sort of half not apply in Sweden too.


What do you mean by half valid?


Software patents are valid only if part of a larger invention, the algorithm itself cannot be patented.

However, patent clerks have also from time to time registered algorithmic patents.


Naively hoped I'd read 'this will be released to the community under a GPL license' or similar. Instead found the words 'patent' and 'transparent licensing model'.

I appreciate that it costs money and time to develop these algorithms, but when you're backed by multi-billion dollar "partners from industry including Apple, Ericsson, Intel, Huawei, Microsoft, Qualcomm, and Sony" perhaps they could swallow the costs? It is 2020 after all.


That's Fraunhofer for you.

In the early days of MP3, all MP3 rippers and players were built off of their implementation.

Hardware and software companies had to license in order to play MP3 files. As such there was not native support for MP3's for quite some time.

In the late 90's right around the explosion of MP3's on the internet, Fraunhofer was going after companies for doing so.

In my humble opinion, that license mess set back innovation in the portable audio space by a good 5 years.


> In my humble opinion, that license mess set back innovation in the portable audio space by a good 5 years.

Seeing all this, I'm convinced that copyright in general and patent system in particular does more harm than good by slowing down the technical progress of the humanity as a whole for the sake of some already rich people becoming a bit richer.

The initial idea behind patent system was sensible, but the way it's abused now... I mean, it could work in today's world as intended if patents lasted a year or two, not what is effectively eternity.


Seeing all this, I'm convinced that copyright in general and patent system in particular does more harm than good by slowing down the technical progress of the humanity as a whole for the sake of some already rich people becoming a bit richer

There are plenty of societies that don't respect intellectual property and copyright. And those societies don't innovate at the rate as those who do.

There are certainly abuses in the copyright, trademark, and patent systems. But throwing out the baby with the bathwater is not the answer. Identifying the abuses and improving the system is the answer.

Perhaps that progress won't happen at the rate you'd prefer, but it's significantly better than burning the whole thing to the ground.


> There are plenty of societies that don't respect intellectual property and copyright. And those societies don't innovate at the rate as those who do.

That's not necessarily so - for example, during the industrial revolution, Germany largely overtook the British in mechanical engineering skill during a period where the British had copyright but before the Germans got it: https://www.wired.com/2010/08/copyright-germany-britain/

Also, it's quite possible the causation goes the other way: A society might not succeed at innovation because of IP laws - it's just as possible that because a society has succeeded at innovation, it passes IP legislation, in order to 'kick away the ladder'. But just like regulatory capture shows in general, legislation that helps yesterday's winners seek rent is not necessarily the same (and is often in fact the opposite) of legislation to help tomorrow's winners see the light of day.


> There are certainly abuses in the copyright, trademark, and patent systems. But throwing out the baby with the bathwater is not the answer. Identifying the abuses and improving the system is the answer.

That sounds nice and reasonable, in theory.

In practice, all the money is behind expanding the copyright and patent systems. When is the last time the duration of copyright terms was shortened? When is the last time the patent system was adapted to be less draconian and less protective of those poor, poor multinational corporations that somehow end up holding all those patents?

Spoiler alert - that has happened exactly never. Instead, all we get is 'harmonization' which always means extending terms and giving those laws more teeth, to match the strictest law implemented anywhere in the world.

Steamboat Willy's copyright expires January 1st, 2024. How much do you want to bet that Disney will be pushing for another copyright term extension before then?

> Perhaps that progress won't happen at the rate you'd prefer, but it's significantly better than burning the whole thing to the ground.

The current systems only ever get stricter. Where are the much shorter terms? Where is the recognition that cooperation and remixing fosters innovation and progress and as such should be encouraged, not punished? Where is the PTO following the actual law that says math and logic (i.e. software) are not patentable? Etc, etc, etc.

The abuses have been very well documented over the past several decades. There has been zero progress on incremental improvements. Tell me again how you propose we improve the system without a drastic overhaul?


> Perhaps that progress won't happen at the rate you'd prefer, but it's significantly better than burning the whole thing to the ground.

Changing patent length (pc's suggestion) is hardly seems like burning the whole thing to the ground.

In the US, if I understand correctly, there are ways to defend against patent abuse, though it typically involves costly legal fees. Fees many cannot justify in spending.

Addressing patent challenges may be a way to protect innovators, both those who should gain reward for their innovations, and those who seek to grain reward by building upon innovations.


In the US, if I understand correctly, there are ways to defend against patent abuse

That's the key: In the U.S. Good luck stopping someone infringing on your patient in a good number of other countries.


> There are plenty of societies that don't respect intellectual property and copyright. And those societies don't innovate at the rate as those who do.

For one, this claim suffers from a correlation/causation issue. But also, do you have an actual citation for research which shows this is true?


Chapter 8 [1] of Against Intellectual Monopoly, "Does Intellectual Monopoly Increase Innovation?" looks into this, including comparing countries with stronger patent laws vs weaker laws. The results seem pretty decidedly mixed.

[1] http://www.dklevine.com/papers/ip.ch.8.m1004.pdf


> There are plenty of societies that don't respect intellectual property and copyright. And those societies don't innovate at the rate as those who do.

What it does do though is allow someone to start from zero, and catch up to the rest of the competition very fast and cheap. They can then offer their "product" cheaper in order to gain market share. As long as they are getting/keeping customers, there's no need to innovate. You can have a viable business without spending tons of cash on R&D, and if you're making money that way, who cares?

*I am no way endorsing this kind of business model, but it exists and does well.


>There are plenty of societies that don't respect intellectual property and copyright. And those societies don't innovate at the rate as those who do.

Comparable examples being ?


> There are plenty of societies that don't respect intellectual property and copyright. And those societies don't innovate at the rate as those who do.

I'm not 100% sure about this. Just look at how hard China laughs at intellectual property and copyright in general and tell me if you still think the same.

Not being burdened by expensive license fees is a competitive advantage.

But then again, also not having to care about workers right is...


uh China is doing pretty awesome so thats one big massive outlier distorting the entire supposition


isn’t China still mostly piggybacking on foreign innovation?


I wouldn't say mostly. It happens and is a predictable thing that will happen with software and hardware. People also build their own things there. It is a very large market with multiple tech hubs, servicing itself.


They have inveated in manufacturing, Solar, Nulcear and a number of other areas to the extent where they are now world leaders. You can debate relative merits, but if you rank countries it certai ly is doomg better than most.


You've flipped the causal relationship - societies capable of rapid invention are also going to be more likely to be legalistic.


US patent laws sensibly state that only individuals, not corporations, may be awarded patents.

Unfortunately, most companies require that any patent awarded to an engineer in their employ is automatically assigned to the company.

Get rid of that loophole and employees will be able to license their patents as they see fit. Of course this is fraught with practical difficulties, but some kind of compromise could be reached.

Never happen, obviously.


Do you really want to try to get individual licenses from every single person who worked on video codecs from in the last 20 years so you can legally compress video?


While that would be somewhat harder, I bet most engineers are much less greedy than companies that employ them.


I see you are unfamiliar with the Wright brothers patent war: https://en.wikipedia.org/wiki/Wright_brothers_patent_war


> I mean, it could work in today's world as intended if patents lasted a year or two, not what is effectively eternity.

This needs to vary based on field. Some areas, like drugs, take forever to get to market and have exorbitant development costs to recover. (Though, there, other abuses need to get fixed, like renewing patent lifetime with slightly different applications or formulation.


A probable reason is the length of time patents and especially copyright law applies, at least in the US. Although we are seeing issues with 70+ year copyright with extensions after the creators death, digital technology just moves way too fast compared to classical technology. By the time a classic patent expired, other firms may have built the infrastructure to create a generic copy or even advanced the science like with light bulbs. Digital tech moves quickly and software that can be distributed across the planet in seconds and make billions in its first few years is ripe for subversion and knock offs. Maybe if digital patents applied to much more specific applications then competition could build their own version using a new code base and compete?


> In my humble opinion, that license mess set back innovation in the portable audio space by a good 5 years.

Or did it push it forward? If not for the licensing at the time, would it have been developed? Would it be allowed to be used by anyone just paying for the technology?


That’s not edgy enough for HN ;)


On the other hand those licenses finance the development in the first place. MP3 by the way was done by another Fraunhofer institute.


It's even worse than that. The main driving force behind VVC is AFAIK the Heinrich Hertz Institute, which used to be independent and has now been sucked into the Fraunhofer mothership, which is one of the big research organisations in Germany.

Fraunhofer has a budget of over 2 billion Euros, and 30% of their money comes from public funding. They run over 70 institutes, so they do much more than this.

The root problem here is that nowadays public research is funded with industry money, which means there has to be a return on investment, hence the patents. In fact, this has metastasized into universities being graded by their patent portfolio volume. So I would expect there to be patents even if 100% of the funding came from the tax payer.

It would have been possible to do the whole process just with public money and zero patents. In fact, I would love it if some research team collated all the patent tax payments across the population of Germany and compare the bottom line cost for the country.

I wager it would have been cheaper without patents, too.


Disclaimer: I work at a Fraunhofer institute, though not Fraunhofer HHI, which developed this codec, and I have no intimate knowledge on the financing of that institute or this project in particular. But some basic principles apply to all institutes the same.

Fraunhofer gets roughly 30% of its funding from public sources, the remainder is raised on a per-project basis. It's a fair assumption that those industry partners provided some funds towards the development here. Maybe they even covered all the payroll costs for the involved scientists for the duration of the project.

And yet more income means more money for other research projects. Maybe ones that are not as commercially interesting, or for which a partner decides to terminate a contract rather unexpectedly. While I am also a fan of OSS and would love for work like this to either have no patents or a liberal patent grant, I can also appreciate the desire to fund your research institute.


Public sources as in the taxpayer?

Why would the taxpayer fund anything that isn’t open and free. Crazy.


Yes, public sources as in the tax payer. It is worth pointing out that Fraunhofer is itself a non-profit organization.

The argument that anything funded by tax money must be open is a very fair stance. Though the line gets very blurry when you mix various sources of funding like this. To the best of my recollection I have yet to be paid from any public funding (rather than project specific funding raised from the industry, for example).

Personally, I have no qualms with the funding model, but other points of view are presumably equally valid.


Unfortunately, non-profit does not necessarily mean that someone in the organisation isn't amply lining their pockets.


You're right, but I'm not aware of any egregious salaries or bonuses. Both the president and all the chairmen are all professors, which generally means they are paid by their University, not Fraunhofer. The same applies to many directors of the individual institutes. The salaries for employees are based on "Tarifvertrag öffentlicher Dienst" ("labor agreement for public service",). Unless there's straight up fraud somewhere, there shouldn't be too much lining of pockets.


Public transport also is just subsidized and not free.


The dirty secret of video codecs is that you can't make a modern video codec that isn't patent encumbered, which in turn makes it so that even if they wanted to be open, they go for defensive patents, which in turn perpetuate the situation.

At least the patent licenses usually used with MPEG mean that private use of open source implementations is free.


> The dirty secret of video codecs is that you can't make a modern video codec that isn't patent encumbered

The existence of Theora, VP8, VP9, and now AV1 seems to contradict that theory.

You could argue that they infringe on some unknown patents, but that is also arguably true of patent cabals like MPEG (you just hope that the cabal is big enough that there aren't any patentholders lurking outside). The only difference is that with a patent cabal you have the fun of having to obey the restrictions of everyone who showed up with a possibly-related-in-some-way patent and joined the cabal.

Not to mention that it isn't necessary for a patent pool to be a cabal. AOMedia has a similar structure to a patent cabal except it doesn't act like a cabal (its patent pool is royalty-free in the style of the W3C). So even if the argument is that a patent pool is a good idea (and video codecs cannot be developed without them), there isn't a justification behind turning the patent pool into a cabal.

> At least the patent licenses usually used with MPEG mean that private use of open source implementations is free.

You say that, but there's a reason why some distributions (openSUSE for one) still can't ship x264 (even though the code itself is free software). Not to mention the need for Cisco's OpenH264 in Firefox (which you cannot recompile or modify otherwise you lose the patent rights to use it). The existence of the MPEG patent cabal isn't a good thing, and any minor concessions you get from them do not justify their actions.


Yeah. It's a mess.

The video patents aren't just "patent troll" patents, either. They are highly enforceable, and were registered by corporations like Ampex.

I have been trying to write a simple app to stream RTSP (security cameras), and that has been a pain.

I need to basically use either proprietary (paid) or GPL software to do it.

Video software is not for the faint of heart. Much as I grouse about the licensing, I am not about to develop my own codec.

I did write this one app, which is an ffmpeg wrapper, to convert RTSP to HLS (Which is not -currently- suitable for realtime streaming): https://github.com/RiftValleySoftware/RVS_MediaServer

It's GPL, because I need to use the GPL ffmpeg H.264 codec.


> I need to basically use either proprietary (paid) or GPL software to do it.

> It's GPL, because I need to use the GPL ffmpeg H.264 codec.

I don't understand the problem?

I can see the problem with people patenting things and preventing you from writing your own implementation, but it seems you just want other people to do the hard work of implementing it so you can wrap a skin around it and do what?


I wasn't complaining. My apologies for coming across that way.

It was a statement of fact.

I've been writing open-source software for well over 20 years (actually, well over 30 years –Where does the time go?).

I think I understand the issues involved.


You can license x264 to not have to worry about the GPL. Though that costs money and you may find yourself signing MPEG LA contracts.

edit: Your project reminds me, https://github.com/arut/nginx-rtmp-module is super worth checking out and might be helpful to you.


Yeah, but it was a free app.

One of the frustrating things about implementing video software, even licensing it, is pretty much everything out there is an expression of ffmpeg, which is a really, really good system, but does have some baggage.


As others mentioned, VP8, VP9 and AV1 all are patent encumbered - they just happen to have licenses that are royalty-free and as someone mentioned, include retaliation clauses which help fighting off submarine patents - but that's essentially a possibly high-stakes game of chicken.


While they are patent-encumbered in the strictest sense of the word, they are not as aggressively licensed as H.266 -- which is what GP was lamenting. Licensing H.266 under the GPL(v3) wouldn't erase the existence of the patents and it would still be "patent-encumbered" in the same sense that VP8/9 and AV1 are.

And retaliation clauses are present in basically every free software license that has clauses dealing with patents (including Apache-2.0).


A big part is in how expensive can you make said retaliation.

I'm more and more fond of calling it all a big game of chicken, a MAD without nukes,


Hah I love this analogy! Guess that's societal progress for you; we use currency to replace munitions.


Wouldn't Apache 2 be a better license for this since it is explicit on patents?


GPL is the example GP used, which is why I referenced it. It's not really important what free software license it would be (though GPLv3 does have an entire section (s11) which is all about patents -- that's why it's compatible with Apache-2.0).


> but that is also arguably true of patent cabals like MPEG (you just hope that the cabal is big enough that there aren't any patentholders lurking outside)

Apparently, they aren't. There are at least 2 other patent pools that claim patents for HEVC, and I think I saw 3 in some other article before:

https://streaminglearningcenter.com/codecs/hevc-ip-mess-wors...


Software patents aren't a thing in ~~Europe~~ a few European countries. Sure it's difficult to ignore the American market for a company, but an independent developer could specify a state of the art video codec without thinking about patents.

Edited because I didn't know that some European countries accept software patents.


Software patents are a very complex thing.

For example, many countries in EU do not allow patents on software, but that's not something you can claim to be true for all of them - at least before Brexit, since iirc UK was pretty happy to provide software patents.

Then there's a case where if you're really willing you can, as far as I understand, force a patent dispute through WTO, with possibility of patent valid in USA being executed for example in Poland, despite the fact that the patent is invalid in Poland (it doesn't matter if your software is part of physical solution in Poland, algorithms of any kind are not patentable).


What really annoys me is that Article 52 of the EPC explicitly excludes software from patentability. It couldn't be more clear in its language (to me). But when I talked to a patent lawyer about this a while back he said didn't really mean that and software could and is easily patented.

https://en.wikipedia.org/wiki/Software_patents_under_the_Eur...


One more benefit of the brexit for mainland Europe! I didn't know about using the WTO to dispute invalide patents. I guess a WTO dispute can make sense for a Boeing software patent used by Airbus. I wonder whether the WTO would care about an independent developer. It's not very good press, but perhaps it's fine for them.


WTO doesn't care, WTO serves as a forum to get it through.

The question is, does a "practicing entity" holding the patent cares enough to go through the hardest route to get the patent executed using WTO as a forum? One needs to compare costs and benefits. It's why patent trolling involved pretty much few counties in Texas, because that's where the costs were lowest compared to benefits.


It's not just the UK that allows software patents in practice - as the discussion hints at, Germany does too, and if I remember rightly pretty much the last patent affecting MP3 was a German patent on zero-padding held by Philips, who enforced it aggressively against manufacturers of devices like music players.


> Software patents aren't a thing in ~~Europe~~ a few European countries.

Look at all those European patents in the MPEG-LA license pool!

https://www.mpegla.com/wp-content/uploads/avc-att1.pdf


Software patents can exist, but are almost always unenforceable, is my understanding of it from researching this for my own media-related project.


> Software patents can exist, but are almost always unenforceable

Wanna take the risk? You might end-up winning the lawsuit, but at this moment, there's a good chance for you to be already out-of-business.


As I understand it there has been no successful enforcement of a software patent so far, and it's ridiculously unlikely that I (as opposed to any other actor in this space) will be the first case. So yeah, I'll take that risk.


I would love that you be right. However, here are examples of successfully enforced software patents:

- in Europe: http://www.bailii.org/ew/cases/EWCA/Civ/2002/1702.html

- in US: https://web.archive.org/web/20061205050434/http://eolas.com/...

But as I said, if your startup is being sued by Dolby, whether the enforcement is successful or not is actually irrelevant. Showing that your work doesn't infringe a patent, or that Dolby's patent is invalid, is a money and time-consuming process (unsurprisingly, patents are not generally written to facilitate re-implementation or defense).

(Moreover, in the US, in some cases, the patent owner might even get a preliminary injunction ( https://www.tms.org/pubs/journals/jom/matters/matters-9712.h... ), which might seriously and immediately harm your business. I don't know if such a thing exists in Europe).

Big tech companies like Dolby and IBM use a preventive racket-looking technique ; it involves trying to sell to potential infringers a "protective" subscription, but there's no preliminary analysis of whether there actually is any patent being infringed.

During broadcasting tech events like IBC or NAB, Dolby actually sends people to other company's booths for this ; and there's a famous story about IBM against small-at-this-time SUN : https://www.forbes.com/asap/2002/0624/044.html , whose gist is:

> "OK," [the IBM lawyer] said, "maybe you don't infringe these seven patents. But we have 10,000 U.S. patents.

> Do you really want us to go back to Armonk [IBM headquarters in New York] and find seven patents you do infringe?

> Or do you want to make this easy and just pay us $20 million?"


I meant in Europe, and that case is not successful enforcement of a software patent (merely a preliminary question as to whether some jurisdiction weirdness could be a reason that William Hill did not infringe on a patent); in fact there was no question to the court as to whether the patent was valid, and the case ended with an answer to the question asked.

This is a quirk of some UK courts, where you can literally just start a case to ask a question on some detail of the law and get an answer.

The question was:

> "Is it a defence to the claim under s.60(2) of the Patents Act 1977, if otherwise good, that the host computer claimed in the patent in suit is not present in the UK, but is connected to the rest of the apparatus claimed in the patent."

From Wikipedia:

> Questions of validity were never considered by the court.


I think Europe does allow software patents of a sort: See https://www.epo.org/law-practice/legal-texts/html/guidelines... Or you might have a particular definition of the term in mind that excludes these, but I think when most folks here the term, they'd include what the EPO permits.


The European patent office is not an official thing. It's an independent private organization that will gladly take your money to submit patents.


The European Patent Organisation is an intergovernmental organisation (https://www.epo.org/about-us/foundation.html).

Not a private organization.


Alright, I was wrong. Still strange that they issue patents that are obviously invalid.


I know quite well a patent examiner there; there is a lot of politics and incompetence in appointing people and a lot of pressure to accept most of the patents they receive, if I understood correctly part of the reason is politics (if you refuse, even stupid patents, influential people will get upset) and part is the idea to have as many patents as possible as a measure of European creativity and innovation and ... bla-bla. Universities and research institutes are measured by the number of patents and if they don't invent something good, they will patent something stupid just to get the numbers right.


The reasoning is that individual states have Opinions about the validity of software patents and they're going to keep issuing them until somebody with the proper authority clarifies that the law means what the law says, although as far as I can tell none have actually been successfully enforced.


In practice, EU does allow software patents. The software invention just needs to be disguised as a machine. Industrial property lawyers know very well how to do that, and it's unfortunately very common.

> Edited because I didn't know that some European countries accept software patents.

European patents are granted at the European Patent Office (individual european countries also have their own patent offices, whose patents can only be enforced in their home country).


enforceability of EPO patents varies, that's why there was a big fight on software patents in EU parliament not so long ago, not sure however on current status - that needs to be checked I guess.


Are AV1 and VP9 not modern video codecs? Or are you suggesting the patent claims have substance?


AV1 and VP9 are covered by plenty of patents. One of the differences here is that there is a retaliation clause in the licenses for AV1 and VP9. https://aomedia.org/license/patent-license/

"1.3. Defensive Termination. If any Licensee, its Affiliates, or its agents initiates patent litigation or files, maintains, or voluntarily participates in a lawsuit against another entity or any person asserting that any Implementation infringes Necessary Claims, any patent licenses granted under this License directly to the Licensee are immediately terminated as of the date of the initiation of action unless 1) that suit was in response to a corresponding suit regarding an Implementation first brought against an initiating entity, or 2) that suit was brought to enforce the terms of this License (including intervention in a third-party action by a Licensee)."

This makes it much harder for practicing entities or their licensees to assert claims against other practicing entities over the formats.


There are patents that cover AV1 and VP9 they just are just licensed without royalties: https://aomedia.googlesource.com/aom/+/refs/heads/master/PAT...


They are modern video codecs which are playing chicken with possible submarine patents.

Ultimately, it's a question of how much you're gonna risk to get where you want, and how much power/influence/wealth you can bring to squash a possible lawsuit.


But H.265 and H.266 have the same risk - worse actually, as multiple submarine patents claims have risen again H.265, and there are now multiple licensing organizations that all claim they need to be paid if you want to use H.265, and over a decade later their claims still haven't been legally settled.

To completely avoid risk your only choice is to to use old technology where all the patents have expired (20 years in US), like MPEG-2. The next lowest risk is to use H.264 and VP9 which have been out for a while and whose the patent pools have stabilized over the years (and the original parts of the standard will have their patents expire soon - but not some of the newer profiles). After that I would argue that AV1 is less risky that H.265 and H.266, as a lot of work was put into intentionally avoiding patented technology that was not part of the pool, and no one outside the pool has yet made patent claims against it.


>To completely avoid risk your only choice is to to use old technology where all the patents have expired

EVC baseline is basically that, only using technology based on H.264 that are already expired or soon to be expired and patented techniques from companies that are giving it away to the standard.

In some way EVC is even more exciting than VVC.


> The dirty secret of video codecs is that you can't

You can, but fraunhofer certainly isn't trying!

> At least the patent licenses usually used with MPEG mean that private use of open source implementations is free.

0_o that is not at all the truth.


> The dirty secret of video codecs is that you can't make a modern video codec that isn't patent encumbered

Why is that?


Pretty much every method involved in high-efficiency video compression has a patent on it, doesn't matter how small part is involved.

Now compound this by the fact that a) trying to make an exhaustive patent search to get a verifiable claim that you don't infringe on any patent is very problematic b) known patent pools like MPEG-LA are known not to cover everything.

So you can make a reasonable bet that you avoid infringing patents by avoiding patents from MPEG-LA and few other better known groups, but you can't actually guarantee that you're not infringing on any patents.

This resolves, sort of, into a game of chicken and depends heavily on whether a lesser known patent holder decides it's worth it to bother executing against you... but even if they don't, unless they come out with a royalty-free license, the possibility of patent is a Damocles' Sword hanging over your codec.


"make an exhaustive patent search to get a verifiable claim that you don't infringe on any patent"

Is there even such a thing?

Isn't the problem that one has to actually go to court to get the answer to this question?


In theory you can do such kind of search. Problem is that doing that is impractically expensive (and at least in some jurisdictions doing that can actually raise your liability in case of patent infringement).


Does such a search indemnify one of liability though, is the big question?


No.

In fact, I heard more than once that current advice is to explicitly avoid searching :/


Because there are companies with a lot of money in this space which spend all their time trying new things and patenting anything they come up with, even if it doesn't make it into a published codec, basically.

The way to get around this is to exist in the EU and avoid providing anything to the US.


The field is littered with companies holding submarine patents just waiting for that big payday when something using some tiny part that they patented gets popular and deployed to millions of devices so they can surprise everybody with their pricey licensing terms and huge lawsuit.


It would be hard to build a new algorithm without stepping on other established algorithms that are patented.


That's what AV1 is.


The situation with AV1 is "we are in middle of minefield and nobody got mine-clearing gear".

They can be reasonably sure they do not infringe known patents from certain Patent Pools and patents declared as part of MPEG-LA bundles. They can't provide reasonable data that they do not infringe on any submarine patent, something that killed 3 attempts by MPEG-LA to provide a royalty-free codec for the web - all that was required to kill it was a note from a company that they "might" have patents covering things they tried to release, or that they decided not to allow royalty-free license for their known patent.

Meanwhile patent search is complex enough that it's unreasonable to impossible to make a statement that you definitely don't infringe any unless you keep clear of anything invented within last ~20 years.


> The situation with AV1 is "we are in middle of minefield and nobody got mine-clearing gear".

That's true of every new video codec. It didn't stop the use of H.264 or VP9 or even HEVC.

Multiple companies have now rolled out AV1 into production. We'll see what happens.


When I wrote that I didn't know about retaliation clauses in AV1 patent pool license. That said, a huge chunk of patents involved in all MPEG standards (AVC, HEVC, etc.) meant that if you tried to go after them, you might have lost licenses necessary elsewhere, so it's all a question of risk analysis for someone who wants to torpedo AV1 with a patent.

What's wrong is claiming that AV1, VP9, VP8 are "patent-free". They are not.



I think that's more a case of Sisvel trying their luck.

I don't think they'll be successful. The Alliance for Open Media was careful to avoid potential patent problems during AV1 development. So, unless AOMedia seriously failed in that effort, AV1 will be alright.


Being careful and being successful are two different things, especially given how hard it is to do an exhaustive search over patents to ensure that no, no claim in any patent filing touches your code, especially given that there are regimes where patent filings for software get pretty much a rubber stamp and are based around "first to file" even if there's prior art.


I'm confused. How is Sisvel able to sell a license for AV1 patents when they're not a member of AOMedia? Do they actually have AV1 patents, or are they trying to trick companies that would rather pay up than risk violating patents?


Basically, unlike copyright, you don't even have to know that someone else patented something in order to infringe on a patent and have them come out of the woodwork later. It's just even more expensive if you do know.

The patent system was not designed to have every tiny little technique patented, and this is its failure mode.


> It's just even more expensive if you do know.

In the USA. Wilful and unwilful infringement of the patent costs the same in Europe.


Luckily, software patents are unenforceable in the EU, so for many of us on this site that's irrelevant.


That's what patent trolling is


Copyright != patents. A copyright license for a codec implementation or a spec text doesn't grant you patents for the ideas it contains.

A GPL implementation doesn't guarantee a patent grant. Even if you wrote the code yourself, even just for your personal use, your own work could still be illegal to use due to lacking a patent license from the original patent holders.

Be careful about using implementations of H.26x codecs, because in countries that recognize software patents the code may be illegal to use, regardless whether you've got a license for the code or not. Even when a FLOSS license says something about patent grants, it's still meaningless if the code author didn't own the patents to grant.


Of course, they can't do that, because the source technologies they've put together are themselves patent-encumbered. An AV-codec is a lot like a modern pop song: a piece of IP entirely made up of licensed samples of other people's IP.

I think a more subtle "open-sourcing" of this IP could still be possible, though. Maybe one that still requires that large corporate players that are going to sell their derivative products, acquire a license the traditional way (this is, after all, what the contributors to the codec's patent-pool and R&D efforts based their relative-R&D-labor-contribution negotiations around: that each contributor would end up paying for the devices of theirs that run the codec.)

Maybe there could be a foundation created under the stewardship of the patent-pool itself, which nominally pays the same per-seat/per-device licensing costs as every other member, but where this money doesn't come from revenue but rather is donated by those other members; and where this foundation then grants open-source projects an automatic but non-transferrable license to use the technology.

So, for example, a directly open-source project (e.g. ffmpeg) would be granted an automatic license (for its direct individual users); but that license wouldn't transfer to software that embeds it. Instead, other open-source software (e.g. Handbrake, youtube-dl, etc.) that embeds ffmpeg would acquire its own automatic license (and thus be its own line-item under the foundation); while closed-source software that embeds ffmpeg would be stuck needing a commercial license.

Is there already a scheme that works like this?


GPL would seem like a very weird choice -- that would mean it couldn't be put into any closed source product?


"or similar" - I mean an open-source free license to use, that is not encumbered by patents. GPL was the first license name to pop into my old-man brain.


It could still serve as a reference implementation.


Not only a GPL reference implementation, GPLv3 would preclude patents on any improvements.


I don’t see how that follows?


GPLv3 has a clause that says if you release anything under the license then you have to provide a cost free license to anything that is patented, if you aren't able to provide a license then you aren't allowed to release it under GPLv3. Any modifications to a GPLv3 licensed product has to be released under GPLv3.

So yes you are correct, but in effect it might as well be patent free as far as a 3rd party end user is concerned as they have been provided what is in effect a safe harbour.


It could still be useful if the patents were part of a defence patent portfolio. (Ideally I'd abolish software patents altogether, or set a hard 10 year limit on them together with some kind of mechanism to curb the impact of add-on and submarine patents.)


Yes, it can be used defensively. If you breach the GPLv3 then in effect you also lose your patent license. The GPLv3 also forbids you making an opposing patent claim, so in effect it is defensive.


Notice that the missing party here was 'Google'. These are the folks who are really competing with VP9, the royalty free codec that limited the uptake of H.265.


Patents and open\close source are independent things. OpenH264 is BSD-licensed H.264 implementation, for example.


Yes, but what does " backed by " mean?

Does it mean money, or not? Because if not, then Fraunhofer does not exist.

But there is absolutely a problem here, because said 'mega businesses' actually should have a strategic imperative to want to make internet technologies more widespread.

Why on earth would MS want to limit their main line of business for a tiny big of IP related revenue?

It would seem to me, that G, MS, Huawei and all of the various patent holders should be trying their best to remove any and all barriers to adoption. There are enough bureaucratic hurdles in the way to worry about, let alone legal concerns.

Even if MS or whoever had to buy out some laggard IP owners who didn't want to play ball, it would probably still make sense for them.

Fraunhofer or anyone else are not in that situation, but the behemoths running vast surpluses are, it just seems shortsighted for them to hamstring any of this.


> will be released to the community under a GPL license' or similar

Both h264 and h265 have these implementations, I think FFMPEG library has both under the terms of GPLv2.

The decoders are almost completely useless. The video codec, at least the decoder, needs to be in the hardware, not in software.

Mobile devices just don’t have the resources to run the decoders on CPU. The code works on PC but consumes too much electricity and thermal budget. Even GPGPUs are not good enough for the job, couple generations ago AMD tried to use shader cores for video codecs, didn’t work well enough and they switched to dedicated silicon like the rest of them.


> The decoders are almost completely useless.

You'd be surprised at how often these are used.


Indeed. From an article here on HN there is a lack of vaapi support on Firefox and I think Chrome in linux. Furthermore there are special profiles(looking at you H264) that only work reliably in software decoders.


Mozilla added VA-API support under Wayland sessions in FireFox 76, and it'll be coming to Xorg sessions in FireFox 80.


I wonder why?

Even Raspberry Pi has a hardware decoder: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo

Linux Kernel support for HEVC is WIP but I’m pretty sure they’ll integrate eventually.


buying vc1/mpeg licenses to use hardware decoding on raspberry pi's pre-4 was annoying https://codecs.raspberrypi.org/license-keys/

is this because of licensing/copyright?


Right, now I remember these keys.

But the very first Pi already had hardware h264 decoder (and even encoder!) which didn’t need any extra keys to work. No idea how they did it, maybe the license was included in the $25 price. Pi 1 was launched in 2013, at that time h264 has been already widespread, while mpeg-2 use was declining.

I think that’s why they did not include the license. It increases price for all users but only useful for very few of them, who connected a USB DVD or BluRay drives to their Pi-s.


MIT, FreeBSD will also do.


Some never learn. Also surprising to see supporters of this approach in this thread. They are still around apparently.


Is H.265 released under a GPL-like licenses? If not, how do softwares like Handbrake use it?


They use ffmpeg, which is developed by people who do not care about software patents because it doesn't apply to them.


Why don't the patents apply to them?


Because the developers of x265 (which is the real codec and ffmpeg is a wrapper around it) are located in France.

http://www.videolan.org/developers/x265.html

https://en.wikipedia.org/wiki/VideoLAN


I don't know for all contributors, but the creator is French and there is no software patents in France.


All of the French patents listed in this license pool care to disagree: https://www.mpegla.com/wp-content/uploads/avc-att1.pdf


Patents are almost always approved fwiw and the FR cases here seem to be linked to worldwide patents. It is not the patent offices job to test the validity of patents. That occurs in a court when someone challenges the patent. This is true of every country, France included. So whilst you have a list of software patents filed worldwide (including France) that's not really relevant to the enforceability of the patent in France.

This is also why you see articles from time to time highlight a stupid patent as if it's an outrage that the patent office allowed it. It's not the patent offices job to enforce patents. You can literally go ahead and patent swinging on a swing (1) and the patent office would approve it if the paper work is in order. The media would then likely pick up on this with outrage as if that's an enforceable patent. The truth is that it's simply not the patent offices job. Patents are meant to be enforced by courts.

1: https://patents.google.com/patent/US6368227B1/en


> It is not the patent offices job to test the validity of patents.

Actually, it is (at least in the US). USPTO can deny patents on the basis on nonpatentability, and its general refusal to do so post-State St. decision is often cited as one of the problems of the modern patent system.

Broadly speaking, however, if the argument is that software patents are invalid in Europe because they'll be found so by the courts, it should be noted that SCOTUS is actually pretty likely to rule software unpatentable were it to hear a software patent case. A little background is in order:

In Parker v Flook (1978), SCOTUS said that mathematical algorithms (i.e., basically software) is unpatentable. In Diamond v Diehr (1981), they said that part of the patent being software doesn't make the entire thing invalid. The big decision is State St (1998), which is a CAFC decision holding that anything was patentable so long as it produced a "useful, concrete, tangible" result and basically broke the patent office. When SCOTUS decided Bilski v Kappos (2008), they emphatically (and unanimously!) called out State St as wrong, but declined to endorse any guidelines as to what the limits of patentability should be. Later Mayo (2012) and Alice (2014) decisions again unanimously and unambiguously laid out what wasn't patentable: natural processes, and "do it on a computer" steps.

A few years ago, we had a patent attorney at work tell us (paraphrasing somewhat) that Alice made it really hard to figure out how to write a software patent that wouldn't be invalidated. Their continued existence (and pretense to their enforceability) is less because it's secure and more because no one wants to spend the money to litigate it to the highest level (see also the Google v Oracle case, which is exactly the sort of thing a software patent case history would entail).


They are not valid in France.


x265 ffmpeg


What you're looking for is the AV1 codec, which finalized a year and a half ago and will likely see much wider adoption, simply because none of the members want to pay royalties.

https://aomedia.org/av1/

https://aomedia.org/membership/members/

AV1 decoding has already been in Chrome and Firefox since at least a year ago. We're just waiting for hardware decoding and encoding support now, which should start appearing this year.

The next version of Chrome will also support the AV1-based AVIF image format this month:

https://chromium-review.googlesource.com/c/chromium/src/+/22...

YouTube, Netflix, and Amazon/Twitch are also likely to not support VVC (and some of them don't support h.265 either) for their streaming services.


Your comment was deaded. I vouched for it because I don't see anything that looks obviously inflammatory/incorrect, but I noticed that a large fraction of your comment history is also dead. You might want to look into that, since most of the dead comments also looked fine to me.


>The next version of Chrome will also support the AV1-based AVIF image format this month:

FWIW, Firefox supports it already (behind the image.avif.enabled knob).


> perhaps they could swallow the costs? It is 2020 after all.

They should swallow so people again come back and say they are doing it to kill competition? Just received bill from doctor's visit for hundreds of dollar, I guess things are not gonna be free in 2020 after all.


Ironically in Germany, where Fraunhofer is located, you would have received no such bill.


Can anyone verify if this is a real number? It’s possible sometimes to make surprising claims (such as 50% lower size) by relying on unusual or unrealistic situations. I would rather if they use a standard set of test videos with different content and resolutions, and some objective measure of fidelity to the original, when quoting these percentages. But if the 50% number is real, then that is truly remarkable. I wonder how many more CPU instructions are required per second of decoded video compared to HEVC.


If one trusted numbers like this, and followed a chain of en vogue codecs back through history, you'd expect that a modern codec would produce files sizes like 3% of MPEG-2 on the same input data. It's all spin.

I'm sure it does better, but I'm equally sure it'll turn out to be an incremental benefit in practice.

> I wonder how many more CPU instructions are required per second of decoded video compared to HEVC.

CPU cycles are cheap. The real cost is the addition of yet another ?!@$!!#@ video codec block on every consumer SoC shipped over the coming decade.

Opinionated bile: video encoding is a Solved Problem in the modern world, no matter how much the experts want it to be exciting. The low hanging fruit has been picked, and we should just pick something and move on. JPEG-2000 and WebP failed too, but at least there it was only some extra forgotten software. Continuing to bang on the video problem is wasting an absolutely obscene amount of silicon.


MPEG-2 doesn't support video larger than 1920x1152, so it's hard to compare on 4k video, let alone 8k. But according to https://www.researchgate.net/publication/321412719_Subjectiv... H.265 can achieve similar visual quality with 10% of the bit rate of MPEG-2 even on SD video (832*480). [Edit: not 720p]


832x480 is ~480p video (720x480 is standard widescreen dvd) 720p is 1280x720


Video sizes are not shrinking. 8K is coming, with even bigger sizes on the way. That matters at scale.

As long as someone sees the benefit, people will keep pursuing it. Compression (as like all tech) is a moving target with the platforms regularly improving on many axes.


The video codec battle is not about file size, it is about streaming bandwidth.


Are those different? If I chop up a video file into chunks, I'm streaming it, and if I save a stream I have a file. With a buffer, I would expect the sizes involved to be identical. (Although without a buffer I'd expect streaming to be worse)


Yes. For the most part, you need to be able to encode to a low and constant bandwidth at 30/60/+ fps, possibly even with limited latency. Then there are also some lesser, but also important aspects, such as the need to be able to start in the middle of a stream, handle lost packages etc.


Latency will become more important with video conferencing (already now) and AR/VR.


>Are those different?

No. In the end you transmitted a file that has a certain size. It doesn't really matter if you save that file or just use a volatile buffer.


>some objective measure of fidelity to the original

This is part of the problem. What is an "objective" measure of perceptual fidelity to the original?


A measure that has consistently high agreement when evaluated by humans, repeatable across laboratories. Image quality tests can (and are) be conducted in a structured fashion.


No such measure exists, at least not one that can’t be gamed.


No if you allow / don't control for attempts to game it then no such metric can be made. One attempt to counter this is to use multiple independent labs.


SSIM



It's not perfect but much better than nothing


Usually several test sets are created periodically to handle any type of situation in a variety of resolutions and depending on which part of the codec people are working on, this process is very complicated and take a lot of time to ensure that all the kind of content is representative. The metric is not optimal for video content, but gives an idea if they are going in the right direction. Ultimately, blind tests with people (experts and non experts) are done on the content to have subjective measures added to the objective ones


This has been going on since the birth of video codec. So for anyone who is new to this, the 50% are under the best case scenario. So they will compare it in 4K Res, where VVC has the highest advantage against HEVC, in measurement that fits those numbers. ( PSNR or SSIM ).

For lower res you will see lower percentage. You see the same claims from other codec as well.


Those numbers are valid for a specific PSNR... but marketing and PR obviously prefer to skip that point.

For comparison, HEVC claimed 50 to 60% compared to AVC. You can compare with reality...


PSNR is a terrible way to measure video quality. SSIM is better but not great. VMAF is good but it’s skewed towards the types of material netflix has.

Objective video quality is a tricky thing to do and you can easily come out with a codec that’s great at fine details in foreground people but terrible at high speed panning of water and trees. Depending on the material depends where you want the quality to go.


Nobody mentioning EVC? Worth a read for anyone concerned about patent licensing:

https://en.wikipedia.org/wiki/Essential_Video_Coding

There are 3 video coding formats expected out of (former) MPEG this year:

https://www.streamingmedia.com/Articles/Editorial/Featured-A...

So this isn't necessarily the successor to HEVC (except that it is, in terms of development and licensing methods).


> A uniform and transparent licensing model based on the FRAND principle (i.e., fair, reasonable, and non-discriminatory) is planned to be established for the use of standard essential patents related to H.266/VVC.

Maybe. On the other hand, maybe not. Leoanardo Chiariglione, founder and chairman of MPEG, thinks MPEG has for all practical purposes ceased to be:

https://blog.chiariglione.org/a-future-without-mpeg/

The disorganised and fractured licensing around HEVC contributed to that. And, so far, VVC's licensing looks like it's headed down the same path as HEVC.

Maybe AV1's simple, royalty-free licensing will motivate them to get their act together with VVC licensing.


Shouldn't deep learning based video codecs take over dedicated hardware video decoders as more tensor cores become available in all new hardware?

NVIDIA's DLSS 2.0 supersampling is already moving into that direction.


Instead of a video file or stream, that would be more like shipping a program that recreates the video. It might be cool, but it's not really feasible to play back that kind of thing on normal TV hardware.


I'm not sure what you mean. There are already multiple research articles that show that deep neural network based video compression can be competitive, here's an example:

https://papers.nips.cc/paper/9127-deep-generative-video-comp...


> it's not really feasible

How do you know this?

TV hardware is on par with browsers. Anything is a program.


It's surprisingly difficult to guarantee a high quality on every kind of videos and formats using a neural network. Furthermore, the network should be able to handle all the corner cases (think about color profiles alone..)


The fact that the underground scene is still pumping 264 instead of 265 (I'd estimate 90/10 split optimistically) tells me the real world is not quite ready for 266.

So I guess it comes down to 266 hw support. Or powerful CPUs that can push sw decoding?


What I don't understand, why do internationa; standardization organizations allows patent-encumbered technologies to become de-jure standards.

MPEG, WiFi, GSM…

IMHO, intentional standards must be implementable without any patent fees, or they are very bad standards.


There's no law requiring wifi - "de facto". And they're standards because they're quite good! They have hardware support and parallelization and account for all use cases, even the marginal ones, and have reference implementations and support. Standards orgs don't care about patents because they're not relevant. This isn't a case of trolling - this is literally a software patent being used for its intended purpose by its developer, to extract profit by coming up with a new idea, and letting others use it.


International standardization organizations (like ISO) are not governmental organizations. They are private entities, which sometimes become too involved in official standards. But they are controlled by whoever funds them.


Standards organisations are older than publicly-available software. The concept of "reasonable and non-discriminatory" patent licensing was what they went with, and it seemed sensible at a time when goods were physical and the idea of giving away a product for which standards would be relevant would be ludicrous.


A quote from an Ecma presentation

    "ECMA for instance has made all the standards for DVD and optical disks. There were 5 recording formats. So there you are a little bit uneasy, of course. And again after a few beers I can ask the people in the room. Why do you want to have 5 formats? Do you still call that standardization? The answer is always the same: You are well paid. Shut up"
https://youtu.be/wITyO71Et6g?t=226


H.265 went absolutely nowhere


I'm no expert when it comes to video codecs but I'm surprised that we're still able to see such strong claims of algorithmic improvements to h264, and now to h265. I'm also aware of how patent-encumbered this whole field is and I'm skeptical that this is just a money grab.

This is really just a press release, what's actually new? Can it be implemented efficiently in hardware?


Your skepticism is very healthy, especially in this arena. With video codecs, information theory is ultimately the devil you must answer to at the end of the day. No amount of patents, specifications or algorithmic fantasy can get you away from fundamental constraints.

It seems like the major trade-off being taken right now is along lines of using more memory to buffer additional frames. This can help you in certain scenarios, but in the general case, you cannot ever hope that a prior frame of video has any bearing on future frames of video. It is just exceedingly likely that most frames of video look much like prior frames. So, you can certainly play this game to a point, but you will quickly find yourself on the other end of the bell curve.

You can also play games with ML, but I argue that you are going even further from the fundamental "truth" of your source data with this kind of technique, even if it appears to be a better aesthetic result in isolation of any other concern.

There are also lots of one-off edge cases that have always been impossible to address with any interframe video compression scheme. Just look at the slowmo guys on youtube dump confetti on a 4K camera. No algorithm except for the dumbest intraframe techniques (i.e. JPEG) can faithfully reproduce scenes with information this dense, and usually at the expense of dramatic bandwidth increases.

Bandwidth is cheap and ubiquitous. I say we just use the algorithms that are the fastest and most efficient for our devices. We aren't in 2010 sucking 3G or edge through a straw anymore. Most people can get 20+mbps in their smartphones in decently-populated areas.


The advantage of the H-series of codecs is strong support of hardware implementation. This has been a selling point since H.262. You can get a H.265 IP core from Xilinx, Intel, and other major vendors -- so the actual runtime performance for H.266 (once a core is available) will be very low and constant (and comparable to current codecs). Bandwidth and storage space are real costs, despite the handwaving around it, and reducing these requirements while not reducing visual quality is an important step.

As for "information-dense scenes": Pathologic cases such as the HBO intro screen are encoded into modern codecs as noise, and regenerated client-side, because there's no actual information there. These scenes are either engineered or pure noise.


that sounds great, but this is a press release with no real technical details. can anyone in the know add some context? for instance, whats the tradeoff? I assume more CPU?

webrtc based video chats are all still using h264, did they not adopt 265 yet for technical or licensing reasons? what is the likelihood of broad browser support for h266 anytime soon?


H.265/HEVC takes about ten times as much computation to encode than H.264 [1], so H.264 still has legitimate technical use cases, even with licensing/patents aside.

This makes it great for a company like Netflix or YouTube, but less good for one-to-one and/or battery sensitive use cases like video calls. However, specialized chips help, and some mobile devices can record in HEVC in real time (mine from 2019 can). I believe current smartphones have HEVC encoding hardware, but I'm struggling to find a source for that right now.

I haven't seen the details of this new codec yet, but it's quite possible it also has a large encoding cost which will make it better suited to particular use cases, as opposed to a blanket upgrade.

[1] http://www.praim.com/en/news/advanced-protocols-h264-vs-h265...


iPhone 7 onwards[1], Qualcomm Snapdragon 610 onwards[2], and Intel Skylake and later CPUs[3] can all encode and decode H.265 in hardware to varying profile levels.

1: https://support.apple.com/en-gb/HT207022

2: https://www.qualcomm.com/snapdragon/processors/comparison

3: https://trac.ffmpeg.org/wiki/Hardware/QuickSync


QuickSync is actually a feature of Intel's integrated GPU. Parts without GPUs, no matter how recent, don't have QuickSync.


> webrtc based video chats are all still using h264, did they not adopt 265 yet for technical or licensing reasons?

Is that with x265 built into both browsers ? I build it into mine but I don't think it is the default for ffmpeg.


Not sure why this was downvoted. These all seem like very reasonable questions that others here might be able to answer.


WebRTC aims to adopt AV1. H.### codecs are a dead end.


Huh. I wonder how encoding speeds compare. I rarely chose h265 over h264 because similar levels of visual quality took massively more time.


My guess is, encoding speed will be worse. Video codecs for non-realtime applications are optimized for size and acceptably cheap playback. Encoding performance doesn't really matter since you encode only once but play and store more often.


But for most situation you encode once and play multiple times. Wouldn't it be better to reduce storage and bandwidth costs with a smaller file (assuming the same quality)?


It's a trade off. When I have a batch of 40 videos and encoding h264 takes 20 minutes per video and h265 takes 4 hours means the difference between 13 days and 160 days.

The latter isn't practical, I'll eat the couple hundred MB in order to save a lot of time.


I think by "13 days and 160 days" you mean "13 hours and 160 hours".


At some point the compressed version of "Joker" will be 45 chars:

"Sequel to Dark Night starring Joaquin Phoenix"

Of course we will not have to film movies in the first place then. We will just put a description into a compressor start watching.


8 MiB Shrek is kind of an AV1 meme at this point.


H.265 is still not mainstream, and not used to full extend of its performance

I'm not sure if 265 is worth spending efforts on now when 266 is about to crash the party, and will be equally adopted at least "equally poorly"


H.265 seems pretty mainstream by now. Older devices obviously don't support it in hardware, but pretty much all newer ones seem to, no?

It's just a slow percolation throughout the ecosystem as people buy new hardware and video servers selectively send the next-generation streams to those users.

The effort on h.265 has already been spent. Now it looks like h.266 is the next generation. It's going to be years before chips for it will be in devices. That's just how each new generation works.


H.266 will take many years to become a usable standard. H.265 is quite mainstream - even some cheap smartphones shoot it. Many modern DLSRs / mirrorless cameras shoot it.


H.265 seems to be gaining traction slower because many of the older devices, including laptops and some smart TVs don't support it. H.264 became ubiquitous for piracy since it offered tiny size and worked on older devices, making it the perfect choice for those in poorer countries where tech isn't the first priority in a household. I wonder if H.266 will run into the same problems as H.266.


Question is, how does it compare to AV1?


I guess only time will tell.

AV1 is supposed to be 30% better than HEVC and they claim H.266 is 50% better than HECV. This would mean that H.266 is roughly 30% better than AV1. By better I'm always referring to the bandwidth/space needed.

But take this with more than a grain of salt since bandwidth/space are only one of many things that matter and also these comparisons are dependent on so many things like resolution, material (animatic/real), etc. etc.


I don't think your math adds up. Is 150 30% better than 130? It is only 14% better.

Regardless, these early performance claims are most likely complete bullshit.


It’s about size: 50 is about 30% better than 70, which is 30% better than 100.


AV1 is supposed to be 30% better than HEVC Source? If I recall correctly HEVC outperform AV1


It is mildly amusing that the very simple vector art "VVC" logo on their webpage is displayed by sending the viewer a 711 KB .jpg file.


Will it be used? Probably the last one that does not use some sort of AI Compression.See this for image compression https://hific.github.io/ In the next 10 years AI Compression will be everywhere. The problem will be standartisation. Classic compression algoritms can't beat AI ones.


AI compression is super, super cool... but while standardization is certainly a major issue, isn't the model size a much larger one?

Given that model sizes for decoding seem like they'll be on the order of many gigabytes, it will be impossible to run AI decompression in software, but will need chips, and chips that are a lot more complex (expensive?) than today's.

I think AI compression has a good chance of coming eventually, but in 10 years it will still be in research labs. There is absolutely no way it will have made it into consumer chips by then.


"Isn't the model size a much larger one?" yap It will probably be different, and systems will have to download the weights and network model, as new models come in, I don't think that we will have a fixed model with fixed weights, the evolution is too fast. Decoding will take place using the AI chip on the device aka "AI accelerator"


I wonder how small would one of those 700mb divx/xvid movies would be if compressed with this new encoding method.


They talk about saving 50% of bits over h.265, but also talk about it being designed especially for 4K/8K video.

Are normal 1080p videos going to see this fabled 50% savings over h.265? Or is the 50% only for 4K/8K, while 1080p gets maybe only 10-20% savings?

The press release unfortunately seems rather ambiguous about this.


Among other things, I have worked with and developed technology in the uncompressed professional imaging domain for decades. One of the things I always watch out for is precisely the terminology and language used in this release:

"for equal perceptual quality"

Put a different way: We can fool your eyes/brain into thinking you are looking at the same images.

For most consumer use cases where the objective is to view images --rather than process them-- this is fine. The human vision system (HVS, eyes + brain processing) is tolerant of and can handle lots of missing or distorted data. However, the minute you get into having to process the images in hardware or software things can change radically.

Take, as an example, color sub-sampling. You start with a camera with three distinct sensors. Each sensor has a full frame color filter. They are optically coupled to see the same image through a prism. This means you sample the red, green and blue portions of the visible spectrum at full spatial resolution. If we are talking about a 1K x 1K image, you are capturing one million pixels of each, red, green and blue.

BTW, I am using "1K" to mean one thousand, not 1024.

Such a camera is very expensive and impractical for consumer applications. Enter the Bayer filter [0].

You can now use a single sensor to capture all three color components. However, instead of having one million samples for each components you have 250K red, 500K green and 250K blue. Still a million samples total (that's the resolution of the sensor) yet you've sliced it up into three components.

This can be reconstructed into full one million samples per color components through various techniques, one of them being the use of polyphase FIR (Finite Impulse Response) filters looking across a range of samples. Generally speaking, the wider the filter the better the results, however, you'll always have issues around the edges of the image. There are also more sophisticated solutions that apply FIR filters diagonally as well as temporally (use multiple frames).

You are essentially trying to reconstruct the original image by guessing or calculating the missing samples. By doing so you introduce spatial (and even temporal) frequency domain issues that would not have been present in the case of a fully sampled (3 sensor) capture system.

In a typical transmission chain the reconstructed RGB data is eventually encoded into the YCbCr color space [1]. I think of this as the first step in the perceptual "let's see what we can get away with" encoding process. YCbCr is about what the HVS sees. "Y" is the "luma", or intensity component. "Cb" and "Cr" are color difference samples for blue and red.

However, it doesn't stop there. The next step is to, again, subsample some of it in order to reduce data for encoding, compression, storage and transmission. This is where you get into the concept of chroma subsampling [2] and terminology such as 4:4:4, 4:2:2, etc.

Here, again, we reduce data by throwing away (not quite) color information. It turns out your brain can deal with irregularities in color far more so than in the luma, or intensity, portion of an image. And so, "4:4:4" means we take every sample of the YCbCr encoded image, while "4:2:2" means we cut down Cb and Cr in half.

There's an additional step which encodes the image in a nonlinear fashion, which, again, is a perceptual trick. This introduces Y' (Y prime) as "luminance" rather than "luma". It turns out that your HVS is far more sensitive to minute detail in the low-lights (the darker portions of the image, say, from 50% down to black) than in the highlights. You can have massive errors in the highlights and your HVS just won't see them, particularly if things are blended through wide FIR filters during display. [3]

Throughout this chain of optical and mathematical wrangling you are highly dependent on the accuracy of each step in the process. How much distortion is introduced depends on a range of factors, not the least of which is the way math is done in software or chips that touch every single sample's data. With so much math in the processing chain you have to be extremely careful about not introducing errors by truncation or rounding.

We then introduce compression algorithms. In the case of motion video they will typically compress a reference frame as a still and then encode the difference with respect to that frame for subsequent frames. They divide an image into blocks of pixels and then spatially process these blocks to develop a dictionary of blocks to store, transmit, etc.

The key technology in compression is the Discrete Cosine Transform (DCT) [4]. This bit of math transforms the image from the spatial domain to the frequency domain. Once again, we are trying to trick the eye. Reduce information the HVS might not perceive. We are not as sensitive to detail, which means it's safe to remove some detail. That's what DCT is about.

So, we started with a 3 sensor full-sampling camera, reduced it to a single sensor and three away 75% of red samples, 50% of green samples and 75% of blue samples. We then reconstruct the full RGB data mathematically, perceptually encode it to YCbCr, apply gamma encoding if necessary, apply DCT to reduce high frequency information based on agreed-upon perceptual thresholds and then store and transmit the final result. For display on an RGB display we reverse the process. Errors are introduced every step of the way, the hope and objective being to trick the HVS into seeing an acceptable image.

All of this is great for watching a movie or a TikTok video. However, when you work in machine vision or any domain that requires high quality image data, the issues with the processing chain presented above can introduce problems with consequences ranging from the introduction of errors (Was that a truck in front of our self driving car or something else?) to making it impossible to make valid use of the images (Is that a tumor or healthy tissue?).

While H.266 sounds fantastic for TikTok or Netflix, I fear that the constant effort to find creative ways to trick the HVS might introduce issues in machine vision, machine learning and AI that most in the field will not realize. Unless someone has a reasonable depth of expertise in imaging they might very well assume the technology they are using is perfectly adequate for the task. Imagine developing a training data set consisting of millions of images without understanding the images have "processing damage" because of the way they were acquired and processed before they even saw their first learning algorithm.

Having worked in this field for quite some time --not many people take a 20x magnifying lens to pixels on a display to see what the processing is doing to the image-- I am concerned about the divergence between HVS trickery, which, again, is fine for TikTok and Netflix and MV/ML/AI. A while ago there was a discussion on HN about ML misclassification of people of color. While I haven't looked into this in detail, I am convinced, based on experience, that the numerical HVS trickery I describe above has something to do with this problem. If you train models with distorted data you have to expect errors in classification. As they say, garbage-in, garbage-out.

Nothing wrong with H.266, it sounds fantastic. However, I think MV/ML/AI practitioners need to be deeply aware of what data they are working with and how it got to their neural network. It is for this reason that we've avoided using off-the-shelf image processing chips to the extent possible. When you use an FPGA to process images with your own processing chain you are in control of what happens to every single pixel's data and, more importantly, you can qualify and quantify any errors that might be introduced in the chain.

[0] https://en.wikipedia.org/wiki/Bayer_filter

[1] https://en.wikipedia.org/wiki/YCbCr

[2] https://en.wikipedia.org/wiki/Chroma_subsampling

[3] https://en.wikipedia.org/wiki/Gamma_correction

[4] https://www.youtube.com/watch?v=P7abyWT4dss


Don't you think that the iterated convolution process in neural networks is, in a measure, able to overlook this kind of 'visual trickery' ? I can imagine that the network is not able to perform well if you change the color profile of the input when you trained on another one, but small texture attenuations, diminished chroma components, etc. may not be as important when the image is downsampled and split a huge number of times (wondering)


Here's one way to look at it: Contrast and well defined edges can be important in feature extraction. Our vision system, on the other hand, can do just fine with less information in the high frequencies (where edges live).


I see your point, and I'm not particularly defending neural networks, but IMO, nothing prevents a network to generate a kernel able to detect 'fuzzy edges' and to refine it to an edge after some convolutions. So, if the input images are always consistent between them and with the input images for inference, I think the problem may be diminished (?), even if, as you say, we introduce some misclassification error. Obviously, to have the guarantee that all the input images are generated in the same way is a very strong condition difficult to achieve.


From my perspective, the only way to get there is if AI practitioners make a paradigm shift towards encoding understanding rather than making classifier systems trained with massive data sets. The classification approach has a very real asymptotic limit on what can be achieved. You can train NN's using large data sets on some domains but not all domains. Just think about what a dog can do, even just a puppy. We are nowhere near to that. Not even close. This is because our AI classifies without understanding.

I have books on AI that are thirty years old. I think I can say they cover somewhere between 80% and 90% (if not more) of what AI is today. The difference is computing that is thousands, millions, of times faster, massive amounts of storage, etc. In other words, one could very well argue we haven't done much in 30 years other than build faster computers.


I don't think that neural networks are the right framework to achieve general purpose artificial intelligence (AGI). And indeed, the AI field may need a paradigm shift to achieve higher classification goals. I believe that probabilistic neural networks may be an interesting extension toward general purpose AI, even though this kind of networks need even more data.

If we take the example of a puppy, it seems to generalize pretty well using something like one-shot learning, but is it? I cannot confirm for sure how much data a puppy has already digested before being able to do what we could call "one shot learning". So maybe, the exposure to data is already there, waiting for a specialization toward a particular task.

Giving the ability to a network to be probabilistic enable it to do inference using uncertainty, which is clearly a neat feature when you are gravitating toward AGI for scene understanding.

In the case of video compression, scene understanding may introduce more artifacts IMO: Even if the scene is captured with high end cameras, on a pixel level basis, the edges will never be perfectly neat. I think this will decrease the ability of any network to "understand" which object is at the edges, this results in low classification rates on them, resulting in bad compression/decompression quality (?) for features that are important to the human eye.

All in all, I'm not sure that NN are the right tool for this kind of problems. But we are diverging from the main subject VVC, Thanks for the very interesting comments :)


Really informative comment.


50% is very impressive. It's not just a gold rush of low hanging fruit anymore, they did real work, created real benefits. I'm willing to pay a little tax on my devices or softwares for this.


My 2012 Mac Mini has quickly become much less useful since YouTube switched from H.264 (AVC) to VP9 for videos larger than 1080p a couple of years ago (Apple devices have hardware decoders). I've tested 4K h.264 videos and they play wonderfully thanks to the hardware.

My internet connection speeds and hard drive space have increased much faster than my CPU speeds (internet being basically a free upgrade).

So I don't appreciate new codecs coming out and obsoleting my hardware to save companies a few cents on bandwidth. H.264 got a good run in, but there isn't a "universal" replacement for it where I can buy hardware with decoding support that will work for at least 5-10 years.


Honestly, the expectation that a 2012 computer will play 4K video seems a little unreasonable, no? 4K video virtually didn't even exist back then. I'm actually amazed it even handles it in h.264.

This isn't about saving companies a few cents on bandwidth. It's about halving internet traffic, about doubling the number of videos you can store on your phone. That's pretty huge. You can still get h.264 video in 1080p on YouTube so your computer is still meeting the expectations it was manufactured for.


It's not so much about it being able to handle 4K, but Youtube already has too low of a bitrate for 1080p (resulting in MPEG artifacts, color banding, etc). So I like to watch in 2.7K or 4K downsampled, since at least I get a higher bitrate.

The bigger problem that I didn't mention is with videoconferencing: FaceTime is hardware accelerated and has no issues with 720p, but anything WebRTC seems to prefer VP8 or VP9 codecs, which fails on my Mini and strains my 2015 MBP. Feels like a waste of perfectly good hardware to me.


Maybe you can force the YouTube web app to use the H.264 version?

https://github.com/erkserkserks/h264ify

Or is there no longer an H.264 4k version?


I have that installed and it works really well! But H.264 only goes up to 1080p on YouTube as of a couple of years ago. As mentioned in other comments, I prefer 2.7K or 4K and downsample to get higher bitrate, but this isn't possible without downloading and converting. (I only do that for the rare video where I really want the best quality; most of the time I tolerate the YouTube 1080p with h264ify).


Your 2012 Mac Mini's video output doesn't go higher than 2560x1600 anyway, so just staying at 1080p doesn't seem like a huge sacrifice.


True. I just addressed this in a reply above this one as well: Youtube already has too low of a bitrate for 1080p (resulting in MPEG artifacts, color banding, etc). So I like to watch in 2.7K or 4K downsampled, since at least I get a higher bitrate.

Can't do that anymore without either buying a new computer or youtube-dl and recompress the bigger version (which takes hours for minutes of video on my poor machine).


I first got into computers in the mid 90s. Back then, clock speeds were doubling about every 2 years. Combined with architecture improvements with each CPU generation, it really meant that computers were almost completely obsolete in less than 5 years, as new hardware would literally be over 4 times faster in all applications. So with this in mind, I find it puzzling that you'd think 8 year old hardware should still run today's software and algorithms.

But FWIW, H.266 isn't going to be in any sort of wide use for a few years. Buy something that supports H.265 and you'll probably be good for at least 5 years.


>So I don't appreciate new codecs coming out and obsoleting my hardware

You could say something similar about any other technological advance.


100%, I was worried I'd seem like an old 35-year-old man when I posted my comment. And it's true, I upgrade my hardware much less frequently than I did when I was younger.

I don't mind replacing a machine after 8 years of service, but h.265 still isn't supported by Google/YouTube, and Apple refuses to add hardware decoding for VP8/VP9, so there's no universal codec that will work as efficiently as h.264 did on my Mini and multitude of MacBooks and iPhones all this time.


Yes, there is one, it's called AVC.


Download the format you need using youtube-dl


I'd rather have slightly larger files that don't take hardware acceleration only available on modern CPU to decode without dying (ie, h264). Streaming is creating incentives for bad video codecs that only do one thing well: stream. Other aspects are neglected.

And it's not like any actual 4K content (besides porn, real, nature, or otherwise) actually exists. Broadcast and movie media is done in 2K then extrapolated and scaled to "4K" for streaming services.


Huh? TV and movies are widely shot with 4K cameras these days.

What is 2K? I've never even heard of a "2K" camera. Where did you get the idea things are being filmed in "2K" and being scaled to 4K?

Genuinely curious where you're getting this information from. Or are you confused because 1080p refers to the vertical resolution while 4K refers to the horizontal resolution?


https://www.engadget.com/2019-06-19-upscaled-uhd-4k-digital-... is one easily found example but it wasn't where I had read it. I'm pretty sure I've seen it on HN itself.

edit: here's another https://old.reddit.com/r/cordcutters/comments/9x3v4e/just_le...


OK, so by 2K you mean 1080p. That's a very unusual nomenclature but I see what you mean, thanks.

The top link in the reddit thread disproves what you're saying though:

https://4kmedia.org/real-or-fake-4k/

Somewhere between a third and a half of films are listed as "real 4K".

So there is actually tons of real 4K content. (And the list is just films -- there are plenty of streaming TV shows in real 4K too, like Mrs Maisel.)

There might be another reason for the misperception -- it's true that film editing is generally done in something lower-quality like compressed 1080p, but that's just for speed/space while you work. All the clips "point" to the 4K originals, so when the final master is produced, it's still produced out of that "real" 4K.


BTW: I really dislike calling horizontal 2660px as "2K" resolution. It's even close to 3K than 2K. It should be called as 2.5K.


Will devices need new hardware. Also I thought companies were all on board with royalty free VP9


So how does it achieve this compression from a laypersons perspective?


How many weeks does it take to encode a 1-minute video on an average (non-gaming, I mean without a huge fancy GPU card or an i9/Threadripper CPU) PC?


Anyone know how this compares to AV1?


It will be first adopted by pirates for sure


H264 seems to still be the preferred codec in this space, even though H265 is a smaller file size.

Largely due to the CPU over head of H265, though I am not sure why more people do not use GPU encoding over CPU Encoding, I have never been able to notice the difference visually


From what I can tell from a cursory look at some popular tv show torrents, everything 1080p and below is still h.264, with everything above that running on h.265


Nvidia's NVENC doesn't support CRF, one of the more popular methods of rate control during encoding.


NVENC is too low quality for the scene.


Not for 4K rips. I see tons of x265 10 bit encodes, just look for UHD or 2160p copies.


Due to the patenting Fraunhofer probably did more harm to humanity than something good. At least its software division.


Another patent encumbered monstrosity? No, thanks. Enough of this junk. Some just never learn.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: