Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's interesting that they are able to continue improving video compression. You'd think that it would have all been figured out by now.

Is this continued improvement related to the improvement of technology? Or just coincidental?

Like, why couldn't have H.266 been invented 30 years ago? Is it because the computers back in the day wouldn't have been fast enough to realistically use it?

Do we have algorithms today that can compress way better but would be too slow to encode/decode?



Video compression is a calculus of IO capacity, memory, and algorithmic complexity. Take the MPEG-1 codec for instance, it was new about 30 years ago. While today most people think of MPEG-1 videos as low quality the spec provides the ability to handle bit rates up to 100Mb/s and resolutions up to 4095x4095. That was way higher than the hardware of the time supported.

One of MPEG-1's design goals was to get VHS-quality video at a bitrate that could stream over T1/E1 lines or 1x CD-ROMs. The limit on bitrate led to increased algorithmic complexity. It was well into the Pentium/PowerPC era until desktop systems could play back VCD quality MPEG-1 video in software.

Later MPEG codecs increased their algorithmic complexity to squeeze better quality video into low bit rates. A lot of those features existed on paper 20-30 years ago but weren't practical on hardware of the time, even custom ASICs. Even within a spec features are bound to profiles so a file/stream can be handled by less capable decoders/hardware.

There's plenty of video codecs or settings for them that can choke modern hardware. It also depends on what you mean by "modern hardware". There's codecs/configurations a Threadripper with 64GB of RAM in a mains powered jet engine sounding desktop could handle in software that would kill a Snapdragon with 6GB of RAM in a phone. There's also codecs/configurations the Snapdragon in the phone could play using hardware acceleration that would choke a low powered Celeron or Atom decoding in software.


Are there codecs that require high compute (Threadripper) for encode but can be easily decoded on a Snapdragon ?


Yes — many codecs can be optimized for decoding at the expense of encoding. This is appropriate for any sort of broadcast (YouTube, television, etc).

Also, in many applications, it’s suitable to exchange time for memory / compute. You can spend an hour of compute time optimally encoding a 20-minute YouTube video, with no real downside.

Neither of these approaches are suitable for things like video conferencing, where there is a small number of receivers for each encoded stream and latency is critical. At 60fps, you have less than 17ms to encode each frame.

Interestingly, for a while, real-time encoders were going in a massively parallel direction, in which an ASIC chopped up a frame and encoded different regions in parallel. This was a useful optimization for a while, but now, common GPUs can handle encoding an entire 1080p frame (and sometimes even 4K) within that 17ms budget. Encoding the whole frame at once is way simpler from an engineering standpoint, and you can get better compression and / or fewer artifacts since the algorithm can take into account all the frame data rather than just chopped up bits.


Surely videoconferencing doesn’t actually use 60 FPS...


Why not? It's not full-motion video, it's literally talking heads. Talking heads are easy to push to 60fps on a relatively low bitrate.


Some web conferencing would want to do 60fps. There's also realtime streaming like Twitch, PS Now, and Google's Stadia.


Twitch isn't real time.


Yes it is. The delay on a Twitch stream doesn't mean they don't have to deal with encoding and transmitting frames at full speed. If Twitch wasn't real time, you'd only be able to watch live streams slowed down!


That's a different definition than most people mean when they say "real-time".


Not me. All realtime systems have some latency, but what makes them realtime is that they must maintain throughput, processing data as quickly as it comes in. You can subdivide to hard-realtime and soft-realtime depending on how strict your latency requirements are, but it is still realtime.


really? I think not. Lets use speech synthesis as an example: I would call speech synthesis real time if it takes less than one second to produce one second of synthesized speech. I think you're probably thinking of the word "live". There's always going to be a small delay when re-encoding. Real time doesn't mean 0ms delay, that's impossible. Twitch has a small delay, but it's still re-encoded in real time (encoding 1 second takes ≤ 1 second for twitch).


The delay (configurable) isn't due to the encoder. And the encoder has to process everything in real time, otherwise you start skipping frames or fall behind.


Pretty much all of them. Encode complexity for most codecs is way higher than decode complexity (on purpose).

This has been an issue with AV1, it's got relatively high decode complexity and there's not a lot of hardware acceleration available. The encode complexity is fantastic though and is very slow even on very powerful hardware, less than 1fps so ~30 hours to encode a one hour video. Even Intel's highly optimized AV1 encoder can't break 10fps (three hours to encode an hour of video) while their h.265 encoder can hit 300fps on the same hardware.


A lot of video codecs are NP hard to encode optimally, so rely on heuristics. So you could certainly say that some approaches take a lot of compute power to encode, but are much more easily decodable.


The codecs aren't NP hard. Rather, the "perfect" encode is. That's where the heuristics are coming into play. The codec just specifies what the stream can look like, the encoders have to pick out how to write that language and the decoders how to read it.

Decoders are relatively simple book keepers/transformers. Encoders are complex systems with tons of heuristics.

This is also why hardware decoders tend to be in everything and are relatively cheap with equal quality to software counterparts. On the flip side, hardware encoders are almost always worse than their software counterparts when it comes to the quality of the output (while being significantly faster).


> The codecs aren't NP hard. Rather, the "perfect" encode is.

That's what I meant by my first sentence.

And I'll throw out there that the vast majority of 'hardware codecs' are in fact software codecs running on a pretty general purpose DSP. You could absolutely reach the same quality as a high quality encoder given the right organizational impetus of the manufacturer; they simply are focused on reaching a specific real time bitrate for resolution rather than overall quality. By the time they've hit that, there's a new SoC with it's own DSPs and it's own Jira cards that needs attention. If these cores were more open, I'm sure you'd see less real time focused encoder software targeting them as well.


I wonder why all of the MPEG1 encoders of the day enforced a maximum of 320x240?


While the spec allowed for outrageous settings playback wouldn't have been possible. Most hardware decoders were meant for (or derived from) VCD playback. The VCD spec covered CIF video which mean QVGA would fall into the supported macroblock rate for hardware decoders.

In MPEG-1's heyday there would haven't been a lot of point in encoding presets producing content common hardware decoders couldn't handle.

There were several other video codecs in the same era that didn't have hardware decode requirements. Cinepak was widely used and could be readily played on a 68030, 486, and even CD-ROM game consoles. As I recall Cinepak encoders had more knobs and dials since the output didn't need to hit a hardware decoder limitation.


PAL/NTSC resolutions.


Here is a hint:

> Because H.266/VVC was developed with ultra-high-resolution video content in mind, the new standard is particularly beneficial when streaming 4K or 8K videos on a flat screen TV.

Compressing video is very different from gzipping a file. It's more about human perception than algorithms, really. The question is "what data can we delete without people noticing?", and it makes sense that answer is different for an 8k video than a 480p video.


So compressing a 1080p video with H266 will not result in similar file size/quality improvements as a 4k video? How much are we looking at for 1080p, 10%?


yup, from the example I remember(I read it through link on HN but cannot find it in quick search, I wish I could link it),

if you film(still film no movement) the macbook pro top to bottom in h264/MP4 at 1024 p resolution and again you take a picture from your camera.

the results will be shocking,

the video of 5-10 seconds will have lower storage size than the size of single Image. but when you inspect video carefully you will see the tiny details are missing, like the edges are not as sharp for metal body, the gloss of metal is bit different, the tiny speaker holes on top of keyboard are clear in Images and can be individually examined while in video they are fuzzy pixelated and so on.

so, The end results: A 5 second video with 10s of frames per second is smaller in size than single Image taken from same camera.


You're thinking of "h264 is magic"

https://sidbala.com/h-264-is-magic/

GREAT article


yes,that is the article I was referring.


Let's also be clear, the still image will be the full resolution of the sensor. The video taken on the same camera is usually a cropped section of the sensor. You're also comparing a spatial compression (still image) vs a temporal compression (video), and at what compression levels are each image taken?


Additionally the images are not very compressed (edit: you mentioned that in your comment, sorry). While the RAW files can be a couple dozen megabytes and the losslessly compressed PNGs are still 5-15 mb, good cameras normally set the JPEG quality factor to a high amount and so even with a JPEG you're getting pretty close to a lossless image. Whereas in video you can often plainly notice the compression artifacts & softness. A more fair comparison would be the video file to a JPEG with equivalent visual quality.


see more details, https://sidbala.com/h-264-is-magic/

I know that's not fair comparison, but imaging clever compressions were not invented and you would download terabytes of data to view a small movie.


What question are you trying to answer? Nobody asked what is compression and why do we use it. I have been encoding videos since VideoCDs were a thing, so I have a pretty good understanding of how compression works. The fact that I differentiated between spatial and temporal compression should have been a clue. All I was pointing out was that compressing a postage stamp sized video and comparing its filesize to a large megapixel image isn't a fair comparison. (yes, I'm jaded by calling 1080p frame size a postage stamp. I work in 4K and 8K resolutions all day.)


>How much are we looking at for 1080p, 10%?

We don't know yet. There are no public technical details (that I know of) for H266 yet, but if I recall H265 gave the same 50% reduction in bandwidth claims, and for years people stuck with H264, because it was higher quality due to dropping off less subtle parts of the video you really want to see. Only in the last couple of years has H265 really started to become embraced and used by piracy groups. Frankly, I don't know what changed. I wouldn't be surprised if there was some sort of H265 feature addition that improved the codec.


H.265 was always better from a technical perspective but that's not everything which factors into a video codec decision. H.264 was supported everywhere, including hardware support on most platforms. You could generate one file and have it work with great experience everywhere, whereas switching to anything else likely required adding additional tools to your workflow and trying to balance bandwidth savings against both client compatibility and performance — if you guess wrong on the latter case, users notice their fans coming on / battery draining in the best case and frames dropping in the worst cases.

Encoder maturity is also a big factor: when H.265 first came out, people were comparing the very mature tools like x264 to the first software encoders which might have had bugs which could affect quality and were definitely less polished. It especially takes time for people to polish those tools and develop good settings for various content types.


> H.264 was supported everywhere

this. our TV (a few years old, "smart") can play videos from a network drive, but doesn't support H.265. reencoding a season's worth of episodes to H.264 takes a while...


H.264 will become the "mp3" of video I think. Universally supported, and the pstents will run out micj sooner than the newer formats.


I think you’re right - for a lot of people it was the first to hit the “good enough” threshold: going from MPEG 1/2 to the various Windows Media / Real / QuickTime codecs you saw very noticeable improvements in playback quality with each new release, especially in things like high motion scenes or with sharp borders.

That didn’t stop, of course, but I generally don’t notice the improvements if I’m not looking for them. Someone with a 4K or better home theater will 100% benefit from newer codecs’ many improvements on all those extra pixels but if you’re the other 95% of people watching on a phone or tablet, lower-end TV with the underpowered SoC the manufacturer could get for $15, etc. you probably won’t notice much difference and convenience will win out for years longer.


Reencoding will cumulate H264 artifacts with H265 artifacts (and psychovisual optimization for one with psychovisual optimization for the other). Unless you reencode from source don't do that.


i appreciate your concern :) but honestly, is that a practical problem, or a theoretical one? the end result was fine to watch in our case. (and upload itself wasn't of great quality anyway)


It can be a practical problem depending on the source: if you're moving from a relatively higher-resolution / less-compressed video it won't be noticeable but if you're starting from video which has already been compressed fairly aggressively it can be fairly noticeable.

One area where this can be important to remember is when comparing codecs: a fair number of people will make the mistake where they'll take a relatively heavily compressed video, recompress it with something else, and get a size reduction which is a lot more dramatic than what you'd get if you compared both codecs starting from a source video which has most of the original information.


Another thing to consider, exactly the same thing happened when h.264 was first released.

Even though x264 quickly started seeing better results compared to DivX and XVid, you didn't see pirate encodes switch to x264 for years.


Yes and x264 is such an improvement that it was worth it. HW support became nearly universal. DivX and Xvid didn't have that.

So it was kind of a magical codec upgrade and now it seems more incremental to me.


The scene switched to x264 almost immediately. It did not take years.


It was def hardware support that changes the piracy groups policies.


Having GPU or hardware support to speed up encoding can make a big difference in adoption.


H.265 is patent laden. H.264 is much better in avoiding getting sued.


That's patently wrong[1]

Money quote :

"H.264 is a newer video codec. The standard first came out in 2003, but continues to evolve. An automatically generated patent expiration list is available at H.264 Patent List based on the MPEG-LA patent list. The last expiration is US 7826532 on 29 nov 2027 ( note that 7835443 is divisional, but the automated program missed that). US 7826532 was first filed in 05 sep 2003 and has an impressive 1546 day extension. It will be a while before H.264 is patent free."

(emphasis mine)

[1] https://www.osnews.com/story/24954/us-patent-expiration-for-...


not an expert, but from what i understand is more like that they "extend" the codec with technique more effective on higher resolution content, or new "profile" (parameter) more effective for higher resolution content (a bit like you can have different parameter when you zip file).

This new techniques can also be used for 1080p video (for example) but with lower gain. Also the "old" algorithm/system are generally still used but they may be improved/extend.


>Like, why couldn't have H.266 been invented 30 years ago?

It is all a matter of trade offs and engineering.

For MPEG / H.26x codec, the committees start the project by asking or defining the encoding and decoding complexities. And if you only read Reddit or HN, most of the comment's world view would be Video codec are only for Internet Video and completely disregard other video delivering platform. Which all have their own trade off and limitations. There is also cost in decoding silicon die size and power usage. If more video are being consumed on Mobile and battery is a limitation, can you expect hardware decoding energy usage to be within the previous codec? Does it scale with adding more transistors, are there Amdahl's law somewhere. etc It is easy to just say adding more transistor, but ultimately there is a cost to hardware vendors.

Vast majority of the Internet seems to think most people working on MPEG Video Codec are patents trolls and idiots and paid little to no respect to its engineering. When as a matter of fact Video Codec are thousands of small tools within the spec, and pretty much insane amount of trial and error. It may not be as complicated as 3GPP / 5G level of complexity, but it is still lot of work. Getting something to compress better while doing it efficiently is hard. And as Moore's Law is slowing down. No one can continue to just throwing transistors at the problem.


I don't know much about H.266, but some of the advances in H.265 depended on players having enough RAM to hold a bunch of previous decoded frames, so they could be referred to by later compressed data. Newer codecs tend to have a lot more options for the encoder to tune, so they need a combination of faster CPUs and smarter heuristics to explore the space of possible encodings quickly.


I wonder if instead of heuristics, machine learning could be used to figure out the best parameters.


In a somewhat-related topic, you might be interested in DLSS [0], where machine learning is being used in graphics rendering in games to draw the games at a lower resolution, then upscale the image to the monitor's resolution using a neural network to fill in the missing data. I imagine a similar thing could be done with video rendering, though you'd need some crazy computing power to train the neural network for each video, just like DLSS requires training for each game.

[0] https://en.wikipedia.org/wiki/Deep_learning_super_sampling


That seems likely at least.

You could actually use ML for all of the video decoding, but that research is still in it's early stages. It has been done rather well with still images [1], so I'm sure it'll eventually be done with video too.

Those ML techniques are still a little slow and require large networks (the one in [1] decodes to PNG at 0.7 megapixels/s and its network is 726MB) so more optimizations will be needed before they can see any real-world use.

[1] https://hific.github.io/ HN thread: https://news.ycombinator.com/item?id=23652753


That's already done today. Most modern codecs support variable bitrate encoding so more of the data budget can be given to high complexity scenes. Source video can also be processed multiple times with varied parameters and that output then compared structurally to the source to find the parameters that best encode the scene. This is beyond the more typical multi pass encoding where a first pass over the source just provides some encoding hints for the second pass. It takes several times longer to encode (though is embarrassingly parallel) but the output ends up higher quality for a given data rate than a more naïve approach.


Any apps you can recommend for giving this a spin?


x264 with 2-pass encoding? "Machine learning" doesn't only mean deep neural networks.


In the limit case, compression and AI are identical.

Once you get to an AI that has full comprehension of what humans perceive to be reality, you can just give them a rough outline of a story, add some information on casting, writers, and Spielberg's mood during production, and they'll fill in the (rather large) blanks.

That's a bit exaggerated, but I remember reading about one such algorithm a few days ago (by Netflix, maybe?). It was image compression that had internal representations such as "there is an oak tree on the left".

It would then run the "decompression", find the differences to the original, and add further hints where neccessary.


Sure. Machine Learning is just "heuristics we don't understand."


Or more typically Monte Carlo heuristics.


Reinforcement learning uses Monte Carlo a lot but traditional machine learning or deep learning don't.


I was hoping H266 was going to be a neural network based approach, but it looks like that might end up being H267.

Right now neural networks allow for higher compression for tailored content, so you need to ship a decoder with the video, or have several categories of decoders. The future is untold and it might end up not being done this way.


I think the number of previous frames for typical settings went from about 4 to about 6 as we went from H.264 to H.265. And the actual max in H.264 was 16. So that doesn't seem like a huge factor.


Computers wouldn't have been fast enough. Moore's law is a hell of a drug.

In the mid '90s, PCs often weren't fast enough to decode DVDs, which were typically 720x480 24FPS MPEG2. DVD drives were often shipped with accelerator cards that decoded MPEG2 in hardware. I had one. My netbook is many orders of magnitude faster than my old Pentium Pro. But it's not fast enough to decode 1080p 30fps H.265 or VP9 in software. It must decode VP9/H.265 on the GPU or not at all. MPEG2 is trivial to decode by comparison. I would expect a typical desktop PC of the mid '90s to take seconds to decode a frame of H.265, if it even had enough RAM to be able to do it at all.

It's an engineering tradeoff between compression efficiency of the codec and the price of the hardware which is required to execute it. If a chip which is capable of decoding the old standard costs $8, and a chip which is capable of decoding the new standard costs $9, sure, the new standard will get lots of adoption. But if a chip which is capable of decoding the new standard costs $90, lots of vendors will balk.


Indeed. The brand-new fancy Blue & White Power Mac G3's from early 1999 were the first Macs that shipped with a DVD drive, and they could play video DVD's but they had an obvious (and strange) additional decoder mezzanine card on the already unusual Rage128 PCI video card.

By the end of that year the G4 Power Macs were just barely fast enough to play DVD's with software decoding and assistance from the PCI or later AGP video card. And after a while (perhaps ~ 2002?), even the Blue G3's could do it in software even if you got a different video card, as long as you also upgraded to a G4 CPU (they were all in ZIF sockets).

It was very taxing on computers at y2k!

Later autumn 2000 G3 iMacs could also play DVD's but I think they needed special help from a video co-processor.


From what I've heard (would love to hear more expertise on this), it's incredibly hard to invent a new video compression algorithm without breaking an existing set of patents, and there's also no easy way to even know whether you're breaking anything as you develop the algo. Thus the situation we're in is not that it's too hard to develop better codecs, but that you've very disincentivized to do so.


Which then begs the question - why are video compression standards developed in the US at all? MPEG is obviously US based but Xiph is also a US nonprofit. The software patents should be hugely crippling the ability for Americans to develop competitive video codecs when every other nation doesn't have such nonsense. Why hasn't Europe invested in and developed better codecs that combine the techniques impossible to mix in the states?

Is it just basically the same mechanism that leads to so much drug development happening in the US despite how backwards its medical system is, because those regressive institutions create profit incentives not available elsewhere (to develop drugs or video codecs for profit) and thus the US already has capitalists throwing money at what could be profitable whereas everyone else would look at it as an investment cost for basically research infrastructure.


MPEG is not US-based.

https://en.wikipedia.org/wiki/Moving_Picture_Experts_Group

The article we are all commenting on is by a German research organization that has been a major contributor to video coding standards.

Perhaps you're confused by the patent issue? European companies are happy to file for US patents and collect the money.


This sounds like a somewhat obvious way to side-step the patent mechanism, so I would assume patents prevent this kind of a thing, when you develop patent-breaking technology abroad and then "just use" it wherever you want. You're probably not allowed to use the patented technology in any of the products you're building.


It's about the assumptions made during the standardization.

Compared to 30 years ago, we now have better knowledge and statistics about what low level primitives are useful in a codec.

E.g. jpeg operates on fixed 8×8 blocks independently, which makes it less efficient for very large images than a codec with variable block size. But variable block size adds overhead for very small images.

An other reason can be common hardware. As hardware evolves, different hardware accelerated encoding/decoding techniques become feasible that gets folded into the new standards.


Something that I learned about 10 years ago when bandwidth was still expensive is that you can make a very large version of an image, set the jpeg compression to a ridiculously high value, then scale the image down in the browser. The artifacts aren't as noticeable when the image is scaled down, and the file size is actually smaller than what it would be if you encoded a smaller image with less compression.


This “huge JPEG at low quality” technique has been widely known for years. But it is typically avoided by larger sites and CDNs, as it requires a lot more memory and processing on the client.

Depending on the client or the number of images on the site the huge JPEG could be a crippling performance issue, or even a “site doesn’t work at all” issue.


Interesting. I've never heard of this, but it makes some sense: the point of lossy image compression is to provide a better quality/size ratio than downscaling.


> But variable block size adds overhead for very small images.

What kind of overhead?

The extra code shouldn't make a big difference if nothing is running it.

And space-wise, it should only cost a few bits to say that the entire image is using the smallest block size.

Is the worry just about complicating hardware decoders?


I still remember when my pc would take 24 hours to compress a dvd to a high quality h264 mkv, sure you could squeeze it down with fast presets in handbrake but the point was transparency. Now I'm sure for most normal pc's the time to compress at the same quality with h.265 is the same 24 hours, in 4k, even longer, I'm sure h266 would take more than twice as long easily.

Early pc's had separate and very expensive mpeg decode boards just to decode dvd, creative sold a set, the cpu simply couldn't even handle mpeg 2. I know its hard to believe but there was a time when playing back an mp3 was a big ask, all these algorithms could be made long ago, but they would have been impractical fantasy. Only now are we seeing real partial cheated resolution ray tracing in modern high end gaming hardware now which is a good comparison, ray tracing has been with us for a long time, only hardware advancement over decades has made it viable.

It amused me that they claimed 4k uhd h265 is now 10GB for a movie, that's garbage bitrate, they always ask too much of these codecs.


> I know its hard to believe but there was a time when playing back an mp3 was a big ask

can confirm. audio playback would stutter on my 486dx if one dared to multitask.


Good compression is quite complex and can go wrong in an unimaginable variety of ways. Remember when Xerox copiers would copy numbers incorrectly due to compression? The numbers would look clear, they just wouldn't always be the same numbers that you started with.

https://www.theregister.com/2013/08/06/xerox_copier_flaw_mea...


Xerox problem stemmed from simple replacement of "recognized" numbers with learned dictionary. Good implementation would use learned alphabet atlas as a supplement, encoding difference between guess and source image. That way even predicted 0 instead of 8 wouldnt be catastrophic with encoder filling in missing detail.


> Like, why couldn't have H.266 been invented 30 years ago? Is it because the computers back in the day wouldn't have been fast enough to realistically use it?

Here's something to consider:

In 1995, a typical video stream was 640 x 480 x 24fps. That's 7,372,800 pixels per second.

In 2020, we have some content that's 7680 x 4320 x 120fps. That's 3,981,312,000 pixels per second, or a 540 fold increase in 25 years.

The massive increase in image size actually makes it easier to use high compression ratios. I found this out the hard way recently, when I was trying to compress and email a powerpoint presentation that a coworker had presented on video. In a nutshell, the powerpoint doc with it's sharp edges and it's low resolution made it difficult to compress.

Increase framerate plays a factor too; due to decades of research on motion interpolation, algorithms have become quite good at guessing what content can be eliminated from a moving stream.


Compression is AI. It’s never going to be “all” figured out.


Another way if saying it is that compression is understanding.


Lossy compression is, I feel compelled to add.


Actually both! Arithmetic coding works over any kind of predictor.


End-credits are just text. So it should be possible to put it through OCR and save only text, positions, and fonts. And the text is also possible to compress with a dictionary.


Credits also contain logos/symbols (near the end), and often have stylistic flairs as well. Video compression is based on making predictions and then adding information (per Shannon's definition) for the deltas from those predictions. The pattern of credits statically sliding at a consistent rate is exactly the sort of prediction codecs are optimized for; for instance, the same algorithms will save space by predicting repeated pixel patterns during a slow camera pan.

Still, I've often thought it would be nice if text were a more first-class citizen within video codecs. I think it's more a toolchain/workflow problem than a shortcoming in video compression technology as such. Whoever is mastering a Blu-Ray or prepping a Hollywood film for Netflix is usually not the same person cutting and assembling the original content. For innumerable reasons (access to raw sources, low return on time spent, chicken-egg playback compatibility), it just doesn't make sense to (for instance) extract the burned-in stylized subtitles and bake them into the codec as text+font data, as opposed to just merging them into the film as pixels and calling it a day.

Fun fact: nearly every Pixar Blu-Ray is split into multiple forking playback paths for different languages, such that if you watch it in French, any scenes with diegetic text (newspapers, signs on buildings) are re-rendered in French. Obviously that's hugely inefficient; yet at 50GB, there's storage to spare, so why not? The end result is a nice touch and a seamless experience.


Text with video is difficult to do correctly for a few different reasons. Just rendering text well is a complicated task that's often done poorly. Allowing arbitrary text styling leads to more complexity. However for the sake of accessibility (and/or regulations) you need some level of styling ability.

This is all besides complexity like video/audio content synced text or handling multiple simultaneous speakers. Even that is besides workflow/tooling issues that you mentioned.

The MPEG-4 spec kind of punted on text and supports fairly basic timed text subtitles. Text essentially has timestamp where it appears and a duration. There's minimal ability to style the text and there's limits on the availability of fonts though it does allow for Unicode so most languages are covered. It's possible to do tricks where you style words at time stamps to give a karaoke effect or identify speakers but that's all on the creation side and is very tricky.

The Matroska spec has a lot more robust support for text but it's more of just preserving the original subtitle/text encoding in the file and letting the player software figure out what to do with that particular format and then displaying it as an overlay on the video.

It's unfortunate text doesn't get more first class love from multimedia specs. There's a lot that could be done, titles and credits as you mention, but also better integration of descriptive or reference text or hyperlink-able anchors.


MPEG 4 (taken as a the whole body of standards, not as two particular video codecs) actually has provisions for text content, vector video layers and even rudimentary 3D objects. On the other hand I'm almost sure that there are no practical implementations of any of that.


Oh, and that's only the beginning. The MPEG-4 standard also includes some pretty wacky kitchen-sink features like animated human faces and bodies (defined in MPEG-4 part 2 as "FBA objects"), and an XML format for representing musical notation (MPEG-4 part 23, SMR).


Don't forget Java bytecode tracks!


Scene releases often had optimized compression settings for credits (low keyframes, b&w, aggressive motion compensation, etc.)


The text, positions and fonts could very well take up more space than the compressed video. And then with fonts, you have licensing issues as well.


Recognizing text and using it to increase compression ratios is possible. I believe that's what this 1974 paper is about:

https://www.semanticscholar.org/paper/A-Means-for-Achieving-...


True, but end credits take very little space compared to the rest of the movie.


x264 is kinda absurdly good at compressing screencasts, even a nearly lossless 1440p screencast will only have about 1 Mbit/s on average. The only artifacts I can see are due to 4:2:0 chroma subsampling (i.e. color bleed on single-pixel borders and such), but that has nothing to do with the encoder, and would almost certainly not happen in 4:4:4, which is supported by essentially nothing as far as distribution goes.


Why not use deep learning to recognize actor face patterns in scenes and build entire movies from AI models?


I'm not super strong on theory, but if I'm not mistaken, doesn't Kolmogorov complexity (https://en.wikipedia.org/wiki/Kolmogorov_complexity) say we can't even know if it is all figured out?

The way I understand it is that one way to compress a document would be to store a computer program and, at the decompression stage, interpret the program so that running it outputs the original data.

So suppose you have a program of some size that produces the correct output, and you want to know if a smaller-sized program can also. You examine one of the possible smaller-sized programs, and you observe that it is running a long time. Is it going to halt, or is it going to produce the desired output? To answer that (generally), you have to solve the halting problem.

(This applies to lossless compression, but maybe the idea could be extended to lossy as well.)


I really ain't a theorist either, but:

If you are looking at Kolmogorov complexity you are right, we can't ever know. But Kolmogorov complexity is about single points in the space of possible outputs. It basically says "there might be possible outputs that do look random, but are actually produced by a very short encoding". One example would be the digits of pi.

But if you look at the overall statistics of possible output streams, and at their averages, there is a lower bound for compression on average. As soon as the bitlength in the compressed stream matches the entropy in the uncompressed stream in bits, you reached maximum compression. There will be some streams that don't conform to those statistics, but their averages will.

However, we are somewhat far away from matched entropy equilibrium for video compression. And even then, improvements can be made, not in compression ratio but in time, ops and energy needed for de/encoding.


> It's interesting that they are able to continue improving video compression. You'd think that it would have all been figured out by now.

Would you? AV1 was only officially released 2 years ago, h.265 7, h.264 14, …


Ten-year software video compression engineer here:

TL;DR: it's partly because we're using higher video resolutions. A non-negligible part of the improvement stems from adapting existing algorithms to the now-doubled-resolution.

Almost all video compression standards split the input frame into fixed-size square blocks, aka "macroblocks". To put it simply, the macroblock is the coarsest granularity level at which compression happens.

- H.264 and MPEG-2 Video use 16x16 macroblocks (ignoring MBAFF).

- H.265 use configurable quad-tree-like macroblocks, with a frame-level configurable size up to 64x64.

- AV1 makes this block-size configurable up to 128x128.

Which means:

Compression to H.264 a SD video (720x576, used by DVDs) results in 1620 macroblocks/frame.

Compressing to H.265 a HD video (1920x1080) results in at least 506 macroblocks/frame.

Compressing to AV1 a 4K video (3840x2160) results in at least 506 macroblocks/frame.

But compressing to H.264 a 4K video (3840x2160) will result in 32400 macroblocks/frame.

The problem is, there are constant bitcosts per-macroblock ((mostly) regardless of the input picture). So using H.264 to compress 4K video will be inefficient.

When you take an old compression standard to encode recent-resolution content, you're using the compression standard outside of the resolution domain for which it was optimized.

> Is this continued improvement related to the improvement of technology? Or just coincidental?

Of course, there also "real" improvements (in the sense "qualitative improvements that would have benefited to compression old video resolutions, if only we had invented them sooner").

For example:

- the context-adaptive arithmetic coding from H.264, which is a net improvement over classic variable-length huffman coding used by MPEG-2 (and H.264 baseline profile).

- the entropy coding used by AV1, which is a net improvement over H.264's CABAC.

- integer DCT (introduced by H.264), which allow bit-accuracy checking and way lot easier and smaller hardware implementations (compared to floating point DCT that is used by MPEG2).

- loop filters: H.264 pioneered the idea of a normative post-processing step, whose output could be used to predict next frames. H.264 had 1 loop filter ("deblocking"). HEVC had 2 loop filters: "deblocking" and "SAO". AV1 has 4 loop filters.

All of these a real improvements, brought to us by time, and extremely clever and dedicated people. However, the compression gains of these improvements are nowhere near the "50% less bitrate" that is used to sell each new advanced-high-efficiency-versatile-nextgen video codec. Without increasing - a lot - the frame resolution, selling a new video compression standard will be a lot harder.

Besides, now that the resolutions seems to have settled up around 4K/8K (and that "high definition" has become the lowest resolution we might have to deal with :D), things are going to get interesting ... provided that we don't start playing the same game with framerates!


I hope next target is VR optimized codec.


H.266 VVC includes tools specifically for VR use cases like doing a motion vector wrap around at the boundaries of 360 equirectangular video or better support for independently coded tiles (subpictures in VVC lingo) which are used in viewport-dependent streaming of 360 content.


"You'd think that it would have all been figured out by now."

Would you? Video compression is one of the few things that we will work on for the next 1000 years and still be nowhere near finished. The best video compression would be to know the state of the universe at the big bang, have a timestamp of the beginning and end of your clip and spatial coordinates defining your viewport. Then some futuristic quantum computer would just simulate the content of your clip...

So yeah, sure we are done with video compression :). This is of course an extreme example of constant time compression that may or may not be ever feasible (if we live in a computer simulation of an alien race, then it is already happening).

But the gist is the same. Video compression is mostly about inferring the world and computing movement not by storing the content of the image.

For instance by taking a snapshot of the world, decomposing it into geometric shapes (pretty much the opposite of 3D rendering) and then computing the next frames by morphing these shapes + some diff data that snaps these approximations back in line with the actual data.

We are all but in the very infancy of video compression. What should surprise you is why it takes us so long to get anywhere.


The way I read the release was that it's not a lossless compression, it reads like it's downscaling 4k+ video to a lower format with 'no perceptible loss of quality.' Since this is also seemingly targeted at mobile, I'm guessing the lack of perceptible loss of quality is a direct function of screen size and pixel density on a smaller mobile devices.

For me, this is another pointless advance in video technology. 720p or 1080p is fantastic video resolution, especially on a mobile phone. Less than 1% of the population cares or wants higher resolution.

What new technologies are doing now is re-setting the patent expiration clock. As long as new video tech comes out every 5-10 years, HW manufacturers get to sell new chips, phone manufacturers get to sell new phones, TV manufacturers get to sell new TVs, rinse, repeat.


> 720p or 1080p is fantastic video resolution, especially on a mobile phone.

720p is far from fantastic. It's noticeably blurry, even on mobile.

1080p is minimally acceptable, and is now over 10 years old.

> Less than 1% of the population cares or wants higher resolution.

That's a very bold claim. Have any studies or polls to back that up?


> 720p is far from fantastic. It's noticeably blurry, even on mobile. 1080p is minimally acceptable, and is now over 10 years old.

Not OP. I would say these are far-fetched claims that need defending. Most blurriness of mobile video comes from the low bitrate it's encoded at. Basically nobody is watching Bluray quality 720p or 1080p materials on a phone - and that's the problem, not the resolution.

My guess is that a typical middle or even middle-upper class family is going to have a TV that is less than 70 inches, and is 10+ feet from most viewers. Even with 20-20 vision, the full quality provided by 1080p is not even visible at that distance! (You'd need to go all the way up to 78 inches at 10 feet, or sit 7 feet from your 55 inch set to even get the full benefit from a 1080p set.) See this very helpful chart: http://s3.carltonbale.com/resolution_chart.html

Most of the benefit in 4k video comes from recent advances in HDR presentation and better codecs, not from the resolution. Sure, if you're a real stickler for quality, you might be sitting 6 feet from your 80 inch OLED set, and 4k is definitely for you in that case, but it's really not that important to the average person. In my case, I can barely distinguish between 720p and 1080p on my set even with glasses on.

Now granted, it's great to have laptops and tablets at higher resolutions, because your face is smashed up against them and you're often trying to read fine text. But that's not the video case that's being talked about here.


I don't need to get the full benefit to care about the difference. And in my experience there's a lot of screens closer than 10 feet to couches.

On mobile, 720p starts to get shoddy once your screen hits 5 inches across.

So while 4k is situational, and encoding quality is more important than the extra resolution most of the time, 1080 vs. 720 is pretty clear-cut; 1080 should be considered the minimum for most content.


Note that I was talking about the full benefit of 1080p, not 4k. My points are that 4k is usually pointless, and therefore that 1080p is usually better than just "minimally acceptable", since most of the time we don't even get the full benefit of it.

> On mobile, 720p starts to get shoddy once your screen hits 5 inches across.

Even assuming you're right abut this, I guess I really just have a hard time caring. Anything you watch on your phone is at best something you don't give a shit about, artistically speaking, and it's hard to imagine 1080p vs 720p making any kind of difference to the experience. (I suppose I might be biased since my screen is "only" 5.2 inches diagonally - I get the smallest one I can whenever I buy a new phone.)

And for what it's worth, using the same math as for my previous comment, you can't even get the full benefit of 720p on a 5 inch diag screen unless you're holding it less than a foot from your face. Granted, you can get the full benefit of 1080p at a little under 8 inches, but I'm suffering even imagining trying to watch a video this way. Even at this distance, I would dispute using "shoddy" to describe how 720p will look.

The math is actually pretty simple: for a 720p screen, there are sqrt(12802 + 7202) pixels on the 5 inch diagonal, so a distance of 5 inches / sqrt(12802 + 7202) per pixel. 20/20 visual acuity can resolve roughly 1 arc minute, or pi/10800 radians. By the arc length formula, the distance we calculated subtends that angle at (5 inches / sqrt(19202+10802)) * (10800/pi), or 11.7 inches.

> And in my experience there's a lot of screens closer than 10 feet to couches.

Note that I addressed this point. If you have a pretty typical 50 inch TV, you've got to have it closer than 7 feet from your couch for 4k to make any difference at all.


> Note that I was talking about the full benefit of 1080p

Yes, so am I.

> therefore that 1080p is usually better than just "minimally acceptable", since most of the time we don't even get the full benefit of it.

If most of your users would be limited by 720p, and 1080p is standard and easy to do, then I'm comfortable calling 1080p the minimum.

> I'm suffering even imagining trying to watch a video this way.

The official "Retina" numbers have a phone 10 inches from your face. And that's about where I hold it when I have my glasses on.

> 20/20 visual acuity can resolve roughly 1 arc minute, or pi/10800 radians.

That's a good baseline number, but a lot of people can beat it by a significant fraction.


Average visual acuity for young people is about 0.7 arcminutes, best being around 0.5.

There seems to be disconnect about how people are consuming media. For me holding a 2280x1080 6.3" phone ~8" from my eyeballs is a natural viewing distance and I can see the full resolution without difficulty. And at least from my point of view a 65" TV is also a pretty typical size.


> That's a very bold claim. Have any studies or polls to back that up?

From 1:

> In Japan, only 14 percent of households will own a 4K TV in 2019 because most households already have relatively new TVs, IHS said.

Let's dissect the reasoning. Most households already have a relatively new TV, thus a low adoption rate. Implying that needing a new TV generally, not the desire to upgrade resolution, is the primary motivating factor in purchasing a TV. In fact, most 4K TVs are already as cheap as the rest of the market.

I truly believe that almost everyone does not care about 4K whatsoever. In fact, even if they do 'care' it's not because they know what they're talking about. Most of the enhancements that 4K TVs bring are an artifact of better display technology, rather than increased resolution. See 2.

Streaming 4K+ video is a waste of resources with no tangible benefit to anyone other than marketing purposes. Netflix streams '4K' because everyone has a '4K' TV now and they demand it.

[1]: https://www.twice.com/research/us-4k-tv-penetration-hit-35-2...

[2]: https://www.cnet.com/news/why-ultra-hd-4k-tvs-are-still-stup...


I agree with most of what you're saying, but this is actually wrong.

> Streaming 4K+ video is a waste of resources with no tangible benefit to anyone other than marketing purposes. Netflix streams '4K' because everyone has a '4K' TV now and they demand it.

It's not inherently wrong, just practically so. The difference is that Netflix (and often other streaming services) max out at a much higher bitrate for their 4k streaming, and are using a better codec (H.265) as well. By comparison, Netflix's bitrate for 1080p is severely limited and so if you compare the two, even watching at the 1080p level of detail, streaming in 4k will often be a vastly superior experience.

So it's not inherent (not a result of the resolution), but still, streaming 4k is not pointless at present.


Your first link includes an extra detail which is important:

> In addition, “with the Japanese consumer preference for smaller TV screens, it will be more difficult for 4K TV to expand its household penetration in the country"

With a smaller TV screen, yeah, you don't need 4K.

But in the USA, larger screens are desirable. And that's seen in the expected 34% 4K adoption rate in the USA your article describes.

I still use a 1080p TV, but it's also only 46" and is 10 years old. I'll probably be buying a 4K 70-75" OLED later this year.

My computer monitor is 1440p. I could have bought 4K when I upgraded, but I'm primarily a gamer and I wanted 144 hz, and 4K 144 hz monitors didn't exist yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: