Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Tricks of the trade: why have one computer compress the file when you can split it up into logical segments and have each segment sent to its own encoder?


Back when I still cared about saving disk space, I made a cluster of NVidia Jetson Nanos running in a docker swarm configuration [1] to compress my blu-ray rips, but honestly even when you have six computers working at once, H264 on a single computer is still often faster.

On the Jetson Nanos I was lucky to get maybe 1fps in ffmpeg using VP9. Multiply that by six boards and that's about 6fps in total; ffmpeg running x264 in software mode was getting around 11fps on a single board, not even counting using the onboard encoder chip, meaning that I was getting better performance from one board using x264 than all six using VP9.

Now obviously this is a single anecdote on specific hardware, so I'm not saying that this applies to every single case, but it's a big reason why I personally have not used VP9 for anything substantial yet.

[1] https://gitlab.com/tombert/distributed-transcode


h.264 is from the 90s, so of course it's fast after ~30 years of use. hell, when I first got into encoding, we had dedicated expansion cards to do MPEG-1/MPEG-2 encoding because it was so difficult at the time. New codecs always take time in the beginning while the encoding software is tweaked/optimized. Eventually, it becomes part of the CPU hardware and then we all make comments like "remember when ____ was so slow?" one day, you'll regale the young whiper snappers on internet forums about how painfully slow AV1 encodes were when they start complaining how newHotnessEncoder5000 is so slow.


Oh definitely, no argument here, I'm 100% ok with AV1 becoming the standard "video codec to rule them all", but I'm saying that in the short term, it's difficult to recommend AV1 or VP9 over h264 (at least for personal use). H264 encodes 10x faster, still gives reasonably decent compression, is supported by basically every consumer device [1] and browser out of the box, and very soon will have all the patents for it expired meaning that it will be truly royalty-free. x264 in particular is extremely nice in my experience, doing a lot to really squeeze out a lot of quality in a relatively small amount of space.

That said, AV1 is very obviously the future, and I'm perfectly happy with it taking over the market from h264, and I think that due to the bandwidth savings it's only a matter of time before all the major video services make it the default, especially as the speed of encoders increases to a useable level, which I'm sure it will soon enough.

[1] I know the most recent Raspberry Pi doesn't have a decoder chip for h264, but I think it's fast enough to do it in software.


Raspberry Pi have had hardware decoder for h264 for as long as they've existed (I think?), but dropped in the most recent version. I don't understand why.

They've recently contributed non-trivial patches to Firefox to use the embedded Linux API for video hardware acceleration (V4L2, vs. VAAPI on desktop that we also support), and are shipping the h264ify extension with their Firefox build to get that codec often for their users so that the experience is good on older devices.

Maybe the 5 is that much faster than it's not needed as much, but h264 represent so much content that it feels a bit surprising anyways.

But I'm just a software person, hardware is complicated differently.


> h.264 is from the 90s, so of course its fast after ~30 years of use.

If only! Then the patents would have expired. But H.264 is newer than MPEG-4 Part 2.

But you're right: H.264 has had the advantage of time, to gain fast hardware support.


Ok. You are allowed to think that but stop forcing AV1 down my throat, because you think CPUs will be both cheap and fast in the far away future, because as far I am aware, I exist in the present moment until that future arrives.


>h.264 is from the 90s

I am pretty sure you are thinking of H.263 if it was from the 90s. H.264 barely started in the 00s.


This is done particularly when you are implementing adaptive bitrates (the thing that Netflix uses where it automatically sends you a higher or lower quality picture depending on your Internet connection).

In adaptive bitrate world, you split a video up into fragments, say 2-10 seconds large, and encode each segment in multiple bitrates, so that every say 5 seconds the video player can make a decision to download a different quality for the next 5 seconds.

Ok, but why not split the file up for standard encoding? Well, you can't just concatenate two .mp4 together without re-encoding and have it make sense to most media players (as far as I am aware), and moreover, it's inefficient from a RAM perspective when doing that. 1 second of RAW uncompressed 4k (24 fps) video is about 600MB. Source content for a single episode/movie at Netflix (I don't work there, just something I read once) can reach into the terabytes easily.


> Ok, but why not split the file up for standard encoding? Well, you can't just concatenate two .mp4 together without re-encoding and have it make sense to most media players (as far as I am aware)

You can't just literally `cat foo-[123].mp4 > foo.mp4` with old-school non-fragmented .mp4 files, but you just have to shuffle the container stuff around a bit. You don't need to re-encode.

One downside is if you decide ahead of time that you're going to divide the video into fixed 5-second fragments/segments/chunks to encode independently, you're going to end up with that-length closed GOPs that don't match scene transitions or the like. IDR frame every 5 seconds. So no B/P frames that reference stuff 10 seconds ago, no periodic incremental refresh, nothing fancy.


okay, i don't agree with anything in your reply. segmenting a video file for HLS/DASH delivery is not at all the same thing I'm suggesting. Just for the sake of round numbers, i'm saying to take a 90 minute feature into nine 10-minute segments. fire up 10 separate instances to encode each 10-minute segment. you've just increased the encode time 10x. also, DASH/HLS does not require segmented files. you can have a single contiguous file like an fmp4 as a DASH/HLS source.

>Ok, but why not split the file up for standard encoding?<snip>

at this point, you would be better served by just writing an elementary stream rather than a muxed mp4 file since it's just a segment anyways so why waste the cycles on muxing? you then absolutely 100% can concat those streams (even if you did mux them into a container). if you think you can't, you clearly have not tried very hard.

>I don't work there, just something I read once

I don't work there either, but do have 30+ years of experience with this subject. Sadly, you're not as well informed as you might think. People don't tend to encode to AV1 from RAW. They instead are dealing with a deliverable file most typically a ProRes in today's world after the post process has been completed. No where near terabytes for a feature. More closely to a couple hundred gigabytes for UHD HDR content. You seem to be unnecessarily exaggerating.

Edit: it's a 10x increase in encode speed, not time. that would be opposite effect.


Why did the encode time increase by 10x in that instance? Can't you just seek in the video to the I-frame before the cut point and start your encode there?

I've never tried merging streams across computers so was naively just thinking that your output from each computer would be an MP4 but that makes sense.

I pulled that info. from a Netflix talk, perhaps video cameras back from when that talk occured didn't compress the video for you? Besides, isn't IMAX all intra-encoded? It was my understanding that IMAX films are actually just a series of J2K images, so I would imagine that the video cameras used there would also be intra-encoded.


s/increase/decrease/

i was thinking increased the encode speed 9x, but typed increased encode time. i also swapped the number of segments by segment duration. 9 segments of 10mins = 9x increase in performance.

Sounds like you are confusing Netflix' recommended formats for acquisition vs delivery. Cameras capture RAW formats (rarely is it uncompressed though), and the post houses use that as sources. The post house/color correction will the create the delivery formats which is typically ProRes. RAW is not a friendly format for distribution in the slightest. The full workflow from camera original to what ends up being streamed to the end viewer changes formats multiple times through the process.


Gotcha, that makes much more sense


For certain containers / codecs you can concatenate files without re-encoding. Do it quite often with ffmpef using -c:copy and it's basically at the speed of the disk.


The parent comment I was responding to was discussing splitting encodes across multiple computers then re-combining which is what I was referring to. Still sounds like it is possible and I was incorrect.


Because the most common situation where a person would be encoding video is when they are live capturing it from a camera. If you are making a home security camera, recording videos, etc you need an encoder that can keep up.


yep, that's what bitmovin does


I'm pretty sure that's what everyone does after the first time they try a test encode and see the dismal speeds. It's a trick as old as time. The trick is to make that segment decision better than something like the YT algo that decides where to place an ad break.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: