Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's the point of WACZ? It appears to wrap a number of WARC files into a single zip, enabling Range requests to specific WARC files so it can be served by a passive file server. But why is that needed?


It's huge for being able to replay big WARC files in a browser without having to download the whole thing. (e.g. try loading a 700mb WARC from IPFS to visit one page within it, it's too slow to work as-is)

It's used extensively by the Browsertrix/Webrecorder.io projects (who's team pioneered the WACZ format) and a few other projects.


Oh I may have missed that part. So the WACZ (indexes?) can contains offsets into the WARC file itself to each individual page?


WACZ is a replacement for WARC that has the index with offsets built in.


But it uses warc files inside as the archive format. It seems weird to call it a replacement when the original is still present.


I just meant from a user's perspective it's a format that superseeds WARC. But internally, yes, one is an encapsulation format for the other.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: