For backing up websites, I released https://github.com/ludios/grab-site. Compared to HTTrack, it makes it easy to add ignores to skip unwanted URLs after a crawl has already started. It also saves to WARC instead of trying to fit the site to an on-disk directory structure, which is not always possible or useful (e.g. directory with > 100K files).
For backing up websites, I released https://github.com/ludios/grab-site. Compared to HTTrack, it makes it easy to add ignores to skip unwanted URLs after a crawl has already started. It also saves to WARC instead of trying to fit the site to an on-disk directory structure, which is not always possible or useful (e.g. directory with > 100K files).