/
WARC software
WARC software
- Internet Archive heritrix crawler (Producer of WARC libraries bundled with wayback/Heritrix 1.14.4/3.0.0/3.1.0 RC)
- The WARC libraries maintained by IA represent the reference implementation of the WARC standard)
- Old unmaintained heritrix main page w/ releasehistorie
- Heritrix on sourceforge
- (W)ARC code i H1 trunk
- (W)ARC code i H1.14.4 release
- W)ARC code i H3 trunk
- University of Maryland WarcManager
- The Laboratory for Web Algorithmics (LAW) - Among other things "A wealth of tools to manage WARC/0.9 web archive files."
- * [warcdb](https://github.com/florents-Tselai/warcdb) - A command line utility (Python) for importing WARC files into a SQLite database. *(Stable)*
* [warcdedupe](https://gitlab.com/taricorp/warcdedupe) - WARC deduplication tool (and WARC library) written in Rust. (In Development)
* [warc-safe](https://github.com/natliblux/warc-safe) - Automatic detection of viruses and NSFW content in WARC files.
* [WarcPartitioner](https://github.com/helgeho/WarcPartitioner) - Partition (W)ARC Files by MIME Type and Year. *(Stable)*
* [warcrefs](https://github.com/arcalex/warcrefs) - Web archive deduplication tools. *Stable*
* [webarchive-indexing](https://github.com/ikreymer/webarchive-indexing) - Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.