version three?

Xdelta3 is the third and latest release of Xdelta, which is a set of tools and APIs for reading and writing compressed deltas. Deltas encode the differences between two versions of a document. This release features a completely new compression engine, several algorithmic improvements, a fully programmable interface modelled after zlib, in addition to a command-line utility, use of the RFC3284 (VCDIFF) encoding, a python extension, and now 64-bit support.

Xdelta3 is tiny. A minimal, fully functional VCDIFF decoder library pipes in at 16KB. The command-line utility complete with encoder/decoder tools, external compression support, and the djw secondary compression routines, is just under 60KB, slightly larger than a gzip executable.

Xdelta3 has few dependencies because it's capable of stand-alone file compression (i.e., what zlib and gzip do). The stand-alone compression of Xdelta3/VCDIFF is 10-20% worse than gzip, you may view this as paying for the convenience-cost of having a single encoding, tool, and api designed to do both data-compression and differencing at once.

The Xdelta3 command-line tool, xdelta3, supports several convenience routines. Delta compression works when the two inputs are similar, but often we would like to compute the difference between two compressed documents. xdelta3 has (optional) support to recognize externally compressed inputs and process them correctly. This support is facilitated, in part, using the VCDIFF application header field to store xdelta3 meta-data, which includes the original file names (if any) and codes to incidate whether the inputs were externally compressed. Applications may provide their own application header.

what are version one and version two?

Many shortcomings in the Xdelta1.x release are fixed in its replacement, Xdelta3. Xdelta1 used both a simplistic compression algorithm and a simplistic encoding. For example, Xdelta1 compresses the entire document at once and thus uses memory proportional to the input size.

The Xdelta1 compression engine made no attempt to find matching strings smaller than say 16 or 32 bytes, and the encoding does not attempt to efficiently encode the COPY and ADD instructions which constitute a delta. For documents with highly similar data, however, these techniques degrade performance by a relatively insignificant amount. (Xdelta1.x compresses the delta with Zlib to improve matters, but this dependency stinks.)

Despite leaving much to be desired, Xdelta1 showed that you can do well without great complexity; as it turns out, the particulars of the compression aengine are a relatively insignificant compared to the difficulty of programming an application that uses delta-compression. Better solve that first.

What we want are systems that manage compressed storage and network communication. The second major release, Xdelta2, addresses these issues. Xdelta2 features a storage interface -- part database and part file system -- which allows indexing and labeling compressed documents. The feature set is similar to RCS. The Xdelta2 interface supports efficient algorithms for extracting deltas between any pair of versions in storage. The extraction technique also does not rely on hierarchy or centralizing the namespace, making the techniques ideal for peer-to-peer communication and proxy architectures. I am grateful to Mihut Ionescu for implementing the Xproxy HTTP delta-compressing proxy system based on this interface and studying the benefits of delta-compression in that context. Xdelta2 stressed the Xdelta1 compression engine beyond its limits; so Xdelta3 is designed as the ideal replacement. The Xdelta2 techniques are yet to be ported to the new implementation.