From d594fbe514e2a381541a510fe01041e546f56a67 Mon Sep 17 00:00:00 2001
From: Josh MacDonald <josh.macdonald@gmail.com>
Date: Fri, 26 Feb 2016 21:43:58 -0800
Subject: Remove xdelta1 from this repo

---
 xdelta1/xdelta-0.13.README | 311 ---------------------------------------------
 1 file changed, 311 deletions(-)
 delete mode 100644 xdelta1/xdelta-0.13.README

(limited to 'xdelta1/xdelta-0.13.README')

diff --git a/xdelta1/xdelta-0.13.README b/xdelta1/xdelta-0.13.README
deleted file mode 100644
index f928f66..0000000
--- a/xdelta1/xdelta-0.13.README
+++ /dev/null
@@ -1,311 +0,0 @@
-
-I have placed version 0.13 of XDELTA, new delta generator and
-prototype RCS-replacement program at:
-
-  ftp://www.xcf.berkeley.edu/pub/jmacd/xdelta-0.13.tar.gz
-
-======
-
-Hereafter, the algorithm and program are described.  A delta algorithm
-operates on two files, a FROM file and a TO file.  It computes enough
-informnation to reconstruct the TO file from the FROM file.  So the
-delta and it's inverse, patch, work as follows:
-
-P = delta (FROM, TO)
-TO = patch (FROM, P)
-
-The delta generator portion of this program is a delta algorithm which
-searches for substring matches between the files and then outputs
-instructions to reconstruct the new file from the old file.  It
-produces a set of copy/insert instructions that tell how to
-reconstruct the file as a sequence of copies from the FROM file and
-inserts from the delta itself.  In this regard, the program is much
-closer to a compression program than to a diff program.  However, the
-delta is not "compressed", in that the delta's entropy H(P) will be
-very similar to the entropy of the portions of the TO file not found
-within the FROM file.  The delta will compress just as well as the TO
-file will.  This is a fundamentally different method of computing
-deltas than in the traditional "diff" program.  The diff program and
-it's variants use a least-common-subsequence (LCS) algorithm to find a
-list of inserts and deletes that will modify the FROM file into the TO
-file.  LCS is more expensive to compute and is sometimes more useful,
-especially to the human reader.  Since LCS is a fairly expensive
-algorithm, diff programs usually divide the input files into
-newline-separated "atoms" before computing a delta.  This is a fine
-approximation for text files, but not binary files.
-
-I propose this delta generator as a replacement for "diff" in
-applications where the insert/delete delta is not required.  Since the
-copy/insert delta is easier to compute, it does not have to make the
-reduction in input size by breaking the file into lines.  The delta
-generator works well on binary files.
-
-The next question is whether it is actually desirable to compute
-deltas between binary files.  The answer is certainly yes, even though
-some binary file formats will not have a great degree of similarity
-between versions which were generated between minor modifications to
-their sources.  First, I have evidence that some file formats (notably
-FrameMaker documents) work very well.  Machine-dependant object files
-and executables don't work very well, but it is still worth doing.
-The reason it is still worthwhile is that compression takes longer
-than finding these deltas, so any space savings the delta generator
-produces it very likely to reduce the total archival time.  Even if
-the delta generator saves no space, the total time well end up less
-than twice the time of compression.  I will include some measurements
-at the end of this writing.
-
-With the delta generator as a building block, I have written a
-replacement for RCS.  It does not have all the user-level features
-someone would want in RCS, but these things only get in the way when
-the code is used by higher level version control programs such as
-PRCS.  For now, the prototype supports the following actions:
-
-register		create an empty version file
-checkin			create a new version
-checkout		extract an old version
-
-The program departs from the RCS model in that it does not branch it's
-versions.  The reason for doing this is to solve several problems with
-RCS (aside from it's use of diff which is solved by using my delta
-generator).
-
-1. Development very often ends up off the "main trunk" of the RCS
-file, which causes an O(N) extract, where N deltas must be applied to
-the head of the trunk before getting the desired file.  Once this
-occurs, things get worse and worse.  Of course, it is possible to
-remedy the situation by checking it in once on the trunk, but that
-defeats the purpose of branching, because you lose the information
-about branch of development you are on.  Further, the notion of
-branching individual files conflicts with the model presented by
-higher level version control programs such as PRCS, which would rather
-not have to go to extra trouble to insure that development does not
-end up off the trunk.
-
-2. Deltas are never used more than once in a particular version
-history, even if they contain the same data as another delta.
-
-Conceptually, to get rid of RCS-like branching and retain the "deltas
-are computed between a file and its parent", the xdelta version
-control library computes a reverse delta between the new file and the
-entire version file.  The most recently checked in version is always
-the "head", in RCS terms.  It is always the quickest and easiest to
-extract.  For a repository with N versions, version I requires the
-application of (N-I) deltas, regardless of how the set of versions are
-related.
-
-Each delta contains a set of instructions and all data which was
-originally part of a file version but missing in the FROM file.  This
-data is represented without compression or modification.  As a result,
-each delta may be used as an input for finding future deltas just as a
-regular file may be.
-
-Several equations describe the process, first some notation:
-
-	V0 is the version file with 1 version
-	V1 is the version file with 2 versions,
-	...
-	VN is the version file with N+1 versions
-
-	F0 is the 1st file checked in, upon checkin V0 is produced
-	F1 is the 2nd file checked in, upon checkin V1 is produced
-	...
-	FN is the Nth file checked in, upon checkin VN is produced
-
-	D0 is the 1st delta produced, upon checkin of F1
-	D1 is the 2nd delta produced, upon checkin of F2
-	...
-	DN is the Nth delta produced, upon checkin of FN+1
-
-Checkin goes as follows:
-
-	V0 <- F0
-
-	D0 <- delta (F1, F0)
-	V1 <- F1 + D0
-
-	D1 <- delta (F2 + D0, F1)
-	V2 <- F2 + D1 + D0
-
-	D2 <- delta (F3 + D1 + D0, F2)
-	V3 <- F3 + D2 + D1 + D0
-
-an so on.  To checkout:
-
-	F3 is immediate
-	F2 <- patch (F3 + D1 + D0, D2)
-	F1 <- patch (F2 + D0, D1)
-	F0 <- patch (F1, D0)
-
-Now I will describe why this is just as good as branching, even though
-the branches do not affect which files are used for computing deltas,
-and why it is just as good as the RCS model.
-
-Suppose the we have a sequence of checkins with RCS version numbers
-and their sequence numbers:
-
-F0: 1.1
-F1: 1.2
-F2: 1.3
-F3: 1.1.1.1
-F4: 1.1.1.2
-F5: 1.1.1.3
-F6: 1.4
-F7: 1.5
-
-Of interest are the 4th checkin, from 1.3 to 1.1.1.1, and the 7th
-checkin, from 1.1.1.3 to 1.4.  I will now use the '-' sign to denote
-the taking of a delta and '+' for patch.  Thus, delta(A,B) is
-equivalent to B-A, and patch(A,xdelta(A,B)) is equivalent to A+(B-A),
-or B, as it should be.  In the above sequence of checkins, these
-deltas are produced:
-
-D0=F0-F1
-D1=F1-F2-D0
-D2=F2-F3-D1-D0
-D3=F3-F4-D2-D1-D0
-D4=F4-F5-D3-D2-D1-D0
-D5=F5-F6-D4-D3-D2-D1-D0
-D6=F6-F7-D5-D4-D3-D2-D1-D0
-
-The checkin of F3 produces D2.  The concern is whether any deletions
-between 1.1 and 1.3 have to be inserted into D2.  They have not,
-because any deletions between 1.1 and 1.3 appear in D0 and D1.  Thus,
-using D0, D1, and F3 as an input for computing D2 achieves the desired
-results: the deletion is not reflected twice in separate deltas.
-
-The properties of this version format differ from the RCS format.  It
-ensures that recently checked-in versions are quick to check out, no
-matter where in the "tree" they occur.  One potential problem with
-this formulation is that according to the formulae above, the ammount
-of memory and computation required for checking in each new version
-grows with the number of previous versions.  To prevent this it only
-uses up to some limiting value (currently set at 10) of previous
-deltas in computing new deltas.  This means that deletions on a branch
-over 10 checkins old will be duplicated on a checkin.  Another
-property is that to delete a version all older versions must first be
-deleted or off-line processing may be done to reconstruct the delta
-log with the version removed (an expensive but not time-critical
-operation).  It also lends itself to keeping windowed histories, such
-as "only keep the last 20 revisions, anything earlier than that may be
-thrown away".
-
----
-
-The current prototype implementation is complete.  The program,
-'xdelta', has a command syntax similar to PRCS:
-
-xdelta COMMAND [OPTIONS] ARG1 ARG2 ...
-
-There are 5 commands:
-
- register  Create a versionfile
- checkin   Checkin a new version
- checkout  Checkout a version (latest or -r VERSION)
- info      Show information on a version (all or -r VERSION)
- delta     Produce a delta from ARG1 to ARG2 producing ARG3
- patch     Patch file ARG1 with ARG2 producing ARG3
-
-The -r option specifies a version (the latest version is always the
-default).  Versions are sequentially numbered.  The file itself is a
-GNU DBM database which stores the deltas and latest version.  The file
-format is subject to change as I look for storage methods that are
-portable to Windows (otherwise, the delta generator has already been
-tested on Windows).
-
-The DELTA and PATCH commands are unrelated to the rest, and simply
-serve to generate deltas (and could be used as replacements for "diff"
-and "patch").  All operations are verified with MD5 checksums, which
-are saved for each version in the version file and the FROM and TO
-files when generating and applying deltas.
-
----
-
-Some timing comparisons (remember all times include MD5 computations)
-on a 300MHz Alpha:
-
-Two kernels:
-
-   7875136 /genvmunix
-   6677320 /vmunix
-
-And the times to compress and delta them:
-
-					time(s)	size(bytes)
-
-xdelta delta /genvmunix /vmunix patch:	10.1	4338029
-gzip --best /genvmunix:			34.0	3413963
-gzip --best /vmunix:			63.7	2501766
-gzip --best patch:			51.9	1531163
-
-(I'm not sure why there's such a variation in gzip times--it was
-repeatable.)
-
-xdelta produces a patch which is 4338029 bytes, approximately 2.6
-megabytes smaller than its TO file.  The total space required for
-saving the two kernels compressed is 5915729 bytes, produced in 97.7
-seconds.  The total space for storing /genvmunix and the delta
-compressed is 4945126 bytes, produced in 96.0 seconds.  Using the
-delta saved approximately one megabyte and two seconds.
-
----
-
-All the files in PRCS 1.0.0 and 1.2.0 concatenated (files ordered
-lexigraphically):
-
-   1678837 /tmp/1.0.0
-   2021819 /tmp/1.2.0
-
-							time(s)	space(bytes)
-
-xdelta delta /tmp/1.0.0 /tmp/1.2.0 /tmp/patch:		1.5	695893
-gdiff -a --rcs /tmp/1.0.0 /tmp/1.2.0 /tmp/patch:	3.7	1524705
-
-Even though both files were completely text, gdiff missed a lot of
-similar regions because a line had been partially modified.  Xdelta
-performs better in both dimensions.
-
----
-
-All the .c and .h files in GNU gcc 2.7.2 and 2.7.2.3 concatenated (files
-ordered lexigraphically).  These files are highly similar:
-
-  15590954 /tmp/2.7.2
-  15599644 /tmp/2.7.2.3
-							time(s)	space(bytes)
-
-xdelta delta /tmp/2.7.2 /tmp/2.7.2.3 /tmp/patch:	7.3	11922
-gdiff -a --rcs /tmp/2.7.2 /tmp/2.7.2.3:			4.8	19477
-
-Here xdelta takes a about 50% longer and produces a slightly smaller
-diff.  I think the improved performance in the other two examples
-makes xdelta an all around winner.
-
----
-
-A note on FrameMaker files, which I happen to know a fair bit about.
-Even making radical modifications to a Frame+SGML document leaves much
-of its content unchanged, due to the fact that the EDD and various
-catalogs do not change.  When only minimal modifications are performed
-
-Take a 24 page document, chapter 15 of the SGML developers guide for
-version 5.5. I replaced 38 occurances of "document" with
-"abcdefghijklmnop" and added the source of this README from here up at
-the end, making it 33 pages.
-
-  250880 graphics.before.fm
-  272384 graphics.after.fm
-
-The document grew by about 22 kbytes.  The patch 32780 bytes.  The
-same diff generated by GNU diff is 101097 bytes.  Clearly, Frame
-documents are very good for binary version control.
-
----
-
-I also have evidence that the program works well on uncompressed image
-formats such as the GIMP's XCF file format.
-
-This program will be the basis of PRCS version 2, which will support
-client/server distributed version control.
-
-Please send any comments, questions, thoughts, or interest to me,
-jmacd@cs.berkeley.edu.
-- 
cgit v1.2.3