import 1.1.3

author: dotdotisdead <dotdotisdead@a3eca27d-f21b-0410-9b4a-6511e771f64e> 2006-09-30 04:47:47 +0000
committer: dotdotisdead <dotdotisdead@a3eca27d-f21b-0410-9b4a-6511e771f64e> 2006-09-30 04:47:47 +0000
commit: ad85653ca73c8126de516b9a4294e8f08577c00d (patch)
tree: 79fb4d644ccf6a4fe8dac146b801a21d63537b23 /xdelta1/xdelta-0.13.README
parent: 5a7c245871879325d7b05c06e0b2011203986ee8 (diff)
1 files changed, 311 insertions, 0 deletions
diff --git a/xdelta1/xdelta-0.13.README b/xdelta1/xdelta-0.13.README
new file mode 100755
index 0000000..f928f66
--- /dev/null
+++ b/xdelta1/xdelta-0.13.README
@@ -0,0 +1,311 @@
+I have placed version 0.13 of XDELTA, new delta generator and
+prototype RCS-replacement program at:
+  ftp://www.xcf.berkeley.edu/pub/jmacd/xdelta-0.13.tar.gz
+======
+Hereafter, the algorithm and program are described.  A delta algorithm
+operates on two files, a FROM file and a TO file.  It computes enough
+informnation to reconstruct the TO file from the FROM file.  So the
+delta and it's inverse, patch, work as follows:
+P = delta (FROM, TO)
+TO = patch (FROM, P)
+The delta generator portion of this program is a delta algorithm which
+searches for substring matches between the files and then outputs
+instructions to reconstruct the new file from the old file.  It
+produces a set of copy/insert instructions that tell how to
+reconstruct the file as a sequence of copies from the FROM file and
+inserts from the delta itself.  In this regard, the program is much
+closer to a compression program than to a diff program.  However, the
+delta is not "compressed", in that the delta's entropy H(P) will be
+very similar to the entropy of the portions of the TO file not found
+within the FROM file.  The delta will compress just as well as the TO
+file will.  This is a fundamentally different method of computing
+deltas than in the traditional "diff" program.  The diff program and
+it's variants use a least-common-subsequence (LCS) algorithm to find a
+list of inserts and deletes that will modify the FROM file into the TO
+file.  LCS is more expensive to compute and is sometimes more useful,
+especially to the human reader.  Since LCS is a fairly expensive
+algorithm, diff programs usually divide the input files into
+newline-separated "atoms" before computing a delta.  This is a fine
+approximation for text files, but not binary files.
+I propose this delta generator as a replacement for "diff" in
+applications where the insert/delete delta is not required.  Since the
+copy/insert delta is easier to compute, it does not have to make the
+reduction in input size by breaking the file into lines.  The delta
+generator works well on binary files.
+The next question is whether it is actually desirable to compute
+deltas between binary files.  The answer is certainly yes, even though
+some binary file formats will not have a great degree of similarity
+between versions which were generated between minor modifications to
+their sources.  First, I have evidence that some file formats (notably
+FrameMaker documents) work very well.  Machine-dependant object files
+and executables don't work very well, but it is still worth doing.
+The reason it is still worthwhile is that compression takes longer
+than finding these deltas, so any space savings the delta generator
+produces it very likely to reduce the total archival time.  Even if
+the delta generator saves no space, the total time well end up less
+than twice the time of compression.  I will include some measurements
+at the end of this writing.
+With the delta generator as a building block, I have written a
+replacement for RCS.  It does not have all the user-level features
+someone would want in RCS, but these things only get in the way when
+the code is used by higher level version control programs such as
+PRCS.  For now, the prototype supports the following actions:
+register                create an empty version file
+checkin                 create a new version
+checkout                extract an old version
+The program departs from the RCS model in that it does not branch it's
+versions.  The reason for doing this is to solve several problems with
+RCS (aside from it's use of diff which is solved by using my delta
+generator).
+1. Development very often ends up off the "main trunk" of the RCS
+file, which causes an O(N) extract, where N deltas must be applied to
+the head of the trunk before getting the desired file.  Once this
+occurs, things get worse and worse.  Of course, it is possible to
+remedy the situation by checking it in once on the trunk, but that
+defeats the purpose of branching, because you lose the information
+about branch of development you are on.  Further, the notion of
+branching individual files conflicts with the model presented by
+higher level version control programs such as PRCS, which would rather
+not have to go to extra trouble to insure that development does not
+end up off the trunk.
+2. Deltas are never used more than once in a particular version
+history, even if they contain the same data as another delta.
+Conceptually, to get rid of RCS-like branching and retain the "deltas
+are computed between a file and its parent", the xdelta version
+control library computes a reverse delta between the new file and the
+entire version file.  The most recently checked in version is always
+the "head", in RCS terms.  It is always the quickest and easiest to
+extract.  For a repository with N versions, version I requires the
+application of (N-I) deltas, regardless of how the set of versions are
+related.
+Each delta contains a set of instructions and all data which was
+originally part of a file version but missing in the FROM file.  This
+data is represented without compression or modification.  As a result,
+each delta may be used as an input for finding future deltas just as a
+regular file may be.
+Several equations describe the process, first some notation:
+        V0 is the version file with 1 version
+        V1 is the version file with 2 versions,
+        ...
+        VN is the version file with N+1 versions
+        F0 is the 1st file checked in, upon checkin V0 is produced
+        F1 is the 2nd file checked in, upon checkin V1 is produced
+        ...
+        FN is the Nth file checked in, upon checkin VN is produced
+        D0 is the 1st delta produced, upon checkin of F1
+        D1 is the 2nd delta produced, upon checkin of F2
+        ...
+        DN is the Nth delta produced, upon checkin of FN+1
+Checkin goes as follows:
+        V0 <- F0
+        D0 <- delta (F1, F0)
+        V1 <- F1 + D0
+        D1 <- delta (F2 + D0, F1)
+        V2 <- F2 + D1 + D0
+        D2 <- delta (F3 + D1 + D0, F2)
+        V3 <- F3 + D2 + D1 + D0
+an so on.  To checkout:
+        F3 is immediate
+        F2 <- patch (F3 + D1 + D0, D2)
+        F1 <- patch (F2 + D0, D1)
+        F0 <- patch (F1, D0)
+Now I will describe why this is just as good as branching, even though
+the branches do not affect which files are used for computing deltas,
+and why it is just as good as the RCS model.
+Suppose the we have a sequence of checkins with RCS version numbers
+and their sequence numbers:
+F0: 1.1
+F1: 1.2
+F2: 1.3
+F3: 1.1.1.1
+F4: 1.1.1.2
+F5: 1.1.1.3
+F6: 1.4
+F7: 1.5
+Of interest are the 4th checkin, from 1.3 to 1.1.1.1, and the 7th
+checkin, from 1.1.1.3 to 1.4.  I will now use the '-' sign to denote
+the taking of a delta and '+' for patch.  Thus, delta(A,B) is
+equivalent to B-A, and patch(A,xdelta(A,B)) is equivalent to A+(B-A),
+or B, as it should be.  In the above sequence of checkins, these
+deltas are produced:
+D0=F0-F1
+D1=F1-F2-D0
+D2=F2-F3-D1-D0
+D3=F3-F4-D2-D1-D0
+D4=F4-F5-D3-D2-D1-D0
+D5=F5-F6-D4-D3-D2-D1-D0
+D6=F6-F7-D5-D4-D3-D2-D1-D0
+The checkin of F3 produces D2.  The concern is whether any deletions
+between 1.1 and 1.3 have to be inserted into D2.  They have not,
+because any deletions between 1.1 and 1.3 appear in D0 and D1.  Thus,
+using D0, D1, and F3 as an input for computing D2 achieves the desired
+results: the deletion is not reflected twice in separate deltas.
+The properties of this version format differ from the RCS format.  It
+ensures that recently checked-in versions are quick to check out, no
+matter where in the "tree" they occur.  One potential problem with
+this formulation is that according to the formulae above, the ammount
+of memory and computation required for checking in each new version
+grows with the number of previous versions.  To prevent this it only
+uses up to some limiting value (currently set at 10) of previous
+deltas in computing new deltas.  This means that deletions on a branch
+over 10 checkins old will be duplicated on a checkin.  Another
+property is that to delete a version all older versions must first be
+deleted or off-line processing may be done to reconstruct the delta
+log with the version removed (an expensive but not time-critical
+operation).  It also lends itself to keeping windowed histories, such
+as "only keep the last 20 revisions, anything earlier than that may be
+thrown away".
+---
+The current prototype implementation is complete.  The program,
+'xdelta', has a command syntax similar to PRCS:
+xdelta COMMAND [OPTIONS] ARG1 ARG2 ...
+There are 5 commands:
+ register  Create a versionfile
+ checkin   Checkin a new version
+ checkout  Checkout a version (latest or -r VERSION)
+ info      Show information on a version (all or -r VERSION)
+ delta     Produce a delta from ARG1 to ARG2 producing ARG3
+ patch     Patch file ARG1 with ARG2 producing ARG3
+The -r option specifies a version (the latest version is always the
+default).  Versions are sequentially numbered.  The file itself is a
+GNU DBM database which stores the deltas and latest version.  The file
+format is subject to change as I look for storage methods that are
+portable to Windows (otherwise, the delta generator has already been
+tested on Windows).
+The DELTA and PATCH commands are unrelated to the rest, and simply
+serve to generate deltas (and could be used as replacements for "diff"
+and "patch").  All operations are verified with MD5 checksums, which
+are saved for each version in the version file and the FROM and TO
+files when generating and applying deltas.
+---
+Some timing comparisons (remember all times include MD5 computations)
+on a 300MHz Alpha:
+Two kernels:
+   7875136 /genvmunix
+   6677320 /vmunix
+And the times to compress and delta them:
+                                        time(s) size(bytes)
+xdelta delta /genvmunix /vmunix patch:  10.1    4338029
+gzip --best /genvmunix:                 34.0    3413963
+gzip --best /vmunix:                    63.7    2501766
+gzip --best patch:                      51.9    1531163
+(I'm not sure why there's such a variation in gzip times--it was
+repeatable.)
+xdelta produces a patch which is 4338029 bytes, approximately 2.6
+megabytes smaller than its TO file.  The total space required for
+saving the two kernels compressed is 5915729 bytes, produced in 97.7
+seconds.  The total space for storing /genvmunix and the delta
+compressed is 4945126 bytes, produced in 96.0 seconds.  Using the
+delta saved approximately one megabyte and two seconds.
+---
+All the files in PRCS 1.0.0 and 1.2.0 concatenated (files ordered
+lexigraphically):
+   1678837 /tmp/1.0.0
+   2021819 /tmp/1.2.0
+                                                        time(s) space(bytes)
+xdelta delta /tmp/1.0.0 /tmp/1.2.0 /tmp/patch:          1.5     695893
+gdiff -a --rcs /tmp/1.0.0 /tmp/1.2.0 /tmp/patch:        3.7     1524705
+Even though both files were completely text, gdiff missed a lot of
+similar regions because a line had been partially modified.  Xdelta
+performs better in both dimensions.
+---
+All the .c and .h files in GNU gcc 2.7.2 and 2.7.2.3 concatenated (files
+ordered lexigraphically).  These files are highly similar:
+  15590954 /tmp/2.7.2
+  15599644 /tmp/2.7.2.3
+                                                        time(s) space(bytes)
+xdelta delta /tmp/2.7.2 /tmp/2.7.2.3 /tmp/patch:        7.3     11922
+gdiff -a --rcs /tmp/2.7.2 /tmp/2.7.2.3:                 4.8     19477
+Here xdelta takes a about 50% longer and produces a slightly smaller
+diff.  I think the improved performance in the other two examples
+makes xdelta an all around winner.
+---
+A note on FrameMaker files, which I happen to know a fair bit about.
+Even making radical modifications to a Frame+SGML document leaves much
+of its content unchanged, due to the fact that the EDD and various
+catalogs do not change.  When only minimal modifications are performed
+Take a 24 page document, chapter 15 of the SGML developers guide for
+version 5.5. I replaced 38 occurances of "document" with
+"abcdefghijklmnop" and added the source of this README from here up at
+the end, making it 33 pages.
+  250880 graphics.before.fm
+  272384 graphics.after.fm
+The document grew by about 22 kbytes.  The patch 32780 bytes.  The
+same diff generated by GNU diff is 101097 bytes.  Clearly, Frame
+documents are very good for binary version control.
+---
+I also have evidence that the program works well on uncompressed image
+formats such as the GIMP's XCF file format.
+This program will be the basis of PRCS version 2, which will support
+client/server distributed version control.
+Please send any comments, questions, thoughts, or interest to me,
+jmacd@cs.berkeley.edu.
author	dotdotisdead <dotdotisdead@a3eca27d-f21b-0410-9b4a-6511e771f64e>	2006-09-30 04:47:47 +0000
committer	dotdotisdead <dotdotisdead@a3eca27d-f21b-0410-9b4a-6511e771f64e>	2006-09-30 04:47:47 +0000
commit	ad85653ca73c8126de516b9a4294e8f08577c00d (patch)
tree	79fb4d644ccf6a4fe8dac146b801a21d63537b23 /xdelta1/xdelta-0.13.README
parent	5a7c245871879325d7b05c06e0b2011203986ee8 (diff)

diff --git a/xdelta1/xdelta-0.13.README b/xdelta1/xdelta-0.13.README new file mode 100755 index 0000000..f928f66 --- /dev/null +++ b/xdelta1/xdelta-0.13.README
@@ -0,0 +1,311 @@
	1
	2	I have placed version 0.13 of XDELTA, new delta generator and
	3	prototype RCS-replacement program at:
	4
	5	ftp://www.xcf.berkeley.edu/pub/jmacd/xdelta-0.13.tar.gz
	6
	7	======
	8
	9	Hereafter, the algorithm and program are described. A delta algorithm
	10	operates on two files, a FROM file and a TO file. It computes enough
	11	informnation to reconstruct the TO file from the FROM file. So the
	12	delta and it's inverse, patch, work as follows:
	13
	14	P = delta (FROM, TO)
	15	TO = patch (FROM, P)
	16
	17	The delta generator portion of this program is a delta algorithm which
	18	searches for substring matches between the files and then outputs
	19	instructions to reconstruct the new file from the old file. It
	20	produces a set of copy/insert instructions that tell how to
	21	reconstruct the file as a sequence of copies from the FROM file and
	22	inserts from the delta itself. In this regard, the program is much
	23	closer to a compression program than to a diff program. However, the
	24	delta is not "compressed", in that the delta's entropy H(P) will be
	25	very similar to the entropy of the portions of the TO file not found
	26	within the FROM file. The delta will compress just as well as the TO
	27	file will. This is a fundamentally different method of computing
	28	deltas than in the traditional "diff" program. The diff program and
	29	it's variants use a least-common-subsequence (LCS) algorithm to find a
	30	list of inserts and deletes that will modify the FROM file into the TO
	31	file. LCS is more expensive to compute and is sometimes more useful,
	32	especially to the human reader. Since LCS is a fairly expensive
	33	algorithm, diff programs usually divide the input files into
	34	newline-separated "atoms" before computing a delta. This is a fine
	35	approximation for text files, but not binary files.
	36
	37	I propose this delta generator as a replacement for "diff" in
	38	applications where the insert/delete delta is not required. Since the
	39	copy/insert delta is easier to compute, it does not have to make the
	40	reduction in input size by breaking the file into lines. The delta
	41	generator works well on binary files.
	42
	43	The next question is whether it is actually desirable to compute
	44	deltas between binary files. The answer is certainly yes, even though
	45	some binary file formats will not have a great degree of similarity
	46	between versions which were generated between minor modifications to
	47	their sources. First, I have evidence that some file formats (notably
	48	FrameMaker documents) work very well. Machine-dependant object files
	49	and executables don't work very well, but it is still worth doing.
	50	The reason it is still worthwhile is that compression takes longer
	51	than finding these deltas, so any space savings the delta generator
	52	produces it very likely to reduce the total archival time. Even if
	53	the delta generator saves no space, the total time well end up less
	54	than twice the time of compression. I will include some measurements
	55	at the end of this writing.
	56
	57	With the delta generator as a building block, I have written a
	58	replacement for RCS. It does not have all the user-level features
	59	someone would want in RCS, but these things only get in the way when
	60	the code is used by higher level version control programs such as
	61	PRCS. For now, the prototype supports the following actions:
	62
	63	register create an empty version file
	64	checkin create a new version
	65	checkout extract an old version
	66
	67	The program departs from the RCS model in that it does not branch it's
	68	versions. The reason for doing this is to solve several problems with
	69	RCS (aside from it's use of diff which is solved by using my delta
	70	generator).
	71
	72	1. Development very often ends up off the "main trunk" of the RCS
	73	file, which causes an O(N) extract, where N deltas must be applied to
	74	the head of the trunk before getting the desired file. Once this
	75	occurs, things get worse and worse. Of course, it is possible to
	76	remedy the situation by checking it in once on the trunk, but that
	77	defeats the purpose of branching, because you lose the information
	78	about branch of development you are on. Further, the notion of
	79	branching individual files conflicts with the model presented by
	80	higher level version control programs such as PRCS, which would rather
	81	not have to go to extra trouble to insure that development does not
	82	end up off the trunk.
	83
	84	2. Deltas are never used more than once in a particular version
	85	history, even if they contain the same data as another delta.
	86
	87	Conceptually, to get rid of RCS-like branching and retain the "deltas
	88	are computed between a file and its parent", the xdelta version
	89	control library computes a reverse delta between the new file and the
	90	entire version file. The most recently checked in version is always
	91	the "head", in RCS terms. It is always the quickest and easiest to
	92	extract. For a repository with N versions, version I requires the
	93	application of (N-I) deltas, regardless of how the set of versions are
	94	related.
	95
	96	Each delta contains a set of instructions and all data which was
	97	originally part of a file version but missing in the FROM file. This
	98	data is represented without compression or modification. As a result,
	99	each delta may be used as an input for finding future deltas just as a
	100	regular file may be.
	101
	102	Several equations describe the process, first some notation:
	103
	104	V0 is the version file with 1 version
	105	V1 is the version file with 2 versions,
	106	...
	107	VN is the version file with N+1 versions
	108
	109	F0 is the 1st file checked in, upon checkin V0 is produced
	110	F1 is the 2nd file checked in, upon checkin V1 is produced
	111	...
	112	FN is the Nth file checked in, upon checkin VN is produced
	113
	114	D0 is the 1st delta produced, upon checkin of F1
	115	D1 is the 2nd delta produced, upon checkin of F2
	116	...
	117	DN is the Nth delta produced, upon checkin of FN+1
	118
	119	Checkin goes as follows:
	120
	121	V0 <- F0
	122
	123	D0 <- delta (F1, F0)
	124	V1 <- F1 + D0
	125
	126	D1 <- delta (F2 + D0, F1)
	127	V2 <- F2 + D1 + D0
	128
	129	D2 <- delta (F3 + D1 + D0, F2)
	130	V3 <- F3 + D2 + D1 + D0
	131
	132	an so on. To checkout:
	133
	134	F3 is immediate
	135	F2 <- patch (F3 + D1 + D0, D2)
	136	F1 <- patch (F2 + D0, D1)
	137	F0 <- patch (F1, D0)
	138
	139	Now I will describe why this is just as good as branching, even though
	140	the branches do not affect which files are used for computing deltas,
	141	and why it is just as good as the RCS model.
	142
	143	Suppose the we have a sequence of checkins with RCS version numbers
	144	and their sequence numbers:
	145
	146	F0: 1.1
	147	F1: 1.2
	148	F2: 1.3
	149	F3: 1.1.1.1
	150	F4: 1.1.1.2
	151	F5: 1.1.1.3
	152	F6: 1.4
	153	F7: 1.5
	154
	155	Of interest are the 4th checkin, from 1.3 to 1.1.1.1, and the 7th
	156	checkin, from 1.1.1.3 to 1.4. I will now use the '-' sign to denote
	157	the taking of a delta and '+' for patch. Thus, delta(A,B) is
	158	equivalent to B-A, and patch(A,xdelta(A,B)) is equivalent to A+(B-A),
	159	or B, as it should be. In the above sequence of checkins, these
	160	deltas are produced:
	161
	162	D0=F0-F1
	163	D1=F1-F2-D0
	164	D2=F2-F3-D1-D0
	165	D3=F3-F4-D2-D1-D0
	166	D4=F4-F5-D3-D2-D1-D0
	167	D5=F5-F6-D4-D3-D2-D1-D0
	168	D6=F6-F7-D5-D4-D3-D2-D1-D0
	169
	170	The checkin of F3 produces D2. The concern is whether any deletions
	171	between 1.1 and 1.3 have to be inserted into D2. They have not,
	172	because any deletions between 1.1 and 1.3 appear in D0 and D1. Thus,
	173	using D0, D1, and F3 as an input for computing D2 achieves the desired
	174	results: the deletion is not reflected twice in separate deltas.
	175
	176	The properties of this version format differ from the RCS format. It
	177	ensures that recently checked-in versions are quick to check out, no
	178	matter where in the "tree" they occur. One potential problem with
	179	this formulation is that according to the formulae above, the ammount
	180	of memory and computation required for checking in each new version
	181	grows with the number of previous versions. To prevent this it only
	182	uses up to some limiting value (currently set at 10) of previous
	183	deltas in computing new deltas. This means that deletions on a branch
	184	over 10 checkins old will be duplicated on a checkin. Another
	185	property is that to delete a version all older versions must first be
	186	deleted or off-line processing may be done to reconstruct the delta
	187	log with the version removed (an expensive but not time-critical
	188	operation). It also lends itself to keeping windowed histories, such
	189	as "only keep the last 20 revisions, anything earlier than that may be
	190	thrown away".
	191
	192	---
	193
	194	The current prototype implementation is complete. The program,
	195	'xdelta', has a command syntax similar to PRCS:
	196
	197	xdelta COMMAND [OPTIONS] ARG1 ARG2 ...
	198
	199	There are 5 commands:
	200
	201	register Create a versionfile
	202	checkin Checkin a new version
	203	checkout Checkout a version (latest or -r VERSION)
	204	info Show information on a version (all or -r VERSION)
	205	delta Produce a delta from ARG1 to ARG2 producing ARG3
	206	patch Patch file ARG1 with ARG2 producing ARG3
	207
	208	The -r option specifies a version (the latest version is always the
	209	default). Versions are sequentially numbered. The file itself is a
	210	GNU DBM database which stores the deltas and latest version. The file
	211	format is subject to change as I look for storage methods that are
	212	portable to Windows (otherwise, the delta generator has already been
	213	tested on Windows).
	214
	215	The DELTA and PATCH commands are unrelated to the rest, and simply
	216	serve to generate deltas (and could be used as replacements for "diff"
	217	and "patch"). All operations are verified with MD5 checksums, which
	218	are saved for each version in the version file and the FROM and TO
	219	files when generating and applying deltas.
	220
	221	---
	222
	223	Some timing comparisons (remember all times include MD5 computations)
	224	on a 300MHz Alpha:
	225
	226	Two kernels:
	227
	228	7875136 /genvmunix
	229	6677320 /vmunix
	230
	231	And the times to compress and delta them:
	232
	233	time(s) size(bytes)
	234
	235	xdelta delta /genvmunix /vmunix patch: 10.1 4338029
	236	gzip --best /genvmunix: 34.0 3413963
	237	gzip --best /vmunix: 63.7 2501766
	238	gzip --best patch: 51.9 1531163
	239
	240	(I'm not sure why there's such a variation in gzip times--it was
	241	repeatable.)
	242
	243	xdelta produces a patch which is 4338029 bytes, approximately 2.6
	244	megabytes smaller than its TO file. The total space required for
	245	saving the two kernels compressed is 5915729 bytes, produced in 97.7
	246	seconds. The total space for storing /genvmunix and the delta
	247	compressed is 4945126 bytes, produced in 96.0 seconds. Using the
	248	delta saved approximately one megabyte and two seconds.
	249
	250	---
	251
	252	All the files in PRCS 1.0.0 and 1.2.0 concatenated (files ordered
	253	lexigraphically):
	254
	255	1678837 /tmp/1.0.0
	256	2021819 /tmp/1.2.0
	257
	258	time(s) space(bytes)
	259
	260	xdelta delta /tmp/1.0.0 /tmp/1.2.0 /tmp/patch: 1.5 695893
	261	gdiff -a --rcs /tmp/1.0.0 /tmp/1.2.0 /tmp/patch: 3.7 1524705
	262
	263	Even though both files were completely text, gdiff missed a lot of
	264	similar regions because a line had been partially modified. Xdelta
	265	performs better in both dimensions.
	266
	267	---
	268
	269	All the .c and .h files in GNU gcc 2.7.2 and 2.7.2.3 concatenated (files
	270	ordered lexigraphically). These files are highly similar:
	271
	272	15590954 /tmp/2.7.2
	273	15599644 /tmp/2.7.2.3
	274	time(s) space(bytes)
	275
	276	xdelta delta /tmp/2.7.2 /tmp/2.7.2.3 /tmp/patch: 7.3 11922
	277	gdiff -a --rcs /tmp/2.7.2 /tmp/2.7.2.3: 4.8 19477
	278
	279	Here xdelta takes a about 50% longer and produces a slightly smaller
	280	diff. I think the improved performance in the other two examples
	281	makes xdelta an all around winner.
	282
	283	---
	284
	285	A note on FrameMaker files, which I happen to know a fair bit about.
	286	Even making radical modifications to a Frame+SGML document leaves much
	287	of its content unchanged, due to the fact that the EDD and various
	288	catalogs do not change. When only minimal modifications are performed
	289
	290	Take a 24 page document, chapter 15 of the SGML developers guide for
	291	version 5.5. I replaced 38 occurances of "document" with
	292	"abcdefghijklmnop" and added the source of this README from here up at
	293	the end, making it 33 pages.
	294
	295	250880 graphics.before.fm
	296	272384 graphics.after.fm
	297
	298	The document grew by about 22 kbytes. The patch 32780 bytes. The
	299	same diff generated by GNU diff is 101097 bytes. Clearly, Frame
	300	documents are very good for binary version control.
	301
	302	---
	303
	304	I also have evidence that the program works well on uncompressed image
	305	formats such as the GIMP's XCF file format.
	306
	307	This program will be the basis of PRCS version 2, which will support
	308	client/server distributed version control.
	309
	310	Please send any comments, questions, thoughts, or interest to me,
	311	jmacd@cs.berkeley.edu.