From: Davide Libenzi <davidel@xmailserver.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Junio C Hamano <junkio@cox.net>, Git Mailing List <git@vger.kernel.org>
Subject: Re: Use a *real* built-in diff generator
Date: Sat, 25 Mar 2006 01:03:38 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0603250025550.1704@alien.or.mcafeemobile.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0603241938510.15714@g5.osdl.org>
On Fri, 24 Mar 2006, Linus Torvalds wrote:
> - the libxdiff algorithm is different, and I bet GNU diff has gotten a
> lot more testing. And the thing is, generating a diff is not an exact
> science - you can get two different diffs (and you will), and they can
> both be perfectly valid. So it's not possible to "validate" the
> libxdiff output by just comparing it against GNU diff.
Correct, the diff(A, B) is not unique. If you look inside the test
directory, there's an xregression binary that does:
1) Random generate A
2) Create B by random changing A
3) Create D=A-B
4) Verify that B+D==A and A-D==B (using the library patch function)
It does and repeat this operation continuosly, for both text (using text
diff/patch) and binary (using binary diff/patch). It ran several days
w/out finding errors, so I've a good confidence about it.
> - GNU diff does some nice eye-candy, like trying to figure out what the
> last function was, and adding that information to the "@@ .." line.
> libxdiff doesn't do that.
This, I don't think is a natural part of a generic text/binary diff/patch
library. If you feel it is important, you could post-process the diff, but
IMO is kinda bogus.
> - The libxdiff thing has some known deficiencies. In particular, it gets
> the "\No newline at end of file" case wrong. So this is currently for
> the experimental branch only. I hope Davide will help fix it.
This, need fix. At the moment, in my projects I enforce the final EOL if
missing (look inside the file-load function inside the test directory).
> Technical note: this is based on libxdiff-0.17, but I did some surgery to
> get rid of the extraneous fat - stuff that git doesn't need, and seriously
> cutting down on mmfile_t, which had much more capabilities than the diff
> algorithm either needed or used. In this version, "mmfile_t" is just a
> trivial <pointer,length> tuple.
>
> That said, I tried to keep the differences to simple removals, so that you
> can do a diff between this and the libxdiff origin, and you'll basically
> see just things getting deleted. Even the mmfile_t simplifications are
> left in a state where the diffs should be readable.
Here you have two options. Either you suck in the libxdiff code and change
it to drop/change the stuff you don't want (the whole libxdiff library
compiled with -O2 is 33KB though). Or you use the library as is, like
you'd use libz & co. Once you have your own load-mmfile, you can pretty
much feed libxdiff as is. Not my choice though, so pick the one you think
best for your project.
I see you use XDF_NEED_MINIMAL. You might want to do some experiments with
and without, to see how diff size changes, versus time.
> Apologies to Davide, whom I'd love to get feedback on this all from (I
> wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old
If you look inside the test directory, I use a similar function. The
reason of the mmfile born for a use I made of the library inside an
embedded device where there was no guarantee of contiguos memory, and dat
could have been generated in chunks. OTOH an mmfile with a single block is
a perfectly valid mmfile ;)
PS: Another solution you have is to libify GNU diff by creating a
diff_main() & co., usual libification wrapping. You'd need to change
the exit() that diff throws with a setjmp/longjmp, and make it call
you own mem alloc/free functions, in order to free up memory diff does
not clear on return. I did it once, not many changes. This solution
will give you all the GNU diff crud, like function names, etc...
- Davide
next prev parent reply other threads:[~2006-03-25 9:04 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-25 4:13 Use a *real* built-in diff generator Linus Torvalds
2006-03-25 6:54 ` Junio C Hamano
2006-03-25 7:39 ` Junio C Hamano
2006-03-25 17:28 ` Linus Torvalds
2006-03-25 9:03 ` Davide Libenzi [this message]
2006-03-25 9:35 ` Marco Costalba
2006-03-25 12:56 ` Alex Riesen
2006-03-25 16:09 ` Linus Torvalds
2006-03-25 13:44 ` Morten Welinder
2006-03-25 15:36 ` Linus Torvalds
2006-03-25 15:56 ` Linus Torvalds
2006-03-25 18:14 ` Davide Libenzi
2006-03-25 18:39 ` Linus Torvalds
2006-03-26 4:11 ` Davide Libenzi
2006-03-26 11:09 ` Ralf Baechle
2006-03-26 18:20 ` Petr Baudis
2006-03-25 18:48 ` Linus Torvalds
2006-03-26 5:33 ` Davide Libenzi
2006-03-25 19:52 ` Junio C Hamano
2006-03-25 20:17 ` Junio C Hamano
2006-03-25 20:48 ` Linus Torvalds
2006-03-25 21:06 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0603250025550.1704@alien.or.mcafeemobile.com \
--to=davidel@xmailserver.org \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).