git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Davide Libenzi <davidel@xmailserver.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Junio C Hamano <junkio@cox.net>, Git Mailing List <git@vger.kernel.org>
Subject: Re: Use a *real* built-in diff generator
Date: Sat, 25 Mar 2006 01:03:38 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0603250025550.1704@alien.or.mcafeemobile.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0603241938510.15714@g5.osdl.org>

On Fri, 24 Mar 2006, Linus Torvalds wrote:

> - the libxdiff algorithm is different, and I bet GNU diff has gotten a
>   lot more testing. And the thing is, generating a diff is not an exact
>   science - you can get two different diffs (and you will), and they can
>   both be perfectly valid. So it's not possible to "validate" the
>   libxdiff output by just comparing it against GNU diff.

Correct, the diff(A, B) is not unique. If you look inside the test 
directory, there's an xregression binary that does:

1) Random generate A
2) Create B by random changing A
3) Create D=A-B
4) Verify that B+D==A and A-D==B (using the library patch function)

It does and repeat this operation continuosly, for both text (using text 
diff/patch) and binary (using binary diff/patch). It ran several days 
w/out finding errors, so I've a good confidence about it.



> - GNU diff does some nice eye-candy, like trying to figure out what the
>   last function was, and adding that information to the "@@ .." line.
>   libxdiff doesn't do that.

This, I don't think is a natural part of a generic text/binary diff/patch 
library. If you feel it is important, you could post-process the diff, but 
IMO is kinda bogus.



> - The libxdiff thing has some known deficiencies. In particular, it gets
>   the "\No newline at end of file" case wrong. So this is currently for
>   the experimental branch only. I hope Davide will help fix it.

This, need fix. At the moment, in my projects I enforce the final EOL if 
missing (look inside the file-load function inside the test directory).



> Technical note: this is based on libxdiff-0.17, but I did some surgery to
> get rid of the extraneous fat - stuff that git doesn't need, and seriously
> cutting down on mmfile_t, which had much more capabilities than the diff
> algorithm either needed or used. In this version, "mmfile_t" is just a
> trivial <pointer,length> tuple.
>
> That said, I tried to keep the differences to simple removals, so that you
> can do a diff between this and the libxdiff origin, and you'll basically
> see just things getting deleted. Even the mmfile_t simplifications are
> left in a state where the diffs should be readable.

Here you have two options. Either you suck in the libxdiff code and change 
it to drop/change the stuff you don't want (the whole libxdiff library 
compiled with -O2 is 33KB though). Or you use the library as is, like 
you'd use libz & co. Once you have your own load-mmfile, you can pretty 
much feed libxdiff as is. Not my choice though, so pick the one you think 
best for your project.
I see you use XDF_NEED_MINIMAL. You might want to do some experiments with 
and without, to see how diff size changes, versus time.



> Apologies to Davide, whom I'd love to get feedback on this all from (I
> wrote my own "fill_mmfile()" for the new simpler mmfile_t format: the old

If you look inside the test directory, I use a similar function. The 
reason of the mmfile born for a use I made of the library inside an 
embedded device where there was no guarantee of contiguos memory, and dat 
could have been generated in chunks. OTOH an mmfile with a single block is 
a perfectly valid mmfile ;)



PS: Another solution you have is to libify GNU diff by creating a
     diff_main() & co., usual libification wrapping. You'd need to change
     the exit() that diff throws with a setjmp/longjmp, and make it call
     you own mem alloc/free functions, in order to free up memory diff does
     not clear on return. I did it once, not many changes. This solution
     will give you all the GNU diff crud, like function names, etc...



- Davide

  parent reply	other threads:[~2006-03-25  9:04 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-25  4:13 Use a *real* built-in diff generator Linus Torvalds
2006-03-25  6:54 ` Junio C Hamano
2006-03-25  7:39   ` Junio C Hamano
2006-03-25 17:28   ` Linus Torvalds
2006-03-25  9:03 ` Davide Libenzi [this message]
2006-03-25  9:35 ` Marco Costalba
2006-03-25 12:56 ` Alex Riesen
2006-03-25 16:09   ` Linus Torvalds
2006-03-25 13:44 ` Morten Welinder
2006-03-25 15:36   ` Linus Torvalds
2006-03-25 15:56     ` Linus Torvalds
2006-03-25 18:14       ` Davide Libenzi
2006-03-25 18:39         ` Linus Torvalds
2006-03-26  4:11           ` Davide Libenzi
2006-03-26 11:09           ` Ralf Baechle
2006-03-26 18:20             ` Petr Baudis
2006-03-25 18:48         ` Linus Torvalds
2006-03-26  5:33           ` Davide Libenzi
2006-03-25 19:52       ` Junio C Hamano
2006-03-25 20:17       ` Junio C Hamano
2006-03-25 20:48         ` Linus Torvalds
2006-03-25 21:06         ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0603250025550.1704@alien.or.mcafeemobile.com \
    --to=davidel@xmailserver.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).