From: "Shawn O. Pearce" <spearce@spearce.org>
To: Steven Grimm <koreth@midwinter.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH] Ignore end-of-line style when computing similarity score for rename detection
Date: Thu, 28 Jun 2007 02:18:21 -0400 [thread overview]
Message-ID: <20070628061821.GM32223@spearce.org> (raw)
In-Reply-To: <20070628060416.GA13162@midwinter.com>
Steven Grimm <koreth@midwinter.com> wrote:
> Junio rightly points out that it would be a mistake to discard \r
> characters from binary files when computing similarity scores. So now we
> only do it if the file contents test as non-binary.
>
> The file attributes aren't available at this level of the code, but they
> could be propagated down from the higher levels if we don't trust
> buffer_is_binary() to make an adequately accurate decision.
Ick. If we can get the attributes into diff_filespec this is
pretty easy, as you can do a crlf->lf conversion on both files if
both are considered to be text, but it doesn't look like it would
be very easy to get the attributes into the diff_filespec.
Actually even better if you can also run the in/out filter things.
I'm thinking of say an XML file that has had whitespace formatting
changes, but whose XSD and processors ignore unnecessary whitespace.
Be nice if the rename detection actually was able to canonicalize
both files before detecting the rename, assuming both files had a
canonicalizer input filter defined that does that...
Of course diff.c defines a nice diff_is_binary() at file scope that
does at least a "can we diff this" decision. Might be good if that
could be reused for the rename detection.
OK, that's far more than I actually know about diffcore. This is
one for Junio, Linus, you, and those who are less tired than I feel
right now... ;-)
Personally I'd rather see us doing the right thing (use attributes
and fallback on guessing if no preference is stated either way)
over doing something half-a**ed (only guessing).
--
Shawn.
next prev parent reply other threads:[~2007-06-28 6:18 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-28 2:39 [PATCH] Ignore end-of-line style when computing similarity score for rename detection Steven Grimm
2007-06-28 2:46 ` Steven Grimm
2007-06-28 7:22 ` Johannes Sixt
2007-06-28 8:16 ` Junio C Hamano
2007-06-28 4:29 ` Junio C Hamano
2007-06-28 6:04 ` Steven Grimm
2007-06-28 6:18 ` Shawn O. Pearce [this message]
2007-06-29 6:34 ` Junio C Hamano
2007-06-28 12:41 ` Johannes Schindelin
2007-06-28 18:17 ` Steven Grimm
2007-06-29 10:19 ` Johannes Schindelin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070628061821.GM32223@spearce.org \
--to=spearce@spearce.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=koreth@midwinter.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).