From: Stefan Beller <sbeller@google.com>
To: jamespharvey20@gmail.com
Cc: git <git@vger.kernel.org>
Subject: Re: Using --word-diff breaks --color-moved
Date: Wed, 31 Oct 2018 10:41:52 -0700 [thread overview]
Message-ID: <CAGZ79kZ6LxRevLy2mZd1Ag=oO_NtDdmRSuadswR_n=RGpO=rGQ@mail.gmail.com> (raw)
In-Reply-To: <CA+X5Wn76N34oBhRZvXKOwP0L_pF=LYbT6ugTgtPYSvnHg=MZVw@mail.gmail.com>
On Tue, Oct 30, 2018 at 7:06 PM james harvey <jamespharvey20@gmail.com> wrote:
>
> If you use both "--word-diff" and "--color-moved", regardless of the
> order of arguments, "--word-diff" takes precedence and "--color-moved"
> isn't allowed to do anything.
The order of arguments doesn't matter here, as these just set internal
flags at parse time, which determine what later stages do.
Git uses the xdiff library internally for producing diffs[1].
To produce a diff, we have to feed two "streams of symbols"
to the library which then figures out the diff.
Usually a symbol is a whole line. Once we have the diff
we need to make it look nice again (i.e. put file names,
context markers and lines around the diff), which happens
in diff.c.
But when --word-diff is given, each line is broken up
into words and those are used as symbols for the finding
the diff[2]. See the function fn_out_consume() [3],
for example 'ecbdata->diff_words' is set on '--word-diff'.
When it is not set we fall down to the switch case that
will call emit_{add, del, context}_line(), which in turn
emits the lines.
The --color-moved step is performed after all diffing
(and nicing up) is done already and solely works on
the add/del lines. The word diff is piecing together lines
for output, which are completely ignored for move
detection.
[1] see the xdiff/ dir in your copy of git. We have some
substantial changes compared to unmaintained upstream
http://www.xmailserver.org/xdiff-lib.html
http://www.xmailserver.org/xdiff.html
[2] https://github.com/git/git/blob/master/diff.c#L1872
[3] https://github.com/git/git/blob/master/diff.c#L2259
> I think "--color-moved" should have precedence over "--word-diff".
I agree for precedence as in "work well together". Now we'd need
to figure out what that means. In its current form, the move
detection can detect moved lines across diff hunks or file
boundaries.
Should that also be the case for word diffing?
I think word diffing is mostly used for free text, which has different
properties compared to code, that the color-moved was originally
intended for.
For example in code we often have few characters on a line
such as "<TAB> }" which is found often in gits code base.
We added some heuristics that lines showing up often with
few characters would not be detected on their own as a moved
block [4]. I would expect we'd have to figure out a similar heuristic
for word diffing, if we go down that route.
But that is a detail; we'd first have to figure out how to make the
words work with the move detection.
[4] https://github.com/git/git/commit/f0b8fb6e591b50b72b921f2c4cf120ebd284f510
> I
> cannot think of a scenario where a user would supply both options, and
> actually want "--word-diff" to take precedence. If I'm not thinking
> of a scenario where this wouldn't be desired, perhaps whichever is
> first as an argument could take precedence.
word diffing and move detection are completely orthogonal at the moment.
Instead of option order, I'd rather introduce a new option that tells us
how to resolve some corner case. Or in the short term we might just
want to raise an error?
> (The same behavior happens if 4+ lines are moved and
> "--color-moved{default=zebra}" is used, but below
> "--color-moved=plain" is used to be a smaller testcase.)
>
> [...]
This sounds like you are asking for two things:
(1) make color-moved work with words (somehow)
(2) allow the user to fine tune the heuristics for a block,
such that default=zebra would still work.
next prev parent reply other threads:[~2018-10-31 17:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-31 2:05 Using --word-diff breaks --color-moved james harvey
2018-10-31 4:27 ` Junio C Hamano
2018-10-31 7:07 ` james harvey
2018-10-31 17:43 ` Stefan Beller
2018-10-31 17:41 ` Stefan Beller [this message]
2018-11-02 1:18 ` james harvey
2018-11-02 20:46 ` Stefan Beller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGZ79kZ6LxRevLy2mZd1Ag=oO_NtDdmRSuadswR_n=RGpO=rGQ@mail.gmail.com' \
--to=sbeller@google.com \
--cc=git@vger.kernel.org \
--cc=jamespharvey20@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).