From: Junio C Hamano <gitster@pobox.com>
To: "Yee Cheng Chin via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Phillip Wood <phillip.wood123@gmail.com>,
Yee Cheng Chin <ychin.git@gmail.com>
Subject: Re: [PATCH v2] xdiff: re-diff shifted change groups when using histogram algorithm
Date: Fri, 13 Mar 2026 00:07:36 -0700 [thread overview]
Message-ID: <xmqqikb08ax3.fsf@gitster.g> (raw)
In-Reply-To: <pull.2120.v2.git.git.1772463265865.gitgitgadget@gmail.com> (Yee Cheng Chin via GitGitGadget's message of "Mon, 02 Mar 2026 14:54:25 +0000")
"Yee Cheng Chin via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Yee Cheng Chin <ychin.git@gmail.com>
>
> After a diff algorithm has been run, the compaction phase
> (xdl_change_compact()) shifts and merges change groups to produce a
> cleaner output. However, this shifting could create a new matched group
> where both sides now have matching lines. This results in a
> wrong-looking diff output which contains redundant lines that are the
> same on both files.
>
> Fix this by detecting this situation, and re-diff the texts on each side
> to find similar lines, using the fall-back Myer's diff. Only do this for
> histogram diff as it's the only algorithm where this is relevant. Below
> contains an example, and more details.
> ...
> This issue is rare in a normal repository. Below is a table of
> repositories (`git log --no-merges -p --histogram -1000`), showing how
> many times a re-diff was done and how many times it resulted in finding
> matching lines (therefore addressing this issue) with the fix. In
> general it is fewer than 1% of diff's that exhibit this offending
> behavior:
>
> | Repo (1k commits) | Re-diff | Found matching lines |
> |--------------------|---------|----------------------|
> | llvm-project | 45 | 11 |
> | vim | 110 | 9 |
> | git | 18 | 2 |
> | WebKit | 168 | 1 |
> | ripgrep | 22 | 1 |
> | cpython | 32 | 0 |
> | vscode | 13 | 0 |
>
> Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com>
> ---
Thanks for the updated patch, and sorry for nobody responding to the
patch for over a week.
The detailed explanation of the issue and the inclusion of the
repository analysis results are very helpful; they clearly show that
while this is a rare edge case, it significantly improves the
quality of histogram diffs when it does occur.
- The removal of go_orig is correct since g and go are kept in sync
throughout the slide loops.
- Clearing the algorithm mask while preserving other flags ensures that
user-provided options like --ignore-all-space are correctly applied
during the re-diff.
- While ignore_regex and anchors are not passed to the sub-diff, they
aren't currently available to xdl_change_compact anyway. Given that
compaction happens before regex filtering in the main pipeline, this
is OK, I guess.
Let me mark the topic for 'next'.
next prev parent reply other threads:[~2026-03-13 7:07 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 20:51 [PATCH] xdiff: re-diff shifted change groups when using histogram algorithm Yee Cheng Chin via GitGitGadget
2026-01-21 20:51 ` Junio C Hamano
2026-01-24 10:54 ` Phillip Wood
2026-01-25 17:34 ` Junio C Hamano
2026-01-26 9:37 ` Phillip Wood
2026-01-26 17:32 ` Junio C Hamano
2026-01-29 16:53 ` Yee Cheng Chin
2026-01-29 20:58 ` Junio C Hamano
2026-01-30 1:58 ` Yee Cheng Chin
2026-01-30 5:43 ` Junio C Hamano
2026-01-30 16:06 ` Phillip Wood
2026-02-20 23:07 ` Junio C Hamano
2026-02-21 9:56 ` Yee Cheng Chin
2026-03-02 14:54 ` [PATCH v2] " Yee Cheng Chin via GitGitGadget
2026-03-13 7:07 ` Junio C Hamano [this message]
2026-03-13 10:23 ` Phillip Wood
2026-03-19 23:30 ` Yee Cheng Chin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqikb08ax3.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=phillip.wood123@gmail.com \
--cc=ychin.git@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.