All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Yee Cheng Chin via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,  Phillip Wood <phillip.wood123@gmail.com>,
	 Yee Cheng Chin <ychin.git@gmail.com>
Subject: Re: [PATCH v2] xdiff: re-diff shifted change groups when using histogram algorithm
Date: Fri, 13 Mar 2026 00:07:36 -0700	[thread overview]
Message-ID: <xmqqikb08ax3.fsf@gitster.g> (raw)
In-Reply-To: <pull.2120.v2.git.git.1772463265865.gitgitgadget@gmail.com> (Yee Cheng Chin via GitGitGadget's message of "Mon, 02 Mar 2026 14:54:25 +0000")

"Yee Cheng Chin via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Yee Cheng Chin <ychin.git@gmail.com>
>
> After a diff algorithm has been run, the compaction phase
> (xdl_change_compact()) shifts and merges change groups to produce a
> cleaner output. However, this shifting could create a new matched group
> where both sides now have matching lines. This results in a
> wrong-looking diff output which contains redundant lines that are the
> same on both files.
>
> Fix this by detecting this situation, and re-diff the texts on each side
> to find similar lines, using the fall-back Myer's diff. Only do this for
> histogram diff as it's the only algorithm where this is relevant. Below
> contains an example, and more details.
> ...
> This issue is rare in a normal repository. Below is a table of
> repositories (`git log --no-merges -p --histogram -1000`), showing how
> many times a re-diff was done and how many times it resulted in finding
> matching lines (therefore addressing this issue) with the fix. In
> general it is fewer than 1% of diff's that exhibit this offending
> behavior:
>
> | Repo (1k commits)  | Re-diff | Found matching lines |
> |--------------------|---------|----------------------|
> | llvm-project       |  45     | 11                   |
> | vim                | 110     |  9                   |
> | git                |  18     |  2                   |
> | WebKit             | 168     |  1                   |
> | ripgrep            |  22     |  1                   |
> | cpython            |  32     |  0                   |
> | vscode             |  13     |  0                   |
>
> Signed-off-by: Yee Cheng Chin <ychin.git@gmail.com>
> ---

Thanks for the updated patch, and sorry for nobody responding to the
patch for over a week.

The detailed explanation of the issue and the inclusion of the
repository analysis results are very helpful; they clearly show that
while this is a rare edge case, it significantly improves the
quality of histogram diffs when it does occur.

 - The removal of go_orig is correct since g and go are kept in sync 
   throughout the slide loops.

 - Clearing the algorithm mask while preserving other flags ensures that 
   user-provided options like --ignore-all-space are correctly applied 
   during the re-diff.

 - While ignore_regex and anchors are not passed to the sub-diff, they 
   aren't currently available to xdl_change_compact anyway. Given that 
   compaction happens before regex filtering in the main pipeline, this
   is OK, I guess.

Let me mark the topic for 'next'.


  reply	other threads:[~2026-03-13  7:07 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-06 20:51 [PATCH] xdiff: re-diff shifted change groups when using histogram algorithm Yee Cheng Chin via GitGitGadget
2026-01-21 20:51 ` Junio C Hamano
2026-01-24 10:54   ` Phillip Wood
2026-01-25 17:34     ` Junio C Hamano
2026-01-26  9:37       ` Phillip Wood
2026-01-26 17:32         ` Junio C Hamano
2026-01-29 16:53           ` Yee Cheng Chin
2026-01-29 20:58             ` Junio C Hamano
2026-01-30  1:58               ` Yee Cheng Chin
2026-01-30  5:43                 ` Junio C Hamano
2026-01-30 16:06                 ` Phillip Wood
2026-02-20 23:07                 ` Junio C Hamano
2026-02-21  9:56                   ` Yee Cheng Chin
2026-03-02 14:54 ` [PATCH v2] " Yee Cheng Chin via GitGitGadget
2026-03-13  7:07   ` Junio C Hamano [this message]
2026-03-13 10:23     ` Phillip Wood
2026-03-19 23:30       ` Yee Cheng Chin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqikb08ax3.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=phillip.wood123@gmail.com \
    --cc=ychin.git@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.