From: Phillip Wood <phillip.wood123@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Yee Cheng Chin via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, Yee Cheng Chin <ychin.git@gmail.com>
Subject: Re: [PATCH] xdiff: re-diff shifted change groups when using histogram algorithm
Date: Mon, 26 Jan 2026 09:37:03 +0000 [thread overview]
Message-ID: <3aeb49dd-8618-42e0-b9f9-6a4fb8065793@gmail.com> (raw)
In-Reply-To: <xmqqy0llk33y.fsf@gitster.g>
On 25/01/2026 17:34, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
>
>> On 21/01/2026 20:51, Junio C Hamano wrote:
>>> "Yee Cheng Chin via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>> @@ -915,6 +919,45 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>>>> }
>>>> }
>>>>
>>>> + /*
>>>> + * If this has a matching group from the other file, it could
>>>> + * either be the original match from the diff algorithm, or
>>>> + * arrived at by shifting and joining groups. When it's the
>>>> + * latter, it's possible for the two newly joined sides to have
>>>> + * matching lines. Re-diff the group to mark these matching
>>>> + * lines as unchanged and remove from the diff output.
>
> Also, after reading the first paragraph of the big comment again, it
> makes me wonder if it is saying the same thing as "When histogram is
> being used, we shouldn't bother shifting up and down to join groups,
> as the result will always worse than the fallback", but is it that
> bad?
Looking at the example in the commit message the result of shifting up
and down and then calling the fallback is better than either the
unshifted diff or shifting without the fallback, so I don't think just
disabling shifting improves things. It would also stop us coalescing
changed lines, for example
-A A
A -> -A
-B -B
The indent heuristic seems to assume that we've shifted down as far as
possible before trying it so that would probably get messed up as well.
To me the problem is that the histogram diff does not always generate
particularly good diffs (maybe I'm biased - whenever I've tried
switching the default to "histogram" I've always switched back
"patience" fairly quickly after being presented with a diff that I found
hard to comprehend)
Thanks
Phillip
>>>> + if (end_matching_other != -1 &&
>>>> + XDF_DIFF_ALG(flags) == XDF_HISTOGRAM_DIFF &&
>>>> + (g.start != g_orig.start ||
>>>> + g.end != g_orig.end ||
>>>> + go.start != go_orig.start ||
>>>> + go.end != go_orig.end)) {
>>>
>>> So the idea is to remember the original values in g and go (the
>>> location of the group in the file and the other file) and if
>>> shifting up and down changed any one of the four ends from the
>>> original locations, we always take the fall-back route (if we are
>>> doing histogram)?
>>
>> I'm a bit confused why we need to check both groups. I think they're
>> supposed to move together (if we move "g" by n context lines we also
>> move "go" by n context lines) so I can't see how we can have
>>
>> g.start == g_orig.start && g.end == g_orig.end
>>
>> when
>>
>> go.start != go.orig.start || go.end != go_orig.end
>
> Interesting.
>
>>> By the way, this appears after the if/else if/ cascade that has:
>>>
>>> if (g.end == earliest_end) {
>>> ... do nothing case (case #1)
>>> } else if (end_matching_other != -1) {
>>> ... do the slide-up thing (case #2)
>>> } else if (flags & XDF_INDENT_HEIRISTIC) {
>>> ... do the indent heuristic thing (case #3)
>>> }
>>>
>>> Am I reading the code correctly that, even though this new block
>>> appears as if it is a post-clean-up phase that is independent from
>>> which one of the three choices are taken in the previous if/elseif
>>> cascade, it only is relevant to the second case? I am wondering if
>>> it would make it easier to follow if the new code were made into a
>>> small helper function that is called from the (case #2) arm of the
>>> existing if/else if cascade.
>>
>> That's a good point
>>
>>>> + xpparam_t xpp;
>>>> + xdfenv_t xe;
>>>> +
>>>> + memset(&xpp, 0, sizeof(xpp));
>>>> + xpp.flags = flags & ~XDF_DIFF_ALGORITHM_MASK;
>>>> +
>>>> + memcpy(&xe.xdf1, xdf, sizeof(xdfile_t));
>>>> + memcpy(&xe.xdf2, xdfo, sizeof(xdfile_t));
>>
>> These would be safer as "xe.xdf1 = *xdf" so we don't have to worry about
>> getting the size correct (sizeof(*xdf) would also be safer but there is
>> no need for memcpy() here).
>
> Very good readability enhancement suggestion.
>
> Thanks.
next prev parent reply other threads:[~2026-01-26 9:37 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 20:51 [PATCH] xdiff: re-diff shifted change groups when using histogram algorithm Yee Cheng Chin via GitGitGadget
2026-01-21 20:51 ` Junio C Hamano
2026-01-24 10:54 ` Phillip Wood
2026-01-25 17:34 ` Junio C Hamano
2026-01-26 9:37 ` Phillip Wood [this message]
2026-01-26 17:32 ` Junio C Hamano
2026-01-29 16:53 ` Yee Cheng Chin
2026-01-29 20:58 ` Junio C Hamano
2026-01-30 1:58 ` Yee Cheng Chin
2026-01-30 5:43 ` Junio C Hamano
2026-01-30 16:06 ` Phillip Wood
2026-02-20 23:07 ` Junio C Hamano
2026-02-21 9:56 ` Yee Cheng Chin
2026-03-02 14:54 ` [PATCH v2] " Yee Cheng Chin via GitGitGadget
2026-03-13 7:07 ` Junio C Hamano
2026-03-13 10:23 ` Phillip Wood
2026-03-19 23:30 ` Yee Cheng Chin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3aeb49dd-8618-42e0-b9f9-6a4fb8065793@gmail.com \
--to=phillip.wood123@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=phillip.wood@dunelm.org.uk \
--cc=ychin.git@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox