From: Junio C Hamano <gitster@pobox.com>
To: "Ezekiel Newren via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Yee Cheng Chin" <ychin.git@gmail.com>,
"Phillip Wood" <phillip.wood123@gmail.com>,
"René Scharfe" <l.s.r@web.de>, "Jeff King" <peff@peff.net>,
"D. Ben Knoble" <ben.knoble@gmail.com>,
"Ezekiel Newren" <ezekielnewren@gmail.com>
Subject: Re: [PATCH v3 4/6] xdiff/xdl_cleanup_records: make limits more clear
Date: Fri, 27 Mar 2026 16:01:02 -0700 [thread overview]
Message-ID: <xmqqcy0oj2s1.fsf@gitster.g> (raw)
In-Reply-To: <xmqqy0jdhtd0.fsf@gitster.g> (Junio C. Hamano's message of "Fri, 27 Mar 2026 14:09:47 -0700")
Junio C Hamano <gitster@pobox.com> writes:
> "Ezekiel Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>> From: Ezekiel Newren <ezekielnewren@gmail.com>
>>
>> Make the handling of per-file limits and the minimal-case clearer.
>> * Use explicit per-file limit variables (mlim1, mlim2) and initialize
>> them.
>> * The additional condition `!need_min` is redudant now, remove it.
>> Best viewed with --color-words.
>>
>> Signed-off-by: Ezekiel Newren <ezekielnewren@gmail.com>
>> ---
>> xdiff/xprepare.c | 19 ++++++++++++-------
>> 1 file changed, 12 insertions(+), 7 deletions(-)
>
> t4071 and t8015 do not like this step, even though they are happy
> with 1-3/6 applied.
>
>
>> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
>> index 386668a92d..2cf1f8d1a8 100644
>> --- a/xdiff/xprepare.c
>> +++ b/xdiff/xprepare.c
>> @@ -268,7 +268,7 @@ static bool xdl_clean_mmatch(uint8_t const *action, ptrdiff_t i, ptrdiff_t s, pt
>> * might be potentially discarded if they appear in a run of discardable.
>> */
>> static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) {
>> - ptrdiff_t i, nm, mlim;
>> + ptrdiff_t i, nm, mlim1, mlim2;
Ah, the problem may manifest itself in this step in the series, but
the root cause might be before this step. ptrdiff_t is signed and
that is the type used for mlim/mlim1/mlim2 here, and before this
series these counters count in "long" that is signed.
>> + if (need_min) {
>> + /* i.e. infinity */
>> + mlim1 = SIZE_MAX;
>> + mlim2 = SIZE_MAX;
But SIZE_MAX is the maximum that a size_t (unsigned) can take. No
wonder assigning it to ptrdiff_t and assuming that any other
sensible ptrdiff_t value can ever reach it. Instead, this
essentially assigns -1 to mlim1 and mlim2 when need_min is true.
>> + } else {
>> + mlim1 = XDL_MIN(xdl_bogosqrt(xdf1->nrec), XDL_MAX_EQLIMIT);
>> + mlim2 = XDL_MIN(xdl_bogosqrt(xdf2->nrec), XDL_MAX_EQLIMIT);
This side I do not think has much to do with the breakage, but the
way XDL_MIN() is implemented, it must be noted that xdl_bogosqrt()
is called twice on the same value with this rewrite ...
>> + }
>> +
>> /*
>> * Initialize temporary arrays with DISCARD, KEEP, or INVESTIGATE.
>> */
>> - if ((mlim = (long)xdl_bogosqrt((uint64_t)xdf1->nrec)) > XDL_MAX_EQLIMIT)
>> - mlim = XDL_MAX_EQLIMIT;
... as opposed to computing the value only once, in the original.
>> for (i = xdf1->dstart; i <= xdf1->dend; i++) {
>> size_t mph1 = xdf1->recs[i].minimal_perfect_hash;
>> rcrec = cf->rcrecs[mph1];
>> nm = rcrec ? rcrec->len2 : 0;
>> - action1[i] = (nm == 0) ? DISCARD: (nm >= mlim && !need_min) ? INVESTIGATE: KEEP;
So the original said, "if nm is not zero and need_min is true, do
not bother comparing nm with anything, and always use KEEP. If
need_min is false, we use INVESTIGAGE only when nm is large enough,
otherwise KEEP.
>> + action1[i] = (nm == 0) ? DISCARD: nm >= mlim1 ? INVESTIGATE: KEEP;
Updated code, when nm is not zero, does something different. if
need_min is true, mlim1 is set to -1 and presumably nm is a count or
length that is bounded on its lower end with 0, so it is larger than
mlim1 (== -1), and we always take INVESTIGATE and never kEEP.
So the rewritten code is broken when need_min is true?
I suspect the remainder of the patch is broken exactly the same way,
so the remedy would be similar?
>> }
>>
>> - if ((mlim = (long)xdl_bogosqrt((uint64_t)xdf2->nrec)) > XDL_MAX_EQLIMIT)
>> - mlim = XDL_MAX_EQLIMIT;
>> for (i = xdf2->dstart; i <= xdf2->dend; i++) {
>> size_t mph2 = xdf2->recs[i].minimal_perfect_hash;
>> rcrec = cf->rcrecs[mph2];
>> nm = rcrec ? rcrec->len1 : 0;
>> - action2[i] = (nm == 0) ? DISCARD: (nm >= mlim && !need_min) ? INVESTIGATE: KEEP;
>> + action2[i] = (nm == 0) ? DISCARD: nm >= mlim2 ? INVESTIGATE: KEEP;
>> }
>>
>> /*
next prev parent reply other threads:[~2026-03-27 23:01 UTC|newest]
Thread overview: 78+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-02 18:52 [PATCH 00/10] Xdiff cleanup part 3 Ezekiel Newren via GitGitGadget
2026-01-02 18:52 ` [PATCH 01/10] ivec: introduce the C side of ivec Ezekiel Newren via GitGitGadget
2026-01-04 5:32 ` Junio C Hamano
2026-01-17 16:06 ` Ezekiel Newren
2026-01-08 14:34 ` Phillip Wood
2026-01-15 15:55 ` Ezekiel Newren
2026-01-16 10:39 ` Phillip Wood
2026-01-16 20:19 ` René Scharfe
2026-01-17 13:55 ` Phillip Wood
2026-01-17 16:04 ` Ezekiel Newren
2026-01-18 14:58 ` René Scharfe
2026-01-17 16:14 ` Ezekiel Newren
2026-01-17 16:16 ` Ezekiel Newren
2026-01-17 17:40 ` Phillip Wood
2026-01-19 5:59 ` Jeff King
2026-01-19 20:21 ` Ezekiel Newren
2026-01-19 20:40 ` Jeff King
2026-01-20 2:36 ` D. Ben Knoble
2026-01-21 21:00 ` Ezekiel Newren
2026-01-21 21:20 ` Jeff King
2026-01-21 21:31 ` Junio C Hamano
2026-01-21 21:45 ` Ezekiel Newren
2026-01-20 13:46 ` Phillip Wood
2026-01-20 14:06 ` Phillip Wood
2026-01-21 21:39 ` Ezekiel Newren
2026-01-28 11:15 ` Phillip Wood
2026-01-16 20:19 ` René Scharfe
2026-01-17 15:58 ` Ezekiel Newren
2026-01-18 14:55 ` René Scharfe
2026-01-02 18:52 ` [PATCH 02/10] xdiff: make classic diff explicit by creating xdl_do_classic_diff() Ezekiel Newren via GitGitGadget
2026-01-20 15:01 ` Phillip Wood
2026-01-21 21:05 ` Ezekiel Newren
2026-01-02 18:52 ` [PATCH 03/10] xdiff: don't waste time guessing the number of lines Ezekiel Newren via GitGitGadget
2026-01-20 15:02 ` Phillip Wood
2026-01-21 21:12 ` Ezekiel Newren
2026-01-22 10:16 ` Phillip Wood
2026-01-02 18:52 ` [PATCH 04/10] xdiff: let patience and histogram benefit from xdl_trim_ends() Ezekiel Newren via GitGitGadget
2026-01-20 15:02 ` Phillip Wood
2026-01-21 14:49 ` Phillip Wood
2026-01-02 18:52 ` [PATCH 05/10] xdiff: use xdfenv_t in xdl_trim_ends() and xdl_cleanup_records() Ezekiel Newren via GitGitGadget
2026-01-20 16:32 ` Phillip Wood
2026-01-02 18:52 ` [PATCH 06/10] xdiff: cleanup xdl_trim_ends() Ezekiel Newren via GitGitGadget
2026-01-20 16:32 ` Phillip Wood
2026-01-02 18:52 ` [PATCH 07/10] xdiff: replace xdfile_t.dstart with xdfenv_t.delta_start Ezekiel Newren via GitGitGadget
2026-01-20 16:32 ` Phillip Wood
2026-01-28 10:51 ` Phillip Wood
2026-01-02 18:52 ` [PATCH 08/10] xdiff: replace xdfile_t.dend with xdfenv_t.delta_end Ezekiel Newren via GitGitGadget
2026-01-02 18:52 ` [PATCH 09/10] xdiff: remove dependence on xdlclassifier from xdl_cleanup_records() Ezekiel Newren via GitGitGadget
2026-01-16 20:19 ` René Scharfe
2026-01-17 16:34 ` Ezekiel Newren
2026-01-18 18:23 ` René Scharfe
2026-01-21 15:01 ` Phillip Wood
2026-01-02 18:52 ` [PATCH 10/10] xdiff: move xdl_cleanup_records() from xprepare.c to xdiffi.c Ezekiel Newren via GitGitGadget
2026-01-21 15:01 ` Phillip Wood
2026-01-28 10:56 ` Phillip Wood
2026-01-04 2:44 ` [PATCH 00/10] Xdiff cleanup part 3 Junio C Hamano
2026-01-04 6:01 ` Yee Cheng Chin
2026-01-28 14:40 ` Phillip Wood
2026-03-06 23:03 ` Junio C Hamano
2026-03-09 19:06 ` Ezekiel Newren
2026-03-09 23:31 ` Junio C Hamano
2026-03-25 21:11 ` [PATCH v2 0/5] " Ezekiel Newren via GitGitGadget
2026-03-25 21:11 ` [PATCH v2 1/5] xdiff/xdl_cleanup_records: delete local recs pointer Ezekiel Newren via GitGitGadget
2026-03-25 21:11 ` [PATCH v2 2/5] xdiff/xdl_cleanup_records: make limits more clear Ezekiel Newren via GitGitGadget
2026-03-25 21:11 ` [PATCH v2 3/5] xdiff/xdl_cleanup_records: make setting action easier to follow Ezekiel Newren via GitGitGadget
2026-03-25 21:11 ` [PATCH v2 4/5] xdiff/xdl_cleanup_records: simplify INVESTIGATE handling for clarity Ezekiel Newren via GitGitGadget
2026-03-25 21:11 ` [PATCH v2 5/5] xdiff/xdl_cleanup_records: use unambiguous types Ezekiel Newren via GitGitGadget
2026-03-25 21:58 ` Junio C Hamano
2026-03-26 6:26 ` [PATCH v2 0/5] Xdiff cleanup part 3 SZEDER Gábor
2026-03-27 19:23 ` [PATCH v3 0/6] " Ezekiel Newren via GitGitGadget
2026-03-27 19:23 ` [PATCH v3 1/6] xdiff/xdl_cleanup_records: delete local recs pointer Ezekiel Newren via GitGitGadget
2026-03-27 19:23 ` [PATCH v3 2/6] xdiff: use unambiguous types in xdl_bogo_sqrt() Ezekiel Newren via GitGitGadget
2026-03-27 19:23 ` [PATCH v3 3/6] xdiff/xdl_cleanup_records: use unambiguous types Ezekiel Newren via GitGitGadget
2026-03-27 19:23 ` [PATCH v3 4/6] xdiff/xdl_cleanup_records: make limits more clear Ezekiel Newren via GitGitGadget
2026-03-27 21:09 ` Junio C Hamano
2026-03-27 23:01 ` Junio C Hamano [this message]
2026-03-27 19:23 ` [PATCH v3 5/6] xdiff/xdl_cleanup_records: make setting action easier to follow Ezekiel Newren via GitGitGadget
2026-03-27 19:23 ` [PATCH v3 6/6] xdiff/xdl_cleanup_records: simplify INVESTIGATE handling for clarity Ezekiel Newren via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqqcy0oj2s1.fsf@gitster.g \
--to=gitster@pobox.com \
--cc=ben.knoble@gmail.com \
--cc=ezekielnewren@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=l.s.r@web.de \
--cc=peff@peff.net \
--cc=phillip.wood123@gmail.com \
--cc=ychin.git@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox