From: Phillip Wood <phillip.wood123@gmail.com>
To: "René Scharfe" <l.s.r@web.de>,
"Alex via GitGitGadget" <gitgitgadget@gmail.com>,
git@vger.kernel.org
Cc: Alex <alexguo1023@gmail.com>, jinyaoguo <guo846@purdue.edu>
Subject: Re: [PATCH] Fix buffer underflow in xdl_build_script
Date: Sat, 24 May 2025 14:53:23 +0100 [thread overview]
Message-ID: <953dc362-f546-4f49-b580-b48a4a1370e8@gmail.com> (raw)
In-Reply-To: <2b9c5e91-67ab-4e46-93c4-15c8b79841be@gmail.com>
On 24/05/2025 14:38, Phillip Wood wrote:
> On 24/05/2025 10:08, René Scharfe wrote:
>> Am 24.05.25 um 07:57 schrieb René Scharfe:
>>> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
>>>> From: jinyaoguo <guo846@purdue.edu>
>>>>
>>>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
>>>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
>>>> `rchg2[i2-1]`), which underflows the buffer.
>>>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around
>>>> those array accesses to prevent invalid negative indexing.
>>>
>>> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
>>> ends for rchg arrays, so an index of -1 should be within the bounds.
>
> and rchg[-1] == 0 so i1 and i2 can never drop below -1
>
>>> i1 and i2 are decreased in lockstep, though, so one of them can become
>>> smaller than -1 if nrec is different between the files. And that's how
>>> this code run can indeed run off into the weeds.
>>
>> Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice
>> versa. Or can it? It makes sense that we reach the start of both
>> buffers at the same time if we walk backwards from the end, don't
>> misstep and have consistent rchg array contents, but I'm not sure.
> The code looks like
>
> for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0;
> i1--, i2--)
> if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
> for (l1 = i1; rchg1[i1 - 1]; i1--);
> for (l2 = i2; rchg2[i2 - 1]; i2--);
>
> I think I've convinced myself that it is safe assuming there are an
> equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] ==
> rchg2[-1] == 0. Each iteration consumes any changed lines in the
> preimage and the postimage plus a single context line from each (apart
> from the final iteration when the context lines may have been
> exhausted). At the start of the last iteration there are three
> possibilities
>
> - i1 == -1 && i2 >= 0 => there are insertions as the start of the file.
Sorry that should be i1 == 0 && i2 >= 0
> As the context lines in the preimage have been exhausted all the
> remaining rchg2 elements represent added lines and are consumed by
> for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body
> i2 == 0 and the outer loop will exit.
>
> - i1 >= 0 && i2 == -1 => there are deletions at the start of the file.
This one should be i1 >= 0 && i2 == 0
> As the context lines in the postimage have been exhausted all the
> remaining rchg1 elements represent deleted lines and are consumed by
> for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body
> i1 == 0 and the outer loop will exit.
>
> - i1 >= 0 && i2 >= 0 => the first line is unchanged or there are
> insertions and deletions at the beginning of the file. At the end of
> the loop body i1 == 0 && i2 == 0 and the outer loop will exit.
>
> We could add
>
> if (i1 < -1 || i2 < -1)
> BUG("mismatched context line count");
>
> before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're
> worried about bugs that break the assumption that there are equal
> numbers of context lines on each side. Any such bug would generate
> invalid diffs. I don't know how likely that is to happen in practice.
>
> Best Wishes
>
> Phillip
>
>>
>> Are you able to demonstrate any out-of-bounds access with e.g.,
>> Valgrind, AddressSanitizer or an assertion?
>>
>>> Curiously, AddressSanitizer doesn't report anything, but if I add the
>>> following line after the outer for, I can trigger it to report a
>>> heap-buffer-overflow with e.g., git show 8613c2bb6c:
>>>
>>> if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);
>>
>> That's because I forgot to add braces. D'oh! I can't trigger any
>> out-of-bounds access or that Oops with them properly in place. So I
>> let myself get fooled by a daring coding style. :-|
>>
>>>
>>>>
>>>> Signed-off-by: Alex Guo <alexguo1023@gmail.com>
>>>> ---
>>>> Fix buffer underflow in xdl_build_script
>>>>
>>>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-
>>>> git-1976%2Fmugitya03%2Fbuf-1-v1
>>>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-
>>>> git-1976/mugitya03/buf-1-v1
>>>> Pull-Request: https://github.com/git/git/pull/1976
>>>>
>>>> xdiff/xdiffi.c | 7 ++++---
>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>>>> index 5a96e36dfbe..2e983965328 100644
>>>> --- a/xdiff/xdiffi.c
>>>> +++ b/xdiff/xdiffi.c
>>>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t
>>>> **xscr) {
>>>> * Trivial. Collects "groups" of changes and creates an edit
>>>> script.
>>
>> Trivial for Davide perhaps (libxdiff author), but not my mushy brain..
>>
>>>> */
>>>> for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >=
>>>> 0; i1--, i2--)
>>>
>>> Should the || be a && instead? From a birds-eye view I would assume we
>>> can stop scanning for changes when we exhaust (reach the top) of either
>>> side. We just have to make sure everything from the other side is
>>> accounted for in the last added change.
>>>
>>>> - if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
>>>> - for (l1 = i1; rchg1[i1 - 1]; i1--);
>>>> - for (l2 = i2; rchg2[i2 - 1]; i2--);
>>>> + if ((i1 > 0 && rchg1[i1 - 1]) ||
>>>> + (i2 > 0 && rchg2[i2 - 1])) {
>>>> + for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
>>>> + for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);
>>>
>>> Nit: The indentation of that line is off.
>>>
>>>> if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 -
>>>> i2))) {
>>>> xdl_free_script(cscr);
>>>>
>>>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0
>
prev parent reply other threads:[~2025-05-24 13:53 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-23 20:51 [PATCH] Fix buffer underflow in xdl_build_script Alex via GitGitGadget
2025-05-24 5:57 ` René Scharfe
2025-05-24 9:08 ` René Scharfe
2025-05-24 13:38 ` Phillip Wood
2025-05-24 13:53 ` Phillip Wood [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=953dc362-f546-4f49-b580-b48a4a1370e8@gmail.com \
--to=phillip.wood123@gmail.com \
--cc=alexguo1023@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=guo846@purdue.edu \
--cc=l.s.r@web.de \
--cc=phillip.wood@dunelm.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).