git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Phillip Wood <phillip.wood123@gmail.com>
To: "René Scharfe" <l.s.r@web.de>,
	"Alex via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org
Cc: Alex <alexguo1023@gmail.com>, jinyaoguo <guo846@purdue.edu>
Subject: Re: [PATCH] Fix buffer underflow in xdl_build_script
Date: Sat, 24 May 2025 14:53:23 +0100	[thread overview]
Message-ID: <953dc362-f546-4f49-b580-b48a4a1370e8@gmail.com> (raw)
In-Reply-To: <2b9c5e91-67ab-4e46-93c4-15c8b79841be@gmail.com>

On 24/05/2025 14:38, Phillip Wood wrote:
> On 24/05/2025 10:08, René Scharfe wrote:
>> Am 24.05.25 um 07:57 schrieb René Scharfe:
>>> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
>>>> From: jinyaoguo <guo846@purdue.edu>
>>>>
>>>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
>>>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
>>>> `rchg2[i2-1]`), which underflows the buffer.
>>>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around
>>>> those array accesses to prevent invalid negative indexing.
>>>
>>> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
>>> ends for rchg arrays, so an index of -1 should be within the bounds.
> 
> and rchg[-1] == 0 so i1 and i2 can never drop below -1
> 
>>> i1 and i2 are decreased in lockstep, though, so one of them can become
>>> smaller than -1 if nrec is different between the files.  And that's how
>>> this code run can indeed run off into the weeds.
>>
>> Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice
>> versa.  Or can it?  It makes sense that we reach the start of both
>> buffers at the same time if we walk backwards from the end, don't
>> misstep and have consistent rchg array contents, but I'm not sure.
> The code looks like
> 
>      for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; 
> i1--, i2--)
>          if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
>              for (l1 = i1; rchg1[i1 - 1]; i1--);
>              for (l2 = i2; rchg2[i2 - 1]; i2--);
> 
> I think I've convinced myself that it is safe assuming there are an 
> equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] == 
> rchg2[-1] == 0. Each iteration consumes any changed lines in the 
> preimage and the postimage plus a single context line from each (apart 
> from the final iteration when the context lines may have been 
> exhausted). At the start of the last iteration there are three 
> possibilities
> 
>   - i1 == -1 && i2 >= 0 => there are insertions as the start of the file.

Sorry that should be i1 == 0 && i2 >= 0

>     As the context lines in the preimage have been exhausted all the
>     remaining rchg2 elements represent added lines and are consumed by
>     for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body
>     i2 == 0 and the outer loop will exit.
> 
>   - i1 >= 0 && i2 == -1 => there are deletions at the start of the file.

This one should be i1 >= 0 && i2 == 0

>     As the context lines in the postimage have been exhausted all the
>     remaining rchg1 elements represent deleted lines and are consumed by
>     for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body
>     i1 == 0 and the outer loop will exit.
> 
>   - i1 >= 0 && i2 >= 0 => the first line is unchanged or there are
>     insertions and deletions at the beginning of the file. At the end of
>     the loop body i1 == 0 && i2 == 0 and the outer loop will exit.
> 
> We could add
> 
>      if (i1 < -1 || i2 < -1)
>          BUG("mismatched context line count");
> 
> before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're 
> worried about bugs that break the assumption that there are equal 
> numbers of context lines on each side. Any such bug would generate 
> invalid diffs. I don't know how likely that is to happen in practice.
> 
> Best Wishes
> 
> Phillip
> 
>>
>> Are you able to demonstrate any out-of-bounds access with e.g.,
>> Valgrind, AddressSanitizer or an assertion?
>>
>>> Curiously, AddressSanitizer doesn't report anything, but if I add the
>>> following line after the outer for, I can trigger it to report a
>>> heap-buffer-overflow with e.g., git show 8613c2bb6c:
>>>
>>>     if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);
>>
>> That's because I forgot to add braces.  D'oh!  I can't trigger any
>> out-of-bounds access or that Oops with them properly in place.  So I
>> let myself get fooled by a daring coding style. :-|
>>
>>>
>>>>
>>>> Signed-off-by: Alex Guo <alexguo1023@gmail.com>
>>>> ---
>>>>      Fix buffer underflow in xdl_build_script
>>>>
>>>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr- 
>>>> git-1976%2Fmugitya03%2Fbuf-1-v1
>>>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr- 
>>>> git-1976/mugitya03/buf-1-v1
>>>> Pull-Request: https://github.com/git/git/pull/1976
>>>>
>>>>   xdiff/xdiffi.c | 7 ++++---
>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>>>> index 5a96e36dfbe..2e983965328 100644
>>>> --- a/xdiff/xdiffi.c
>>>> +++ b/xdiff/xdiffi.c
>>>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t 
>>>> **xscr) {
>>>>        * Trivial. Collects "groups" of changes and creates an edit 
>>>> script.
>>
>> Trivial for Davide perhaps (libxdiff author), but not my mushy brain..
>>
>>>>        */
>>>>       for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 
>>>> 0; i1--, i2--)
>>>
>>> Should the || be a && instead?  From a birds-eye view I would assume we
>>> can stop scanning for changes when we exhaust (reach the top) of either
>>> side.  We just have to make sure everything from the other side is
>>> accounted for in the last added change.
>>>
>>>> -        if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
>>>> -            for (l1 = i1; rchg1[i1 - 1]; i1--);
>>>> -            for (l2 = i2; rchg2[i2 - 1]; i2--);
>>>> +        if ((i1 > 0 && rchg1[i1 - 1]) ||
>>>> +            (i2 > 0 && rchg2[i2 - 1])) {
>>>> +            for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
>>>> +            for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);
>>>
>>> Nit: The indentation of that line is off.
>>>
>>>>               if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - 
>>>> i2))) {
>>>>                   xdl_free_script(cscr);
>>>>
>>>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0
> 


      reply	other threads:[~2025-05-24 13:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-23 20:51 [PATCH] Fix buffer underflow in xdl_build_script Alex via GitGitGadget
2025-05-24  5:57 ` René Scharfe
2025-05-24  9:08   ` René Scharfe
2025-05-24 13:38     ` Phillip Wood
2025-05-24 13:53       ` Phillip Wood [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=953dc362-f546-4f49-b580-b48a4a1370e8@gmail.com \
    --to=phillip.wood123@gmail.com \
    --cc=alexguo1023@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=guo846@purdue.edu \
    --cc=l.s.r@web.de \
    --cc=phillip.wood@dunelm.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).