* [PATCH] Fix buffer underflow in xdl_build_script @ 2025-05-23 20:51 Alex via GitGitGadget 2025-05-24 5:57 ` René Scharfe 0 siblings, 1 reply; 5+ messages in thread From: Alex via GitGitGadget @ 2025-05-23 20:51 UTC (permalink / raw) To: git; +Cc: Alex, jinyaoguo From: jinyaoguo <guo846@purdue.edu> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or `rchg2[i2-1]`), which underflows the buffer. This commit adds explicit `i1 > 0` and `i2 > 0` checks around those array accesses to prevent invalid negative indexing. Signed-off-by: Alex Guo <alexguo1023@gmail.com> --- Fix buffer underflow in xdl_build_script Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1 Pull-Request: https://github.com/git/git/pull/1976 xdiff/xdiffi.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 5a96e36dfbe..2e983965328 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) { * Trivial. Collects "groups" of changes and creates an edit script. */ for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--) - if (rchg1[i1 - 1] || rchg2[i2 - 1]) { - for (l1 = i1; rchg1[i1 - 1]; i1--); - for (l2 = i2; rchg2[i2 - 1]; i2--); + if ((i1 > 0 && rchg1[i1 - 1]) || + (i2 > 0 && rchg2[i2 - 1])) { + for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--); + for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--); if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) { xdl_free_script(cscr); base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0 -- gitgitgadget ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix buffer underflow in xdl_build_script 2025-05-23 20:51 [PATCH] Fix buffer underflow in xdl_build_script Alex via GitGitGadget @ 2025-05-24 5:57 ` René Scharfe 2025-05-24 9:08 ` René Scharfe 0 siblings, 1 reply; 5+ messages in thread From: René Scharfe @ 2025-05-24 5:57 UTC (permalink / raw) To: Alex via GitGitGadget, git; +Cc: Alex, jinyaoguo Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget: > From: jinyaoguo <guo846@purdue.edu> > > The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing > `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or > `rchg2[i2-1]`), which underflows the buffer. > This commit adds explicit `i1 > 0` and `i2 > 0` checks around > those array accesses to prevent invalid negative indexing. xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both ends for rchg arrays, so an index of -1 should be within the bounds. i1 and i2 are decreased in lockstep, though, so one of them can become smaller than -1 if nrec is different between the files. And that's how this code run can indeed run off into the weeds. Curiously, AddressSanitizer doesn't report anything, but if I add the following line after the outer for, I can trigger it to report a heap-buffer-overflow with e.g., git show 8613c2bb6c: if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2); > > Signed-off-by: Alex Guo <alexguo1023@gmail.com> > --- > Fix buffer underflow in xdl_build_script > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1 > Pull-Request: https://github.com/git/git/pull/1976 > > xdiff/xdiffi.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index 5a96e36dfbe..2e983965328 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) { > * Trivial. Collects "groups" of changes and creates an edit script. > */ > for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--) Should the || be a && instead? From a birds-eye view I would assume we can stop scanning for changes when we exhaust (reach the top) of either side. We just have to make sure everything from the other side is accounted for in the last added change. > - if (rchg1[i1 - 1] || rchg2[i2 - 1]) { > - for (l1 = i1; rchg1[i1 - 1]; i1--); > - for (l2 = i2; rchg2[i2 - 1]; i2--); > + if ((i1 > 0 && rchg1[i1 - 1]) || > + (i2 > 0 && rchg2[i2 - 1])) { > + for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--); > + for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--); Nit: The indentation of that line is off. > > if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) { > xdl_free_script(cscr); > > base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix buffer underflow in xdl_build_script 2025-05-24 5:57 ` René Scharfe @ 2025-05-24 9:08 ` René Scharfe 2025-05-24 13:38 ` Phillip Wood 0 siblings, 1 reply; 5+ messages in thread From: René Scharfe @ 2025-05-24 9:08 UTC (permalink / raw) To: Alex via GitGitGadget, git; +Cc: Alex, jinyaoguo Am 24.05.25 um 07:57 schrieb René Scharfe: > Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget: >> From: jinyaoguo <guo846@purdue.edu> >> >> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing >> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or >> `rchg2[i2-1]`), which underflows the buffer. >> This commit adds explicit `i1 > 0` and `i2 > 0` checks around >> those array accesses to prevent invalid negative indexing. > > xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both > ends for rchg arrays, so an index of -1 should be within the bounds. > > i1 and i2 are decreased in lockstep, though, so one of them can become > smaller than -1 if nrec is different between the files. And that's how > this code run can indeed run off into the weeds. Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice versa. Or can it? It makes sense that we reach the start of both buffers at the same time if we walk backwards from the end, don't misstep and have consistent rchg array contents, but I'm not sure. Are you able to demonstrate any out-of-bounds access with e.g., Valgrind, AddressSanitizer or an assertion? > Curiously, AddressSanitizer doesn't report anything, but if I add the > following line after the outer for, I can trigger it to report a > heap-buffer-overflow with e.g., git show 8613c2bb6c: > > if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2); That's because I forgot to add braces. D'oh! I can't trigger any out-of-bounds access or that Oops with them properly in place. So I let myself get fooled by a daring coding style. :-| > >> >> Signed-off-by: Alex Guo <alexguo1023@gmail.com> >> --- >> Fix buffer underflow in xdl_build_script >> >> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1 >> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1 >> Pull-Request: https://github.com/git/git/pull/1976 >> >> xdiff/xdiffi.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c >> index 5a96e36dfbe..2e983965328 100644 >> --- a/xdiff/xdiffi.c >> +++ b/xdiff/xdiffi.c >> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) { >> * Trivial. Collects "groups" of changes and creates an edit script. Trivial for Davide perhaps (libxdiff author), but not my mushy brain.. >> */ >> for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--) > > Should the || be a && instead? From a birds-eye view I would assume we > can stop scanning for changes when we exhaust (reach the top) of either > side. We just have to make sure everything from the other side is > accounted for in the last added change. > >> - if (rchg1[i1 - 1] || rchg2[i2 - 1]) { >> - for (l1 = i1; rchg1[i1 - 1]; i1--); >> - for (l2 = i2; rchg2[i2 - 1]; i2--); >> + if ((i1 > 0 && rchg1[i1 - 1]) || >> + (i2 > 0 && rchg2[i2 - 1])) { >> + for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--); >> + for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--); > > Nit: The indentation of that line is off. > >> >> if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) { >> xdl_free_script(cscr); >> >> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0 > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix buffer underflow in xdl_build_script 2025-05-24 9:08 ` René Scharfe @ 2025-05-24 13:38 ` Phillip Wood 2025-05-24 13:53 ` Phillip Wood 0 siblings, 1 reply; 5+ messages in thread From: Phillip Wood @ 2025-05-24 13:38 UTC (permalink / raw) To: René Scharfe, Alex via GitGitGadget, git; +Cc: Alex, jinyaoguo On 24/05/2025 10:08, René Scharfe wrote: > Am 24.05.25 um 07:57 schrieb René Scharfe: >> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget: >>> From: jinyaoguo <guo846@purdue.edu> >>> >>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing >>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or >>> `rchg2[i2-1]`), which underflows the buffer. >>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around >>> those array accesses to prevent invalid negative indexing. >> >> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both >> ends for rchg arrays, so an index of -1 should be within the bounds. and rchg[-1] == 0 so i1 and i2 can never drop below -1 >> i1 and i2 are decreased in lockstep, though, so one of them can become >> smaller than -1 if nrec is different between the files. And that's how >> this code run can indeed run off into the weeds. > > Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice > versa. Or can it? It makes sense that we reach the start of both > buffers at the same time if we walk backwards from the end, don't > misstep and have consistent rchg array contents, but I'm not sure. The code looks like for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--) if (rchg1[i1 - 1] || rchg2[i2 - 1]) { for (l1 = i1; rchg1[i1 - 1]; i1--); for (l2 = i2; rchg2[i2 - 1]; i2--); I think I've convinced myself that it is safe assuming there are an equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] == rchg2[-1] == 0. Each iteration consumes any changed lines in the preimage and the postimage plus a single context line from each (apart from the final iteration when the context lines may have been exhausted). At the start of the last iteration there are three possibilities - i1 == -1 && i2 >= 0 => there are insertions as the start of the file. As the context lines in the preimage have been exhausted all the remaining rchg2 elements represent added lines and are consumed by for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body i2 == 0 and the outer loop will exit. - i1 >= 0 && i2 == -1 => there are deletions at the start of the file. As the context lines in the postimage have been exhausted all the remaining rchg1 elements represent deleted lines and are consumed by for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body i1 == 0 and the outer loop will exit. - i1 >= 0 && i2 >= 0 => the first line is unchanged or there are insertions and deletions at the beginning of the file. At the end of the loop body i1 == 0 && i2 == 0 and the outer loop will exit. We could add if (i1 < -1 || i2 < -1) BUG("mismatched context line count"); before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're worried about bugs that break the assumption that there are equal numbers of context lines on each side. Any such bug would generate invalid diffs. I don't know how likely that is to happen in practice. Best Wishes Phillip > > Are you able to demonstrate any out-of-bounds access with e.g., > Valgrind, AddressSanitizer or an assertion? > >> Curiously, AddressSanitizer doesn't report anything, but if I add the >> following line after the outer for, I can trigger it to report a >> heap-buffer-overflow with e.g., git show 8613c2bb6c: >> >> if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2); > > That's because I forgot to add braces. D'oh! I can't trigger any > out-of-bounds access or that Oops with them properly in place. So I > let myself get fooled by a daring coding style. :-| > >> >>> >>> Signed-off-by: Alex Guo <alexguo1023@gmail.com> >>> --- >>> Fix buffer underflow in xdl_build_script >>> >>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1 >>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1 >>> Pull-Request: https://github.com/git/git/pull/1976 >>> >>> xdiff/xdiffi.c | 7 ++++--- >>> 1 file changed, 4 insertions(+), 3 deletions(-) >>> >>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c >>> index 5a96e36dfbe..2e983965328 100644 >>> --- a/xdiff/xdiffi.c >>> +++ b/xdiff/xdiffi.c >>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) { >>> * Trivial. Collects "groups" of changes and creates an edit script. > > Trivial for Davide perhaps (libxdiff author), but not my mushy brain.. > >>> */ >>> for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--) >> >> Should the || be a && instead? From a birds-eye view I would assume we >> can stop scanning for changes when we exhaust (reach the top) of either >> side. We just have to make sure everything from the other side is >> accounted for in the last added change. >> >>> - if (rchg1[i1 - 1] || rchg2[i2 - 1]) { >>> - for (l1 = i1; rchg1[i1 - 1]; i1--); >>> - for (l2 = i2; rchg2[i2 - 1]; i2--); >>> + if ((i1 > 0 && rchg1[i1 - 1]) || >>> + (i2 > 0 && rchg2[i2 - 1])) { >>> + for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--); >>> + for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--); >> >> Nit: The indentation of that line is off. >> >>> >>> if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) { >>> xdl_free_script(cscr); >>> >>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] Fix buffer underflow in xdl_build_script 2025-05-24 13:38 ` Phillip Wood @ 2025-05-24 13:53 ` Phillip Wood 0 siblings, 0 replies; 5+ messages in thread From: Phillip Wood @ 2025-05-24 13:53 UTC (permalink / raw) To: René Scharfe, Alex via GitGitGadget, git; +Cc: Alex, jinyaoguo On 24/05/2025 14:38, Phillip Wood wrote: > On 24/05/2025 10:08, René Scharfe wrote: >> Am 24.05.25 um 07:57 schrieb René Scharfe: >>> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget: >>>> From: jinyaoguo <guo846@purdue.edu> >>>> >>>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing >>>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or >>>> `rchg2[i2-1]`), which underflows the buffer. >>>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around >>>> those array accesses to prevent invalid negative indexing. >>> >>> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both >>> ends for rchg arrays, so an index of -1 should be within the bounds. > > and rchg[-1] == 0 so i1 and i2 can never drop below -1 > >>> i1 and i2 are decreased in lockstep, though, so one of them can become >>> smaller than -1 if nrec is different between the files. And that's how >>> this code run can indeed run off into the weeds. >> >> Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice >> versa. Or can it? It makes sense that we reach the start of both >> buffers at the same time if we walk backwards from the end, don't >> misstep and have consistent rchg array contents, but I'm not sure. > The code looks like > > for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; > i1--, i2--) > if (rchg1[i1 - 1] || rchg2[i2 - 1]) { > for (l1 = i1; rchg1[i1 - 1]; i1--); > for (l2 = i2; rchg2[i2 - 1]; i2--); > > I think I've convinced myself that it is safe assuming there are an > equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] == > rchg2[-1] == 0. Each iteration consumes any changed lines in the > preimage and the postimage plus a single context line from each (apart > from the final iteration when the context lines may have been > exhausted). At the start of the last iteration there are three > possibilities > > - i1 == -1 && i2 >= 0 => there are insertions as the start of the file. Sorry that should be i1 == 0 && i2 >= 0 > As the context lines in the preimage have been exhausted all the > remaining rchg2 elements represent added lines and are consumed by > for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body > i2 == 0 and the outer loop will exit. > > - i1 >= 0 && i2 == -1 => there are deletions at the start of the file. This one should be i1 >= 0 && i2 == 0 > As the context lines in the postimage have been exhausted all the > remaining rchg1 elements represent deleted lines and are consumed by > for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body > i1 == 0 and the outer loop will exit. > > - i1 >= 0 && i2 >= 0 => the first line is unchanged or there are > insertions and deletions at the beginning of the file. At the end of > the loop body i1 == 0 && i2 == 0 and the outer loop will exit. > > We could add > > if (i1 < -1 || i2 < -1) > BUG("mismatched context line count"); > > before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're > worried about bugs that break the assumption that there are equal > numbers of context lines on each side. Any such bug would generate > invalid diffs. I don't know how likely that is to happen in practice. > > Best Wishes > > Phillip > >> >> Are you able to demonstrate any out-of-bounds access with e.g., >> Valgrind, AddressSanitizer or an assertion? >> >>> Curiously, AddressSanitizer doesn't report anything, but if I add the >>> following line after the outer for, I can trigger it to report a >>> heap-buffer-overflow with e.g., git show 8613c2bb6c: >>> >>> if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2); >> >> That's because I forgot to add braces. D'oh! I can't trigger any >> out-of-bounds access or that Oops with them properly in place. So I >> let myself get fooled by a daring coding style. :-| >> >>> >>>> >>>> Signed-off-by: Alex Guo <alexguo1023@gmail.com> >>>> --- >>>> Fix buffer underflow in xdl_build_script >>>> >>>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr- >>>> git-1976%2Fmugitya03%2Fbuf-1-v1 >>>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr- >>>> git-1976/mugitya03/buf-1-v1 >>>> Pull-Request: https://github.com/git/git/pull/1976 >>>> >>>> xdiff/xdiffi.c | 7 ++++--- >>>> 1 file changed, 4 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c >>>> index 5a96e36dfbe..2e983965328 100644 >>>> --- a/xdiff/xdiffi.c >>>> +++ b/xdiff/xdiffi.c >>>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t >>>> **xscr) { >>>> * Trivial. Collects "groups" of changes and creates an edit >>>> script. >> >> Trivial for Davide perhaps (libxdiff author), but not my mushy brain.. >> >>>> */ >>>> for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= >>>> 0; i1--, i2--) >>> >>> Should the || be a && instead? From a birds-eye view I would assume we >>> can stop scanning for changes when we exhaust (reach the top) of either >>> side. We just have to make sure everything from the other side is >>> accounted for in the last added change. >>> >>>> - if (rchg1[i1 - 1] || rchg2[i2 - 1]) { >>>> - for (l1 = i1; rchg1[i1 - 1]; i1--); >>>> - for (l2 = i2; rchg2[i2 - 1]; i2--); >>>> + if ((i1 > 0 && rchg1[i1 - 1]) || >>>> + (i2 > 0 && rchg2[i2 - 1])) { >>>> + for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--); >>>> + for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--); >>> >>> Nit: The indentation of that line is off. >>> >>>> if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - >>>> i2))) { >>>> xdl_free_script(cscr); >>>> >>>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0 > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-05-24 13:53 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-05-23 20:51 [PATCH] Fix buffer underflow in xdl_build_script Alex via GitGitGadget 2025-05-24 5:57 ` René Scharfe 2025-05-24 9:08 ` René Scharfe 2025-05-24 13:38 ` Phillip Wood 2025-05-24 13:53 ` Phillip Wood
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).