* Iffy output given git diff --unified=2147483647 [not found] <xXWgbH3mlNEvFcdGLqBHwcclZoeZNPoLg8Hr6YCipHXvS5eKaHeTppzFM-l_wyB46BB1R1T0j6g_jWRXIj7-GRJh1LPxi1ta3GkQ5t8F4-0=@proton.me> @ 2025-03-12 15:51 ` Jason Cho 2025-03-14 22:00 ` [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() René Scharfe 0 siblings, 1 reply; 7+ messages in thread From: Jason Cho @ 2025-03-12 15:51 UTC (permalink / raw) To: git@vger.kernel.org > $ git --versiongit version 2.47.0.windows.2 > > $ git diff --unified=2147483647 1.txt 2.txt > diff --git a/1.txt b/2.txt > index e53eaa1..1130481 100644 > --- a/1.txt > +++ b/2.txt > @@ -1,10 +1,10 @@ > 1 > -2 > +a > 3 > 4 > 5 > 6 > 7 > 8 > a > 0 > @@ -1,10 +1,10 @@ > 1 > a > 3 > 4 > 5 > 6 > 7 > 8 > -9 > +a > 0 > > $ git diff --unified=3 1.txt 2.txt > diff --git a/1.txt b/2.txt > index e53eaa1..1130481 100644 > --- a/1.txt > +++ b/2.txt > @@ -1,10 +1,10 @@ > 1 > -2 > +a > 3 > 4 > 5 > 6 > 7 > 8 > -9 > +a > 0 > > $ diff --version > diff (GNU diffutils) 3.10 > Copyright (C) 2023 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>. > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > > Written by Paul Eggert, Mike Haertel, David Hayes, > Richard Stallman, and Len Tower. > > $ diff --unified=2147483647 1.txt 2.txt > --- 1.txt 2025-03-12 16:04:06.947099900 +0100 > +++ 2.txt 2025-03-12 16:04:27.131732400 +0100 > @@ -1,10 +1,10 @@ > 1 > -2 > +a > 3 > 4 > 5 > 6 > 7 > 8 > -9 > +a > 0 Please see the above command line output. I run this on Windows with git for windows, but the problem should apply for other platforms. The version of my git is 2.47. I prepare two files, I run GNU diff --unified=2147483647 1.txt 2.txt, the output is correct. Then I run git diff with --unified=2147483647, the context of the second hunk is repeated, which is unexpected. I investigated it and found the repetition is due to an overflow issue in xdiff/xemit.c. > xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg){ > xdchange_t *xch, *xchp, *lxch; > long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen; <- ---- > ... > } The documentation https://git-scm.com/docs/git-diff doesn't say the range of --unified. Even if its max value is INT_MAX, 2147483647 is in the range. Can you guys clarify the correct range of --unified? If my value 2147483647 is in range, git diff should output a diff without the strange repetition. Please fix it. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() 2025-03-12 15:51 ` Iffy output given git diff --unified=2147483647 Jason Cho @ 2025-03-14 22:00 ` René Scharfe 2025-03-14 22:28 ` Junio C Hamano 2025-03-14 23:14 ` Jason Cho 0 siblings, 2 replies; 7+ messages in thread From: René Scharfe @ 2025-03-14 22:00 UTC (permalink / raw) To: Jason Cho, git@vger.kernel.org xdl_get_hunk() calculates the maximum number of common lines between two changes that would fit into the same hunk for the given context options. It involves doubling and addition and thus can overflow if the terms are huge. The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while the type of the corresponding context and interhunkcontext in struct diff_options is int. On many platforms longs are bigger that ints, which prevents the overflow. On Windows they have the same range and the overflow manifests as hunks that are split erroneously and lines being repeated between them. Fix the overflow by checking and not going beyond LONG_MAX. This allows specifying a huge context line count and getting all lines of a changed files in a single hunk, as expected. Reported-by: Jason Cho <jason11choca@proton.me> Signed-off-by: René Scharfe <l.s.r@web.de> --- t/t4055-diff-context.sh | 10 ++++++++++ xdiff/xemit.c | 8 +++++++- 2 files changed, 17 insertions(+), 1 deletion(-) diff --git a/t/t4055-diff-context.sh b/t/t4055-diff-context.sh index f7ff234cf9..ec2804eea6 100755 --- a/t/t4055-diff-context.sh +++ b/t/t4055-diff-context.sh @@ -89,4 +89,14 @@ test_expect_success '-U0 is valid, so is diff.context=0' ' grep "^+MODIFIED" output ' +test_expect_success '-U2147483647 works' ' + echo APPENDED >>x && + test_line_count = 16 x && + git diff -U2147483647 >output && + test_line_count = 22 output && + grep "^-ADDED" output && + grep "^+MODIFIED" output && + grep "^+APPENDED" output +' + test_done diff --git a/xdiff/xemit.c b/xdiff/xemit.c index f8e3f25b03..1d40c9cb40 100644 --- a/xdiff/xemit.c +++ b/xdiff/xemit.c @@ -43,6 +43,10 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t * return 0; } +static long saturating_add(long a, long b) +{ + return signed_add_overflows(a, b) ? LONG_MAX : a + b; +} /* * Starting at the passed change atom, find the latest change atom to be included @@ -52,7 +56,9 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t * xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg) { xdchange_t *xch, *xchp, *lxch; - long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen; + long max_common = saturating_add(saturating_add(xecfg->ctxlen, + xecfg->ctxlen), + xecfg->interhunkctxlen); long max_ignorable = xecfg->ctxlen; long ignored = 0; /* number of ignored blank lines */ -- 2.48.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() 2025-03-14 22:00 ` [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() René Scharfe @ 2025-03-14 22:28 ` Junio C Hamano 2025-03-15 6:38 ` René Scharfe 2025-03-14 23:14 ` Jason Cho 1 sibling, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2025-03-14 22:28 UTC (permalink / raw) To: René Scharfe; +Cc: Jason Cho, git@vger.kernel.org René Scharfe <l.s.r@web.de> writes: > xdl_get_hunk() calculates the maximum number of common lines between two > changes that would fit into the same hunk for the given context options. > It involves doubling and addition and thus can overflow if the terms are > huge. > > The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while > the type of the corresponding context and interhunkcontext in struct > diff_options is int. On many platforms longs are bigger that ints, > which prevents the overflow. On Windows they have the same range and > the overflow manifests as hunks that are split erroneously and lines > being repeated between them. > > Fix the overflow by checking and not going beyond LONG_MAX. This allows > specifying a huge context line count and getting all lines of a changed > files in a single hunk, as expected. > > Reported-by: Jason Cho <jason11choca@proton.me> > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > t/t4055-diff-context.sh | 10 ++++++++++ > xdiff/xemit.c | 8 +++++++- > 2 files changed, 17 insertions(+), 1 deletion(-) Oh, I love a patch like this that is well thought out to carefully check the bounds, instead of blindly say "ah, counting number of things in size_t solves everything" ;-) > diff --git a/xdiff/xemit.c b/xdiff/xemit.c > index f8e3f25b03..1d40c9cb40 100644 > --- a/xdiff/xemit.c > +++ b/xdiff/xemit.c > @@ -43,6 +43,10 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t * > return 0; > } > > +static long saturating_add(long a, long b) > +{ > + return signed_add_overflows(a, b) ? LONG_MAX : a + b; > +} > > /* > * Starting at the passed change atom, find the latest change atom to be included > @@ -52,7 +56,9 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t * > xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg) > { > xdchange_t *xch, *xchp, *lxch; > - long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen; > + long max_common = saturating_add(saturating_add(xecfg->ctxlen, > + xecfg->ctxlen), > + xecfg->interhunkctxlen); Looking good. Thanks. Will queue. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() 2025-03-14 22:28 ` Junio C Hamano @ 2025-03-15 6:38 ` René Scharfe 2025-03-16 19:53 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: René Scharfe @ 2025-03-15 6:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jason Cho, git@vger.kernel.org Am 14.03.25 um 23:28 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >> t/t4055-diff-context.sh | 10 ++++++++++ >> xdiff/xemit.c | 8 +++++++- >> 2 files changed, 17 insertions(+), 1 deletion(-) > > Oh, I love a patch like this that is well thought out to carefully > check the bounds, instead of blindly say "ah, counting number of > things in size_t solves everything" ;-) Converting xdiff from long to size_t is still a good idea, I think, but would be lot more effort and thus more risky. Comparisons to upstream would become a lot more noisy as well. René ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() 2025-03-15 6:38 ` René Scharfe @ 2025-03-16 19:53 ` Junio C Hamano 2025-03-17 16:50 ` René Scharfe 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2025-03-16 19:53 UTC (permalink / raw) To: René Scharfe; +Cc: Jason Cho, git@vger.kernel.org René Scharfe <l.s.r@web.de> writes: > Am 14.03.25 um 23:28 schrieb Junio C Hamano: >> René Scharfe <l.s.r@web.de> writes: >> >>> t/t4055-diff-context.sh | 10 ++++++++++ >>> xdiff/xemit.c | 8 +++++++- >>> 2 files changed, 17 insertions(+), 1 deletion(-) >> >> Oh, I love a patch like this that is well thought out to carefully >> check the bounds, instead of blindly say "ah, counting number of >> things in size_t solves everything" ;-) > > Converting xdiff from long to size_t is still a good idea, I think, ... Oh, no question about that, especially if the number of counted things are somehow proportionally related to the size of in-core memory regions in any way. What I do *not* like is the recent trend in the patches I see. They stop thinking there once they blindly replace int or unsigned or whatever with size_t and the compiler stops warning. Even though compiler warnings can be a useful tool when there is very little false positives, they are merely tools to improve the code, but I see more and more confused patches that seem to think squelching warnings is the goal in itself, without thinking if the resulting code is actually improved. And I didn't see that in this patch. The patch was actually written with real goal of improving the code in mind. > but > would be lot more effort and thus more risky. Perhaps, perhaps not. > Comparisons to upstream > would become a lot more noisy as well. I am not sure how much of that matters these days, though. Are they still active, or is the code perfect and pretty much done? I somehow had the impression it has been the latter for a long time... Thanks. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() 2025-03-16 19:53 ` Junio C Hamano @ 2025-03-17 16:50 ` René Scharfe 0 siblings, 0 replies; 7+ messages in thread From: René Scharfe @ 2025-03-17 16:50 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jason Cho, git@vger.kernel.org Am 16.03.25 um 20:53 schrieb Junio C Hamano: > René Scharfe <l.s.r@web.de> writes: > >> Comparisons to upstream >> would become a lot more noisy as well. > > I am not sure how much of that matters these days, though. Are they > still active, or is the code perfect and pretty much done? I somehow > had the impression it has been the latter for a long time... http://www.xmailserver.org/xdiff-lib.html offers libxdiff-0.23.tar.gz, whose entries bear the timestamp 2008-11-12. Solid. René ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() 2025-03-14 22:00 ` [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() René Scharfe 2025-03-14 22:28 ` Junio C Hamano @ 2025-03-14 23:14 ` Jason Cho 1 sibling, 0 replies; 7+ messages in thread From: Jason Cho @ 2025-03-14 23:14 UTC (permalink / raw) To: René Scharfe; +Cc: git@vger.kernel.org Dear René, What a thorough analysis. I didn't know this bug is only on Windows. Thank you for submitting the fix. I am excited to see it in a new release. For now I will cherrypick your patch to my fork of git. Best regards, Jason Cho On Friday, March 14th, 2025 at 11:00 PM, René Scharfe <l.s.r@web.de> wrote: > xdl_get_hunk() calculates the maximum number of common lines between two > changes that would fit into the same hunk for the given context options. > It involves doubling and addition and thus can overflow if the terms are > huge. > > The type of ctxlen and interhunkctxlen in xdemitconf_t is long, while > the type of the corresponding context and interhunkcontext in struct > diff_options is int. On many platforms longs are bigger that ints, > which prevents the overflow. On Windows they have the same range and > the overflow manifests as hunks that are split erroneously and lines > being repeated between them. > > Fix the overflow by checking and not going beyond LONG_MAX. This allows > specifying a huge context line count and getting all lines of a changed > files in a single hunk, as expected. > > Reported-by: Jason Cho jason11choca@proton.me > > Signed-off-by: René Scharfe l.s.r@web.de > > --- > t/t4055-diff-context.sh | 10 ++++++++++ > xdiff/xemit.c | 8 +++++++- > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/t/t4055-diff-context.sh b/t/t4055-diff-context.sh > index f7ff234cf9..ec2804eea6 100755 > --- a/t/t4055-diff-context.sh > +++ b/t/t4055-diff-context.sh > @@ -89,4 +89,14 @@ test_expect_success '-U0 is valid, so is diff.context=0' ' > grep "^+MODIFIED" output > ' > > +test_expect_success '-U2147483647 works' ' > + echo APPENDED >>x && > > + test_line_count = 16 x && > + git diff -U2147483647 >output && > > + test_line_count = 22 output && > + grep "^-ADDED" output && > + grep "^+MODIFIED" output && > + grep "^+APPENDED" output > +' > + > test_done > diff --git a/xdiff/xemit.c b/xdiff/xemit.c > index f8e3f25b03..1d40c9cb40 100644 > --- a/xdiff/xemit.c > +++ b/xdiff/xemit.c > @@ -43,6 +43,10 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const pre, xdemitcb_t * > return 0; > } > > +static long saturating_add(long a, long b) > +{ > + return signed_add_overflows(a, b) ? LONG_MAX : a + b; > +} > > / > * Starting at the passed change atom, find the latest change atom to be included > @@ -52,7 +56,9 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t * > xdchange_t *xdl_get_hunk(xdchange_t **xscr, xdemitconf_t const *xecfg) > { > xdchange_t *xch, *xchp, *lxch; > - long max_common = 2 * xecfg->ctxlen + xecfg->interhunkctxlen; > > + long max_common = saturating_add(saturating_add(xecfg->ctxlen, > > + xecfg->ctxlen), > > + xecfg->interhunkctxlen); > > long max_ignorable = xecfg->ctxlen; > > long ignored = 0; /* number of ignored blank lines */ > > -- > 2.48.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-03-17 16:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <xXWgbH3mlNEvFcdGLqBHwcclZoeZNPoLg8Hr6YCipHXvS5eKaHeTppzFM-l_wyB46BB1R1T0j6g_jWRXIj7-GRJh1LPxi1ta3GkQ5t8F4-0=@proton.me>
2025-03-12 15:51 ` Iffy output given git diff --unified=2147483647 Jason Cho
2025-03-14 22:00 ` [PATCH] xdiff: avoid arithmetic overflow in xdl_get_hunk() René Scharfe
2025-03-14 22:28 ` Junio C Hamano
2025-03-15 6:38 ` René Scharfe
2025-03-16 19:53 ` Junio C Hamano
2025-03-17 16:50 ` René Scharfe
2025-03-14 23:14 ` Jason Cho
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox