git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] diffcore-rename: fix BUG when break detection and --follow used together
@ 2025-03-08  1:00 Elijah Newren via GitGitGadget
  2025-03-14 17:24 ` Jeff King
  2025-03-15  1:08 ` [PATCH v2] " Elijah Newren via GitGitGadget
  0 siblings, 2 replies; 6+ messages in thread
From: Elijah Newren via GitGitGadget @ 2025-03-08  1:00 UTC (permalink / raw)
  To: git; +Cc: Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Prior to commit 9db2ac56168e (diffcore-rename: accelerate rename_dst
setup, 2020-12-11), the function add_rename_dst() resulted in quadratic
runtime since each call inserted the new entry into the array in sorted
order.  The reason for the sorted order requirement was so that
locate_rename_dst(), used when break detection is turned on, could find
the appropriate entry in logarithmic time via bisection on string
comparisons.  (It's better to be quadratic in moving pointers than
quadratic in string comparisons, so this made some sense.)  However,
since break detection always sticks the broken pairs adjacent to each
other, that commit decided to simply append entries to rename_dst, and
record the mapping of (filename) -> (index within rename_dst) via a
strintmap.  Doing this relied on the fact that when adding the source of
a broken pair via register_rename_src(), that the next item we'd process
was the other half of the same broken pair and would be added to
rename_dst via add_rename_dst().  This assumption was fine under break
detection alone, but the combination of break detection and
single_follow violated that assumption because of this code:

		else if (options->single_follow &&
			 strcmp(options->single_follow, p->two->path))
			continue; /* not interested */

which would end up skipping calling add_rename_dst() below that point.
Since I knew I was assuming that the dst pair of a break would always be
added right after the src pair of a break, I added a new BUG() directive
as part of that commit later on at time of use that would check my
assumptions held.  That BUG() didn't trip for nearly 4 years...which
sadly meant I had long since forgotten the related details.  Anyway...

When the dst half of a broken pair is skipped like this, it means that
not only could my recorded index be invalid (just past the end of the
array), it could also point to some unrelated dst that just happened to
be the next one added to the array.  So, to fix this, we need to add a
little more safety around the checks for the recorded break_idx.

It turns out that making a testcase to trigger this is a bit challenging
too.  I added a simple testcase which tickles the necessary area, but
running it normally actually passes for me.  However, running it under
valgrind shows that it is depending upon uninitialized memory.  I
suspect that to get a reliable reproduction case, I might need to have
several more paths involved, but that might make the testcase more
difficult to understand.  So, I instead just embedded a warning within
the testname that the test triggered uninitialized memory use.

In short, when these two rare options are used together, fix the
accidental find of the wrong dst entry (which would often be
uninitialized memory just past the end of the array), by adding a little
more care around the recorded indices for break_idx.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
    diffcore-rename: fix BUG when break detection and --follow used together
    
    Bug dates back to Git v2.31.0, and was discovered and reported about
    four years later over at
    https://lore.kernel.org/git/20240920112228.3d1130f5.olaf@aepfle.de/.
    Sadly, took me about half a year to get back to this one...

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1876%2Fnewren%2Ffix-break-follow-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1876/newren/fix-break-follow-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1876

 diffcore-rename.c                   |  9 +++++----
 t/t4206-log-follow-harder-copies.sh | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 10bb0321b10..cb4be5be63c 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -33,7 +33,7 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filepair *p)
 {
 	/* Lookup by p->ONE->path */
 	int idx = break_idx ? strintmap_get(break_idx, p->one->path) : -1;
-	return (idx == -1) ? NULL : &rename_dst[idx];
+	return (idx == -1 || idx == rename_dst_nr) ? NULL : &rename_dst[idx];
 }
 
 /*
@@ -1668,9 +1668,10 @@ void diffcore_rename_extended(struct diff_options *options,
 			if (DIFF_PAIR_BROKEN(p)) {
 				/* broken delete */
 				struct diff_rename_dst *dst = locate_rename_dst(p);
-				if (!dst)
-					BUG("tracking failed somehow; failed to find associated dst for broken pair");
-				if (dst->is_rename)
+				if (options->single_follow && dst &&
+				    strcmp(dst->p->two->path, p->two->path))
+					dst = NULL;
+				if (dst && dst->is_rename)
 					/* counterpart is now rename/copy */
 					pair_to_free = p;
 			}
diff --git a/t/t4206-log-follow-harder-copies.sh b/t/t4206-log-follow-harder-copies.sh
index bcab71c8e84..9e00f33a425 100755
--- a/t/t4206-log-follow-harder-copies.sh
+++ b/t/t4206-log-follow-harder-copies.sh
@@ -54,4 +54,22 @@ test_expect_success 'validate the output.' '
 	compare_diff_patch current expected
 '
 
+test_expect_success 'log --follow -B does not die or use uninitialized memory' '
+	git switch --orphan break_and_follow_are_icky_so_use_both &&
+	printf "%s\n" A B C D E F G H I J K L M N O P Q R S T U V W X Y Z >z &&
+	git add z &&
+	git commit -m "Initial" &&
+
+	test_seq 1 130 >z &&
+	echo lame >somefile &&
+	git add z somefile &&
+	git commit -m "Rewrite z, introduce lame somefile" &&
+
+	echo Content >somefile &&
+	git add somefile &&
+	git commit -m "Rewrite somefile" &&
+
+	git log -B --follow somefile
+'
+
 test_done

base-commit: f93ff170b93a1782659637824b25923245ac9dd1
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] diffcore-rename: fix BUG when break detection and --follow used together
  2025-03-08  1:00 [PATCH] diffcore-rename: fix BUG when break detection and --follow used together Elijah Newren via GitGitGadget
@ 2025-03-14 17:24 ` Jeff King
  2025-03-14 22:28   ` Elijah Newren
  2025-03-15  1:08 ` [PATCH v2] " Elijah Newren via GitGitGadget
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff King @ 2025-03-14 17:24 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Sat, Mar 08, 2025 at 01:00:15AM +0000, Elijah Newren via GitGitGadget wrote:

> It turns out that making a testcase to trigger this is a bit challenging
> too.  I added a simple testcase which tickles the necessary area, but
> running it normally actually passes for me.  However, running it under
> valgrind shows that it is depending upon uninitialized memory.  I
> suspect that to get a reliable reproduction case, I might need to have
> several more paths involved, but that might make the testcase more
> difficult to understand.  So, I instead just embedded a warning within
> the testname that the test triggered uninitialized memory use.

I think it's OK for a test case to require extra memory checks to fail;
after all, these kinds of bugs are usually non-deterministic without
those checks anyway.

I did verify that it reproduces for me with "--valgrind". I was
surprised (and a little disappointed) that it doesn't seem to trigger
with ASan/UBSan. We do run those routinely in CI, but I doubt that
--valgrind gets used regularly for the whole test suite by anyone these
days, just because it's so much slower.

I'm puzzled, though, why the test case at the beginning of this
thread[1] yields the BUG() so readily, but your test case doesn't.

So maybe this is the best we can do, but it feels like we should be able
to at least trigger the existing BUG() reliably. I couldn't seem to
figure it out, though. :(

> In short, when these two rare options are used together, fix the
> accidental find of the wrong dst entry (which would often be
> uninitialized memory just past the end of the array), by adding a little
> more care around the recorded indices for break_idx.

Your description of the problem and the solution both seemed sensible to
me (though I'm not all that familiar with the ins and outs of the rename
code these days).

-Peff

[1] The simplest I came up with is:

      git clone --bare https://github.com/intel/linux-sgx.git tmp.git
      cd tmp.git
      git --no-pager log -B --follow 63d0e65cfa49bb46a8dbe8745bb15aaf226faa97 -- external/ippcp_internal/inc

    Curiously, that pathspec is actually a directory, but it only has a
    single file in it. Feeding the actual file in it _doesn't_ trigger
    the bug:

      git --no-pager log -B --follow 63d0e65cfa49bb46a8dbe8745bb15aaf226faa97 -- external/ippcp_internal/inc/ippcp20u3.patch

    That just shows the single commit (even though there is a single
    file before and after that commit, it is not similar enough to find
    a rename).

    The file that hits the break detection is unrelated. It looks like
    it's psw/ae/data/prebuilt/le_prod_css.bin.

    I tried variants with break files, subdirs, etc, but I couldn't seem
    to make anything work.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] diffcore-rename: fix BUG when break detection and --follow used together
  2025-03-14 17:24 ` Jeff King
@ 2025-03-14 22:28   ` Elijah Newren
  2025-03-17 17:51     ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Elijah Newren @ 2025-03-14 22:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Elijah Newren via GitGitGadget, git

On Fri, Mar 14, 2025 at 10:24 AM Jeff King <peff@peff.net> wrote:
>
> On Sat, Mar 08, 2025 at 01:00:15AM +0000, Elijah Newren via GitGitGadget wrote:
>
> > It turns out that making a testcase to trigger this is a bit challenging
> > too.  I added a simple testcase which tickles the necessary area, but
> > running it normally actually passes for me.  However, running it under
> > valgrind shows that it is depending upon uninitialized memory.  I
> > suspect that to get a reliable reproduction case, I might need to have
> > several more paths involved, but that might make the testcase more
> > difficult to understand.  So, I instead just embedded a warning within
> > the testname that the test triggered uninitialized memory use.

Maybe I should have been a little more clear about "a bit
challenging".  I spent hours on it.  And I suspected that the only way
to trigger the reporter's particular manifestation of the issue was to
use an invalid command.  I didn't want to spend any more hours on it,
but...

> I think it's OK for a test case to require extra memory checks to fail;
> after all, these kinds of bugs are usually non-deterministic without
> those checks anyway.
>
> I did verify that it reproduces for me with "--valgrind". I was
> surprised (and a little disappointed) that it doesn't seem to trigger
> with ASan/UBSan. We do run those routinely in CI, but I doubt that
> --valgrind gets used regularly for the whole test suite by anyone these
> days, just because it's so much slower.
>
> I'm puzzled, though, why the test case at the beginning of this
> thread[1] yields the BUG() so readily, but your test case doesn't.
>
> So maybe this is the best we can do, but it feels like we should be able
> to at least trigger the existing BUG() reliably. I couldn't seem to
> figure it out, though. :(

So, after _another_ 7 hours or so on it today...  The BUG() the
reporter triggered only happens when there is no uninitialized memory
use, and is only triggerable when you use invalid flags.  For the
reporter, they passed a directory name for their pathspec along with
--follow, despite the fact that --follow only works when given a
single pathspec that names an individual file.  You can also trigger
their particular manifestation of the issue here when using a glob
pathspec together with --follow.  In either event, the --follow
becomes useless: when the follow logic checks whether filenames are
equal to the given pathspec to see if it might be a relevant rename,
no filename is exactly equal to the pathspec, so it never finds any
relevant files to follow or to include in the rename detection.  The
upshot is the command basically behaves the same as if you hadn't
given --follow, other than the fact that the presence of --follow
makes diffcore_rename throw away rename_dst pairs in a way that
happens to trigger this particular BUG().

The testcase I found is:
    seq 1 127 >numbers &&
    git add numbers &&
    git commit -m "numbers" &&

    printf "%s\n" A B C D E F G H I J K L M N O Q R S T U V W X Y Z >pool &&
    seq 1 10 >numbers &&
    git add pool numbers &&
    git commit -m "pool" &&

    git log -1 -B --raw --follow -- "p*"

The BUG will still be triggered if
  * you change the content of pool (you can even make it as similar as
you want to numbers)
  * you change the content of numbers in the second commit while
keeping it sufficiently different from the first commit
The BUG will _not_ be triggered if:
  * you change the log's pathspec to match numbers
  * you change the log's pathspec to not match pool
  * you change the log's pathspec to be exactly "pool"
  * you remove any of the three files (two versions of number and one
of pool) from the testcase
  * you reduce 127 (even if only to 126)
  * you make numbers from the second commit too similar to numbers
from the first commit

In all of these cases (and this is also true for the reporter's
original case), in locate_rename_dst(), idx will be computed as 0, but
rename_dst is NULL, so &rename_dst[idx] is NULL as well.

However, I think the fact that rename_dst == NULL implies
&rename_dst[0] == NULL should raise alarm bells about the risks of
using memory improperly, even if it doesn't directly use uninitialized
memory in this case.  Which brings us to the bigger more encompassing
issue, which is what I reported in the commit message:

> > In short, when these two rare options are used together, fix the
> > accidental find of the wrong dst entry (which would often be
> > uninitialized memory just past the end of the array), by adding a little
> > more care around the recorded indices for break_idx.

It's just the special case when rename_dst is empty and, in fact,
NULL, that you trigger the BUG() call.

> Your description of the problem and the solution both seemed sensible to
> me (though I'm not all that familiar with the ins and outs of the rename
> code these days).

Thanks.

>
> -Peff
>
> [1] The simplest I came up with is:
>
>       git clone --bare https://github.com/intel/linux-sgx.git tmp.git
>       cd tmp.git
>       git --no-pager log -B --follow 63d0e65cfa49bb46a8dbe8745bb15aaf226faa97 -- external/ippcp_internal/inc

Yeah, that was the same I was using last week.

>     Curiously, that pathspec is actually a directory, but it only has a
>     single file in it. Feeding the actual file in it _doesn't_ trigger
>     the bug:
>
>       git --no-pager log -B --follow 63d0e65cfa49bb46a8dbe8745bb15aaf226faa97 -- external/ippcp_internal/inc/ippcp20u3.patch

Yeah, I was doing the same thing last week when fixing this bug.
Sorry, I guess I should have mentioned this to avoid you attempting
the same.

>     That just shows the single commit (even though there is a single
>     file before and after that commit, it is not similar enough to find
>     a rename).
>
>     The file that hits the break detection is unrelated. It looks like
>     it's psw/ae/data/prebuilt/le_prod_css.bin.
>
>     I tried variants with break files, subdirs, etc, but I couldn't seem
>     to make anything work.

Maybe what I wrote above helps.  Is this enough information to satisfy
your curiosity?

I suspect adding this second test to the commit makes sense.  Which
parts of my explanation in this email would you like to see added to
the commit message as well, or is it fine as-is?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] diffcore-rename: fix BUG when break detection and --follow used together
  2025-03-08  1:00 [PATCH] diffcore-rename: fix BUG when break detection and --follow used together Elijah Newren via GitGitGadget
  2025-03-14 17:24 ` Jeff King
@ 2025-03-15  1:08 ` Elijah Newren via GitGitGadget
  2025-03-17 17:52   ` Jeff King
  1 sibling, 1 reply; 6+ messages in thread
From: Elijah Newren via GitGitGadget @ 2025-03-15  1:08 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Elijah Newren, Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Prior to commit 9db2ac56168e (diffcore-rename: accelerate rename_dst
setup, 2020-12-11), the function add_rename_dst() resulted in quadratic
runtime since each call inserted the new entry into the array in sorted
order.  The reason for the sorted order requirement was so that
locate_rename_dst(), used when break detection is turned on, could find
the appropriate entry in logarithmic time via bisection on string
comparisons.  (It's better to be quadratic in moving pointers than
quadratic in string comparisons, so this made some sense.)  However,
since break detection always sticks the broken pairs adjacent to each
other, that commit decided to simply append entries to rename_dst, and
record the mapping of (filename) -> (index within rename_dst) via a
strintmap.  Doing this relied on the fact that when adding the source of
a broken pair via register_rename_src(), that the next item we'd process
was the other half of the same broken pair and would be added to
rename_dst via add_rename_dst().  This assumption was fine under break
detection alone, but the combination of break detection and
single_follow violated that assumption because of this code:

		else if (options->single_follow &&
			 strcmp(options->single_follow, p->two->path))
			continue; /* not interested */

which would end up skipping calling add_rename_dst() below that point.
Since I knew I was assuming that the dst pair of a break would always be
added right after the src pair of a break, I added a new BUG() directive
as part of that commit later on at time of use that would check my
assumptions held.  That BUG() didn't trip for nearly 4 years...which
sadly meant I had long since forgotten the related details.  Anyway...

When the dst half of a broken pair is skipped like this, it means that
not only could my recorded index be invalid (just past the end of the
array), it could also point to some unrelated dst that just happened to
be the next one added to the array.  So, to fix this, we need to add a
little more safety around the checks for the recorded break_idx.

It turns out that making a testcase to trigger this is quite the
challenge.  I actually added two testscases:
  * One testcase which uses --follow incorrectly (it uses its single
    pathspec to specifying something other than a single filename), and
    which triggers the same bug reported-by Olaf.  This triggers a
    special case within locate_rename_dst() where idx evaluates to 0
    and rename_dst is NULL, meaning that our return value of
    &rename_dst[idx] happens to evaluate to NULL as well.  This
    addressing of an index into a NULL array hints at deeper problems,
    which are raised in the next testcase...
  * A second testcase which when run under valgrind shows that the code
    actually depends upon unintialized memory, in particular the entry
    just after the end of the rename_dst array.

In short, when the two rare options -B and --follow are used together,
fix the accidental find of the wrong dst entry (which would often be
uninitialized memory just past the end of the array, but also could
have just been a dst for an unrelated path if no dst was recorded for
the expected path).  Do so by adding a little more care around checking
the recorded indices in break_idx.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
    diffcore-rename: fix BUG when break detection and --follow used together
    
    Bug dates back to Git v2.31.0, and was discovered and reported about
    four years later over at
    https://lore.kernel.org/git/20240920112228.3d1130f5.olaf@aepfle.de/.
    Sadly, took me about half a year to get back to this one...
    
    Changes since v1:
    
     * Added a testcase, and extended the commit message slightly

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1876%2Fnewren%2Ffix-break-follow-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1876/newren/fix-break-follow-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1876

Range-diff vs v1:

 1:  e14c1193905 ! 1:  a68e7fe19f7 diffcore-rename: fix BUG when break detection and --follow used together
     @@ Commit message
          be the next one added to the array.  So, to fix this, we need to add a
          little more safety around the checks for the recorded break_idx.
      
     -    It turns out that making a testcase to trigger this is a bit challenging
     -    too.  I added a simple testcase which tickles the necessary area, but
     -    running it normally actually passes for me.  However, running it under
     -    valgrind shows that it is depending upon uninitialized memory.  I
     -    suspect that to get a reliable reproduction case, I might need to have
     -    several more paths involved, but that might make the testcase more
     -    difficult to understand.  So, I instead just embedded a warning within
     -    the testname that the test triggered uninitialized memory use.
     +    It turns out that making a testcase to trigger this is quite the
     +    challenge.  I actually added two testscases:
     +      * One testcase which uses --follow incorrectly (it uses its single
     +        pathspec to specifying something other than a single filename), and
     +        which triggers the same bug reported-by Olaf.  This triggers a
     +        special case within locate_rename_dst() where idx evaluates to 0
     +        and rename_dst is NULL, meaning that our return value of
     +        &rename_dst[idx] happens to evaluate to NULL as well.  This
     +        addressing of an index into a NULL array hints at deeper problems,
     +        which are raised in the next testcase...
     +      * A second testcase which when run under valgrind shows that the code
     +        actually depends upon unintialized memory, in particular the entry
     +        just after the end of the rename_dst array.
      
     -    In short, when these two rare options are used together, fix the
     -    accidental find of the wrong dst entry (which would often be
     -    uninitialized memory just past the end of the array), by adding a little
     -    more care around the recorded indices for break_idx.
     +    In short, when the two rare options -B and --follow are used together,
     +    fix the accidental find of the wrong dst entry (which would often be
     +    uninitialized memory just past the end of the array, but also could
     +    have just been a dst for an unrelated path if no dst was recorded for
     +    the expected path).  Do so by adding a little more care around checking
     +    the recorded indices in break_idx.
      
          Reported-by: Olaf Hering <olaf@aepfle.de>
          Signed-off-by: Elijah Newren <newren@gmail.com>
     @@ t/t4206-log-follow-harder-copies.sh: test_expect_success 'validate the output.'
       	compare_diff_patch current expected
       '
       
     -+test_expect_success 'log --follow -B does not die or use uninitialized memory' '
     ++test_expect_success 'log --follow -B does not BUG' '
      +	git switch --orphan break_and_follow_are_icky_so_use_both &&
     ++
     ++	test_seq 1 127 >numbers &&
     ++	git add numbers &&
     ++	git commit -m "numbers" &&
     ++
     ++	printf "%s\n" A B C D E F G H I J K L M N O Q R S T U V W X Y Z >pool &&
     ++	echo changed >numbers &&
     ++	git add pool numbers &&
     ++	git commit -m "pool" &&
     ++
     ++	git log -1 -B --raw --follow -- "p*"
     ++'
     ++
     ++test_expect_success 'log --follow -B does not die or use uninitialized memory' '
      +	printf "%s\n" A B C D E F G H I J K L M N O P Q R S T U V W X Y Z >z &&
      +	git add z &&
      +	git commit -m "Initial" &&


 diffcore-rename.c                   |  9 ++++----
 t/t4206-log-follow-harder-copies.sh | 32 +++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 10bb0321b10..cb4be5be63c 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -33,7 +33,7 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filepair *p)
 {
 	/* Lookup by p->ONE->path */
 	int idx = break_idx ? strintmap_get(break_idx, p->one->path) : -1;
-	return (idx == -1) ? NULL : &rename_dst[idx];
+	return (idx == -1 || idx == rename_dst_nr) ? NULL : &rename_dst[idx];
 }
 
 /*
@@ -1668,9 +1668,10 @@ void diffcore_rename_extended(struct diff_options *options,
 			if (DIFF_PAIR_BROKEN(p)) {
 				/* broken delete */
 				struct diff_rename_dst *dst = locate_rename_dst(p);
-				if (!dst)
-					BUG("tracking failed somehow; failed to find associated dst for broken pair");
-				if (dst->is_rename)
+				if (options->single_follow && dst &&
+				    strcmp(dst->p->two->path, p->two->path))
+					dst = NULL;
+				if (dst && dst->is_rename)
 					/* counterpart is now rename/copy */
 					pair_to_free = p;
 			}
diff --git a/t/t4206-log-follow-harder-copies.sh b/t/t4206-log-follow-harder-copies.sh
index bcab71c8e84..190c4843211 100755
--- a/t/t4206-log-follow-harder-copies.sh
+++ b/t/t4206-log-follow-harder-copies.sh
@@ -54,4 +54,36 @@ test_expect_success 'validate the output.' '
 	compare_diff_patch current expected
 '
 
+test_expect_success 'log --follow -B does not BUG' '
+	git switch --orphan break_and_follow_are_icky_so_use_both &&
+
+	test_seq 1 127 >numbers &&
+	git add numbers &&
+	git commit -m "numbers" &&
+
+	printf "%s\n" A B C D E F G H I J K L M N O Q R S T U V W X Y Z >pool &&
+	echo changed >numbers &&
+	git add pool numbers &&
+	git commit -m "pool" &&
+
+	git log -1 -B --raw --follow -- "p*"
+'
+
+test_expect_success 'log --follow -B does not die or use uninitialized memory' '
+	printf "%s\n" A B C D E F G H I J K L M N O P Q R S T U V W X Y Z >z &&
+	git add z &&
+	git commit -m "Initial" &&
+
+	test_seq 1 130 >z &&
+	echo lame >somefile &&
+	git add z somefile &&
+	git commit -m "Rewrite z, introduce lame somefile" &&
+
+	echo Content >somefile &&
+	git add somefile &&
+	git commit -m "Rewrite somefile" &&
+
+	git log -B --follow somefile
+'
+
 test_done

base-commit: f93ff170b93a1782659637824b25923245ac9dd1
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] diffcore-rename: fix BUG when break detection and --follow used together
  2025-03-14 22:28   ` Elijah Newren
@ 2025-03-17 17:51     ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2025-03-17 17:51 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Elijah Newren via GitGitGadget, git

On Fri, Mar 14, 2025 at 03:28:09PM -0700, Elijah Newren wrote:

> > So maybe this is the best we can do, but it feels like we should be able
> > to at least trigger the existing BUG() reliably. I couldn't seem to
> > figure it out, though. :(
> 
> So, after _another_ 7 hours or so on it today...  The BUG() the
> reporter triggered only happens when there is no uninitialized memory
> use, and is only triggerable when you use invalid flags.  For the
> reporter, they passed a directory name for their pathspec along with
> --follow, despite the fact that --follow only works when given a
> single pathspec that names an individual file.  You can also trigger
> their particular manifestation of the issue here when using a glob
> pathspec together with --follow.  In either event, the --follow
> becomes useless: when the follow logic checks whether filenames are
> equal to the given pathspec to see if it might be a relevant rename,
> no filename is exactly equal to the pathspec, so it never finds any
> relevant files to follow or to include in the rename detection.  The
> upshot is the command basically behaves the same as if you hadn't
> given --follow, other than the fact that the presence of --follow
> makes diffcore_rename throw away rename_dst pairs in a way that
> happens to trigger this particular BUG().

OK, that all makes sense (and what I was trying to reduce the case, too,
but somehow it eluded me).

Sorry, I didn't mean for you to spend all day on it. I was just
wondering if there was low-hanging fruit we could convince to trigger
with ASan (which would much better protect against regression). It does
look like we at least got some benefit, though. ;)

> The testcase I found is:
>     seq 1 127 >numbers &&
>     git add numbers &&
>     git commit -m "numbers" &&
> 
>     printf "%s\n" A B C D E F G H I J K L M N O Q R S T U V W X Y Z >pool &&
>     seq 1 10 >numbers &&
>     git add pool numbers &&
>     git commit -m "pool" &&
> 
>     git log -1 -B --raw --follow -- "p*"

Yeah, that's similar to what I was trying, but I didn't try the globbing
pathspec. I stuck everything in a subdirectory and use subdir/. But I
think I got caught up on...

> The BUG will _not_ be triggered if:
>   * you change the log's pathspec to match numbers

I put both files into the subdir/, so they both matched.

> In all of these cases (and this is also true for the reporter's
> original case), in locate_rename_dst(), idx will be computed as 0, but
> rename_dst is NULL, so &rename_dst[idx] is NULL as well.

Makes sense.

> However, I think the fact that rename_dst == NULL implies
> &rename_dst[0] == NULL should raise alarm bells about the risks of
> using memory improperly, even if it doesn't directly use uninitialized
> memory in this case.  Which brings us to the bigger more encompassing
> issue, which is what I reported in the commit message:
> 
> > > In short, when these two rare options are used together, fix the
> > > accidental find of the wrong dst entry (which would often be
> > > uninitialized memory just past the end of the array), by adding a little
> > > more care around the recorded indices for break_idx.
> 
> It's just the special case when rename_dst is empty and, in fact,
> NULL, that you trigger the BUG() call.

Oh, absolutely. The uninitialized memory is the bigger problem, and the
fact that it BUG()-ed at all was mostly lucky.

> Maybe what I wrote above helps.  Is this enough information to satisfy
> your curiosity?

Definitely.

> I suspect adding this second test to the commit makes sense.  Which
> parts of my explanation in this email would you like to see added to
> the commit message as well, or is it fine as-is?

I read over your v2, and I think the explanation there is good. In the
long run we might eventually disallow --follow on the glob like this
(since as you note, it's invalid and doesn't actually work). In which
case I guess we'd end up deleting that test case. But in the meantime, I
think there's some benefit to having it.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] diffcore-rename: fix BUG when break detection and --follow used together
  2025-03-15  1:08 ` [PATCH v2] " Elijah Newren via GitGitGadget
@ 2025-03-17 17:52   ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2025-03-17 17:52 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget; +Cc: git, Elijah Newren

On Sat, Mar 15, 2025 at 01:08:13AM +0000, Elijah Newren via GitGitGadget wrote:

>     Changes since v1:
>     
>      * Added a testcase, and extended the commit message slightly

I already left a bigger response in the thread on v1, but I wanted to
add here that this version looks great to me. Thanks!

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-03-17 17:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-08  1:00 [PATCH] diffcore-rename: fix BUG when break detection and --follow used together Elijah Newren via GitGitGadget
2025-03-14 17:24 ` Jeff King
2025-03-14 22:28   ` Elijah Newren
2025-03-17 17:51     ` Jeff King
2025-03-15  1:08 ` [PATCH v2] " Elijah Newren via GitGitGadget
2025-03-17 17:52   ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).