public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Han Young <hanyang.tony@bytedance.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2 1/1] diffcore-break: prevent dangling pointer
Date: Thu, 12 Feb 2026 10:58:14 -0800	[thread overview]
Message-ID: <xmqqseb5okl5.fsf@gitster.g> (raw)
In-Reply-To: <20260212072002.2347-2-hanyang.tony@bytedance.com> (Han Young's message of "Thu, 12 Feb 2026 15:20:02 +0800")

Han Young <hanyang.tony@bytedance.com> writes:

> After we have freed the file pair, we should set the queue reference to null.
> This prevents us from encountering a dangling pointer later on.

I sense that "This prevents ... later on" needs further be
clarified, since it is totally unclear what "later on" refers to.
We are done with the old filepair, and have no reason to revisit the
q->queue[] item ourselves, but somebody later attempts to use it.
Who is it and why does it do so?  That is a natural question readers
of the above description would ask, isn't it?

    Side note: I am guessiong that this is similar to the problem
    fixed with recent 56d388e6 (diff: avoid segfault with freed
    entries, 2025-12-29), where pointers in diff_queue->queue[] that
    point at file pairs that have been freed were mistakenly used by
    diff_queued_diff_prefetch() to populate them.  Is a prefetch
    happening in "git reset --mixed" that calls read_from_tree()
    which calls update_index_from_diff() via do_diff_cache()?

> The test uses git reset to trigger prefetching after break-rewrites have freed
> the file pair.
>
> Signed-off-by: Han Young <hanyang.tony@bytedance.com>
> ---
>  diffcore-break.c              |  1 +
>  t/t4067-diff-partial-clone.sh | 30 ++++++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+)
>
> diff --git a/diffcore-break.c b/diffcore-break.c
> index c4c2173f30..9b11fe2fa0 100644
> --- a/diffcore-break.c
> +++ b/diffcore-break.c
> @@ -222,6 +222,7 @@ void diffcore_break(struct repository *r, int break_score)
>  				free(p); /* not diff_free_filepair(), we are
>  					  * reusing one and two here.
>  					  */
> +				q->queue[i] = NULL;
>  				continue;
>  			}
>  		}
> diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
> index 72f25de449..a980cd30a0 100755
> --- a/t/t4067-diff-partial-clone.sh
> +++ b/t/t4067-diff-partial-clone.sh
> @@ -132,6 +132,36 @@ test_expect_success 'diff with rename detection batches blobs' '
>  	test_line_count = 1 done_lines
>  '
>  
> +test_expect_success 'diff succeeds even if prefetch triggered by break-rewrites' '
> +	test_when_finished "rm -rf server client trace" &&
> +
> +	test_create_repo server &&
> +	echo xyz >server/foo &&
> +	mkdir server/bar &&
> +	test_seq -f "line %d" 1 100 >server/bar/baz &&
> +	git -C server add -A &&
> +	git -C server commit -m x &&
> +
> +
> +	echo xyzz >server/foo &&

The blank line above does not have to be doubled, I think.  So the
first commit yas "xyz" in "foo", and 100 lines 1..100 in "bar/baz".

> +	rm server/bar/baz &&

We are overwriting it, so I am not sure why this "rm" is needed.  Is
it necessary to avoid reusing the same i-num for the file to avoid
racily clean condition, or something?  I find it unlikely because
the length of the new contents ...

> +	test_seq -f "line %d" 90 190 >server/bar/baz &&

... is different from the original.

> +	git -C server add -A &&
> +	git -C server commit -m x &&

In any case, the second commit has "xyzz" in "foo", and different
100 lines 90..190 in "bar/baz".

> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +	git clone --filter=blob:limit=0 "file://$(pwd)/server" client &&

And we grab both commits and their trees, but not any blob.

> +	# Fetch bar/baz without fetching foo.
> +	git -C client checkout HEAD~1 bar &&

And we move "bar/baz" back to the 1..100 (the tip of the history
immediately after cloning had 90..190).

> +	# Ensure baz has diff
> +	git -C client reset --hard HEAD &&

I am not sure what the comment wants to say.  Before this hard
reset, we did have modification relative to HEAD in bar/baz; with a
hard reset, we are ensuring that everything including bar/baz
exactly match HEAD, aren't we?

> +	# reset's break-rewrites detection will trigger prefetch

"reset's break-rewrites detection" -> "break-rewrites detction in reset"
or something to avoid the "'"; otherwise you'd get

    error: bug in the test script: not 2 or 3 parameters to test-expect-success

You rewrote this line as a part of the last-minute change before you
ran the test for the last time, or something?

Anyway, starting from the "everything clean wrt HEAD" state, we try
to move to the HEAD~1.  We already have both 1..100 and 90..190
versions of bar/baz (and HEAD: and HEAD~1: trees), and we have the
contents for foo in HEAD, which is "xyzz", but we have never seen
HEAD~1:foo so far.  "reset --mixed HEAD~1" requires us to obtain
it...

> +	git -C client reset HEAD~1

... and cause us to run the prefetch to obtain "foo", but it runs
do_diff_cache() and makes it notice bar/baz has changed too much?

Your "do not leave q->queue[] dangling, as other people may still
look at them" fix certainly is a good hygiene, but I have to wonder
why we are doing break detection in this case in the first place.
For the internal "Let's figure out which path have changed, so that
we re-read only those changed paths" invocation of diff machinery,
we should not be doing so.  A break detection is to see if the
change in the contents of a single path is a total rewrite, and
regardless of the answer, the fact that the path was modified does
not change, update_index_from_diff() would work on the path anyway.
I also suspect that, if we are doing rename detection in this call
to do_diff_cache(), it is a totally wasted effort.  We may want to
take a deeper look at it, possibly outside the theme of this more
focused fix.

> +'
> +
>  test_expect_success 'diff succeeds even if entries are removed from queue' '
>  	test_when_finished "rm -rf server client trace" &&

By the way, I find it highly curious that with the following patch
to revert the fix with a bit of extra output sprinkled to your
tests, the problem does not reproduce reliably, which may indicate
that your test may be flaky (i.e., timing dependent).  Am I doing
something bogus in the patch?

 * revert of the fix in diffcore-break.c is a deliberate attempt to
   reproduce the problem, and "echo/cd client/ls" are for inspection
   so you should not take it in your final version.

 * removal of double blank lines is a style fix I'd want you to take.

 * "reset's" fix is a shell script syntax fix I'd want you to take.

Thanks.


diff --git c/diffcore-break.c w/diffcore-break.c
index 9b11fe2fa0..c4c2173f30 100644
--- c/diffcore-break.c
+++ w/diffcore-break.c
@@ -222,7 +222,6 @@ void diffcore_break(struct repository *r, int break_score)
 				free(p); /* not diff_free_filepair(), we are
 					  * reusing one and two here.
 					  */
-				q->queue[i] = NULL;
 				continue;
 			}
 		}
diff --git c/t/t4067-diff-partial-clone.sh w/t/t4067-diff-partial-clone.sh
index a980cd30a0..b5a35ded99 100755
--- c/t/t4067-diff-partial-clone.sh
+++ w/t/t4067-diff-partial-clone.sh
@@ -142,7 +142,6 @@ test_expect_success 'diff succeeds even if prefetch triggered by break-rewrites'
 	git -C server add -A &&
 	git -C server commit -m x &&
 
-
 	echo xyzz >server/foo &&
 	rm server/bar/baz &&
 	test_seq -f "line %d" 90 190 >server/bar/baz &&
@@ -153,12 +152,22 @@ test_expect_success 'diff succeeds even if prefetch triggered by break-rewrites'
 	test_config -C server uploadpack.allowanysha1inwant 1 &&
 	git clone --filter=blob:limit=0 "file://$(pwd)/server" client &&
 
+echo after clone &&
+(cd client && ls -l foo bar/baz) &&
+
 	# Fetch bar/baz without fetching foo.
 	git -C client checkout HEAD~1 bar &&
+
+echo after checkout HEAD~1 bar &&
+(cd client && ls -l foo bar/baz) &&
+
 	# Ensure baz has diff
 	git -C client reset --hard HEAD &&
 
-	# reset's break-rewrites detection will trigger prefetch
+echo after reset --hard HEAD &&
+(cd client && ls -l foo bar/baz) &&
+
+	# break-rewrites detection in reset will trigger prefetch
 	git -C client reset HEAD~1
 '
 

  reply	other threads:[~2026-02-12 18:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-11  4:11 [PATCH 0/1] diffcore-break: prevent dangling pointer Han Young
2026-02-11  4:11 ` [PATCH 1/1] " Han Young
2026-02-11 17:54   ` Junio C Hamano
2026-02-12  7:20 ` [PATCH v2 0/1] " Han Young
2026-02-12  7:20   ` [PATCH v2 1/1] " Han Young
2026-02-12 18:58     ` Junio C Hamano [this message]
2026-02-13  7:14       ` [External] " Han Young
2026-02-13 17:16         ` Junio C Hamano
2026-02-24  6:13   ` [PATCH v3 0/1] diffcore-break: avoid segfault with freed entries Han Young
2026-02-24  6:13     ` [PATCH v3 1/1] " Han Young
2026-02-24 15:22     ` [PATCH v3 0/1] " Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqseb5okl5.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=hanyang.tony@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox