git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] diff: avoid segfault with freed entries
@ 2025-12-29 21:44 Derrick Stolee via GitGitGadget
  2025-12-30  4:38 ` Junio C Hamano
  2025-12-30 16:11 ` Kristoffer Haugsbakk
  0 siblings, 2 replies; 3+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2025-12-29 21:44 UTC (permalink / raw)
  To: git; +Cc: gitster, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <stolee@gmail.com>

When computing a diff in a partial clone, there is a chance that we
could trigger a prefetch of missing objects at the same time as we are
freeing entries from the global diff queue. This is difficult to
reproduce, as we need to have some objects be freed from the queue
before triggering the prefetch of missing objects. There is a new test
in t4067 that does trigger the segmentation fault that results in this
case.

The fix is to set the queue pointer to NULL after it is freed, and then
to be careful about NULL values in the prefetch.

The more elaborate explanation is that within diffcore_std(), we may
skip the initial prefetch due to the output format (--name-only in the
test) and go straight to diffcore_skip_stat_unmatch(). In that method,
the index entries that have been invalidated by path changes show up as
entries but may be deleted because they are not actually content diffs
and only newer timestamps than expected. As those entries are deleted,
later entries are checked with diff_filespec_check_stat_unmatch(), which
uses diff_queued_diff_prefetch() as the missing_object_cb in its diff
options. That can trigger downloading missing objects if the appropriate
scenario occurs to trigger a call to diff_popoulate_filespec(). It's
finally within that callback to diff_queued_diff_prefetch() that the
segfault occurs.

The test was hard to find because it required some real differences,
some not-different files that had a newer modified time, and the order
of those files alphabetically was important to trigger the deletion
before the prefetch was triggered.

I briefly considered a "lock" member for the diff queue, but it was a
much larger diff and introduced many more possible error scenarios.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
---
    diff: avoid segfault with freed entries
    
    I found this segfault in the wild in a pipeline that was using the
    microsoft/git fork, but the error can be reproduced without that fork.
    It requires partial clone, though.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2027%2Fderrickstolee%2Fdiff-segfault-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2027/derrickstolee/diff-segfault-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/2027

 diff.c                        |  5 +++++
 t/t4067-diff-partial-clone.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+)

diff --git a/diff.c b/diff.c
index a1961526c0..72661d635c 100644
--- a/diff.c
+++ b/diff.c
@@ -7046,6 +7046,7 @@ static void diffcore_skip_stat_unmatch(struct diff_options *diffopt)
 			if (!diffopt->flags.no_index)
 				diffopt->skip_stat_unmatch++;
 			diff_free_filepair(p);
+			q->queue[i] = NULL;
 		}
 	}
 	free(q->queue);
@@ -7089,6 +7090,10 @@ void diff_queued_diff_prefetch(void *repository)
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
+
+		if (!p)
+			continue;
+
 		diff_add_if_missing(repo, &to_fetch, p->one);
 		diff_add_if_missing(repo, &to_fetch, p->two);
 	}
diff --git a/t/t4067-diff-partial-clone.sh b/t/t4067-diff-partial-clone.sh
index 581250dd2d..72f25de449 100755
--- a/t/t4067-diff-partial-clone.sh
+++ b/t/t4067-diff-partial-clone.sh
@@ -132,6 +132,41 @@ test_expect_success 'diff with rename detection batches blobs' '
 	test_line_count = 1 done_lines
 '
 
+test_expect_success 'diff succeeds even if entries are removed from queue' '
+	test_when_finished "rm -rf server client trace" &&
+
+	test_create_repo server &&
+	for l in a c e g i p
+	do
+		echo $l >server/$l &&
+		git -C server add $l || return 1
+	done &&
+	git -C server commit -m x &&
+
+	for l in a e i
+	do
+		git -C server rm $l || return 1
+	done &&
+
+	for l in b d f i
+		do
+		echo $l$l >server/$l &&
+		git -C server add $l || return 1
+	done &&
+	git -C server commit -a -m x &&
+
+	test_config -C server uploadpack.allowfilter 1 &&
+	test_config -C server uploadpack.allowanysha1inwant 1 &&
+	git clone --filter=blob:limit=0 "file://$(pwd)/server" client &&
+
+	for file in $(ls client)
+	do
+		cat client/$file >$file &&
+		mv $file client/$file || return 1
+	done &&
+	git -C client diff --name-only --relative HEAD^
+'
+
 test_expect_success 'diff does not fetch anything if inexact rename detection is not needed' '
 	test_when_finished "rm -rf server client trace" &&
 

base-commit: 9a2fb147f2c61d0cab52c883e7e26f5b7948e3ed
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] diff: avoid segfault with freed entries
  2025-12-29 21:44 [PATCH] diff: avoid segfault with freed entries Derrick Stolee via GitGitGadget
@ 2025-12-30  4:38 ` Junio C Hamano
  2025-12-30 16:11 ` Kristoffer Haugsbakk
  1 sibling, 0 replies; 3+ messages in thread
From: Junio C Hamano @ 2025-12-30  4:38 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> The more elaborate explanation is that within diffcore_std(), we may
> skip the initial prefetch due to the output format (--name-only in the
> test) and go straight to diffcore_skip_stat_unmatch().

That's very interesting.  We have code to fetch on-demand when it
turns out that the initial prefetch shouldn't have been skipped and
we need contents, so in that sense, the condition to skip the
initial prefetch does not have to be precise, but we may want to see
if we can have a single helper function that exactly tells us if we
need to look at the contents or not.  I think we had a few changes
that made the definition of "diff status is based on contents" in
the past few releases, not for the purpose of this prefetch skipping
but to set the exit status.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] diff: avoid segfault with freed entries
  2025-12-29 21:44 [PATCH] diff: avoid segfault with freed entries Derrick Stolee via GitGitGadget
  2025-12-30  4:38 ` Junio C Hamano
@ 2025-12-30 16:11 ` Kristoffer Haugsbakk
  1 sibling, 0 replies; 3+ messages in thread
From: Kristoffer Haugsbakk @ 2025-12-30 16:11 UTC (permalink / raw)
  To: Josh Soref, git; +Cc: Junio C Hamano, Derrick Stolee

On Mon, Dec 29, 2025, at 22:44, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <stolee@gmail.com>
>
>[snip]
> The more elaborate explanation is that within diffcore_std(), we may
> skip the initial prefetch due to the output format (--name-only in the
> test) and go straight to diffcore_skip_stat_unmatch(). In that method,
> the index entries that have been invalidated by path changes show up as
> entries but may be deleted because they are not actually content diffs
> and only newer timestamps than expected. As those entries are deleted,
> later entries are checked with diff_filespec_check_stat_unmatch(), which
> uses diff_queued_diff_prefetch() as the missing_object_cb in its diff
> options. That can trigger downloading missing objects if the appropriate
> scenario occurs to trigger a call to diff_popoulate_filespec(). It's

s/diff_popoulate_filespec/diff_populate_filespec/

> finally within that callback to diff_queued_diff_prefetch() that the
> segfault occurs.
>
>[snip]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-12-30 16:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-29 21:44 [PATCH] diff: avoid segfault with freed entries Derrick Stolee via GitGitGadget
2025-12-30  4:38 ` Junio C Hamano
2025-12-30 16:11 ` Kristoffer Haugsbakk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).