public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] promisor-remote: prevent lazy-fetch recursion in child fetch
@ 2026-03-04 16:57 Paul Tarjan via GitGitGadget
  2026-03-04 17:41 ` Junio C Hamano
  2026-03-04 18:27 ` [PATCH v2] " Paul Tarjan via GitGitGadget
  0 siblings, 2 replies; 11+ messages in thread
From: Paul Tarjan via GitGitGadget @ 2026-03-04 16:57 UTC (permalink / raw)
  To: git; +Cc: Paul Tarjan, Paul Tarjan

From: Paul Tarjan <github@paulisageek.com>

fetch_objects() spawns a child `git fetch` to lazily fill in missing
objects. That child's index-pack, when it receives a thin pack
containing a REF_DELTA against a still-missing base, explicitly
calls promisor_remote_get_direct() — which is fetch_objects() again.
If the base is truly unavailable (e.g. because many refs in the
local store point at objects that have been garbage-collected on the
server), each recursive lazy-fetch can trigger another, leading to
unbounded recursion with runaway disk and process consumption.

The GIT_NO_LAZY_FETCH guard (introduced by e6d5479e7a (git: add
--no-lazy-fetch option, 2021-08-31)) already exists at the top of
fetch_objects(); the missing piece is propagating it into the child
fetch's environment. Add that propagation so the child's
index-pack, if it encounters a REF_DELTA against a missing base,
hits the guard and fails fast instead of recursing.

Depth-1 lazy fetch (the whole point of fetch_objects()) is
unaffected: only the child and its descendants see the variable.
With negotiationAlgorithm=noop the client advertises no "have"
lines, so a well-behaved server sends requested objects
un-deltified or deltified only against objects in the same pack;
the child's index-pack should never need a depth-2 fetch. If it
does, the server response was broken or the local store is already
corrupt, and further fetching would not help.

This is the same bug shape that 3a1ea94a49 (commit-graph.c: no lazy
fetch in lookup_commit_in_graph(), 2022-07-01) addressed at a
different entry point.

Add a test that verifies the child fetch environment contains
GIT_NO_LAZY_FETCH=1 via a reference-transaction hook, and that
only one fetch subprocess is spawned.

Cc: Jonathan Tan <jonathantanmy@google.com>
Cc: Han Xin <hanxin.hx@bytedance.com>
Cc: Jeff Hostetler <jeffhostetler@github.com>
Cc: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Paul Tarjan <github@paulisageek.com>
---
    promisor-remote: prevent recursive lazy-fetch during index-pack
    
    fetch_objects() in promisor-remote.c spawns a child git fetch to lazily
    fill missing objects. That child's index-pack --fix-thin, when it hits a
    REF_DELTA against a still-missing base, calls
    promisor_remote_get_direct() — which is fetch_objects() again. Unbounded
    recursion.
    
    We hit this in production: 276 GB of promisor packs written in 90
    minutes against a 100 GB monorepo with ~61K stale prefetch refs pointing
    at GC'd commits. Every thin pack picked a bad delta base, and the
    recursion fanned out until the mount filled.
    
    The fix is one line: propagate GIT_NO_LAZY_FETCH=1 into the child
    fetch's environment. The guard already exists at the top of
    fetch_objects() (added by e6d5479e7a, 2021); nothing was setting it in
    the child. This is the same bug shape that Han Xin's 3a1ea94a49 (2022)
    closed at lookup_commit_in_graph().
    
    Depth-1 lazy fetch (the whole point of fetch_objects()) is unaffected —
    only the child and its descendants see the variable. With
    negotiationAlgorithm=noop the client advertises no "have" lines, so a
    well-behaved server sends objects un-deltified or deltified only against
    objects in the same pack. A depth-2 fetch would not help; if the server
    sends broken thin packs, recursing just makes it worse.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2224%2Fptarjan%2Fclaude%2Ffix-lazy-fetch-recursion-KP9Hl-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2224/ptarjan/claude/fix-lazy-fetch-recursion-KP9Hl-v1
Pull-Request: https://github.com/git/git/pull/2224

 promisor-remote.c                           |  7 +++
 t/meson.build                               |  1 +
 t/t0412-promisor-no-lazy-fetch-recursion.sh | 49 +++++++++++++++++++++
 3 files changed, 57 insertions(+)
 create mode 100755 t/t0412-promisor-no-lazy-fetch-recursion.sh

diff --git a/promisor-remote.c b/promisor-remote.c
index 96fa215b06..35c7aab93d 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -42,6 +42,13 @@ static int fetch_objects(struct repository *repo,
 	child.in = -1;
 	if (repo != the_repository)
 		prepare_other_repo_env(&child.env, repo->gitdir);
+	/*
+	 * Prevent the child's index-pack from recursing back into
+	 * fetch_objects() when resolving REF_DELTA bases it does not
+	 * have.  With noop negotiation the server should never need
+	 * to send such deltas, so a depth-2 fetch would not help.
+	 */
+	strvec_pushf(&child.env, "%s=1", NO_LAZY_FETCH_ENVIRONMENT);
 	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
 		     "fetch", remote_name, "--no-tags",
 		     "--no-write-fetch-head", "--recurse-submodules=no",
diff --git a/t/meson.build b/t/meson.build
index e5174ee575..0499533dff 100644
--- a/t/meson.build
+++ b/t/meson.build
@@ -141,6 +141,7 @@ integration_tests = [
   't0303-credential-external.sh',
   't0410-partial-clone.sh',
   't0411-clone-from-partial.sh',
+  't0412-promisor-no-lazy-fetch-recursion.sh',
   't0450-txt-doc-vs-help.sh',
   't0500-progress-display.sh',
   't0600-reffiles-backend.sh',
diff --git a/t/t0412-promisor-no-lazy-fetch-recursion.sh b/t/t0412-promisor-no-lazy-fetch-recursion.sh
new file mode 100755
index 0000000000..ec203543d4
--- /dev/null
+++ b/t/t0412-promisor-no-lazy-fetch-recursion.sh
@@ -0,0 +1,49 @@
+#!/bin/sh
+
+test_description='promisor-remote: no recursive lazy-fetch
+
+Verify that fetch_objects() sets GIT_NO_LAZY_FETCH=1 in the child
+fetch environment, so that index-pack cannot recurse back into
+fetch_objects() when resolving REF_DELTA bases.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+	test_create_repo server &&
+	test_commit -C server foo &&
+	git -C server repack -a -d --write-bitmap-index &&
+
+	git clone "file://$(pwd)/server" client &&
+	HASH=$(git -C client rev-parse foo) &&
+	rm -rf client/.git/objects/* &&
+
+	git -C client config core.repositoryformatversion 1 &&
+	git -C client config extensions.partialclone "origin"
+'
+
+test_expect_success 'lazy-fetch spawns only one fetch subprocess' '
+	GIT_TRACE="$(pwd)/trace" git -C client cat-file -p "$HASH" &&
+
+	grep "git fetch" trace >fetches &&
+	test_line_count = 1 fetches
+'
+
+test_expect_success 'child of lazy-fetch has GIT_NO_LAZY_FETCH=1' '
+	rm -rf client/.git/objects/* &&
+
+	# Install a reference-transaction hook to record the env var
+	# as seen by processes inside the child fetch.
+	test_hook -C client reference-transaction <<-\EOF &&
+	echo "$GIT_NO_LAZY_FETCH" >>../env-in-child
+	EOF
+
+	rm -f env-in-child &&
+	git -C client cat-file -p "$HASH" &&
+
+	# The hook runs inside the child fetch, which should have
+	# GIT_NO_LAZY_FETCH=1 in its environment.
+	grep "^1$" env-in-child
+'
+
+test_done

base-commit: 7b2bccb0d58d4f24705bf985de1f4612e4cf06e5
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-03-13 12:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04 16:57 [PATCH] promisor-remote: prevent lazy-fetch recursion in child fetch Paul Tarjan via GitGitGadget
2026-03-04 17:41 ` Junio C Hamano
2026-03-04 18:20   ` Paul Tarjan
2026-03-04 18:27 ` [PATCH v2] " Paul Tarjan via GitGitGadget
2026-03-11 10:52   ` Patrick Steinhardt
2026-03-11 14:18     ` Paul Tarjan
2026-03-12  7:27       ` Patrick Steinhardt
2026-03-13  1:43         ` Jeff King
2026-03-13 12:43           ` [PATCH v3] " Paul Tarjan
2026-03-13 12:43         ` Paul Tarjan
2026-03-11 14:19   ` Paul Tarjan via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox