public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul Tarjan via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Christian Couder <christian.couder@gmail.com>,
	Han Xin <hanxin.hx@bytedance.com>,
	Paul Tarjan <paul@paultarjan.com>, Patrick Steinhardt <ps@pks.im>,
	Paul Tarjan <github@paulisageek.com>,
	Paul Tarjan <github@paulisageek.com>
Subject: [PATCH v3] promisor-remote: prevent lazy-fetch recursion in child fetch
Date: Wed, 11 Mar 2026 14:19:38 +0000	[thread overview]
Message-ID: <pull.2224.v3.git.git.1773238778894.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2224.v2.git.git.1772648846009.gitgitgadget@gmail.com>

From: Paul Tarjan <github@paulisageek.com>

fetch_objects() spawns a child `git fetch` to lazily fill in missing
objects. That child's index-pack, when it receives a thin pack
containing a REF_DELTA against a still-missing base, calls
promisor_remote_get_direct() -- which is fetch_objects() again.

With negotiationAlgorithm=noop the client advertises no "have"
lines, so a well-behaved server sends requested objects
un-deltified or deltified only against objects in the same pack.
A server that nevertheless sends REF_DELTA against a base the
client does not have is misbehaving; however the client should
not recurse unboundedly in response.

Propagate GIT_NO_LAZY_FETCH=1 into the child fetch's environment
so that if the child's index-pack encounters such a REF_DELTA, it
hits the existing guard at the top of fetch_objects() and fails
fast instead of recursing.  Depth-1 lazy fetch (the whole point
of fetch_objects()) is unaffected: only the child and its
descendants see the variable.

Add a test that injects a thin pack containing a REF_DELTA against
a missing base via HTTP, triggering the recursion path through
index-pack's promisor_remote_get_direct() call.  With the fix, the
child's fetch_objects() sees GIT_NO_LAZY_FETCH=1 and blocks the
depth-2 fetch with a "lazy fetching disabled" warning.

Signed-off-by: Paul Tarjan <github@paulisageek.com>
---
    promisor-remote: prevent recursive lazy-fetch during index-pack
    
    Propagate GIT_NO_LAZY_FETCH=1 into the child fetch spawned by
    fetch_objects() so that index-pack cannot recurse back into lazy-fetch
    when resolving REF_DELTA bases.
    
    We hit this in production: 276 GB of promisor packs written in 90
    minutes against a ~10 GB monorepo with ~61K stale prefetch refs pointing
    at GC'd commits.
    
    Changes since v2:
    
     * Replaced env-var-proxy test with behavioral test that injects a thin
       pack containing a REF_DELTA against a missing base via HTTP,
       triggering the actual recursion path through index-pack's
       promisor_remote_get_direct() call
     * Moved test from t0411-clone-from-partial.sh to t5616-partial-clone.sh
       (requires HTTP infrastructure)
     * Dropped commit-graph comparison from commit message per review
       feedback
    
    Changes since v1:
    
     * Dropped CC: trailers from commit message (moved here for
       GitGitGadget)
     * Moved test into t0411-clone-from-partial.sh instead of a new file
     * Removed duplicate commit-message summary from PR description

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2224%2Fptarjan%2Fclaude%2Ffix-lazy-fetch-recursion-KP9Hl-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2224/ptarjan/claude/fix-lazy-fetch-recursion-KP9Hl-v3
Pull-Request: https://github.com/git/git/pull/2224

Range-diff vs v2:

 1:  907ca7a0ac ! 1:  d58ccb858f promisor-remote: prevent lazy-fetch recursion in child fetch
     @@ Commit message
      
          fetch_objects() spawns a child `git fetch` to lazily fill in missing
          objects. That child's index-pack, when it receives a thin pack
     -    containing a REF_DELTA against a still-missing base, explicitly
     -    calls promisor_remote_get_direct() — which is fetch_objects() again.
     -    If the base is truly unavailable (e.g. because many refs in the
     -    local store point at objects that have been garbage-collected on the
     -    server), each recursive lazy-fetch can trigger another, leading to
     -    unbounded recursion with runaway disk and process consumption.
     +    containing a REF_DELTA against a still-missing base, calls
     +    promisor_remote_get_direct() -- which is fetch_objects() again.
      
     -    The GIT_NO_LAZY_FETCH guard (introduced by e6d5479e7a (git: add
     -    --no-lazy-fetch option, 2021-08-31)) already exists at the top of
     -    fetch_objects(); the missing piece is propagating it into the child
     -    fetch's environment. Add that propagation so the child's
     -    index-pack, if it encounters a REF_DELTA against a missing base,
     -    hits the guard and fails fast instead of recursing.
     -
     -    Depth-1 lazy fetch (the whole point of fetch_objects()) is
     -    unaffected: only the child and its descendants see the variable.
          With negotiationAlgorithm=noop the client advertises no "have"
          lines, so a well-behaved server sends requested objects
     -    un-deltified or deltified only against objects in the same pack;
     -    the child's index-pack should never need a depth-2 fetch. If it
     -    does, the server response was broken or the local store is already
     -    corrupt, and further fetching would not help.
     +    un-deltified or deltified only against objects in the same pack.
     +    A server that nevertheless sends REF_DELTA against a base the
     +    client does not have is misbehaving; however the client should
     +    not recurse unboundedly in response.
      
     -    This is the same bug shape that 3a1ea94a49 (commit-graph.c: no lazy
     -    fetch in lookup_commit_in_graph(), 2022-07-01) addressed at a
     -    different entry point.
     +    Propagate GIT_NO_LAZY_FETCH=1 into the child fetch's environment
     +    so that if the child's index-pack encounters such a REF_DELTA, it
     +    hits the existing guard at the top of fetch_objects() and fails
     +    fast instead of recursing.  Depth-1 lazy fetch (the whole point
     +    of fetch_objects()) is unaffected: only the child and its
     +    descendants see the variable.
      
     -    Add a test that verifies the child fetch environment contains
     -    GIT_NO_LAZY_FETCH=1 via a reference-transaction hook.
     +    Add a test that injects a thin pack containing a REF_DELTA against
     +    a missing base via HTTP, triggering the recursion path through
     +    index-pack's promisor_remote_get_direct() call.  With the fix, the
     +    child's fetch_objects() sees GIT_NO_LAZY_FETCH=1 and blocks the
     +    depth-2 fetch with a "lazy fetching disabled" warning.
      
          Signed-off-by: Paul Tarjan <github@paulisageek.com>
      
     @@ promisor-remote.c: static int fetch_objects(struct repository *repo,
       		     "fetch", remote_name, "--no-tags",
       		     "--no-write-fetch-head", "--recurse-submodules=no",
      
     - ## t/t0411-clone-from-partial.sh ##
     -@@ t/t0411-clone-from-partial.sh: test_expect_success 'promisor lazy-fetching can be re-enabled' '
     - 	test_path_is_file script-executed
     + ## t/t5616-partial-clone.sh ##
     +@@ t/t5616-partial-clone.sh: test_expect_success PERL_TEST_HELPERS 'tolerate server sending REF_DELTA against
     + 	! test -e "$HTTPD_ROOT_PATH/one-time-script"
       '
       
     -+test_expect_success 'lazy-fetch child has GIT_NO_LAZY_FETCH=1' '
     -+	test_create_repo nolazy-server &&
     -+	test_commit -C nolazy-server foo &&
     -+	git -C nolazy-server repack -a -d --write-bitmap-index &&
     ++test_expect_success PERL_TEST_HELPERS 'lazy-fetch of REF_DELTA with missing base does not recurse' '
     ++	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
     ++	rm -rf "$SERVER" repo &&
     ++	test_create_repo "$SERVER" &&
     ++	test_config -C "$SERVER" uploadpack.allowfilter 1 &&
     ++	test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
     ++
     ++	# Create a commit with 2 blobs to be used as delta base and content.
     ++	for i in $(test_seq 10)
     ++	do
     ++		echo "this is a line" >>"$SERVER/foo.txt" &&
     ++		echo "this is another line" >>"$SERVER/bar.txt" || return 1
     ++	done &&
     ++	git -C "$SERVER" add foo.txt bar.txt &&
     ++	git -C "$SERVER" commit -m initial &&
     ++	BLOB_FOO=$(git -C "$SERVER" rev-parse HEAD:foo.txt) &&
     ++	BLOB_BAR=$(git -C "$SERVER" rev-parse HEAD:bar.txt) &&
      +
     -+	git clone "file://$(pwd)/nolazy-server" nolazy-client &&
     -+	HASH=$(git -C nolazy-client rev-parse foo) &&
     -+	rm -rf nolazy-client/.git/objects/* &&
     ++	# Partial clone with blob:none. The client has commits and
     ++	# trees but no blobs.
     ++	test_config -C "$SERVER" protocol.version 2 &&
     ++	git -c protocol.version=2 clone --no-checkout \
     ++		--filter=blob:none $HTTPD_URL/one_time_script/server repo &&
      +
     -+	git -C nolazy-client config core.repositoryformatversion 1 &&
     -+	git -C nolazy-client config extensions.partialclone "origin" &&
     ++	# Sanity check: client does not have either blob locally.
     ++	git -C repo rev-list --objects --ignore-missing \
     ++		-- $BLOB_FOO >objlist &&
     ++	test_line_count = 0 objlist &&
      +
     -+	# Install a reference-transaction hook to record the env var
     -+	# as seen by processes inside the child fetch.
     -+	test_hook -C nolazy-client reference-transaction <<-\EOF &&
     -+	echo "$GIT_NO_LAZY_FETCH" >>../env-in-child
     ++	# Craft a thin pack where BLOB_FOO is a REF_DELTA against
     ++	# BLOB_BAR. Since the client has neither blob (blob:none
     ++	# filter), the delta base will be missing. This simulates a
     ++	# misbehaving server that sends REF_DELTA against an object
     ++	# the client does not have.
     ++	test-tool -C "$SERVER" pack-deltas --num-objects=1 >thin.pack <<-EOF &&
     ++	REF_DELTA $BLOB_FOO $BLOB_BAR
      +	EOF
      +
     -+	rm -f env-in-child &&
     -+	git -C nolazy-client cat-file -p "$HASH" &&
     ++	replace_packfile thin.pack &&
      +
     -+	# The hook runs inside the child fetch, which should have
     -+	# GIT_NO_LAZY_FETCH=1 in its environment.
     -+	grep "^1$" env-in-child
     ++	# Trigger a lazy fetch for BLOB_FOO. The child fetch spawned
     ++	# by fetch_objects() receives our crafted thin pack. Its
     ++	# index-pack encounters the missing delta base (BLOB_BAR) and
     ++	# tries to lazy-fetch it via promisor_remote_get_direct().
     ++	#
     ++	# With the fix: fetch_objects() propagates GIT_NO_LAZY_FETCH=1
     ++	# to the child, so the depth-2 fetch is blocked and we see the
     ++	# "lazy fetching disabled" warning. The object cannot be
     ++	# resolved, so cat-file fails.
     ++	#
     ++	# Without the fix: the depth-2 fetch would proceed, potentially
     ++	# recursing unboundedly with a persistently misbehaving server.
     ++	test_must_fail git -C repo -c protocol.version=2 \
     ++		cat-file -p $BLOB_FOO 2>err &&
     ++	test_grep "lazy fetching disabled" err &&
     ++
     ++	# Ensure that the one-time-script was used.
     ++	! test -e "$HTTPD_ROOT_PATH/one-time-script"
      +'
      +
     - test_done
     + # DO NOT add non-httpd-specific tests here, because the last part of this
     + # test script is only executed when httpd is available and enabled.
     + 


 promisor-remote.c        |  7 +++++
 t/t5616-partial-clone.sh | 60 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 67 insertions(+)

diff --git a/promisor-remote.c b/promisor-remote.c
index 96fa215b06..35c7aab93d 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -42,6 +42,13 @@ static int fetch_objects(struct repository *repo,
 	child.in = -1;
 	if (repo != the_repository)
 		prepare_other_repo_env(&child.env, repo->gitdir);
+	/*
+	 * Prevent the child's index-pack from recursing back into
+	 * fetch_objects() when resolving REF_DELTA bases it does not
+	 * have.  With noop negotiation the server should never need
+	 * to send such deltas, so a depth-2 fetch would not help.
+	 */
+	strvec_pushf(&child.env, "%s=1", NO_LAZY_FETCH_ENVIRONMENT);
 	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
 		     "fetch", remote_name, "--no-tags",
 		     "--no-write-fetch-head", "--recurse-submodules=no",
diff --git a/t/t5616-partial-clone.sh b/t/t5616-partial-clone.sh
index 1e354e057f..27f131c8d9 100755
--- a/t/t5616-partial-clone.sh
+++ b/t/t5616-partial-clone.sh
@@ -907,6 +907,66 @@ test_expect_success PERL_TEST_HELPERS 'tolerate server sending REF_DELTA against
 	! test -e "$HTTPD_ROOT_PATH/one-time-script"
 '
 
+test_expect_success PERL_TEST_HELPERS 'lazy-fetch of REF_DELTA with missing base does not recurse' '
+	SERVER="$HTTPD_DOCUMENT_ROOT_PATH/server" &&
+	rm -rf "$SERVER" repo &&
+	test_create_repo "$SERVER" &&
+	test_config -C "$SERVER" uploadpack.allowfilter 1 &&
+	test_config -C "$SERVER" uploadpack.allowanysha1inwant 1 &&
+
+	# Create a commit with 2 blobs to be used as delta base and content.
+	for i in $(test_seq 10)
+	do
+		echo "this is a line" >>"$SERVER/foo.txt" &&
+		echo "this is another line" >>"$SERVER/bar.txt" || return 1
+	done &&
+	git -C "$SERVER" add foo.txt bar.txt &&
+	git -C "$SERVER" commit -m initial &&
+	BLOB_FOO=$(git -C "$SERVER" rev-parse HEAD:foo.txt) &&
+	BLOB_BAR=$(git -C "$SERVER" rev-parse HEAD:bar.txt) &&
+
+	# Partial clone with blob:none. The client has commits and
+	# trees but no blobs.
+	test_config -C "$SERVER" protocol.version 2 &&
+	git -c protocol.version=2 clone --no-checkout \
+		--filter=blob:none $HTTPD_URL/one_time_script/server repo &&
+
+	# Sanity check: client does not have either blob locally.
+	git -C repo rev-list --objects --ignore-missing \
+		-- $BLOB_FOO >objlist &&
+	test_line_count = 0 objlist &&
+
+	# Craft a thin pack where BLOB_FOO is a REF_DELTA against
+	# BLOB_BAR. Since the client has neither blob (blob:none
+	# filter), the delta base will be missing. This simulates a
+	# misbehaving server that sends REF_DELTA against an object
+	# the client does not have.
+	test-tool -C "$SERVER" pack-deltas --num-objects=1 >thin.pack <<-EOF &&
+	REF_DELTA $BLOB_FOO $BLOB_BAR
+	EOF
+
+	replace_packfile thin.pack &&
+
+	# Trigger a lazy fetch for BLOB_FOO. The child fetch spawned
+	# by fetch_objects() receives our crafted thin pack. Its
+	# index-pack encounters the missing delta base (BLOB_BAR) and
+	# tries to lazy-fetch it via promisor_remote_get_direct().
+	#
+	# With the fix: fetch_objects() propagates GIT_NO_LAZY_FETCH=1
+	# to the child, so the depth-2 fetch is blocked and we see the
+	# "lazy fetching disabled" warning. The object cannot be
+	# resolved, so cat-file fails.
+	#
+	# Without the fix: the depth-2 fetch would proceed, potentially
+	# recursing unboundedly with a persistently misbehaving server.
+	test_must_fail git -C repo -c protocol.version=2 \
+		cat-file -p $BLOB_FOO 2>err &&
+	test_grep "lazy fetching disabled" err &&
+
+	# Ensure that the one-time-script was used.
+	! test -e "$HTTPD_ROOT_PATH/one-time-script"
+'
+
 # DO NOT add non-httpd-specific tests here, because the last part of this
 # test script is only executed when httpd is available and enabled.
 

base-commit: 7b2bccb0d58d4f24705bf985de1f4612e4cf06e5
-- 
gitgitgadget

      parent reply	other threads:[~2026-03-11 14:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-04 16:57 [PATCH] promisor-remote: prevent lazy-fetch recursion in child fetch Paul Tarjan via GitGitGadget
2026-03-04 17:41 ` Junio C Hamano
2026-03-04 18:20   ` Paul Tarjan
2026-03-04 18:27 ` [PATCH v2] " Paul Tarjan via GitGitGadget
2026-03-11 10:52   ` Patrick Steinhardt
2026-03-11 14:18     ` Paul Tarjan
2026-03-12  7:27       ` Patrick Steinhardt
2026-03-13  1:43         ` Jeff King
2026-03-13 12:43           ` [PATCH v3] " Paul Tarjan
2026-03-13 12:43         ` Paul Tarjan
2026-03-11 14:19   ` Paul Tarjan via GitGitGadget [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.2224.v3.git.git.1773238778894.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=github@paulisageek.com \
    --cc=hanxin.hx@bytedance.com \
    --cc=paul@paultarjan.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox