From: "Paul Tarjan via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Christian Couder <christian.couder@gmail.com>,
Han Xin <hanxin.hx@bytedance.com>,
Paul Tarjan <github@paulisageek.com>,
Paul Tarjan <github@paulisageek.com>
Subject: [PATCH v2] promisor-remote: prevent lazy-fetch recursion in child fetch
Date: Wed, 04 Mar 2026 18:27:25 +0000 [thread overview]
Message-ID: <pull.2224.v2.git.git.1772648846009.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.2224.git.git.1772643468305.gitgitgadget@gmail.com>
From: Paul Tarjan <github@paulisageek.com>
fetch_objects() spawns a child `git fetch` to lazily fill in missing
objects. That child's index-pack, when it receives a thin pack
containing a REF_DELTA against a still-missing base, explicitly
calls promisor_remote_get_direct() — which is fetch_objects() again.
If the base is truly unavailable (e.g. because many refs in the
local store point at objects that have been garbage-collected on the
server), each recursive lazy-fetch can trigger another, leading to
unbounded recursion with runaway disk and process consumption.
The GIT_NO_LAZY_FETCH guard (introduced by e6d5479e7a (git: add
--no-lazy-fetch option, 2021-08-31)) already exists at the top of
fetch_objects(); the missing piece is propagating it into the child
fetch's environment. Add that propagation so the child's
index-pack, if it encounters a REF_DELTA against a missing base,
hits the guard and fails fast instead of recursing.
Depth-1 lazy fetch (the whole point of fetch_objects()) is
unaffected: only the child and its descendants see the variable.
With negotiationAlgorithm=noop the client advertises no "have"
lines, so a well-behaved server sends requested objects
un-deltified or deltified only against objects in the same pack;
the child's index-pack should never need a depth-2 fetch. If it
does, the server response was broken or the local store is already
corrupt, and further fetching would not help.
This is the same bug shape that 3a1ea94a49 (commit-graph.c: no lazy
fetch in lookup_commit_in_graph(), 2022-07-01) addressed at a
different entry point.
Add a test that verifies the child fetch environment contains
GIT_NO_LAZY_FETCH=1 via a reference-transaction hook.
Signed-off-by: Paul Tarjan <github@paulisageek.com>
---
promisor-remote: prevent recursive lazy-fetch during index-pack
Propagate GIT_NO_LAZY_FETCH=1 into the child fetch spawned by
fetch_objects() so that index-pack cannot recurse back into lazy-fetch
when resolving REF_DELTA bases.
We hit this in production: 276 GB of promisor packs written in 90
minutes against a 100 GB monorepo with ~61K stale prefetch refs pointing
at GC'd commits.
Changes since v1:
* Dropped CC: trailers from commit message (moved here for
GitGitGadget)
* Moved test into t0411-clone-from-partial.sh instead of a new file
* Removed duplicate commit-message summary from PR description
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2224%2Fptarjan%2Fclaude%2Ffix-lazy-fetch-recursion-KP9Hl-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2224/ptarjan/claude/fix-lazy-fetch-recursion-KP9Hl-v2
Pull-Request: https://github.com/git/git/pull/2224
Range-diff vs v1:
1: 7b723441f4 ! 1: 907ca7a0ac promisor-remote: prevent lazy-fetch recursion in child fetch
@@ Commit message
different entry point.
Add a test that verifies the child fetch environment contains
- GIT_NO_LAZY_FETCH=1 via a reference-transaction hook, and that
- only one fetch subprocess is spawned.
+ GIT_NO_LAZY_FETCH=1 via a reference-transaction hook.
- Cc: Jonathan Tan <jonathantanmy@google.com>
- Cc: Han Xin <hanxin.hx@bytedance.com>
- Cc: Jeff Hostetler <jeffhostetler@github.com>
- Cc: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Paul Tarjan <github@paulisageek.com>
## promisor-remote.c ##
@@ promisor-remote.c: static int fetch_objects(struct repository *repo,
"fetch", remote_name, "--no-tags",
"--no-write-fetch-head", "--recurse-submodules=no",
- ## t/meson.build ##
-@@ t/meson.build: integration_tests = [
- 't0303-credential-external.sh',
- 't0410-partial-clone.sh',
- 't0411-clone-from-partial.sh',
-+ 't0412-promisor-no-lazy-fetch-recursion.sh',
- 't0450-txt-doc-vs-help.sh',
- 't0500-progress-display.sh',
- 't0600-reffiles-backend.sh',
-
- ## t/t0412-promisor-no-lazy-fetch-recursion.sh (new) ##
-@@
-+#!/bin/sh
-+
-+test_description='promisor-remote: no recursive lazy-fetch
-+
-+Verify that fetch_objects() sets GIT_NO_LAZY_FETCH=1 in the child
-+fetch environment, so that index-pack cannot recurse back into
-+fetch_objects() when resolving REF_DELTA bases.
-+'
-+
-+. ./test-lib.sh
-+
-+test_expect_success 'setup' '
-+ test_create_repo server &&
-+ test_commit -C server foo &&
-+ git -C server repack -a -d --write-bitmap-index &&
+ ## t/t0411-clone-from-partial.sh ##
+@@ t/t0411-clone-from-partial.sh: test_expect_success 'promisor lazy-fetching can be re-enabled' '
+ test_path_is_file script-executed
+ '
+
++test_expect_success 'lazy-fetch child has GIT_NO_LAZY_FETCH=1' '
++ test_create_repo nolazy-server &&
++ test_commit -C nolazy-server foo &&
++ git -C nolazy-server repack -a -d --write-bitmap-index &&
+
-+ git clone "file://$(pwd)/server" client &&
-+ HASH=$(git -C client rev-parse foo) &&
-+ rm -rf client/.git/objects/* &&
-+
-+ git -C client config core.repositoryformatversion 1 &&
-+ git -C client config extensions.partialclone "origin"
-+'
-+
-+test_expect_success 'lazy-fetch spawns only one fetch subprocess' '
-+ GIT_TRACE="$(pwd)/trace" git -C client cat-file -p "$HASH" &&
-+
-+ grep "git fetch" trace >fetches &&
-+ test_line_count = 1 fetches
-+'
++ git clone "file://$(pwd)/nolazy-server" nolazy-client &&
++ HASH=$(git -C nolazy-client rev-parse foo) &&
++ rm -rf nolazy-client/.git/objects/* &&
+
-+test_expect_success 'child of lazy-fetch has GIT_NO_LAZY_FETCH=1' '
-+ rm -rf client/.git/objects/* &&
++ git -C nolazy-client config core.repositoryformatversion 1 &&
++ git -C nolazy-client config extensions.partialclone "origin" &&
+
+ # Install a reference-transaction hook to record the env var
+ # as seen by processes inside the child fetch.
-+ test_hook -C client reference-transaction <<-\EOF &&
++ test_hook -C nolazy-client reference-transaction <<-\EOF &&
+ echo "$GIT_NO_LAZY_FETCH" >>../env-in-child
+ EOF
+
+ rm -f env-in-child &&
-+ git -C client cat-file -p "$HASH" &&
++ git -C nolazy-client cat-file -p "$HASH" &&
+
+ # The hook runs inside the child fetch, which should have
+ # GIT_NO_LAZY_FETCH=1 in its environment.
+ grep "^1$" env-in-child
+'
+
-+test_done
+ test_done
promisor-remote.c | 7 +++++++
t/t0411-clone-from-partial.sh | 26 ++++++++++++++++++++++++++
2 files changed, 33 insertions(+)
diff --git a/promisor-remote.c b/promisor-remote.c
index 96fa215b06..35c7aab93d 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -42,6 +42,13 @@ static int fetch_objects(struct repository *repo,
child.in = -1;
if (repo != the_repository)
prepare_other_repo_env(&child.env, repo->gitdir);
+ /*
+ * Prevent the child's index-pack from recursing back into
+ * fetch_objects() when resolving REF_DELTA bases it does not
+ * have. With noop negotiation the server should never need
+ * to send such deltas, so a depth-2 fetch would not help.
+ */
+ strvec_pushf(&child.env, "%s=1", NO_LAZY_FETCH_ENVIRONMENT);
strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
"fetch", remote_name, "--no-tags",
"--no-write-fetch-head", "--recurse-submodules=no",
diff --git a/t/t0411-clone-from-partial.sh b/t/t0411-clone-from-partial.sh
index 9e6bca5625..10a829fb80 100755
--- a/t/t0411-clone-from-partial.sh
+++ b/t/t0411-clone-from-partial.sh
@@ -78,4 +78,30 @@ test_expect_success 'promisor lazy-fetching can be re-enabled' '
test_path_is_file script-executed
'
+test_expect_success 'lazy-fetch child has GIT_NO_LAZY_FETCH=1' '
+ test_create_repo nolazy-server &&
+ test_commit -C nolazy-server foo &&
+ git -C nolazy-server repack -a -d --write-bitmap-index &&
+
+ git clone "file://$(pwd)/nolazy-server" nolazy-client &&
+ HASH=$(git -C nolazy-client rev-parse foo) &&
+ rm -rf nolazy-client/.git/objects/* &&
+
+ git -C nolazy-client config core.repositoryformatversion 1 &&
+ git -C nolazy-client config extensions.partialclone "origin" &&
+
+ # Install a reference-transaction hook to record the env var
+ # as seen by processes inside the child fetch.
+ test_hook -C nolazy-client reference-transaction <<-\EOF &&
+ echo "$GIT_NO_LAZY_FETCH" >>../env-in-child
+ EOF
+
+ rm -f env-in-child &&
+ git -C nolazy-client cat-file -p "$HASH" &&
+
+ # The hook runs inside the child fetch, which should have
+ # GIT_NO_LAZY_FETCH=1 in its environment.
+ grep "^1$" env-in-child
+'
+
test_done
base-commit: 7b2bccb0d58d4f24705bf985de1f4612e4cf06e5
--
gitgitgadget
next prev parent reply other threads:[~2026-03-04 18:27 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 16:57 [PATCH] promisor-remote: prevent lazy-fetch recursion in child fetch Paul Tarjan via GitGitGadget
2026-03-04 17:41 ` Junio C Hamano
2026-03-04 18:20 ` Paul Tarjan
2026-03-04 18:27 ` Paul Tarjan via GitGitGadget [this message]
2026-03-11 10:52 ` [PATCH v2] " Patrick Steinhardt
2026-03-11 14:18 ` Paul Tarjan
2026-03-12 7:27 ` Patrick Steinhardt
2026-03-13 1:43 ` Jeff King
2026-03-13 12:43 ` [PATCH v3] " Paul Tarjan
2026-03-13 12:43 ` Paul Tarjan
2026-03-11 14:19 ` Paul Tarjan via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=pull.2224.v2.git.git.1772648846009.gitgitgadget@gmail.com \
--to=gitgitgadget@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=github@paulisageek.com \
--cc=hanxin.hx@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox