Git development
 help / color / mirror / Atom feed
* Re: [PATCH v2] promisor-remote: prevent lazy-fetch recursion in child fetch
  2026-03-12  7:27 [PATCH v2] " Patrick Steinhardt
@ 2026-03-13  1:43 Jeff King
  2026-03-13 12:43 ` [PATCH v3] " Paul Tarjan
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff King @ 2026-03-13  1:43 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Paul Tarjan, git, gitgitgadget, christian.couder, hanxin.hx

On Thu, Mar 12, 2026 at 08:27:05AM +0100, Patrick Steinhardt wrote:

> > > Note that I'm not arguing that we shouldn't have protection on the
> > > client, too. But I'd first like to understand whether there is a bug
> > > lurking somewhere that causes us to send invalid packfiles.
> > 
> > Agreed, there may well be a server-side bug here. Regardless, the
> > client should fail fast rather than consume unbounded resources.
> 
> Probably, yes. What I'm trying to figure out is whether there are edge
> cases here where it's _valid_ for the server to send a thin pack with a
> REF_DELTA. Because if so, unconditionally disabling the lazy fetches
> would break such edge cases.
> 
> I don't think there are such cases, but I wouldn't consider myself an
> expert with partial clones.
> 
> Cc'd Peff, as he's implemented a couple fixes in this area a couple
> years ago.

Hmm, I'm not sure I have much wisdom. Here's the most plausible scenario
I could come up with.

A backfill fetch like this is going to have a noop negotiation
algorithm. So the server does not have any idea what the client has, and
therefore shouldn't be sending any thin deltas against it.

But it _can_ send deltas against objects which are part of the backfill
itself. Normally we'd send those as OFS_DELTA, because they're both in
the same output pack. But there are cases where we might not:

  - if the client did not tell us it understands ofs-deltas; this would
    not be true for any version of Git in the last 15+ years, but maybe
    there is a bug in sending or parsing the capability? Or an alternate
    Git implementation on the client side which forgets to send it?

  - the verbatim pack-reuse code will sometimes rewrite ofs-delta into
    ref-delta. I don't remember all of the cases where this might
    happen. Certainly if the client hasn't claimed to support
    ofs-deltas, but I think maybe some other cases? I'd have to dig into
    it.

Now there's a catch: the pack is not really thin, and so index-pack
should not need to do an extra backfill request in order to get the base
object. But depending how index-pack is written, it might try to do so
anyway. If X is a delta against base Y, but Y is itself a delta, we
might not have resolved it yet. And so when we try to resolve X, we
think "aha, let us see if we have Y", and then eagerly attempt a
backfill fetch (probably triggered from odb_has_object() or similar).
When in fact the right thing to do is to queue X, resolve everything we
can, and see if we ended up with Y (actually index-pack works from the
bases up in the final resolution phase, but the effect is the same).

If that's what is happening, then I _think_ Paul's patch will do the
right thing. We'd say "no, we don't have that object" without doing the
backfill, and then eventually find it as part of the final resolution.

It would be nice to confirm that's what's going on, though (and it isn't
really a thin pack). If the problem can be reproduced, I don't suppose
we have a GIT_TRACE_PACKET output from a failing instance? That would
confirm that we're correctly using the noop negotiation.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: [PATCH v2] promisor-remote: prevent lazy-fetch recursion in child fetch
@ 2026-03-12  7:27 Patrick Steinhardt
  2026-03-13 12:43 ` [PATCH v3] " Paul Tarjan
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick Steinhardt @ 2026-03-12  7:27 UTC (permalink / raw)
  To: Paul Tarjan; +Cc: git, gitgitgadget, christian.couder, hanxin.hx, Jeff King

On Wed, Mar 11, 2026 at 08:18:46AM -0600, Paul Tarjan wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Is this a theoretical concern or a practical one? I would expect that
> > backfill fetches never cause the server side to send a pack with
> > REF_DELTA objects to nonexistent objects. And if they did they are
> > broken.
> 
> Practical. We hit this at Anthropic: 276 GB of promisor packs written
> by `git maintenance --task=prefetch` in 90 minutes against a ~10 GB
> monorepo with ~61K stale prefetch refs pointing at GC'd commits.

I must be misunderstanding something here, but how is it that a commit
can be garbage collected if a ref points to it? That shouldn't ever
happen, as reachable commits should not be pruned.

Or do you mean to say that the commits don't exist on the server side
anymore?

> > Hm. Can we craft a test that shows us the resulting failure in practice?
> > Testing for the environment variable feels like a bad proxy to me, as
> > I'd rather want to learn how Git would fail now.
> 
> Good point. Reworked the test in v3. It now injects a thin pack
> containing a REF_DELTA against a missing base via HTTP (using the
> replace_packfile pattern from t5616). This triggers the actual
> recursion path: index-pack encounters the missing base, calls
> promisor_remote_get_direct(), which hits the GIT_NO_LAZY_FETCH=1
> guard and fails with "lazy fetching disabled". Without the fix,
> the depth-2 fetch would proceed and potentially recurse.

Great, thanks.

> > Okay, so this seems to be an issue that can be hit in the wild. But I
> > have to wonder whether this really is a bug on the client-side, or
> > whether this is a bug that actually sits on your server. So ultimately:
> > why does the server send REF_DELTA objects in the first place? Is it
> > using git-upload-pack(1), or is it using a different implementation of
> > Git to serve data?
> 
> The server is GitHub. I did a blob:none partial clone and after some
> further git operations ended up in this state. I don't have
> server-side data on why it sent REF_DELTAs against missing bases.

That's certainly curious. Do you maybe have multiple remotes attached to
the repository, or are you dropping/modifying the object filter at some
point?

All subsequent fetches need to use the same object filter as you've used
during the initial clone, otherwise you may run into a situation as you
have described. But in theory, Git knows to continue using the filter.

> > Note that I'm not arguing that we shouldn't have protection on the
> > client, too. But I'd first like to understand whether there is a bug
> > lurking somewhere that causes us to send invalid packfiles.
> 
> Agreed, there may well be a server-side bug here. Regardless, the
> client should fail fast rather than consume unbounded resources.

Probably, yes. What I'm trying to figure out is whether there are edge
cases here where it's _valid_ for the server to send a thin pack with a
REF_DELTA. Because if so, unconditionally disabling the lazy fetches
would break such edge cases.

I don't think there are such cases, but I wouldn't consider myself an
expert with partial clones.

Cc'd Peff, as he's implemented a couple fixes in this area a couple
years ago.

Patrick

^ permalink raw reply	[flat|nested] 5+ messages in thread
* [PATCH v2] promisor-remote: prevent lazy-fetch recursion in child fetch
@ 2026-03-04 18:27 Paul Tarjan via GitGitGadget
  2026-03-11 14:19 ` [PATCH v3] " Paul Tarjan via GitGitGadget
  0 siblings, 1 reply; 5+ messages in thread
From: Paul Tarjan via GitGitGadget @ 2026-03-04 18:27 UTC (permalink / raw)
  To: git; +Cc: Christian Couder, Han Xin, Paul Tarjan, Paul Tarjan

From: Paul Tarjan <github@paulisageek.com>

fetch_objects() spawns a child `git fetch` to lazily fill in missing
objects. That child's index-pack, when it receives a thin pack
containing a REF_DELTA against a still-missing base, explicitly
calls promisor_remote_get_direct() — which is fetch_objects() again.
If the base is truly unavailable (e.g. because many refs in the
local store point at objects that have been garbage-collected on the
server), each recursive lazy-fetch can trigger another, leading to
unbounded recursion with runaway disk and process consumption.

The GIT_NO_LAZY_FETCH guard (introduced by e6d5479e7a (git: add
--no-lazy-fetch option, 2021-08-31)) already exists at the top of
fetch_objects(); the missing piece is propagating it into the child
fetch's environment. Add that propagation so the child's
index-pack, if it encounters a REF_DELTA against a missing base,
hits the guard and fails fast instead of recursing.

Depth-1 lazy fetch (the whole point of fetch_objects()) is
unaffected: only the child and its descendants see the variable.
With negotiationAlgorithm=noop the client advertises no "have"
lines, so a well-behaved server sends requested objects
un-deltified or deltified only against objects in the same pack;
the child's index-pack should never need a depth-2 fetch. If it
does, the server response was broken or the local store is already
corrupt, and further fetching would not help.

This is the same bug shape that 3a1ea94a49 (commit-graph.c: no lazy
fetch in lookup_commit_in_graph(), 2022-07-01) addressed at a
different entry point.

Add a test that verifies the child fetch environment contains
GIT_NO_LAZY_FETCH=1 via a reference-transaction hook.

Signed-off-by: Paul Tarjan <github@paulisageek.com>
---
    promisor-remote: prevent recursive lazy-fetch during index-pack
    
    Propagate GIT_NO_LAZY_FETCH=1 into the child fetch spawned by
    fetch_objects() so that index-pack cannot recurse back into lazy-fetch
    when resolving REF_DELTA bases.
    
    We hit this in production: 276 GB of promisor packs written in 90
    minutes against a 100 GB monorepo with ~61K stale prefetch refs pointing
    at GC'd commits.
    
    Changes since v1:
    
     * Dropped CC: trailers from commit message (moved here for
       GitGitGadget)
     * Moved test into t0411-clone-from-partial.sh instead of a new file
     * Removed duplicate commit-message summary from PR description

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2224%2Fptarjan%2Fclaude%2Ffix-lazy-fetch-recursion-KP9Hl-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2224/ptarjan/claude/fix-lazy-fetch-recursion-KP9Hl-v2
Pull-Request: https://github.com/git/git/pull/2224

Range-diff vs v1:

 1:  7b723441f4 ! 1:  907ca7a0ac promisor-remote: prevent lazy-fetch recursion in child fetch
     @@ Commit message
          different entry point.
      
          Add a test that verifies the child fetch environment contains
     -    GIT_NO_LAZY_FETCH=1 via a reference-transaction hook, and that
     -    only one fetch subprocess is spawned.
     +    GIT_NO_LAZY_FETCH=1 via a reference-transaction hook.
      
     -    Cc: Jonathan Tan <jonathantanmy@google.com>
     -    Cc: Han Xin <hanxin.hx@bytedance.com>
     -    Cc: Jeff Hostetler <jeffhostetler@github.com>
     -    Cc: Christian Couder <christian.couder@gmail.com>
          Signed-off-by: Paul Tarjan <github@paulisageek.com>
      
       ## promisor-remote.c ##
     @@ promisor-remote.c: static int fetch_objects(struct repository *repo,
       		     "fetch", remote_name, "--no-tags",
       		     "--no-write-fetch-head", "--recurse-submodules=no",
      
     - ## t/meson.build ##
     -@@ t/meson.build: integration_tests = [
     -   't0303-credential-external.sh',
     -   't0410-partial-clone.sh',
     -   't0411-clone-from-partial.sh',
     -+  't0412-promisor-no-lazy-fetch-recursion.sh',
     -   't0450-txt-doc-vs-help.sh',
     -   't0500-progress-display.sh',
     -   't0600-reffiles-backend.sh',
     -
     - ## t/t0412-promisor-no-lazy-fetch-recursion.sh (new) ##
     -@@
     -+#!/bin/sh
     -+
     -+test_description='promisor-remote: no recursive lazy-fetch
     -+
     -+Verify that fetch_objects() sets GIT_NO_LAZY_FETCH=1 in the child
     -+fetch environment, so that index-pack cannot recurse back into
     -+fetch_objects() when resolving REF_DELTA bases.
     -+'
     -+
     -+. ./test-lib.sh
     -+
     -+test_expect_success 'setup' '
     -+	test_create_repo server &&
     -+	test_commit -C server foo &&
     -+	git -C server repack -a -d --write-bitmap-index &&
     + ## t/t0411-clone-from-partial.sh ##
     +@@ t/t0411-clone-from-partial.sh: test_expect_success 'promisor lazy-fetching can be re-enabled' '
     + 	test_path_is_file script-executed
     + '
     + 
     ++test_expect_success 'lazy-fetch child has GIT_NO_LAZY_FETCH=1' '
     ++	test_create_repo nolazy-server &&
     ++	test_commit -C nolazy-server foo &&
     ++	git -C nolazy-server repack -a -d --write-bitmap-index &&
      +
     -+	git clone "file://$(pwd)/server" client &&
     -+	HASH=$(git -C client rev-parse foo) &&
     -+	rm -rf client/.git/objects/* &&
     -+
     -+	git -C client config core.repositoryformatversion 1 &&
     -+	git -C client config extensions.partialclone "origin"
     -+'
     -+
     -+test_expect_success 'lazy-fetch spawns only one fetch subprocess' '
     -+	GIT_TRACE="$(pwd)/trace" git -C client cat-file -p "$HASH" &&
     -+
     -+	grep "git fetch" trace >fetches &&
     -+	test_line_count = 1 fetches
     -+'
     ++	git clone "file://$(pwd)/nolazy-server" nolazy-client &&
     ++	HASH=$(git -C nolazy-client rev-parse foo) &&
     ++	rm -rf nolazy-client/.git/objects/* &&
      +
     -+test_expect_success 'child of lazy-fetch has GIT_NO_LAZY_FETCH=1' '
     -+	rm -rf client/.git/objects/* &&
     ++	git -C nolazy-client config core.repositoryformatversion 1 &&
     ++	git -C nolazy-client config extensions.partialclone "origin" &&
      +
      +	# Install a reference-transaction hook to record the env var
      +	# as seen by processes inside the child fetch.
     -+	test_hook -C client reference-transaction <<-\EOF &&
     ++	test_hook -C nolazy-client reference-transaction <<-\EOF &&
      +	echo "$GIT_NO_LAZY_FETCH" >>../env-in-child
      +	EOF
      +
      +	rm -f env-in-child &&
     -+	git -C client cat-file -p "$HASH" &&
     ++	git -C nolazy-client cat-file -p "$HASH" &&
      +
      +	# The hook runs inside the child fetch, which should have
      +	# GIT_NO_LAZY_FETCH=1 in its environment.
      +	grep "^1$" env-in-child
      +'
      +
     -+test_done
     + test_done


 promisor-remote.c             |  7 +++++++
 t/t0411-clone-from-partial.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/promisor-remote.c b/promisor-remote.c
index 96fa215b06..35c7aab93d 100644
--- a/promisor-remote.c
+++ b/promisor-remote.c
@@ -42,6 +42,13 @@ static int fetch_objects(struct repository *repo,
 	child.in = -1;
 	if (repo != the_repository)
 		prepare_other_repo_env(&child.env, repo->gitdir);
+	/*
+	 * Prevent the child's index-pack from recursing back into
+	 * fetch_objects() when resolving REF_DELTA bases it does not
+	 * have.  With noop negotiation the server should never need
+	 * to send such deltas, so a depth-2 fetch would not help.
+	 */
+	strvec_pushf(&child.env, "%s=1", NO_LAZY_FETCH_ENVIRONMENT);
 	strvec_pushl(&child.args, "-c", "fetch.negotiationAlgorithm=noop",
 		     "fetch", remote_name, "--no-tags",
 		     "--no-write-fetch-head", "--recurse-submodules=no",
diff --git a/t/t0411-clone-from-partial.sh b/t/t0411-clone-from-partial.sh
index 9e6bca5625..10a829fb80 100755
--- a/t/t0411-clone-from-partial.sh
+++ b/t/t0411-clone-from-partial.sh
@@ -78,4 +78,30 @@ test_expect_success 'promisor lazy-fetching can be re-enabled' '
 	test_path_is_file script-executed
 '
 
+test_expect_success 'lazy-fetch child has GIT_NO_LAZY_FETCH=1' '
+	test_create_repo nolazy-server &&
+	test_commit -C nolazy-server foo &&
+	git -C nolazy-server repack -a -d --write-bitmap-index &&
+
+	git clone "file://$(pwd)/nolazy-server" nolazy-client &&
+	HASH=$(git -C nolazy-client rev-parse foo) &&
+	rm -rf nolazy-client/.git/objects/* &&
+
+	git -C nolazy-client config core.repositoryformatversion 1 &&
+	git -C nolazy-client config extensions.partialclone "origin" &&
+
+	# Install a reference-transaction hook to record the env var
+	# as seen by processes inside the child fetch.
+	test_hook -C nolazy-client reference-transaction <<-\EOF &&
+	echo "$GIT_NO_LAZY_FETCH" >>../env-in-child
+	EOF
+
+	rm -f env-in-child &&
+	git -C nolazy-client cat-file -p "$HASH" &&
+
+	# The hook runs inside the child fetch, which should have
+	# GIT_NO_LAZY_FETCH=1 in its environment.
+	grep "^1$" env-in-child
+'
+
 test_done

base-commit: 7b2bccb0d58d4f24705bf985de1f4612e4cf06e5
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-17  2:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <xmqqse6p0cvm.fsf@gitster.g>
2026-04-17  2:21 ` [PATCH v3] promisor-remote: prevent lazy-fetch recursion in child fetch Paul Tarjan
2026-03-13  1:43 [PATCH v2] " Jeff King
2026-03-13 12:43 ` [PATCH v3] " Paul Tarjan
  -- strict thread matches above, loose matches on Subject: below --
2026-03-12  7:27 [PATCH v2] " Patrick Steinhardt
2026-03-13 12:43 ` [PATCH v3] " Paul Tarjan
2026-04-15 18:05   ` Junio C Hamano
2026-03-04 18:27 [PATCH v2] " Paul Tarjan via GitGitGadget
2026-03-11 14:19 ` [PATCH v3] " Paul Tarjan via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox