From: Patrick Steinhardt <ps@pks.im>
To: Paul Tarjan via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org,
Christian Couder <christian.couder@gmail.com>,
Han Xin <hanxin.hx@bytedance.com>,
Paul Tarjan <github@paulisageek.com>
Subject: Re: [PATCH v2] promisor-remote: prevent lazy-fetch recursion in child fetch
Date: Wed, 11 Mar 2026 11:52:52 +0100 [thread overview]
Message-ID: <abFJhFhHLhS4qdrM@pks.im> (raw)
In-Reply-To: <pull.2224.v2.git.git.1772648846009.gitgitgadget@gmail.com>
On Wed, Mar 04, 2026 at 06:27:25PM +0000, Paul Tarjan via GitGitGadget wrote:
> From: Paul Tarjan <github@paulisageek.com>
>
> fetch_objects() spawns a child `git fetch` to lazily fill in missing
> objects. That child's index-pack, when it receives a thin pack
> containing a REF_DELTA against a still-missing base, explicitly
> calls promisor_remote_get_direct() — which is fetch_objects() again.
> If the base is truly unavailable (e.g. because many refs in the
> local store point at objects that have been garbage-collected on the
> server), each recursive lazy-fetch can trigger another, leading to
> unbounded recursion with runaway disk and process consumption.
Is this a theoretical concern or a practical one? I would expect that
backfill fetches never cause the server side to send a pack with
REF_DELTA objects to nonexistent objects. And if they did they are
broken.
> The GIT_NO_LAZY_FETCH guard (introduced by e6d5479e7a (git: add
> --no-lazy-fetch option, 2021-08-31)) already exists at the top of
> fetch_objects(); the missing piece is propagating it into the child
> fetch's environment. Add that propagation so the child's
> index-pack, if it encounters a REF_DELTA against a missing base,
> hits the guard and fails fast instead of recursing.
>
> Depth-1 lazy fetch (the whole point of fetch_objects()) is
> unaffected: only the child and its descendants see the variable.
> With negotiationAlgorithm=noop the client advertises no "have"
> lines, so a well-behaved server sends requested objects
> un-deltified or deltified only against objects in the same pack;
> the child's index-pack should never need a depth-2 fetch. If it
> does, the server response was broken or the local store is already
> corrupt, and further fetching would not help.
Exactly, this here matches my understanding. The backfill fetches don't
perform negotiation, so we shouldn't ever see a thin pack in the first
place. What I don't yet understand is your comment about the depth-2
fetch -- when would we ever do that?
> This is the same bug shape that 3a1ea94a49 (commit-graph.c: no lazy
> fetch in lookup_commit_in_graph(), 2022-07-01) addressed at a
> different entry point.
I dunno, I think it's quite different overall. In the mentioned commit
we protect against a stale commit-graph, which is something that is
quite plausible to happen on the client side. But here we protect us
against a remote side that sends a packfile that violates specs, as far
as I understand.
> Add a test that verifies the child fetch environment contains
> GIT_NO_LAZY_FETCH=1 via a reference-transaction hook.
Hm. Can we craft a test that shows us the resulting failure in practice?
Testing for the environment variable feels like a bad proxy to me, as
I'd rather want to learn how Git would fail now.
> Signed-off-by: Paul Tarjan <github@paulisageek.com>
> ---
> promisor-remote: prevent recursive lazy-fetch during index-pack
>
> Propagate GIT_NO_LAZY_FETCH=1 into the child fetch spawned by
> fetch_objects() so that index-pack cannot recurse back into lazy-fetch
> when resolving REF_DELTA bases.
>
> We hit this in production: 276 GB of promisor packs written in 90
> minutes against a 100 GB monorepo with ~61K stale prefetch refs pointing
> at GC'd commits.
Okay, so this seems to be an issue that can be hit in the wild. But I
have to wonder whether this really is a bug on the client-side, or
whether this is a bug that actually sits on your server. So ultimately:
why does the server send REF_DELTA objects in the first place? Is it
using git-upload-pack(1), or is it using a different implementation of
Git to serve data?
Note that I'm not arguing that we shouldn't have protection on the
client, too. But I'd first like to understand whether there is a bug
lurking somewhere that causes us to send invalid packfiles.
Patrick
next prev parent reply other threads:[~2026-03-11 10:52 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 16:57 [PATCH] promisor-remote: prevent lazy-fetch recursion in child fetch Paul Tarjan via GitGitGadget
2026-03-04 17:41 ` Junio C Hamano
2026-03-04 18:20 ` Paul Tarjan
2026-03-04 18:27 ` [PATCH v2] " Paul Tarjan via GitGitGadget
2026-03-11 10:52 ` Patrick Steinhardt [this message]
2026-03-11 14:18 ` Paul Tarjan
2026-03-12 7:27 ` Patrick Steinhardt
2026-03-13 1:43 ` Jeff King
2026-03-13 12:43 ` [PATCH v3] " Paul Tarjan
2026-03-13 12:43 ` Paul Tarjan
2026-04-15 18:05 ` Junio C Hamano
2026-03-11 14:19 ` Paul Tarjan via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abFJhFhHLhS4qdrM@pks.im \
--to=ps@pks.im \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=github@paulisageek.com \
--cc=hanxin.hx@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.