From: Paul Tarjan <paul@paultarjan.com>
To: git@vger.kernel.org
Cc: ps@pks.im, gitgitgadget@gmail.com, christian.couder@gmail.com,
hanxin.hx@bytedance.com
Subject: Re: [PATCH v2] promisor-remote: prevent lazy-fetch recursion in child fetch
Date: Wed, 11 Mar 2026 08:18:46 -0600 [thread overview]
Message-ID: <20260311141846.12315-1-github@paulisageek.com> (raw)
In-Reply-To: <abFJhFhHLhS4qdrM@pks.im>
Patrick Steinhardt <ps@pks.im> writes:
> Is this a theoretical concern or a practical one? I would expect that
> backfill fetches never cause the server side to send a pack with
> REF_DELTA objects to nonexistent objects. And if they did they are
> broken.
Practical. We hit this at Anthropic: 276 GB of promisor packs written
by `git maintenance --task=prefetch` in 90 minutes against a ~10 GB
monorepo with ~61K stale prefetch refs pointing at GC'd commits.
> Exactly, this here matches my understanding. The backfill fetches don't
> perform negotiation, so we shouldn't ever see a thin pack in the first
> place. What I don't yet understand is your comment about the depth-2
> fetch -- when would we ever do that?
The code path already exists and is tested: t5616 line 832 ("tolerate
server sending REF_DELTA against missing promisor objects") creates
exactly this scenario. index-pack's fix_unresolved_deltas() calls
promisor_remote_get_direct() when it encounters a REF_DELTA against a
missing base (builtin/index-pack.c:1508). That's the depth-2 fetch.
With noop negotiation a well-behaved server shouldn't send REF_DELTA
against objects the client doesn't have. But partial clones with
blob:none mean the client is missing most blobs, and if the server
sends a thin pack deltified against one of those filtered-out blobs,
index-pack will try to fetch the base.
> I dunno, I think it's quite different overall. In the mentioned commit
> we protect against a stale commit-graph, which is something that is
> quite plausible to happen on the client side. But here we protect us
> against a remote side that sends a packfile that violates specs, as far
> as I understand.
Fair point. The commit-graph case is purely client-side corruption,
while this requires a misbehaving server. The bug shape is the same
(unbounded recursion through fetch_objects()) but the trigger is
different. I'll drop the comparison in the next version.
> Hm. Can we craft a test that shows us the resulting failure in practice?
> Testing for the environment variable feels like a bad proxy to me, as
> I'd rather want to learn how Git would fail now.
Good point. Reworked the test in v3. It now injects a thin pack
containing a REF_DELTA against a missing base via HTTP (using the
replace_packfile pattern from t5616). This triggers the actual
recursion path: index-pack encounters the missing base, calls
promisor_remote_get_direct(), which hits the GIT_NO_LAZY_FETCH=1
guard and fails with "lazy fetching disabled". Without the fix,
the depth-2 fetch would proceed and potentially recurse.
> Okay, so this seems to be an issue that can be hit in the wild. But I
> have to wonder whether this really is a bug on the client-side, or
> whether this is a bug that actually sits on your server. So ultimately:
> why does the server send REF_DELTA objects in the first place? Is it
> using git-upload-pack(1), or is it using a different implementation of
> Git to serve data?
The server is GitHub. I did a blob:none partial clone and after some
further git operations ended up in this state. I don't have
server-side data on why it sent REF_DELTAs against missing bases.
> Note that I'm not arguing that we shouldn't have protection on the
> client, too. But I'd first like to understand whether there is a bug
> lurking somewhere that causes us to send invalid packfiles.
Agreed, there may well be a server-side bug here. Regardless, the
client should fail fast rather than consume unbounded resources.
next prev parent reply other threads:[~2026-03-11 14:18 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 16:57 [PATCH] promisor-remote: prevent lazy-fetch recursion in child fetch Paul Tarjan via GitGitGadget
2026-03-04 17:41 ` Junio C Hamano
2026-03-04 18:20 ` Paul Tarjan
2026-03-04 18:27 ` [PATCH v2] " Paul Tarjan via GitGitGadget
2026-03-11 10:52 ` Patrick Steinhardt
2026-03-11 14:18 ` Paul Tarjan [this message]
2026-03-12 7:27 ` Patrick Steinhardt
2026-03-13 1:43 ` Jeff King
2026-03-13 12:43 ` [PATCH v3] " Paul Tarjan
2026-03-13 12:43 ` Paul Tarjan
2026-04-15 18:05 ` Junio C Hamano
2026-03-11 14:19 ` Paul Tarjan via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260311141846.12315-1-github@paulisageek.com \
--to=paul@paultarjan.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=hanxin.hx@bytedance.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.