From: Patrick Steinhardt <ps@pks.im>
To: Paul Tarjan <paul@paultarjan.com>
Cc: git@vger.kernel.org, gitgitgadget@gmail.com,
christian.couder@gmail.com, hanxin.hx@bytedance.com,
Jeff King <peff@peff.net>
Subject: Re: [PATCH v2] promisor-remote: prevent lazy-fetch recursion in child fetch
Date: Thu, 12 Mar 2026 08:27:05 +0100 [thread overview]
Message-ID: <abJqySqfdFoY8cEu@pks.im> (raw)
In-Reply-To: <20260311141846.12315-1-github@paulisageek.com>
On Wed, Mar 11, 2026 at 08:18:46AM -0600, Paul Tarjan wrote:
> Patrick Steinhardt <ps@pks.im> writes:
>
> > Is this a theoretical concern or a practical one? I would expect that
> > backfill fetches never cause the server side to send a pack with
> > REF_DELTA objects to nonexistent objects. And if they did they are
> > broken.
>
> Practical. We hit this at Anthropic: 276 GB of promisor packs written
> by `git maintenance --task=prefetch` in 90 minutes against a ~10 GB
> monorepo with ~61K stale prefetch refs pointing at GC'd commits.
I must be misunderstanding something here, but how is it that a commit
can be garbage collected if a ref points to it? That shouldn't ever
happen, as reachable commits should not be pruned.
Or do you mean to say that the commits don't exist on the server side
anymore?
> > Hm. Can we craft a test that shows us the resulting failure in practice?
> > Testing for the environment variable feels like a bad proxy to me, as
> > I'd rather want to learn how Git would fail now.
>
> Good point. Reworked the test in v3. It now injects a thin pack
> containing a REF_DELTA against a missing base via HTTP (using the
> replace_packfile pattern from t5616). This triggers the actual
> recursion path: index-pack encounters the missing base, calls
> promisor_remote_get_direct(), which hits the GIT_NO_LAZY_FETCH=1
> guard and fails with "lazy fetching disabled". Without the fix,
> the depth-2 fetch would proceed and potentially recurse.
Great, thanks.
> > Okay, so this seems to be an issue that can be hit in the wild. But I
> > have to wonder whether this really is a bug on the client-side, or
> > whether this is a bug that actually sits on your server. So ultimately:
> > why does the server send REF_DELTA objects in the first place? Is it
> > using git-upload-pack(1), or is it using a different implementation of
> > Git to serve data?
>
> The server is GitHub. I did a blob:none partial clone and after some
> further git operations ended up in this state. I don't have
> server-side data on why it sent REF_DELTAs against missing bases.
That's certainly curious. Do you maybe have multiple remotes attached to
the repository, or are you dropping/modifying the object filter at some
point?
All subsequent fetches need to use the same object filter as you've used
during the initial clone, otherwise you may run into a situation as you
have described. But in theory, Git knows to continue using the filter.
> > Note that I'm not arguing that we shouldn't have protection on the
> > client, too. But I'd first like to understand whether there is a bug
> > lurking somewhere that causes us to send invalid packfiles.
>
> Agreed, there may well be a server-side bug here. Regardless, the
> client should fail fast rather than consume unbounded resources.
Probably, yes. What I'm trying to figure out is whether there are edge
cases here where it's _valid_ for the server to send a thin pack with a
REF_DELTA. Because if so, unconditionally disabling the lazy fetches
would break such edge cases.
I don't think there are such cases, but I wouldn't consider myself an
expert with partial clones.
Cc'd Peff, as he's implemented a couple fixes in this area a couple
years ago.
Patrick
next prev parent reply other threads:[~2026-03-12 7:27 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-04 16:57 [PATCH] promisor-remote: prevent lazy-fetch recursion in child fetch Paul Tarjan via GitGitGadget
2026-03-04 17:41 ` Junio C Hamano
2026-03-04 18:20 ` Paul Tarjan
2026-03-04 18:27 ` [PATCH v2] " Paul Tarjan via GitGitGadget
2026-03-11 10:52 ` Patrick Steinhardt
2026-03-11 14:18 ` Paul Tarjan
2026-03-12 7:27 ` Patrick Steinhardt [this message]
2026-03-13 1:43 ` Jeff King
2026-03-13 12:43 ` [PATCH v3] " Paul Tarjan
2026-03-13 12:43 ` Paul Tarjan
2026-03-11 14:19 ` Paul Tarjan via GitGitGadget
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abJqySqfdFoY8cEu@pks.im \
--to=ps@pks.im \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=hanxin.hx@bytedance.com \
--cc=paul@paultarjan.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox