Re: [PATCH] object: fix performance regression when peeling tags

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Patrick Steinhardt <ps@pks.im>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] object: fix performance regression when peeling tags
Date: Fri, 7 Nov 2025 07:19:21 +0100	[thread overview]
Message-ID: <aQ2Paa7pDTtbTRpC@pks.im> (raw)
In-Reply-To: <xmqqy0ojjkmv.fsf@gitster.g>

On Thu, Nov 06, 2025 at 06:33:44AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Bisecting the issue lands us at 6ec4c0b45b (refs: don't store peeled
> > object IDs for invalid tags, 2025-10-23). The gist of the commit is that
> > we may end up storing peeled objects in both reftables and packed-refs
> > for corrupted tags, where the claimed tagged object type is different
> > than the actual tagged object type. This will then cause us to create
> > the `struct object *` with a wrong type, as well, and obviously nothing
> > good comes out of that.
> 
> So does the flow of the logic, which led to the original "validation
> while recording peeled tags", go like this?
> 
>  - It is handy to be able to get peeled object cheaply, let's cache
>    it, because the same tag peels to the same object every time.
> 
>  - Usually when we see an object we make sure that is what we
>    expect.  Not having to do this validation costs us less, so why
>    not validate when caching peeled object?  It would amortise the
>    cost of validating the peeled object at runtime every time we use
>    it into a one-time cost when we cache.
> 
> But of course we do not really get rid of the type checking at
> runtime, so we certainly should notice, no?

Yeah, that's basically it. It's only the ref backends that cause us to
cache the peeled values, so the fix was to adapt them so that they know
to never cache data for corrupted tags. Otherwise we can end up in all
kinds of different situations where we may end up with an invalid
in-memory representation of the tagged object.

So the performance hit we have is on the writing side, and that should
amortize the cost somewhat. For the "files" backend that's mostly fine,
as we only pay that cost when packing refs. But for the "reftable"
backend it's less so because we basically pay the cost every time we
update a ref.

During runtime we (in most cases) don't verify the tagged object type.
We could, but that would make the added overhead that we now see as a
performance regression visible on every single read of a cached peeled
tag value. And I don't really think we should go there, doubly so
because the issue can only manifest in a corrupt repository.

One could of course argue that it's corrupt anyway, so why even bother
on the writing side? But such corrupted tags can be served to us by a
remote, and clients do not perform fsck by default. So it is possible
for such broken tags to end up in user repositories, and I'm not sure
about all the consequences it can have. So I'd rather put clients into a
position where they can at least detect the corruption at runtime, and
not caching the values is part of that mitigation.

Patrick

     prev parent reply	other threads:[~2025-11-07  6:19 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-06  8:52 [PATCH] object: fix performance regression when peeling tags Patrick Steinhardt
2025-11-06 14:33 ` Junio C Hamano
2025-11-07  6:19   ` Patrick Steinhardt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQ2Paa7pDTtbTRpC@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).