git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] object: fix performance regression when peeling tags
Date: Thu, 06 Nov 2025 06:33:44 -0800	[thread overview]
Message-ID: <xmqqy0ojjkmv.fsf@gitster.g> (raw)
In-Reply-To: <20251106-b4-pks-peel-object-performance-regression-v1-1-a386147750b0@pks.im> (Patrick Steinhardt's message of "Thu, 06 Nov 2025 09:52:54 +0100")

Patrick Steinhardt <ps@pks.im> writes:

> Bisecting the issue lands us at 6ec4c0b45b (refs: don't store peeled
> object IDs for invalid tags, 2025-10-23). The gist of the commit is that
> we may end up storing peeled objects in both reftables and packed-refs
> for corrupted tags, where the claimed tagged object type is different
> than the actual tagged object type. This will then cause us to create
> the `struct object *` with a wrong type, as well, and obviously nothing
> good comes out of that.

So does the flow of the logic, which led to the original "validation
while recording peeled tags", go like this?

 - It is handy to be able to get peeled object cheaply, let's cache
   it, because the same tag peels to the same object every time.

 - Usually when we see an object we make sure that is what we
   expect.  Not having to do this validation costs us less, so why
   not validate when caching peeled object?  It would amortise the
   cost of validating the peeled object at runtime every time we use
   it into a one-time cost when we cache.

But of course we do not really get rid of the type checking at
runtime, so we certainly should notice, no?

> Taking a step back though reveals an oddity in the new verification
> logic: we not only verify the _tagged_ object's type, but we also verify
> the type of the tag itself. But this isn't really needed, as we wouldn't
> hit the bug in such a case anyway, as we only hit the issue with corrupt
> tags claiming an invalid type for the tagged object.
>
> The consequence of this is that we now started to look up the target
> object of every single reference we're about to write, regardless of
> whether it even is a tag or not. And that is of course quite costly.

;-).

> Fix the issue by only verifying the type of the tagged objects. This
> means that we of course still have a performance hit for actual tags.
> But this only happens for writes anyway, and I'd claim it's preferable
> to not store corrupted data in the refdb than to be fast here. Rename
> the flag accordingly to clarify that we only verify the tagged object's
> type.

OK.

Will queue.

  reply	other threads:[~2025-11-06 14:33 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-06  8:52 [PATCH] object: fix performance regression when peeling tags Patrick Steinhardt
2025-11-06 14:33 ` Junio C Hamano [this message]
2025-11-07  6:19   ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqy0ojjkmv.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).