From: Patrick Steinhardt <ps@pks.im>
To: Kristofer Karlsson <krka@spotify.com>
Cc: Jeff King <peff@peff.net>, Michael Montalbo <mmontalbo@gmail.com>,
git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: weird quadratic reftable behavior, was: Re: [PATCH 3/3] t5551: pack refs after creating many tags
Date: Wed, 1 Jul 2026 12:07:40 +0200 [thread overview]
Message-ID: <akTm7BDohsy85sN8@pks.im> (raw)
In-Reply-To: <CAL71e4PfXA-ixKR6r7fu_7_QmdzK+rTRs29mOsUYKaq+_a5q5w@mail.gmail.com>
On Wed, Jul 01, 2026 at 11:04:45AM +0200, Kristofer Karlsson wrote:
> On Wed, 1 Jul 2026 at 10:00, Jeff King <peff@peff.net> wrote:
>
> > Yeah, it is (mostly) the same problem. About half the time is spent in
> > refs_verify_refnames_available().
> >
> > The other half is in reftable_be_transaction_prepare(). Looks like it
> > makes individual calls to prepare_single_update(), which reads each ref.
> > And those reads are expensive because of all of the tombstones. It might
> > be possible to do an iterator merge or similar between the sorted list
> > of transaction refs and the reftable contents.
>
> Hi, sorry for jumping in -- I found this interesting and started
> poking at the code. I think both halves may share the same root
> cause.
>
> The merged iterator's suppress_deletions flag filters out tombstones
> internally, which means higher-level code with prefix or refname
> bounds never gets a chance to stop iteration early. By letting
> tombstones pass through and filtering them one layer up in the
> reftable backend, the existing bounds checks can kick in before
> we scan through all the tombstones.
Ah, interesting. This kind of hits int o the direction that I proposed
of stopping iteration once we hit the first reference that has a
lexicographically-larget refname. But I thought about having to solve it
generically in the iterator somehow, and I wasn't really looking forward
to that.
Your solutions works around that issue indeed. It of course doesn't
solve reference iteration, which would still be expensive. But it does
solve the issue where we just want to look up a single reference, which
is what Peff noticed to be slow.
> So instead of doing full scans inside merged_iter_next_void()
> we can just delegate to merged_iter_next_entry() and instead
> add a loop to reftable_be_reflog_exists() that skips
> tombstones (but is amortized O(1)).
>
> Now multiple call sites would need to add something like this
> to compensate for returning tombstones:
>
> if (reftable_log_record_is_deletion(&iter->log))
> continue;
>
> but it may be worth it if it reduces cost when there are many refs.
I think it certainly might be.
> The key spot is reftable_ref_iterator_advance(), where the deletion
> skip goes right after the existing prefix check -- so a tombstone
> past the prefix stops iteration immediately instead of being
> silently consumed. The same idea applies to reftable_backend_read_ref()
> and the log iteration paths.
Yup.
> I have a local branch with this attempted fix. Rerunning the
> benchmark:
>
> Before:
> nr=1000 0.306s
> nr=2000 0.945s
> nr=4000 3.816s
> nr=8000 14.93s
>
> After:
> nr=1000 0.020s
> nr=2000 0.044s
> nr=4000 0.071s
> nr=8000 0.145s
> nr=16000 0.258s
> nr=32000 0.591s
>
> I can send a proper patch if needed/wanted, but I might have missed
> something silly here.
Nice gains. I certainly think it would make sense to polish this a bit
and then cast it into a patch.
Patrick
next prev parent reply other threads:[~2026-07-01 10:07 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-20 15:33 [RFH] Why do osx CI jobs so unreliable? Michael Montalbo
2026-06-21 21:34 ` Jeff King
2026-06-22 4:42 ` Patrick Steinhardt
2026-06-22 9:47 ` Patrick Steinhardt
2026-06-22 9:55 ` Patrick Steinhardt
2026-06-22 10:29 ` Patrick Steinhardt
2026-06-26 3:27 ` Michael Montalbo
2026-06-26 5:16 ` Jeff King
2026-06-26 10:50 ` Patrick Steinhardt
2026-06-26 13:45 ` Junio C Hamano
2026-06-26 23:26 ` Michael Montalbo
2026-06-28 7:57 ` [PATCH 0/3] fixing expensive http test timeouts Jeff King
2026-06-28 8:00 ` [PATCH 1/3] t/lib-httpd: bump apache timeout Jeff King
2026-07-02 3:24 ` Michael Montalbo
2026-06-28 8:03 ` [PATCH 2/3] t5551: put many-tags case into its own repo Jeff King
2026-06-28 21:44 ` Junio C Hamano
2026-06-29 0:34 ` Jeff King
2026-06-29 14:42 ` Junio C Hamano
2026-06-29 20:36 ` Jeff King
2026-06-28 8:07 ` [PATCH 3/3] t5551: pack refs after creating many tags Jeff King
2026-06-28 21:25 ` Junio C Hamano
2026-06-29 5:57 ` Patrick Steinhardt
2026-06-29 20:35 ` Jeff King
2026-06-30 9:05 ` Patrick Steinhardt
2026-06-30 23:47 ` Jeff King
2026-06-30 23:58 ` weird quadratic reftable behavior, was: " Jeff King
2026-07-01 6:17 ` Patrick Steinhardt
2026-07-01 8:00 ` Jeff King
2026-07-01 9:04 ` Kristofer Karlsson
2026-07-01 10:07 ` Patrick Steinhardt [this message]
2026-07-03 12:09 ` Kristofer Karlsson
2026-06-29 7:33 ` [PATCH 0/3] fixing expensive http test timeouts Patrick Steinhardt
2026-06-29 14:39 ` Junio C Hamano
2026-06-29 16:09 ` Patrick Steinhardt
2026-06-29 16:19 ` Junio C Hamano
2026-06-30 9:05 ` Patrick Steinhardt
2026-06-30 19:21 ` Junio C Hamano
2026-07-01 6:56 ` Patrick Steinhardt
2026-06-26 23:43 ` [RFH] Why do osx CI jobs so unreliable? Jeff King
2026-06-22 5:05 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=akTm7BDohsy85sN8@pks.im \
--to=ps@pks.im \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=krka@spotify.com \
--cc=mmontalbo@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox