git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Karthik Nayak <karthik.188@gmail.com>
Cc: git@vger.kernel.org,
	"brian m. carlson" <sandals@crustytoothpaste.net>,
	Jeff King <peff@peff.net>, Junio C Hamano <gitster@pobox.com>,
	Christian Couder <chriscool@tuxfamily.org>
Subject: Re: [PATCH 06/14] refs: stop re-verifying common prefixes for availability
Date: Wed, 19 Feb 2025 12:52:15 +0100	[thread overview]
Message-ID: <Z7XF77yKUgENdbA-@pks.im> (raw)
In-Reply-To: <CAOLa=ZQC+UXQGjOqot=pTopkd8mOjduixQ=rBnsis9g_3_HOqw@mail.gmail.com>

On Tue, Feb 18, 2025 at 08:12:05AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > One of the checks done by `refs_verify_refnames_available()` is whether
> > any of the prefixes of a reference already exists. For example, given a
> > reference "refs/heads/main", we'd check whether "refs/heads" or "refs"
> > already exist, and if so we'd abort the transaction.
> >
> > When updating multiple references at once, this check is performed for
> > each of the references individually. Consequently, because references
> > tend to have common prefixes like "refs/heads/" or refs/tags/", we
> > evaluate the availability of these prefixes repeatedly. Naturally this
> > is a waste of compute, as the availability of those prefixes should in
> > general not change in the middle of a transaction. And if it would,
> > backends would notice at a later point in time.
> >
> > Optimize this pattern by storing prefixes in a `strset` so that we can
> > trivially track those prefixes that we have already checked. This leads
> > to a significant speedup when creating many references that all share a
> > common prefix:
> >
> >     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
> >       Time (mean ± σ):      63.1 ms ±   1.8 ms    [User: 41.0 ms, System: 21.6 ms]
> >       Range (min … max):    60.6 ms …  69.5 ms    38 runs
> >
> >     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
> >       Time (mean ± σ):      40.0 ms ±   1.3 ms    [User: 29.3 ms, System: 10.3 ms]
> >       Range (min … max):    38.1 ms …  47.3 ms    61 runs
> >
> >     Summary
> >       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
> >         1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
> >
> > Note that the same speedup cannot be observed for the "files" backend
> > because it still performs availability check per reference.
> >
> 
> In the previous commit, you started using the new function in the
> reftable backend, can we not make a similar change to the files backend?

It's quite a bit more intricate in the "files" backend because the
creation of the lockfiles and calls to `refs_verify_refname_available()`
are intertwined with one another:

  - `lock_raw_ref()` verifies availability when it hits either EEXISTS
    or EISDIR to generate error messages. This is probably nothing we
    have to care about too much, as these are irrelevant in the good
    path.

  - `lock_raw_ref()` also verifies availability though in the case where
    it _could_ create the lockfile to check whether it is conflicting
    with any packed refs. This one could potentially be batched. It's a
    curious thing in the first place as we do not have the packed refs
    locked at this point in time, so this check might even be racy.

  - `lock_ref_oid_basic()` also checks availability with packed refs, so
    this is another case where we might batch the checks. But the
    function is only used when copying/renaming references or when
    expiring reflogs, so it won't be called for many refs.

  - We call it in `refs_refname_ref_available()`, which is executed when
    copying/renaming references. Uninteresting due to the same reason as
    the previous entry.

  - We call it in `files_transaction_finish_initial()`. This one should
    be rather trivial to batch. Again though, no locking with packed
    refs, so the checks are racy.

So... it's a bit more complicated here compared to the reftable backend,
and I didn't feel like opening a can of worms with the potentially-racy
checks with the packed backend.

Anyway, I think we still can and probably should use the new mechanism
in two cases:

  - During normal transactions to batch the availability checks with the
    packed backend. I will have to ignore the issue of a potential race,
    but other than that the change is straight forward and the result is
    a slight speedup:

      Benchmark 1: update-ref: create many refs (preexisting = 100000, new = 10000, revision = HEAD~)
         Time (mean ± σ):     393.4 ms ±   4.0 ms    [User: 64.1 ms, System: 327.5 ms]
         Range (min … max):   387.8 ms … 398.7 ms    10 runs

       Benchmark 2: update-ref: create many refs (preexisting = 100000, new = 10000, revision = HEAD)
         Time (mean ± σ):     373.3 ms ±   3.4 ms    [User: 48.8 ms, System: 322.7 ms]
         Range (min … max):   368.7 ms … 378.6 ms    10 runs

       Summary
         update-ref: create many refs (preexisting = 100000, new = 10000, revision = HEAD) ran
           1.05 ± 0.01 times faster than update-ref: create many refs (preexisting = 100000, new = 10000, revision = HEAD~)

  - During the initial transaction. Here the change is even more trivial
    and we can also fix the race as we eventually lock the packed-refs
    file anyway. This leads to a noticeable speedup when migrating from
    the reftable backend to the files backend:

      Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~)
        Time (mean ± σ):     980.6 ms ±  10.9 ms    [User: 801.8 ms, System: 172.4 ms]
        Range (min … max):   964.7 ms … 995.3 ms    10 runs

      Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD)
        Time (mean ± σ):     739.7 ms ±   6.6 ms    [User: 551.9 ms, System: 181.9 ms]
        Range (min … max):   727.9 ms … 747.2 ms    10 runs

      Summary
        migrate reftable:files (refcount = 1000000, revision = HEAD) ran
          1.33 ± 0.02 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~)

I'll include these changes in the next version, thanks for questioning
why I skipped over the "files" backend.

Patrick

  reply	other threads:[~2025-02-19 11:52 UTC|newest]

Thread overview: 169+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-17 15:50 [PATCH 00/14] refs: batch refname availability checks Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 01/14] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 02/14] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 03/14] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-02-18 16:04   ` Karthik Nayak
2025-02-17 15:50 ` [PATCH 04/14] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 05/14] refs/reftable: start using `refs_verify_refnames_available()` Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 06/14] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-02-18 16:12   ` Karthik Nayak
2025-02-19 11:52     ` Patrick Steinhardt [this message]
2025-02-17 15:50 ` [PATCH 07/14] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-02-18 16:52   ` shejialuo
2025-02-19 11:52     ` Patrick Steinhardt
2025-02-19 12:41       ` shejialuo
2025-02-19 12:59         ` Patrick Steinhardt
2025-02-19 13:06           ` shejialuo
2025-02-19 13:17             ` Patrick Steinhardt
2025-02-19 13:20               ` Patrick Steinhardt
2025-02-19 13:23                 ` shejialuo
2025-02-18 17:13   ` Karthik Nayak
2025-02-19 11:52     ` Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 08/14] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 09/14] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-02-19 20:10   ` Karthik Nayak
2025-02-17 15:50 ` [PATCH 10/14] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-02-19 20:13   ` Karthik Nayak
2025-02-17 15:50 ` [PATCH 11/14] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 12/14] refs/iterator: implement seeking for `packed-ref` iterators Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 13/14] refs/iterator: implement seeking for "files" iterators Patrick Steinhardt
2025-02-17 15:50 ` [PATCH 14/14] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-02-18 17:10 ` [PATCH 00/14] refs: batch refname availability checks brian m. carlson
2025-02-19 13:23 ` [PATCH v2 00/16] " Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-02-19 17:02     ` Justin Tobler
2025-02-19 13:23   ` [PATCH v2 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-02-21  8:00     ` Jeff King
2025-02-21  8:36       ` Patrick Steinhardt
2025-02-21  9:06         ` Jeff King
2025-02-19 13:23   ` [PATCH v2 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-02-19 18:21     ` Justin Tobler
2025-02-20  8:05       ` Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 05/16] refs/reftable: " Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-02-24 13:08     ` shejialuo
2025-02-25  7:39       ` Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-02-24 13:37     ` shejialuo
2025-02-25  7:39       ` Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-02-24 14:00     ` shejialuo
2025-02-25  7:39       ` Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-02-24 14:49     ` shejialuo
2025-02-25  7:39       ` Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 14/16] refs/iterator: implement seeking for `packed-ref` iterators Patrick Steinhardt
2025-02-24 15:09     ` shejialuo
2025-02-25  7:39       ` Patrick Steinhardt
2025-02-25 12:07         ` shejialuo
2025-02-19 13:23   ` [PATCH v2 15/16] refs/iterator: implement seeking for "files" iterators Patrick Steinhardt
2025-02-19 13:23   ` [PATCH v2 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-02-24 15:14     ` shejialuo
2025-02-24 15:18   ` [PATCH v2 00/16] refs: batch refname availability checks shejialuo
2025-02-25  7:39     ` Patrick Steinhardt
2025-02-25  8:55 ` [PATCH v3 " Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-02-26 22:26     ` Junio C Hamano
2025-02-27 11:57       ` Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 05/16] refs/reftable: " Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-02-25  8:55   ` [PATCH v3 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-02-25  8:56   ` [PATCH v3 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
2025-02-25  8:56   ` [PATCH v3 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
2025-02-25  8:56   ` [PATCH v3 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-02-28  9:26 ` [PATCH v4 00/16] refs: batch refname availability checks Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-03-06 13:21     ` Karthik Nayak
2025-02-28  9:26   ` [PATCH v4 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-03-06 13:47     ` Karthik Nayak
2025-02-28  9:26   ` [PATCH v4 05/16] refs/reftable: " Patrick Steinhardt
2025-03-06 14:00     ` Karthik Nayak
2025-03-06 14:12       ` Karthik Nayak
2025-03-06 15:13         ` Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
2025-03-06 14:10     ` Karthik Nayak
2025-02-28  9:26   ` [PATCH v4 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-03-06 14:16     ` Karthik Nayak
2025-02-28  9:26   ` [PATCH v4 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
2025-02-28  9:26   ` [PATCH v4 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-03-06 14:20   ` [PATCH v4 00/16] refs: batch refname availability checks Karthik Nayak
2025-03-06 15:08 ` [PATCH v5 " Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-03-12 12:12     ` shejialuo
2025-03-06 15:08   ` [PATCH v5 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-03-12 12:36     ` shejialuo
2025-03-12 12:44       ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 05/16] refs/reftable: " Patrick Steinhardt
2025-03-12 12:54     ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
2025-03-12 12:58     ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
2025-03-12 13:06     ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-03-12 13:22     ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-03-12 13:45     ` shejialuo
2025-03-12 15:36       ` Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
2025-03-06 15:08   ` [PATCH v5 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-03-06 15:32   ` [PATCH v5 00/16] refs: batch refname availability checks Karthik Nayak
2025-03-12 14:03   ` shejialuo
2025-03-12 15:56 ` [PATCH v6 " Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 01/16] object-name: introduce `repo_get_oid_with_flags()` Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 02/16] object-name: allow skipping ambiguity checks in `get_oid()` family Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 03/16] builtin/update-ref: skip ambiguity checks when parsing object IDs Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 04/16] refs: introduce function to batch refname availability checks Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 05/16] refs/reftable: " Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 06/16] refs/files: batch refname availability checks for normal transactions Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 07/16] refs/files: batch refname availability checks for initial transactions Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 08/16] refs: stop re-verifying common prefixes for availability Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 09/16] refs/iterator: separate lifecycle from iteration Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 10/16] refs/iterator: provide infrastructure to re-seek iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 11/16] refs/iterator: implement seeking for merged iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 12/16] refs/iterator: implement seeking for reftable iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 13/16] refs/iterator: implement seeking for ref-cache iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 14/16] refs/iterator: implement seeking for packed-ref iterators Patrick Steinhardt
2025-04-03 19:56     ` Elijah Newren
2025-04-03 22:18       ` brian m. carlson
2025-04-04  7:18         ` shejialuo
2025-04-04 10:00       ` Patrick Steinhardt
2025-04-04 10:05         ` Patrick Steinhardt
2025-04-04 10:59           ` Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 15/16] refs/iterator: implement seeking for files iterators Patrick Steinhardt
2025-03-12 15:56   ` [PATCH v6 16/16] refs: reuse iterators when determining refname availability Patrick Steinhardt
2025-03-13  2:57   ` [PATCH v6 00/16] refs: batch refname availability checks shejialuo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z7XF77yKUgENdbA-@pks.im \
    --to=ps@pks.im \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karthik.188@gmail.com \
    --cc=peff@peff.net \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).