public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: "brian m. carlson" <sandals@crustytoothpaste.net>,
	git@vger.kernel.org, Karthik Nayak <karthik.188@gmail.com>
Subject: Re: Poor performance using reftable with many refs
Date: Thu, 13 Feb 2025 08:13:38 +0100	[thread overview]
Message-ID: <Z62booOOXODOl_sZ@pks.im> (raw)
In-Reply-To: <Z62NFXja4CkrxSil@pks.im>

On Thu, Feb 13, 2025 at 07:11:33AM +0100, Patrick Steinhardt wrote:
> On Thu, Feb 13, 2025 at 12:01:59AM +0000, brian m. carlson wrote:
> > It takes about 30 times as long to perform using the reftable backend,
> > which is concerning.  While this is a synthetic measurement, I had
> > intended to use it to determine the performance characteristics of
> > the reference update portion when pushing a large repository for the
> > first time.
> 
> Interesting, that's an edge case I didn't yet see. I know about some
> cases where reftables are ~10% slower, but 30x slower is in a different
> ballpark.

Well, I just cannot resist and had to investigate immediately. I can
indeed reproduce the issue with "linux.git" rather easily:

    Benchmark 1: update-ref (refformat = files)
      Time (mean ± σ):     223.0 ms ±   2.4 ms    [User: 76.1 ms, System: 145.6 ms]
      Range (min … max):   220.2 ms … 226.6 ms    5 runs

    Benchmark 2: update-ref (refformat = reftable)
      Time (mean ± σ):     17.472 s ±  0.153 s    [User: 17.402 s, System: 0.049 s]
      Range (min … max):   17.390 s … 17.745 s    5 runs

    Summary
      update-ref (refformat = files) ran
       78.35 ± 1.09 times faster than update-ref (refformat = reftable)

Oops, that indeed doesn't look great.

Turns out that you're hitting quite a funny edge case: the issue comes
from you first deleting all preexisting refs in the target repository
before recreating them. With "packed-refs", this leads to a repository
that has neither a "packed-refs" file nor any loose ref, except for HEAD
of course. But with "reftables" it doesn't:

    total 368
    -rw-r--r-- 1 pks users 332102 Feb 13 08:00 0x000000000001-0x000000000001-d8285c7c.ref
    -rw-r--r-- 1 pks users  32941 Feb 13 08:00 0x000000000002-0x000000000003-f1a8ebf9.ref
    -rw-r--r-- 1 pks users     86 Feb 13 08:00 tables.list

We end up with two tables: the first one has been created when cloning
the repository and contains all references. The second one has been
created when deleting all references, so it only contains ref deletions.
Because deletions don't have to carry an object ID, the resulting table
is also much smaller. This has the effect that auto-compaction does not
kick in, because we see that the geometric sequence is still intact. And
consequently, all the checks that we perform when recreating the refs
are way more expensive now because we have to search for conflicts.

A "fix" would be to pack references after you have deleted refs. This
leads to a significant speedup and makes the reftable backend outperform
the files backend:

    Benchmark 1: update-ref (refformat = files)
      Time (mean ± σ):     223.1 ms ±   0.6 ms    [User: 71.2 ms, System: 150.8 ms]
      Range (min … max):   222.5 ms … 224.2 ms    5 runs

    Benchmark 2: update-ref (refformat = reftable)
      Time (mean ± σ):     129.1 ms ±   2.1 ms    [User: 84.4 ms, System: 44.1 ms]
      Range (min … max):   127.2 ms … 132.7 ms    5 runs

    Summary
      update-ref (refformat = reftable) ran
        1.73 ± 0.03 times faster than update-ref (refformat = files)

I don't really think there's a general fix for this issue though, as the
issue comes from the design of how tombstone references work.

That being said, I found an optimization in how we parse ref updates in
git-update-ref(1): when we see an exact object ID, we can skip the call
to `repo_get_oid()`. This function is quite expensive because it doesn't
only parse object IDs, but revisions in general. This didn't have much
of an impact on "packed-refs", because there are no references in the
first place. But it did have a significant impact on the "reftable"
backend, where we do have deleted references.

So optimizing this edge case leads to a significant speedup for the
"reftable" backend, but also to a small speedup for the "files" backend:

    Benchmark 1: update-ref (refformat = files, revision = master)
      Time (mean ± σ):     224.7 ms ±   2.9 ms    [User: 79.4 ms, System: 143.5 ms]
      Range (min … max):   220.2 ms … 228.0 ms    5 runs

    Benchmark 2: update-ref (refformat = reftable, revision = master)
      Time (mean ± σ):     16.304 s ±  0.429 s    [User: 16.216 s, System: 0.051 s]
      Range (min … max):   15.865 s … 16.862 s    5 runs

    Benchmark 3: update-ref (refformat = files, revision = pks-reftable-optimization)
      Time (mean ± σ):     181.3 ms ±   2.4 ms    [User: 69.5 ms, System: 110.7 ms]
      Range (min … max):   178.5 ms … 185.0 ms    5 runs

    Benchmark 4: update-ref (refformat = reftable, revision = pks-reftable-optimization)
      Time (mean ± σ):      5.939 s ±  0.060 s    [User: 5.895 s, System: 0.028 s]
      Range (min … max):    5.875 s …  6.026 s    5 runs

    Summary
      update-ref (refformat = files, revision = pks-reftable-optimization) ran
        1.24 ± 0.02 times faster than update-ref (refformat = files, revision = master)
       32.76 ± 0.55 times faster than update-ref (refformat = reftable, revision = pks-reftable-optimization)
       89.93 ± 2.65 times faster than update-ref (refformat = reftable, revision = master)

I will continue digging a bit to see whether there is more to find in
this context and will send a patch to the mailing list later today or
tomorrow.

Patrick

  reply	other threads:[~2025-02-13  7:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-13  0:01 Poor performance using reftable with many refs brian m. carlson
2025-02-13  6:11 ` Patrick Steinhardt
2025-02-13  7:13   ` Patrick Steinhardt [this message]
2025-02-13  8:22     ` Jeff King
2025-02-13 11:20       ` Patrick Steinhardt
2025-02-13 14:31         ` Patrick Steinhardt
2025-02-13 19:53           ` Jeff King
2025-02-13 19:42         ` Jeff King
2025-02-13 20:12           ` Junio C Hamano
2025-02-13 22:17       ` brian m. carlson
2025-02-13  9:27     ` Christian Couder
2025-02-13 13:21       ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z62booOOXODOl_sZ@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=karthik.188@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox