Git development
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: Michael Montalbo <mmontalbo@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 3/3] t5551: pack refs after creating many tags
Date: Tue, 30 Jun 2026 19:47:02 -0400	[thread overview]
Message-ID: <20260630234702.GA3759976@coredump.intra.peff.net> (raw)
In-Reply-To: <akOG0oMu2KTqqyW7@pks.im>

On Tue, Jun 30, 2026 at 11:05:22AM +0200, Patrick Steinhardt wrote:

> > We make 2000 fstat, which strace claims takes 85% of the time. I suspect
> > this is over-emphasized because strace inherently makes syscalls slow,
> > but running with perf also highlights it as a non-trivial cost.
> 
> Yeah, this rings a bell. If I remember correctly, this is because we
> call `refs_resolve_ref_unsafe()` to verify whether the target already
> exists. And as that function is generic, it wasn't easy to optimize it
> by reusing the already-loaded reftable stack.

I think it's similar to that. I dug down and got some actual performance
numbers below for eliminating the fstat() calls, with the conclusion
that it's probably not worth spending too much time digging into it. But
details below for completeness.

I was wrong to say fstat() before, it's a regular stat(). The
interesting backtrace is:

  #0  __GI___stat64 (file=0x555555ab0bf0 "/home/peff/tmp/.git/reftable/tables.list", buf=0x7fffffffd630)
      at ../sysdeps/unix/sysv/linux/stat64.c:29
  #1  0x0000555555832ad6 in stack_uptodate (st=0x555555ab0ae0) at reftable/stack.c:575
  #2  0x0000555555832c5c in reftable_stack_reload (st=0x555555ab0ae0) at reftable/stack.c:625
  #3  0x000055555581f690 in backend_for (out=0x7fffffffd798, store=0x555555ab0900,
      refname=0x555555aae178 "refs/tags/foo-1", rewritten_ref=0x7fffffffd780, reload=1) at refs/reftable-backend.c:283
  #4  0x0000555555824957 in reftable_be_reflog_exists (ref_store=0x555555ab0900,
      refname=0x555555aae178 "refs/tags/foo-1") at refs/reftable-backend.c:2266
  #5  0x000055555581393a in refs_reflog_exists (refs=0x555555ab0900, refname=0x555555aae178 "refs/tags/foo-1")
      at refs.c:2995
  #6  0x000055555581f730 in should_write_log (refs=0x555555ab0900, refname=0x555555aae178 "refs/tags/foo-1")
      at refs/reftable-backend.c:301
  #7  0x00005555558224df in write_transaction_table (writer=0x5555569abe30, cb_data=0x5555569ab820)
      at refs/reftable-backend.c:1511
  #8  0x000055555583354d in reftable_addition_add (add=0x5555569ab050,
      write_table=0x5555558220f0 <write_transaction_table>, arg=0x5555569ab820) at reftable/stack.c:902
  #9  0x0000555555822b2b in reftable_be_transaction_finish (ref_store=0x555555ab0900, transaction=0x555555ab0c80,
      err=0x7fffffffdbc0) at refs/reftable-backend.c:1633
  #10 0x0000555555813161 in ref_transaction_commit (transaction=0x555555ab0c80, err=0x7fffffffdbc0) at refs.c:2769
  #11 0x00005555556956c9 in update_refs_stdin (flags=0) at builtin/update-ref.c:789

So we are asking about reflogs for each ref under the "only reflog if a
log already exists" rule. Which means we can easily disable it by
setting core.logallrefupdates to "always", giving us a way to measure
the impact. So we can try:

  git init --ref-format=reftable
  blob=$(echo foo | git hash-object -w --stdin)
  seq -f "create refs/tags/foo-%g $blob" 50000 >input
  cp -a .git/reftable reftable.orig
  hyperfine \
  	-p 'rm -rf .git/reftable; cp -a reftable.orig .git/reftable' \
  	-L config true,always \
  	'git -c core.logallrefupdates={config} update-ref --stdin <input'

which yields:

  Benchmark 1: git -c core.logallrefupdates=true update-ref --stdin <input
    Time (mean ± σ):     128.8 ms ±   1.5 ms    [User: 97.1 ms, System: 31.7 ms]
    Range (min … max):   126.4 ms … 131.3 ms    23 runs
  
  Benchmark 2: git -c core.logallrefupdates=always update-ref --stdin <input
    Time (mean ± σ):     195.2 ms ±   1.7 ms    [User: 182.1 ms, System: 13.0 ms]
    Range (min … max):   191.9 ms … 197.6 ms    15 runs

So we saved all of those stat() calls, but writing the actual reflog
entries adds much more cost!

Sadly there is no "never" option for core.logallrefupdates. I hacked one
in and the result took ~96ms to run. So that tells us the cost of the
stat() checks: around 25% of the runtime.

Which is a big-ish percentage, but a small absolute number. It's 0.64us
per ref. Perhaps not worrying about too much.

The other interesting thing I noticed is that we seem to write() entries
individually with no buffering. Probably we could get an easy speedup
for big cases like this by using stdio in the fd_writer abstraction.
We're again pinching pennies to some degree, but I expect we may be able
to drop the 120ms case down to 60ms or so (just guessing based on the
extra reflog writes adding ~70ms).

There was one other oddity I didn't quite resolve. You may notice the
gross reftable.orig stuff in hyperfine. I originally wrote this as:

    git for-each-ref --format="delete %(refname)" | git update-ref --stdin

but for some reason that causes the subsequent update-ref to loop
infinitely on merged_iter_next_entry(). It does so reliably, but I can't
reproduce it outside of hyperfine. Super weird, and I'm sure I'm missing
something obvious.

-Peff

  reply	other threads:[~2026-06-30 23:47 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-20 15:33 [RFH] Why do osx CI jobs so unreliable? Michael Montalbo
2026-06-21 21:34 ` Jeff King
2026-06-22  4:42   ` Patrick Steinhardt
2026-06-22  9:47     ` Patrick Steinhardt
2026-06-22  9:55       ` Patrick Steinhardt
2026-06-22 10:29         ` Patrick Steinhardt
2026-06-26  3:27           ` Michael Montalbo
2026-06-26  5:16             ` Jeff King
2026-06-26 10:50               ` Patrick Steinhardt
2026-06-26 13:45                 ` Junio C Hamano
2026-06-26 23:26                 ` Michael Montalbo
2026-06-28  7:57                   ` [PATCH 0/3] fixing expensive http test timeouts Jeff King
2026-06-28  8:00                     ` [PATCH 1/3] t/lib-httpd: bump apache timeout Jeff King
2026-07-02  3:24                       ` Michael Montalbo
2026-06-28  8:03                     ` [PATCH 2/3] t5551: put many-tags case into its own repo Jeff King
2026-06-28 21:44                       ` Junio C Hamano
2026-06-29  0:34                         ` Jeff King
2026-06-29 14:42                           ` Junio C Hamano
2026-06-29 20:36                             ` Jeff King
2026-06-28  8:07                     ` [PATCH 3/3] t5551: pack refs after creating many tags Jeff King
2026-06-28 21:25                       ` Junio C Hamano
2026-06-29  5:57                       ` Patrick Steinhardt
2026-06-29 20:35                         ` Jeff King
2026-06-30  9:05                           ` Patrick Steinhardt
2026-06-30 23:47                             ` Jeff King [this message]
2026-06-30 23:58                               ` weird quadratic reftable behavior, was: " Jeff King
2026-07-01  6:17                                 ` Patrick Steinhardt
2026-07-01  8:00                                   ` Jeff King
2026-07-01  9:04                                     ` Kristofer Karlsson
2026-07-01 10:07                                       ` Patrick Steinhardt
2026-07-03 12:09                                         ` Kristofer Karlsson
2026-06-29  7:33                     ` [PATCH 0/3] fixing expensive http test timeouts Patrick Steinhardt
2026-06-29 14:39                       ` Junio C Hamano
2026-06-29 16:09                         ` Patrick Steinhardt
2026-06-29 16:19                           ` Junio C Hamano
2026-06-30  9:05                             ` Patrick Steinhardt
2026-06-30 19:21                               ` Junio C Hamano
2026-07-01  6:56                                 ` Patrick Steinhardt
2026-06-26 23:43                 ` [RFH] Why do osx CI jobs so unreliable? Jeff King
2026-06-22  5:05   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260630234702.GA3759976@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mmontalbo@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox