Git development
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Patrick Steinhardt <ps@pks.im>
Cc: Michael Montalbo <mmontalbo@gmail.com>,
	git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH 3/3] t5551: pack refs after creating many tags
Date: Mon, 29 Jun 2026 16:35:27 -0400	[thread overview]
Message-ID: <20260629203527.GA1895313@coredump.intra.peff.net> (raw)
In-Reply-To: <akIJQbOUbdBbkTef@pks.im>

On Mon, Jun 29, 2026 at 07:57:21AM +0200, Patrick Steinhardt wrote:

> > It would be nice if we had a way to generate all of these refs without
> > writing so many individual files. But even if we taught the ref code to
> > write large cases directly to the packed-refs file, we'd still need to
> > take individual locks. The real solution is a backend like reftable,
> > which shaves ~30% off of the test runtime.
> 
> We kind of already have this with the `REF_TRANSACTION_FLAG_INITIAL`
> flag, but right now it is only used when performing a clone or when
> migrating references. Also, it requires an empty repository that has no
> references yet.
> 
> It raises the question whether we could also extend git-fast-import(1)
> to use it, as it would typically be run on an almost-empty repository.
> It's the "almost" that kills it though, as we already do have at least
> the HEAD reference. So it could be feasible, but it's not as trivial as
> just setting the flag and then we're magically faster.
> 
> And besides, in this particular test here we run git-fast-import(1)
> multiple times in the same repository, so it wouldn't help us.
> 
> We could of course extend all of this so that Git is able to write into
> the packed-refs directly, even with preexisting refs. But I agree with
> your sentiment: it doesn't feel worth it as the reftable backend fixes
> scenarios like this anyway.

Yup. In the past I've pondered exposing this via update-ref, but I think
it's too weird and/or dangerous to do so. Especially because you are
still stuck creating all of the .lock files, so the performance is not
even that much better (though it does save you doing so _twice_ when you
then pack the refs).

So the performance option you really want is "YOLO, just write some
packed-refs without locking". But that is not something I think we want
to expose to users. ;)

We could do it ad-hoc within this test like so:

diff --git a/t/t5551-http-fetch-smart.sh b/t/t5551-http-fetch-smart.sh
index dcff0bc7d4..276c7ac002 100755
--- a/t/t5551-http-fetch-smart.sh
+++ b/t/t5551-http-fetch-smart.sh
@@ -389,11 +389,15 @@ create_tags () {
 		echo "from :$1"
 	done | git fast-import --export-marks=marks &&
 
+	# should be mostly a noop, but makes sure we have the right header
+	git pack-refs &&
+
 	# now assign tags to all the dangling commits we created above
+	# It is OK to write directly to the packed-refs file because we know
+	# that our entries are sorted by refname, and that they all
+	# come after what we wrote earlier.
 	tag=$(perl -e "print \"bla\" x 30") &&
-	sed -e "s|^:\([^ ]*\) \(.*\)$|create refs/tags/$tag-\1 \2|" <marks >input &&
-	git update-ref --stdin <input &&
-	rm input
+	sed -e "s|^:\([^ ]*\) \(.*\)$|\2 refs/tags/$tag-\1|" <marks >>packed-refs
 }
 
 test_expect_success 'create 2,000 tags in the repo' '

That gives us the same ~30% speedup that using reftables does, but it
still is quite gross and fragile. And it is not even strictly correct,
because we don't zero-pad the numbered tags (so our file is subtly out
of order).  Plus it would need to be conditional on the ref backend
being used. Yuck.


There's one other thing you might find interesting. While poking at the
timings here the other day, I noticed that reftable is very eager to
stat the tables.list file. Try this:

  git init --ref-format=reftable
  blob=$(echo foo | git hash-object -w --stdin)
  seq -f "create refs/tags/foo-%g $blob" 2000 |
  strace -c git update-ref --stdin

We make 2000 fstat, which strace claims takes 85% of the time. I suspect
this is over-emphasized because strace inherently makes syscalls slow,
but running with perf also highlights it as a non-trivial cost.

It has been a long time since I've thought about reftable internals, but
it feels like we ought to be able to take the lock and then trust that
the stack has not been manipulated.

It may not be worth digging into too much, though. I can make 50,000
refs in 150ms on my system, which is probably good enough (especially
compared to the files backend).

-Peff

  reply	other threads:[~2026-06-29 20:35 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-20 15:33 [RFH] Why do osx CI jobs so unreliable? Michael Montalbo
2026-06-21 21:34 ` Jeff King
2026-06-22  4:42   ` Patrick Steinhardt
2026-06-22  9:47     ` Patrick Steinhardt
2026-06-22  9:55       ` Patrick Steinhardt
2026-06-22 10:29         ` Patrick Steinhardt
2026-06-26  3:27           ` Michael Montalbo
2026-06-26  5:16             ` Jeff King
2026-06-26 10:50               ` Patrick Steinhardt
2026-06-26 13:45                 ` Junio C Hamano
2026-06-26 23:26                 ` Michael Montalbo
2026-06-28  7:57                   ` [PATCH 0/3] fixing expensive http test timeouts Jeff King
2026-06-28  8:00                     ` [PATCH 1/3] t/lib-httpd: bump apache timeout Jeff King
2026-06-28  8:03                     ` [PATCH 2/3] t5551: put many-tags case into its own repo Jeff King
2026-06-28 21:44                       ` Junio C Hamano
2026-06-29  0:34                         ` Jeff King
2026-06-29 14:42                           ` Junio C Hamano
2026-06-29 20:36                             ` Jeff King
2026-06-28  8:07                     ` [PATCH 3/3] t5551: pack refs after creating many tags Jeff King
2026-06-28 21:25                       ` Junio C Hamano
2026-06-29  5:57                       ` Patrick Steinhardt
2026-06-29 20:35                         ` Jeff King [this message]
2026-06-29  7:33                     ` [PATCH 0/3] fixing expensive http test timeouts Patrick Steinhardt
2026-06-29 14:39                       ` Junio C Hamano
2026-06-29 16:09                         ` Patrick Steinhardt
2026-06-29 16:19                           ` Junio C Hamano
2026-06-26 23:43                 ` [RFH] Why do osx CI jobs so unreliable? Jeff King
2026-06-22  5:05   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260629203527.GA1895313@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mmontalbo@gmail.com \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox