From: "brian m. carlson" <sandals@crustytoothpaste.net>
To: git@vger.kernel.org
Cc: Patrick Steinhardt <ps@pks.im>, Karthik Nayak <karthik.188@gmail.com>
Subject: Poor performance using reftable with many refs
Date: Thu, 13 Feb 2025 00:01:59 +0000 [thread overview]
Message-ID: <Z602dzQggtDdcgCX@tapette.crustytoothpaste.net> (raw)
[-- Attachment #1: Type: text/plain, Size: 4272 bytes --]
I've been doing some testing of reftable at $DAYJOB and I found an
interesting performance problem when creating many refs.
I've attached a script which takes 50,000 recent commits, creates a file
suitable for `git update-ref --stdin`, deletes all of the existing refs,
and then uses that file to create the 50,000 refs. The ref creation is
timed using Linux's `/usr/bin/time`. (This is partially extracted from
a larger script, so please accept my apologies for some untidiness.)
With the files backend, the output is as below:
1.75user 3.73system 0:05.50elapsed 99%CPU (0avgtext+0avgdata 166344maxresident)k
0inputs+442880outputs (0major+27962minor)pagefaults 0swaps
With the reftable backend, this is the output:
56.91user 0.52system 0:57.44elapsed 99%CPU (0avgtext+0avgdata 160416maxresident)k
0inputs+6784outputs (0major+26151minor)pagefaults 0swaps
Both measurements are on next, so they should have all relevant patches
that I'm aware of. I've tested on two X1 Carbons, one with Ubuntu 24.04
and one with Debian unstable, so they're both reasonably beefy machines
with modern Linux OSes.
It takes about 30 times as long to perform using the reftable backend,
which is concerning. While this is a synthetic measurement, I had
intended to use it to determine the performance characteristics of
the reference update portion when pushing a large repository for the
first time.
I admit I haven't done any other particular investigation as to what's
going wrong here, but the behaviour is very noticeable so it may be easy
to profile.
One note: the script will be faster and more useful to reproduce if you
change the repository source to a local clone of the Linux repo.
----
#!/bin/sh -e
# This script will reproduce a performance problem with many (50000) refs using
# the current version of reftable in next. The directory `testcase` under the
# current directory will be removed and replaced.
#
# Once the script is finished, you can do `cat testcase/tracedir/*/re-creation`
# to see the performance characteristics of the files backend (first) and the
# reftable backend (second).
# Your friendly neighbourhood Linux repository. This may be any valid remote,
# including an HTTPS or SSH URL.
REPO_SRC="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git"
TAG="v6.13"
export GIT_CONFIG_GLOBAL=/dev/null
timed_op () {
local output="$1"
local message="$2"
shift
shift
printf '%s...' "$message" >&2
/usr/bin/time -o "$TRACEDIR/$output" "$@"
printf 'done.\n' >&2
}
delete_refs () {
local output="$1"
(
echo "start"
git for-each-ref --format="%(refname)" | sed -e 's/^/delete /'
echo "prepare"
echo "commit"
) | timed_op "$output" "Deleting all references" git update-ref --stdin
}
fake_refs=true
while [ $# -gt 0 ]
do
case "$1" in
--real-refs)
fake_refs=false
shift
;;
*)
break
;;
esac
done
rm -fr testcase
mkdir testcase
cd testcase
git clone --bare "$REPO_SRC" repo
mkdir tracedir
for backend in files reftable
do
git clone --bare repo $backend
(
set -e
cd $backend
TRACEDIR="$(realpath "../tracedir/$backend")"
mkdir -p "$TRACEDIR"
if [ "$backend" = reftable ]
then
timed_op "migration" "Migrating to reftable" git refs migrate --ref-format=reftable
fi
if $fake_refs
then
git rev-list "$TAG" | head -n 50000 | perl -pe '
$count++;
$choice = $count % 4;
if ($choice == 0) {
s!^(.*)$!create refs/heads/ref-$count $1!;
} elsif ($choice == 1) {
s!^(.*)$!create refs/remotes/bk2204/ref-$count $1!;
} elsif ($choice == 2) {
s!^(.*)$!create refs/remotes/origin/ref-$count $1!;
} elsif ($choice == 3) {
s!^(.*)$!create refs/tags/tag-$count $1!;
}
' | sort >all-refs
else
git for-each-ref --format="%(refname) %(objectname)" | sed -e 's/^/create /' >all-refs
fi
delete_refs "deletion"
timed_op "re-creation" "Re-creating refs" git update-ref --stdin <all-refs
)
done
----
--
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]
next reply other threads:[~2025-02-13 0:02 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-13 0:01 brian m. carlson [this message]
2025-02-13 6:11 ` Poor performance using reftable with many refs Patrick Steinhardt
2025-02-13 7:13 ` Patrick Steinhardt
2025-02-13 8:22 ` Jeff King
2025-02-13 11:20 ` Patrick Steinhardt
2025-02-13 14:31 ` Patrick Steinhardt
2025-02-13 19:53 ` Jeff King
2025-02-13 19:42 ` Jeff King
2025-02-13 20:12 ` Junio C Hamano
2025-02-13 22:17 ` brian m. carlson
2025-02-13 9:27 ` Christian Couder
2025-02-13 13:21 ` Patrick Steinhardt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z602dzQggtDdcgCX@tapette.crustytoothpaste.net \
--to=sandals@crustytoothpaste.net \
--cc=git@vger.kernel.org \
--cc=karthik.188@gmail.com \
--cc=ps@pks.im \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox