public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
* Poor performance using reftable with many refs
@ 2025-02-13  0:01 brian m. carlson
  2025-02-13  6:11 ` Patrick Steinhardt
  0 siblings, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2025-02-13  0:01 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak

[-- Attachment #1: Type: text/plain, Size: 4272 bytes --]

I've been doing some testing of reftable at $DAYJOB and I found an
interesting performance problem when creating many refs.

I've attached a script which takes 50,000 recent commits, creates a file
suitable for `git update-ref --stdin`, deletes all of the existing refs,
and then uses that file to create the 50,000 refs.  The ref creation is
timed using Linux's `/usr/bin/time`.  (This is partially extracted from
a larger script, so please accept my apologies for some untidiness.)

With the files backend, the output is as below:

  1.75user 3.73system 0:05.50elapsed 99%CPU (0avgtext+0avgdata 166344maxresident)k
  0inputs+442880outputs (0major+27962minor)pagefaults 0swaps

With the reftable backend, this is the output:

  56.91user 0.52system 0:57.44elapsed 99%CPU (0avgtext+0avgdata 160416maxresident)k
  0inputs+6784outputs (0major+26151minor)pagefaults 0swaps

Both measurements are on next, so they should have all relevant patches
that I'm aware of.  I've tested on two X1 Carbons, one with Ubuntu 24.04
and one with Debian unstable, so they're both reasonably beefy machines
with modern Linux OSes.

It takes about 30 times as long to perform using the reftable backend,
which is concerning.  While this is a synthetic measurement, I had
intended to use it to determine the performance characteristics of
the reference update portion when pushing a large repository for the
first time.

I admit I haven't done any other particular investigation as to what's
going wrong here, but the behaviour is very noticeable so it may be easy
to profile.

One note: the script will be faster and more useful to reproduce if you
change the repository source to a local clone of the Linux repo.

----
#!/bin/sh -e
# This script will reproduce a performance problem with many (50000) refs using
# the current version of reftable in next.  The directory `testcase` under the
# current directory will be removed and replaced.
#
# Once the script is finished, you can do `cat testcase/tracedir/*/re-creation`
# to see the performance characteristics of the files backend (first) and the
# reftable backend (second).

# Your friendly neighbourhood Linux repository.  This may be any valid remote,
# including an HTTPS or SSH URL.
REPO_SRC="https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git"
TAG="v6.13"

export GIT_CONFIG_GLOBAL=/dev/null

timed_op () {
  local output="$1"
  local message="$2"
  shift
  shift
  printf '%s...' "$message" >&2
  /usr/bin/time -o "$TRACEDIR/$output" "$@"
  printf 'done.\n' >&2
}

delete_refs () {
  local output="$1"
  (
    echo "start"
    git for-each-ref --format="%(refname)" | sed -e 's/^/delete /'
    echo "prepare"
    echo "commit"
  ) | timed_op "$output" "Deleting all references" git update-ref --stdin
}

fake_refs=true
while [ $# -gt 0 ]
do
  case "$1" in
    --real-refs)
      fake_refs=false
      shift
      ;;
    *)
      break
      ;;
  esac
done

rm -fr testcase
mkdir testcase
cd testcase
git clone --bare "$REPO_SRC" repo

mkdir tracedir

for backend in files reftable
do
  git clone --bare repo $backend
  (
    set -e
    cd $backend
    TRACEDIR="$(realpath "../tracedir/$backend")"
    mkdir -p "$TRACEDIR"

    if [ "$backend" = reftable ]
    then
      timed_op "migration" "Migrating to reftable" git refs migrate --ref-format=reftable
    fi

    if $fake_refs
    then
      git rev-list "$TAG" | head -n 50000 | perl -pe '
        $count++;
        $choice = $count % 4;
        if ($choice == 0) {
          s!^(.*)$!create refs/heads/ref-$count $1!;
        } elsif ($choice == 1) {
          s!^(.*)$!create refs/remotes/bk2204/ref-$count $1!;
        } elsif ($choice == 2) {
          s!^(.*)$!create refs/remotes/origin/ref-$count $1!;
        } elsif ($choice == 3) {
          s!^(.*)$!create refs/tags/tag-$count $1!;
        }
      ' | sort >all-refs
    else
      git for-each-ref --format="%(refname) %(objectname)" | sed -e 's/^/create /' >all-refs
    fi
    delete_refs "deletion"
    timed_op "re-creation" "Re-creating refs" git update-ref --stdin <all-refs
  )
done
----
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-02-13 22:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-13  0:01 Poor performance using reftable with many refs brian m. carlson
2025-02-13  6:11 ` Patrick Steinhardt
2025-02-13  7:13   ` Patrick Steinhardt
2025-02-13  8:22     ` Jeff King
2025-02-13 11:20       ` Patrick Steinhardt
2025-02-13 14:31         ` Patrick Steinhardt
2025-02-13 19:53           ` Jeff King
2025-02-13 19:42         ` Jeff King
2025-02-13 20:12           ` Junio C Hamano
2025-02-13 22:17       ` brian m. carlson
2025-02-13  9:27     ` Christian Couder
2025-02-13 13:21       ` Patrick Steinhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox