From: git@jeffhostetler.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: [PATCH v8 2/3] p0006-read-tree-checkout: perf test to time read-tree
Date: Mon, 10 Apr 2017 21:14:02 +0000 [thread overview]
Message-ID: <20170410211403.25126-3-git@jeffhostetler.com> (raw)
In-Reply-To: <20170410211403.25126-1-git@jeffhostetler.com>
From: Jeff Hostetler <jeffhost@microsoft.com>
Created t/perf/repos/many-files.sh to generate large, but
artificial repositories.
Created t/perf/p0006-read-tree-checkout.sh to measure
performance on various read-tree, checkout, and update-index
operations.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
t/perf/p0006-read-tree-checkout.sh | 90 ++++++++++++++++++++++++++++
t/perf/repos/.gitignore | 1 +
t/perf/repos/many-files.sh | 118 +++++++++++++++++++++++++++++++++++++
3 files changed, 209 insertions(+)
create mode 100755 t/perf/p0006-read-tree-checkout.sh
create mode 100644 t/perf/repos/.gitignore
create mode 100755 t/perf/repos/many-files.sh
diff --git a/t/perf/p0006-read-tree-checkout.sh b/t/perf/p0006-read-tree-checkout.sh
new file mode 100755
index 0000000..69425ae
--- /dev/null
+++ b/t/perf/p0006-read-tree-checkout.sh
@@ -0,0 +1,90 @@
+#!/bin/sh
+##
+## This test measures the performance of various read-tree
+## and checkout operations. It is primarily interested in
+## the algorithmic costs of index operations and recursive
+## tree traversal -- and NOT disk I/O on thousands of files.
+## Therefore, it uses sparse-checkout to avoid populating
+## the ballast files.
+##
+## It expects the test repo to have certain characteristics.
+## Branches:
+## () master := an arbitrary commit.
+## () ballast := an arbitrary commit with a large number
+## of changes relative to "master".
+## () ballast-alias := a branch pointing to the same commit
+## as "ballast".
+## () ballast-1 := a commit with a 1 file difference from
+## "ballast".
+##
+## Ballast files in the repository should not appear in
+## the sparse-checkout (if present).
+##
+## See "t/perf/repos/many-files.sh" to generate some
+## synthetic data.
+##
+
+test_description="Tests performance of read-tree"
+
+. ./perf-lib.sh
+
+test_perf_default_repo
+
+test_expect_success 'setup' '
+ echo "sparse/*" >>.git/info/sparse-checkout &&
+ git config --local core.sparsecheckout 1 &&
+ git checkout ballast
+'
+
+nr_files=$(git ls-files | wc -l)
+
+test_perf "read-tree master ballast ($nr_files)" '
+ git read-tree -m master ballast -n
+'
+
+## Alternate between a commit with and without the ballast.
+test_perf "switch between master ballast ($nr_files)" '
+ git checkout master &&
+ git checkout ballast
+'
+
+## Alternate between 2 commits with the ballast and 1 change.
+test_perf "switch between ballast ballast-1 ($nr_files)" '
+ git checkout ballast-1 &&
+ git checkout ballast
+'
+
+## Alternate between 2 aliases for the same commit.
+test_perf "switch between aliases ballast ballast-alias ($nr_files)" '
+ git checkout ballast-alias &&
+ git checkout ballast
+'
+
+export nr_random=200000
+
+## Add random items in sorted order. These will effectively
+## get appended because "zzzz/" sorts after anything already
+## present in index. Then delete them in reverse order, so
+## that the last is removed each time.
+test_perf "add forward sorted items ($nr_files) ($nr_random)" '
+ awk "BEGIN { for (f = 0; f <= $nr_random; f++) printf \"r_%06d\n\", f }" |
+ sed "s|^|100644 $EMPTY_BLOB zzzz/|" |
+ git update-index --index-info &&
+ awk "BEGIN { for (f = $nr_random; f >= 0; f--) printf \"r_%06d\n\", f }" |
+ sed "s|^|000000 $EMPTY_BLOB zzzz/|" |
+ git update-index --index-info
+'
+
+## Add random items in reverse order. These will be inserted
+## NEAR the end of the index, but not at the end. Then delete
+## them in the opposite order, so that we remove from the middle.
+test_perf "add reverse sorted items ($nr_files) ($nr_random)" '
+ awk "BEGIN { for (f = $nr_random; f >= 0; f--) printf \"r_%06d\n\", f }" |
+ sed "s|^|100644 $EMPTY_BLOB zzzz/|" |
+ git update-index --index-info &&
+ awk "BEGIN { for (f = 0; f <= $nr_random; f++) printf \"r_%06d\n\", f }" |
+ sed "s|^|000000 $EMPTY_BLOB zzzz/|" |
+ git update-index --index-info
+'
+
+test_done
diff --git a/t/perf/repos/.gitignore b/t/perf/repos/.gitignore
new file mode 100644
index 0000000..72e3dc3
--- /dev/null
+++ b/t/perf/repos/.gitignore
@@ -0,0 +1 @@
+gen-*/
diff --git a/t/perf/repos/many-files.sh b/t/perf/repos/many-files.sh
new file mode 100755
index 0000000..a4c44b3
--- /dev/null
+++ b/t/perf/repos/many-files.sh
@@ -0,0 +1,118 @@
+#!/bin/sh
+## Generate test data repository "many-files.git" using the given parameters.
+## Usage: [-r repo] [-d depth] [-w width] [-f files]
+##
+## -r repo: path to the new repo to be generated
+## -d depth: the depth of sub-directories
+## -w width: the number of sub-directories at each level
+## -f files: the number of files created in each directory
+##
+## Note that all files will have the same SHA-1 and each
+## directory at a level will have the same SHA-1, so we
+## will potentially have a large index, but not a large
+## ODB.
+##
+## Ballast will be created under "ballast/". Sparse-checkout
+## will be enabled so that they will not be populated by
+## default.
+
+EMPTY_BLOB=e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
+
+set -e
+
+## (5, 10, 9) will create 999,999 ballast files.
+## (4, 10, 9) will create 99,999 ballast files.
+depth=5
+width=10
+files=9
+
+while test "$#" -ne 0
+do
+ case "$1" in
+ -r)
+ shift;
+ test "$#" -ne 0 || { echo 'error: -r requires an argument' >&2; exit 1; }
+ repo=$1;
+ shift ;;
+ -d)
+ shift;
+ test "$#" -ne 0 || { echo 'error: -d requires an argument' >&2; exit 1; }
+ depth=$1;
+ shift ;;
+ -w)
+ shift;
+ test "$#" -ne 0 || { echo 'error: -w requires an argument' >&2; exit 1; }
+ width=$1;
+ shift ;;
+ -f)
+ shift;
+ test "$#" -ne 0 || { echo 'error: -f requires an argument' >&2; exit 1; }
+ files=$1;
+ shift ;;
+ *)
+ echo "error: unknown option '$1'" >&2; exit 1 ;;
+ esac
+done
+
+## Inflate the index with thousands of empty files.
+## usage: dir depth width files
+fill_index() {
+ awk -v arg_dir=$1 -v arg_depth=$2 -v arg_width=$3 -v arg_files=$4 '
+ function make_paths(dir, depth, width, files, f, w) {
+ for (f = 1; f <= files; f++) {
+ print dir "/file" f
+ }
+ if (depth > 0) {
+ for (w = 1; w <= width; w++) {
+ make_paths(dir "/dir" w, depth - 1, width, files)
+ }
+ }
+ }
+ END { make_paths(arg_dir, arg_depth, arg_width, arg_files) }
+ ' </dev/null |
+ sed "s/^/100644 $EMPTY_BLOB /" |
+ git update-index --index-info
+ return 0
+}
+
+[ -z "$repo" ] && repo=gen-many-files-$depth.$width.$files.git
+
+mkdir $repo
+cd $repo
+git init .
+
+## Create initial commit just to define branch.
+mkdir sparse
+touch sparse/file.txt
+echo "$depth $width $files" >sparse/repo.params
+git add sparse
+git commit -q -m params
+
+## Turn on sparse-checkout so that we don't have to populate
+## the ballast when switching branches. Use reset --hard to
+## quickly checkout the new HEAD with minimum actual files.
+echo 'sparse/*' >>.git/info/sparse-checkout
+git config --local core.sparsecheckout 1
+git reset --hard
+
+## Inflate the index with thousands of empty files and commit.
+git checkout -b ballast
+fill_index "ballast" $depth $width $files
+git commit -q -m "ballast"
+
+## Create an alias for that commit.
+git branch "ballast-alias"
+
+nr_files=$(git ls-files | wc -l)
+
+## Modify 1 file and commit.
+git checkout -b "ballast-1"
+echo x >sparse/file.txt
+git add sparse
+git commit -q -m "ballast plus 1"
+
+## Checkout master to put repo in canonical state.
+git checkout master
+
+echo "Repository "$repo" ($depth, $width, $files) created. Ballast $nr_files."
+exit 0
--
2.9.3
next prev parent reply other threads:[~2017-04-10 21:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-10 21:14 [PATCH v8 0/3] read-cache: speed up add_index_entry git
2017-04-10 21:14 ` [PATCH v8 1/3] read-cache: add strcmp_offset function git
2017-04-10 21:14 ` git [this message]
2017-04-10 21:21 ` [PATCH v8 2/3] p0006-read-tree-checkout: perf test to time read-tree Jeff King
2017-04-10 21:14 ` [PATCH v8 3/3] read-cache: speed up add_index_entry during checkout git
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170410211403.25126-3-git@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).