From: Jeff King <peff@peff.net>
To: Martin Fick <mfick@codeaurora.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: [PATCH] p5302: create the repo in each index-pack test
Date: Mon, 22 Apr 2019 17:19:52 -0400 [thread overview]
Message-ID: <20190422211952.GA4728@sigill.intra.peff.net> (raw)
In-Reply-To: <20190422205653.GA30286@sigill.intra.peff.net>
On Mon, Apr 22, 2019 at 04:56:53PM -0400, Jeff King wrote:
> > I am running some index packs to test the theory, I can tell you already that
> > the 56 thread versions was much slower, it took 397m25.622s. I am running a
> > few other tests also, but it will take a while to get an answer. Since things
> > take hours to test, I made a repo with a single branch (and the tags for that
> > branch) from this bigger repo using a git init/git fetch. The single branch
> > repo takes about 12s to clone, but it takes around 14s with 3 threads to run
> > index-pack, any ideas why it is slower than a clone?
>
> Are you running it in the same repo, or in another newly-created repo?
> Or alternatively, in a new repo but repeatedly running index-pack? After
> the first run, that repo will have all of the objects. And so for each
> object it sees, index-pack will say "woah, we already had that one;
> let's double check that they're byte for byte identical" which carries
> extra overhead (and probably makes the lock contention way worse, too,
> because accessing existing objects just has one big coarse lock).
>
> So definitely do something like:
>
> for threads in 1 2 3 4 5 12 56; do
> rm -rf repo.git
> git init --bare repo.git
> GIT_FORCE_THREADS=$threads \
> git -C repo.git index-pack -v --stdin </path/to/pack
> done
>
> to test.
This is roughly what p5302 is going, though it does not go as high as
56 (though it seems like there is probably not much point in doing so).
However, I did notice this slight bug in it. After this fix, here are my
numbers from indexing git.git:
Test HEAD
----------------------------------------------------------------
5302.2: index-pack 0 threads 22.72(22.55+0.16)
5302.3: index-pack 1 thread 23.26(23.02+0.24)
5302.4: index-pack 2 threads 13.19(24.06+0.23)
5302.5: index-pack 4 threads 7.96(24.65+0.25)
5302.6: index-pack 8 threads 7.94(45.06+0.38)
5302.7: index-pack default number of threads 9.37(23.82+0.18)
So it looks like "4" is slightly better than the default of "3" for me.
I'm running it on linux.git now, but it will take quite a while to come
up with a result.
-- >8 --
Subject: [PATCH] p5302: create the repo in each index-pack test
The p5302 script runs "index-pack --stdin" in each timing test. It does
two things to try to get good timings:
1. we do the repo creation in a separate (non-timed) setup test, so
that our timing is purely the index-pack run
2. we use a separate repo for each test; this is important because the
presence of existing objects in the repo influences the result
(because we'll end up doing collision checks against them)
But this forgets one thing: we generally run each timed test multiple
times to reduce the impact of noise. Which means that repeats of each
test after the first will be subject to the collision slowdown from
point 2, and we'll generally just end up taking the first time anyway.
Instead, let's create the repo in the test (effectively undoing point
1). That does add a constant amount of extra work to each iteration, but
it's quite small compared to the actual effects we're interested in
measuring.
Signed-off-by: Jeff King <peff@peff.net>
---
The very first 0-thread one will run faster because it has less to "rm
-rf", but I think we can ignore that.
t/perf/p5302-pack-index.sh | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/t/perf/p5302-pack-index.sh b/t/perf/p5302-pack-index.sh
index 99bdb16c85..a9b3e112d9 100755
--- a/t/perf/p5302-pack-index.sh
+++ b/t/perf/p5302-pack-index.sh
@@ -13,35 +13,40 @@ test_expect_success 'repack' '
export PACK
'
-test_expect_success 'create target repositories' '
- for repo in t1 t2 t3 t4 t5 t6
- do
- git init --bare $repo
- done
-'
-
test_perf 'index-pack 0 threads' '
- GIT_DIR=t1 git index-pack --threads=1 --stdin < $PACK
+ rm -rf repo.git &&
+ git init --bare repo.git &&
+ GIT_DIR=repo.git git index-pack --threads=1 --stdin < $PACK
'
test_perf 'index-pack 1 thread ' '
- GIT_DIR=t2 GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK
+ rm -rf repo.git &&
+ git init --bare repo.git &&
+ GIT_DIR=repo.git GIT_FORCE_THREADS=1 git index-pack --threads=1 --stdin < $PACK
'
test_perf 'index-pack 2 threads' '
- GIT_DIR=t3 git index-pack --threads=2 --stdin < $PACK
+ rm -rf repo.git &&
+ git init --bare repo.git &&
+ GIT_DIR=repo.git git index-pack --threads=2 --stdin < $PACK
'
test_perf 'index-pack 4 threads' '
- GIT_DIR=t4 git index-pack --threads=4 --stdin < $PACK
+ rm -rf repo.git &&
+ git init --bare repo.git &&
+ GIT_DIR=repo.git git index-pack --threads=4 --stdin < $PACK
'
test_perf 'index-pack 8 threads' '
- GIT_DIR=t5 git index-pack --threads=8 --stdin < $PACK
+ rm -rf repo.git &&
+ git init --bare repo.git &&
+ GIT_DIR=repo.git git index-pack --threads=8 --stdin < $PACK
'
test_perf 'index-pack default number of threads' '
- GIT_DIR=t6 git index-pack --stdin < $PACK
+ rm -rf repo.git &&
+ git init --bare repo.git &&
+ GIT_DIR=repo.git git index-pack --stdin < $PACK
'
test_done
--
2.21.0.1182.g3590c06d32
next prev parent reply other threads:[~2019-04-22 21:19 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-19 21:47 Resolving deltas dominates clone time Martin Fick
2019-04-20 3:58 ` Jeff King
2019-04-20 7:59 ` Ævar Arnfjörð Bjarmason
2019-04-22 15:57 ` Jeff King
2019-04-22 18:01 ` Ævar Arnfjörð Bjarmason
2019-04-22 18:43 ` Jeff King
2019-04-23 7:07 ` Ævar Arnfjörð Bjarmason
2019-04-22 20:21 ` Martin Fick
2019-04-22 20:56 ` Jeff King
2019-04-22 21:02 ` Jeff King
2019-04-22 21:19 ` Jeff King [this message]
2019-04-23 1:09 ` [PATCH] p5302: create the repo in each index-pack test Junio C Hamano
2019-04-23 2:07 ` Jeff King
2019-04-23 2:27 ` Junio C Hamano
2019-04-23 2:36 ` Jeff King
2019-04-23 2:40 ` Junio C Hamano
2019-04-22 22:32 ` Resolving deltas dominates clone time Martin Fick
2019-04-23 1:55 ` Jeff King
2019-04-23 4:21 ` Jeff King
2019-04-23 10:08 ` Duy Nguyen
2019-04-23 20:09 ` Martin Fick
2019-04-30 18:02 ` Jeff King
2019-04-30 22:08 ` Martin Fick
2019-04-30 17:50 ` Jeff King
2019-04-30 18:48 ` Ævar Arnfjörð Bjarmason
2019-04-30 20:33 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190422211952.GA4728@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=mfick@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).