From: Thomas Gummerer <t.gummerer@gmail.com>
To: Jeff Hostetler <git@jeffhostetler.com>
Cc: git@vger.kernel.org, gitster@pobox.com, peff@peff.net,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: Re: [PATCH v3 2/2] p0005-status: time status on very large repo
Date: Fri, 7 Apr 2017 00:26:39 +0100 [thread overview]
Message-ID: <20170406232634.GB32223@hank> (raw)
In-Reply-To: <590f6863-801b-58e9-3700-962168f8315e@jeffhostetler.com>
On 04/06, Jeff Hostetler wrote:
>
>
> On 4/6/2017 6:14 PM, Thomas Gummerer wrote:
> >On 04/06, git@jeffhostetler.com wrote:
> >>From: Jeff Hostetler <jeffhost@microsoft.com>
> >>
> >>Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
> >>---
> >> t/perf/p0005-status.sh | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >> 1 file changed, 61 insertions(+)
> >> create mode 100755 t/perf/p0005-status.sh
> >>
> >>diff --git a/t/perf/p0005-status.sh b/t/perf/p0005-status.sh
> >>new file mode 100755
> >>index 0000000..704cebc
> >>--- /dev/null
> >>+++ b/t/perf/p0005-status.sh
> >>@@ -0,0 +1,61 @@
> >>+#!/bin/sh
> >>+
> >>+test_description="Tests performance of read-tree"
> >>+
> >>+. ./perf-lib.sh
> >>+
> >>+test_perf_default_repo
> >>+test_checkout_worktree
> >>+
> >>+## usage: dir depth width files
> >>+make_paths () {
> >>+ for f in $(seq $4)
> >>+ do
> >>+ echo $1/file$f
> >>+ done;
> >>+ if test $2 -gt 0;
> >>+ then
> >>+ for w in $(seq $3)
> >>+ do
> >>+ make_paths $1/dir$w $(($2 - 1)) $3 $4
> >>+ done
> >>+ fi
> >>+ return 0
> >>+}
> >>+
> >>+fill_index () {
> >>+ make_paths $1 $2 $3 $4 |
> >>+ sed "s/^/100644 $EMPTY_BLOB /" |
> >>+ git update-index --index-info
> >>+ return 0
> >>+}
> >>+
> >>+br_work1=xxx_work1_xxx
> >>+dir_new=xxx_dir_xxx
> >>+
> >>+## (5, 10, 9) will create 999,999 files.
> >>+## (4, 10, 9) will create 99,999 files.
> >>+depth=5
> >>+width=10
> >>+files=9
> >>+
> >>+## Inflate the index with thousands of empty files and commit it.
> >>+## Use reset to actually populate the worktree.
> >>+test_expect_success 'inflate the index' '
> >>+ git reset --hard &&
> >>+ git branch $br_work1 &&
> >>+ git checkout $br_work1 &&
> >>+ fill_index $dir_new $depth $width $files &&
> >>+ git commit -m $br_work1 &&
> >>+ git reset --hard
> >>+'
> >>+
> >>+## The number of files in the branch.
> >>+nr_work1=$(git ls-files | wc -l)
> >
> >The above seems to be repeated (or at least very similar to what you
> >have in your other series [1]. Especially in this perf test wouldn't
> >it be better just use test_perf_large_repo, and let whoever runs the
> >test decide what constitutes a large repository for them?
> >
> >The other advantage of that would be that it is more of a real-world
> >scenario, instead of a synthetic distribution of the files, which
> >would give us some better results I think.
> >
> >Is there anything I'm missing that would make using
> >test_perf_large_repo not a good option here?
>
> Yes, it is copied from the other series. I make the same change
> that Rene just suggested on it to use awk to create the list.
>
> I did this because I need very large repos. From what I can tell
> the common usage is to set test_perf_large_repo to linux.git, but
> that only has 58K files. And it defaults to git.git which only
> has 3K files.
Yeah true. Back when I worked on "index v5" for my GSoC project, I
used to use the webkit repository, which at the time had
300-something K files. Nowadays the better test might be the chromium
repository, but I'm not sure (cloning that takes a while on my
connection :) ).
> Internally, I test against the Windows source tree with 3.1M files,
> but I can't share that :-)
Heh. I'd love to see the performance numbers for that though!
> So I created this test to generate artificial, but large and
> reproducible repos for evaluation.
>
> I could change the default depth to 4 (giving a 100K tree), but
> I'm really interested in 1M+ repos. For small-ish values of n
> the difference between O(n) and O(n log n) operations can hide
> in system and I/O noise; not so for very large n....
Makes sense to me. Thanks for the explanation!
> >
> >[1]: http://public-inbox.org/git/20170406163442.36463-3-git@jeffhostetler.com/
> >
> >>+test_perf "read-tree status work1 ($nr_work1)" '
> >>+ git read-tree HEAD &&
> >>+ git status
> >>+'
> >>+
> >>+test_done
> >>--
> >>2.9.3
> >>
prev parent reply other threads:[~2017-04-06 21:20 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-06 13:45 [PATCH v3 0/2] string-list: use ALLOC_GROW macro when reallocing git
2017-04-06 13:45 ` [PATCH v3 1/2] string-list: use ALLOC_GROW macro when reallocing string_list git
2017-04-06 13:45 ` [PATCH v3 2/2] p0005-status: time status on very large repo git
2017-04-06 22:14 ` Thomas Gummerer
2017-04-06 20:58 ` Jeff Hostetler
2017-04-06 23:26 ` Thomas Gummerer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170406232634.GB32223@hank \
--to=t.gummerer@gmail.com \
--cc=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).