From: Brian Ericson <bericson@ptc.com>
To: <git@vger.kernel.org>
Subject: Re: Cygwin sparse checkout degrades performance
Date: Wed, 24 Dec 2014 13:40:58 -0600 [thread overview]
Message-ID: <549B16CA.3040107@ptc.com> (raw)
In-Reply-To: <549B0652.3020605@ptc.com>
Huh. The graphs (somehow) ended up incoherently reformatted... Sorry
about that!
Here's the raw data after a second run:
Linux:
100000 0.49
90000 0.27
80000 0.27
70000 0.28
60000 0.23
50000 0.21
40000 0.21
30000 0.19
20000 0.19
10000 0.16
1 0.14
Cygwin:
100000 4.72
90000 4.28
80000 4.41
70000 4.43
60000 4.67
50000 5.04
40000 6.24
30000 7.28
20000 7.88
10000 8.96
1 9.43
On 12/24/2014 12:30 PM, Brian Ericson wrote:
> Counter-intuitively, using sparse checkout in Cygwin degrades "status"
> times as status appears to "stat" non-existent files and directories.
>
> To demonstrate, I created a repo with 100k random files in a
> dir/dir/dir/file structure (on a linux box -- to do this in Cygwin
> requires piping the result of "openssl rand" to "dos2unix" as the output
> contains "\r") and cloned in a Cygwin shell:
>
> git init test
> cd test
> git commit --allow-empty -m 'Empty first commit'
> for i in {1..10}; do for j in {1..10000}; do file=$( openssl rand -hex
> 32 | sed 's,^\(.\)\(.\)\(.\),\1/\2/\3/,'); mkdir -p $( dirname $file );
> echo $file > $file ; done & done; wait
> git add .
> git commit -m '100000 files'
> git gc --prune=now --aggressive
>
> I then timed and plotted "git status" as sparse checkout step-wisely
> reduced the number of files in the working tree using the folllowing
> command:
>
> ( ( git status >& /dev/null; time -p git status > /dev/null ) |& sed -n
> '/real/{s/real/100000/p}'; git config core.sparseCheckout true; for i in
> $( seq 90000 -10000 10000 ) 1; do git ls-files | head -n $i | sed
> 's,^,/,' > .git/info/sparse-checkout; git read-tree -u -m HEAD; git
> status >& /dev/null; ( time -p git status > /dev/null ) |& sed -n
> "/real/{s/real/$i/p}"; done; echo '*' > .git/info/sparse-checkout; git
> read-tree -u -m HEAD; rm .git/info/sparse-checkout; git config --unset
> core.sparseCheckout ) | gnuplot -p -e "set terminal dumb; set xrange[]
> reverse; set style data dots; set nokey; plot '-' using 1:2"
>
> Vertical bar is time in seconds, horizontal the number of files in the
> working tree after the sparse checkout.
>
> Linux results (v2.1.0):
> 0.45
> .+-----+------+-----+------+------+------+------+-----+------+-----++
> + + + + + + + + + + +
> | |
> 0.4 ++ ++
> | |
> 0.35 ++ ++
> | |
> | |
> 0.3 ++ . ++
> | |
> | . |
> 0.25 ++ . . ++
> | . . |
> | |
> 0.2 ++ . . ++
> | |
> 0.15 ++ . +.
> | |
> + + + + + + + + + + +
> 0.1
> ++-----+------+-----+------+------+------+------+-----+------+-----++
> 100000 90000 80000 70000 60000 50000 40000 30000 20000
> 10000 0
>
> Cygwin results (v2.1.1):
> 10
> ++-----+------+------+------+------+------+------+------+------+-----++
> + + + + + + + + + + +
> | .
> 9 ++ ++
> | . |
> | |
> 8 ++ . ++
> | |
> | |
> 7 ++ . ++
> | |
> | |
> | . . |
> 6 ++ ++
> | |
> | |
> 5 ++ . ++
> . . . |
> + + . + + + + + + + +
> 4
> ++-----+------+------+------+------+------+------+------+------+-----++
> 100000 90000 80000 70000 60000 50000 40000 30000 20000
> 10000 0
>
> Linux times do what I expect/want (they get better as the number of
> working tree files decrease), but Cygwin does the opposite: the worst
> times are in a working tree with only 1 (sparse) file, and it's double
> where I started with no sparse checkout! I'd hoped sparse checkout
> would improve the too-slow status times when all files are present...
>
> Looking at strace with a working tree consisting of a single (sparse)
> file suggests Cygwin is attempting to access the non-existent files and
> directories whereas Linux does not appear to do so. In fact, if I do
> nothing more than "mkdir -p $( git ls-files | cut -c1-5 | sort -u )"
> when looking at a single (sparse) file, I can drop status times below
> 3s, a 3-fold improvement and something at least better than where I
> started!
>
> Is there a way I can get improved status times using sparse checkout
> with Cygwin?
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> .
>
prev parent reply other threads:[~2014-12-24 19:41 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-24 18:30 Cygwin sparse checkout degrades performance Brian Ericson
2014-12-24 19:40 ` Brian Ericson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=549B16CA.3040107@ptc.com \
--to=bericson@ptc.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).