From: Brian Ericson <bericson@ptc.com>
To: <git@vger.kernel.org>
Subject: Cygwin sparse checkout degrades performance
Date: Wed, 24 Dec 2014 12:30:42 -0600 [thread overview]
Message-ID: <549B0652.3020605@ptc.com> (raw)
Counter-intuitively, using sparse checkout in Cygwin degrades "status"
times as status appears to "stat" non-existent files and directories.
To demonstrate, I created a repo with 100k random files in a
dir/dir/dir/file structure (on a linux box -- to do this in Cygwin
requires piping the result of "openssl rand" to "dos2unix" as the output
contains "\r") and cloned in a Cygwin shell:
git init test
cd test
git commit --allow-empty -m 'Empty first commit'
for i in {1..10}; do for j in {1..10000}; do file=$( openssl rand -hex
32 | sed 's,^\(.\)\(.\)\(.\),\1/\2/\3/,'); mkdir -p $( dirname $file );
echo $file > $file ; done & done; wait
git add .
git commit -m '100000 files'
git gc --prune=now --aggressive
I then timed and plotted "git status" as sparse checkout step-wisely
reduced the number of files in the working tree using the folllowing
command:
( ( git status >& /dev/null; time -p git status > /dev/null ) |& sed -n
'/real/{s/real/100000/p}'; git config core.sparseCheckout true; for i in
$( seq 90000 -10000 10000 ) 1; do git ls-files | head -n $i | sed
's,^,/,' > .git/info/sparse-checkout; git read-tree -u -m HEAD; git
status >& /dev/null; ( time -p git status > /dev/null ) |& sed -n
"/real/{s/real/$i/p}"; done; echo '*' > .git/info/sparse-checkout; git
read-tree -u -m HEAD; rm .git/info/sparse-checkout; git config --unset
core.sparseCheckout ) | gnuplot -p -e "set terminal dumb; set xrange[]
reverse; set style data dots; set nokey; plot '-' using 1:2"
Vertical bar is time in seconds, horizontal the number of files in the
working tree after the sparse checkout.
Linux results (v2.1.0):
0.45
.+-----+------+-----+------+------+------+------+-----+------+-----++
+ + + + + + + + + + +
| |
0.4 ++ ++
| |
0.35 ++ ++
| |
| |
0.3 ++ . ++
| |
| . |
0.25 ++ . . ++
| . . |
| |
0.2 ++ . . ++
| |
0.15 ++ . +.
| |
+ + + + + + + + + + +
0.1
++-----+------+-----+------+------+------+------+-----+------+-----++
100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0
Cygwin results (v2.1.1):
10
++-----+------+------+------+------+------+------+------+------+-----++
+ + + + + + + + + + +
| .
9 ++ ++
| . |
| |
8 ++ . ++
| |
| |
7 ++ . ++
| |
| |
| . . |
6 ++ ++
| |
| |
5 ++ . ++
. . . |
+ + . + + + + + + + +
4
++-----+------+------+------+------+------+------+------+------+-----++
100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0
Linux times do what I expect/want (they get better as the number of
working tree files decrease), but Cygwin does the opposite: the worst
times are in a working tree with only 1 (sparse) file, and it's double
where I started with no sparse checkout! I'd hoped sparse checkout
would improve the too-slow status times when all files are present...
Looking at strace with a working tree consisting of a single (sparse)
file suggests Cygwin is attempting to access the non-existent files and
directories whereas Linux does not appear to do so. In fact, if I do
nothing more than "mkdir -p $( git ls-files | cut -c1-5 | sort -u )"
when looking at a single (sparse) file, I can drop status times below
3s, a 3-fold improvement and something at least better than where I started!
Is there a way I can get improved status times using sparse checkout
with Cygwin?
next reply other threads:[~2014-12-24 18:40 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-24 18:30 Brian Ericson [this message]
2014-12-24 19:40 ` Cygwin sparse checkout degrades performance Brian Ericson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=549B0652.3020605@ptc.com \
--to=bericson@ptc.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.