All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Ericson <bericson@ptc.com>
To: <git@vger.kernel.org>
Subject: Cygwin sparse checkout degrades performance
Date: Wed, 24 Dec 2014 12:30:42 -0600	[thread overview]
Message-ID: <549B0652.3020605@ptc.com> (raw)

Counter-intuitively, using sparse checkout in Cygwin degrades "status" 
times as status appears to "stat" non-existent files and directories.

To demonstrate, I created a repo with 100k random files in a 
dir/dir/dir/file structure (on a linux box -- to do this in Cygwin 
requires piping the result of "openssl rand" to "dos2unix" as the output 
contains "\r") and cloned in a Cygwin shell:

git init test
cd test
git commit --allow-empty -m 'Empty first commit'
for i in {1..10}; do for j in {1..10000}; do file=$( openssl rand -hex 
32 | sed 's,^\(.\)\(.\)\(.\),\1/\2/\3/,'); mkdir -p $( dirname $file ); 
echo $file > $file ; done & done; wait
git add .
git commit -m '100000 files'
git gc --prune=now --aggressive

I then timed and plotted "git status" as sparse checkout step-wisely 
reduced the number of files in the working tree using the folllowing 
command:

( ( git status >& /dev/null; time -p git status > /dev/null ) |& sed -n 
'/real/{s/real/100000/p}'; git config core.sparseCheckout true; for i in 
$( seq 90000 -10000 10000 ) 1; do git ls-files | head -n $i | sed 
's,^,/,' > .git/info/sparse-checkout; git read-tree -u -m HEAD; git 
status >& /dev/null; ( time -p git status > /dev/null ) |& sed -n 
"/real/{s/real/$i/p}"; done; echo '*' > .git/info/sparse-checkout; git 
read-tree -u -m HEAD; rm .git/info/sparse-checkout; git config --unset 
core.sparseCheckout ) | gnuplot -p -e "set terminal dumb; set xrange[] 
reverse; set style data dots; set nokey; plot '-' using 1:2"

Vertical bar is time in seconds, horizontal the number of files in the 
working tree after the sparse checkout.

Linux results (v2.1.0):
   0.45 
.+-----+------+-----+------+------+------+------+-----+------+-----++
        +      +      +     +      +      +      +      + +      +      +
| |
    0.4 ++ ++
| |
   0.35 ++ ++
| |
| |
    0.3 ++ .                                                           ++
| |
        | .                                                     |
   0.25 ++                  . .                                       ++
        |                                 . .                          |
| |
    0.2 ++                                              . .            ++
| |
   0.15 ++                                                           . +.
| |
        +      +      +     +      +      +      +      + +      +      +
    0.1 
++-----+------+-----+------+------+------+------+-----+------+-----++
      100000 90000  80000 70000  60000  50000  40000  30000 20000 10000    0

Cygwin results (v2.1.1):
   10 
++-----+------+------+------+------+------+------+------+------+-----++
      +      +      +      +      +      +      +      + +      +      +
| .
    9 ++ ++
| .      |
| |
    8 ++ .            ++
| |
| |
    7 ++ .                   ++
| |
| |
      |                                  . .                           |
    6 ++ ++
| |
| |
    5 ++ .                                                             ++
      .                    . .                                         |
      +      +      .      +      +      +      +      + +      +      +
    4 
++-----+------+------+------+------+------+------+------+------+-----++
    100000 90000  80000  70000  60000  50000  40000  30000  20000 10000    0

Linux times do what I expect/want (they get better as the number of 
working tree files decrease), but Cygwin does the opposite: the worst 
times are in a working tree with only 1 (sparse) file, and it's double 
where I started with no sparse checkout!  I'd hoped sparse checkout 
would improve the too-slow status times when all files are present...

Looking at strace with a working tree consisting of a single (sparse) 
file suggests Cygwin is attempting to access the non-existent files and 
directories whereas Linux does not appear to do so.  In fact, if I do 
nothing more than "mkdir -p $( git ls-files | cut -c1-5 | sort -u )" 
when looking at a single (sparse) file, I can drop status times below 
3s, a 3-fold improvement and something at least better than where I started!

Is there a way I can get improved status times using sparse checkout 
with Cygwin?

             reply	other threads:[~2014-12-24 18:40 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-24 18:30 Brian Ericson [this message]
2014-12-24 19:40 ` Cygwin sparse checkout degrades performance Brian Ericson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=549B0652.3020605@ptc.com \
    --to=bericson@ptc.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.