git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Cygwin sparse checkout degrades performance
@ 2014-12-24 18:30 Brian Ericson
  2014-12-24 19:40 ` Brian Ericson
  0 siblings, 1 reply; 2+ messages in thread
From: Brian Ericson @ 2014-12-24 18:30 UTC (permalink / raw)
  To: git

Counter-intuitively, using sparse checkout in Cygwin degrades "status" 
times as status appears to "stat" non-existent files and directories.

To demonstrate, I created a repo with 100k random files in a 
dir/dir/dir/file structure (on a linux box -- to do this in Cygwin 
requires piping the result of "openssl rand" to "dos2unix" as the output 
contains "\r") and cloned in a Cygwin shell:

git init test
cd test
git commit --allow-empty -m 'Empty first commit'
for i in {1..10}; do for j in {1..10000}; do file=$( openssl rand -hex 
32 | sed 's,^\(.\)\(.\)\(.\),\1/\2/\3/,'); mkdir -p $( dirname $file ); 
echo $file > $file ; done & done; wait
git add .
git commit -m '100000 files'
git gc --prune=now --aggressive

I then timed and plotted "git status" as sparse checkout step-wisely 
reduced the number of files in the working tree using the folllowing 
command:

( ( git status >& /dev/null; time -p git status > /dev/null ) |& sed -n 
'/real/{s/real/100000/p}'; git config core.sparseCheckout true; for i in 
$( seq 90000 -10000 10000 ) 1; do git ls-files | head -n $i | sed 
's,^,/,' > .git/info/sparse-checkout; git read-tree -u -m HEAD; git 
status >& /dev/null; ( time -p git status > /dev/null ) |& sed -n 
"/real/{s/real/$i/p}"; done; echo '*' > .git/info/sparse-checkout; git 
read-tree -u -m HEAD; rm .git/info/sparse-checkout; git config --unset 
core.sparseCheckout ) | gnuplot -p -e "set terminal dumb; set xrange[] 
reverse; set style data dots; set nokey; plot '-' using 1:2"

Vertical bar is time in seconds, horizontal the number of files in the 
working tree after the sparse checkout.

Linux results (v2.1.0):
   0.45 
.+-----+------+-----+------+------+------+------+-----+------+-----++
        +      +      +     +      +      +      +      + +      +      +
| |
    0.4 ++ ++
| |
   0.35 ++ ++
| |
| |
    0.3 ++ .                                                           ++
| |
        | .                                                     |
   0.25 ++                  . .                                       ++
        |                                 . .                          |
| |
    0.2 ++                                              . .            ++
| |
   0.15 ++                                                           . +.
| |
        +      +      +     +      +      +      +      + +      +      +
    0.1 
++-----+------+-----+------+------+------+------+-----+------+-----++
      100000 90000  80000 70000  60000  50000  40000  30000 20000 10000    0

Cygwin results (v2.1.1):
   10 
++-----+------+------+------+------+------+------+------+------+-----++
      +      +      +      +      +      +      +      + +      +      +
| .
    9 ++ ++
| .      |
| |
    8 ++ .            ++
| |
| |
    7 ++ .                   ++
| |
| |
      |                                  . .                           |
    6 ++ ++
| |
| |
    5 ++ .                                                             ++
      .                    . .                                         |
      +      +      .      +      +      +      +      + +      +      +
    4 
++-----+------+------+------+------+------+------+------+------+-----++
    100000 90000  80000  70000  60000  50000  40000  30000  20000 10000    0

Linux times do what I expect/want (they get better as the number of 
working tree files decrease), but Cygwin does the opposite: the worst 
times are in a working tree with only 1 (sparse) file, and it's double 
where I started with no sparse checkout!  I'd hoped sparse checkout 
would improve the too-slow status times when all files are present...

Looking at strace with a working tree consisting of a single (sparse) 
file suggests Cygwin is attempting to access the non-existent files and 
directories whereas Linux does not appear to do so.  In fact, if I do 
nothing more than "mkdir -p $( git ls-files | cut -c1-5 | sort -u )" 
when looking at a single (sparse) file, I can drop status times below 
3s, a 3-fold improvement and something at least better than where I started!

Is there a way I can get improved status times using sparse checkout 
with Cygwin?

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Cygwin sparse checkout degrades performance
  2014-12-24 18:30 Cygwin sparse checkout degrades performance Brian Ericson
@ 2014-12-24 19:40 ` Brian Ericson
  0 siblings, 0 replies; 2+ messages in thread
From: Brian Ericson @ 2014-12-24 19:40 UTC (permalink / raw)
  To: git

Huh.  The graphs (somehow) ended up incoherently reformatted...  Sorry 
about that!

Here's the raw data after a second run:

Linux:
100000 0.49
  90000 0.27
  80000 0.27
  70000 0.28
  60000 0.23
  50000 0.21
  40000 0.21
  30000 0.19
  20000 0.19
  10000 0.16
      1 0.14

Cygwin:
100000 4.72
  90000 4.28
  80000 4.41
  70000 4.43
  60000 4.67
  50000 5.04
  40000 6.24
  30000 7.28
  20000 7.88
  10000 8.96
      1 9.43



On 12/24/2014 12:30 PM, Brian Ericson wrote:
> Counter-intuitively, using sparse checkout in Cygwin degrades "status"
> times as status appears to "stat" non-existent files and directories.
>
> To demonstrate, I created a repo with 100k random files in a
> dir/dir/dir/file structure (on a linux box -- to do this in Cygwin
> requires piping the result of "openssl rand" to "dos2unix" as the output
> contains "\r") and cloned in a Cygwin shell:
>
> git init test
> cd test
> git commit --allow-empty -m 'Empty first commit'
> for i in {1..10}; do for j in {1..10000}; do file=$( openssl rand -hex
> 32 | sed 's,^\(.\)\(.\)\(.\),\1/\2/\3/,'); mkdir -p $( dirname $file );
> echo $file > $file ; done & done; wait
> git add .
> git commit -m '100000 files'
> git gc --prune=now --aggressive
>
> I then timed and plotted "git status" as sparse checkout step-wisely
> reduced the number of files in the working tree using the folllowing
> command:
>
> ( ( git status >& /dev/null; time -p git status > /dev/null ) |& sed -n
> '/real/{s/real/100000/p}'; git config core.sparseCheckout true; for i in
> $( seq 90000 -10000 10000 ) 1; do git ls-files | head -n $i | sed
> 's,^,/,' > .git/info/sparse-checkout; git read-tree -u -m HEAD; git
> status >& /dev/null; ( time -p git status > /dev/null ) |& sed -n
> "/real/{s/real/$i/p}"; done; echo '*' > .git/info/sparse-checkout; git
> read-tree -u -m HEAD; rm .git/info/sparse-checkout; git config --unset
> core.sparseCheckout ) | gnuplot -p -e "set terminal dumb; set xrange[]
> reverse; set style data dots; set nokey; plot '-' using 1:2"
>
> Vertical bar is time in seconds, horizontal the number of files in the
> working tree after the sparse checkout.
>
> Linux results (v2.1.0):
>    0.45
> .+-----+------+-----+------+------+------+------+-----+------+-----++
>         +      +      +     +      +      +      +      + +      +      +
> | |
>     0.4 ++ ++
> | |
>    0.35 ++ ++
> | |
> | |
>     0.3 ++ .                                                           ++
> | |
>         | .                                                     |
>    0.25 ++                  . .                                       ++
>         |                                 . .                          |
> | |
>     0.2 ++                                              . .            ++
> | |
>    0.15 ++                                                           . +.
> | |
>         +      +      +     +      +      +      +      + +      +      +
>     0.1
> ++-----+------+-----+------+------+------+------+-----+------+-----++
>       100000 90000  80000 70000  60000  50000  40000  30000 20000
> 10000    0
>
> Cygwin results (v2.1.1):
>    10
> ++-----+------+------+------+------+------+------+------+------+-----++
>       +      +      +      +      +      +      +      + +      +      +
> | .
>     9 ++ ++
> | .      |
> | |
>     8 ++ .            ++
> | |
> | |
>     7 ++ .                   ++
> | |
> | |
>       |                                  . .                           |
>     6 ++ ++
> | |
> | |
>     5 ++ .                                                             ++
>       .                    . .                                         |
>       +      +      .      +      +      +      +      + +      +      +
>     4
> ++-----+------+------+------+------+------+------+------+------+-----++
>     100000 90000  80000  70000  60000  50000  40000  30000  20000
> 10000    0
>
> Linux times do what I expect/want (they get better as the number of
> working tree files decrease), but Cygwin does the opposite: the worst
> times are in a working tree with only 1 (sparse) file, and it's double
> where I started with no sparse checkout!  I'd hoped sparse checkout
> would improve the too-slow status times when all files are present...
>
> Looking at strace with a working tree consisting of a single (sparse)
> file suggests Cygwin is attempting to access the non-existent files and
> directories whereas Linux does not appear to do so.  In fact, if I do
> nothing more than "mkdir -p $( git ls-files | cut -c1-5 | sort -u )"
> when looking at a single (sparse) file, I can drop status times below
> 3s, a 3-fold improvement and something at least better than where I
> started!
>
> Is there a way I can get improved status times using sparse checkout
> with Cygwin?
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> .
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-12-24 19:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-24 18:30 Cygwin sparse checkout degrades performance Brian Ericson
2014-12-24 19:40 ` Brian Ericson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).