From: Dmitry Potapov <dpotapov@gmail.com>
To: git@vger.kernel.org
Subject: Too many 'stat' calls by git-status on Windows
Date: Tue, 7 Jul 2009 04:05:01 +0400 [thread overview]
Message-ID: <20090707000500.GA5594@dpotapov.dyndns.org> (raw)
I have used the Cygwin version of Git on one Windows computer and
noticed that git-status is sluggish. So, I have run the Process Monitor
to see what is going on.
The below, you can see the result of testing on Windows and Linux on the
same repository using the same version of Git. It is rather easy to
compare if you notice that the following match between syscalls:
Windows Linux
QueryOpen lstat or fstat
CreateFile open
CloseFile close
QueryDirectory getdents
I have also tested git-diff to verify that the number of system calls
matches pretty well. (In fact, I got practical identical list for stat
syscalls for files inside of the working directory on Windows and Linux
when ran git-diff.) But something strange is going on with git-status.
The beginning of the log is identical on Windows and Linux, but then I
see more 'stat's in the Windows log that did not happen on Linux. In
total, I see about 3 times increase of 'stat' calls, with all files
being stat twice and directories (which are numerous) being stat 3 and
more times (some of them as many 39 times...) It seems that every
directory is stat as many times as the number of subdirectories it has
plus 3.
It appears that the second 'stat' for files on Windows caused by lack
of d_type in dirent. When I recompiled the Linux version with
NO_D_TYPE_IN_DIRENT = YesPlease, I got the same result for files.
(Still I am not sure what caused those extra stat calls for
directory, maybe, it is Cygwin specific...)
The question is whether it is possible to avoid this redundant 'stat'
for files on system that do not have d_type in dirent or that would
require too much modification? Is it possible to use the cache where
d_stat is not available provided that the entry is marked as uptodate?
==== Git on Windows (CYGWIN) =====
$ wc -l git-diff.csv git-status.csv
5186 git-diff.csv
21694 git-status.csv
$ csvtool col 5 git-diff.csv | sort | uniq -c | sort -nr | head -10
4656 QueryOpen
100 CreateFile
94 CloseFile
80 QuerySecurityFile
61 ReadFile
30 QueryInformationVolume
28 QueryAllInformationFile
26 RegOpenKey
24 RegCloseKey
20 QueryStandardInformationFile
$ csvtool col 5 git-status.csv | sort | uniq -c | sort -nr | head -10
12984 QueryOpen
3086 CreateFile
2103 CloseFile
1984 QueryDirectory
988 QueryFileInternalInformationFile
132 QuerySecurityFile
100 ReadFile
77 WriteFile
55 QueryInformationVolume
53 QueryAllInformationFile
Successful open:
$ csvtool col 5,7,8 git-diff.csv | grep CreateFile,SUCCESS, | wc -l
94
$ csvtool col 5,7,8 git-status.csv | grep CreateFile,SUCCESS, | wc -l
2103
Successful open for directories:
$ csvtool col 5,7,8 git-diff.csv | grep CreateFile,SUCCESS,.*Options:.*Directory | wc -l
37
$ csvtool col 5,7,8 git-status.csv | grep CreateFile,SUCCESS,.*Options:.*Directory | wc -l
1024
Not successful attempts to open
$ csvtool col 5,7,8 git-diff.csv | grep CreateFile | grep -v ,SUCCESS, | wc -l
6
$ csvtool col 5,7,8 git-status.csv | grep CreateFile | grep -v ,SUCCESS, | wc -l
983
Attempts to open .gitignore
$ csvtool col 5,6 git-diff.csv | grep 'CreateFile,.*\\\.gitignore' | wc -l
0
$ csvtool col 5,6 git-status.csv | grep 'CreateFile,.*\\\.gitignore' | wc -l
986
=== GIT on Linux ===
$ wc -l linux-git-*
4674 linux-git-diff.log
9807 linux-git-status.log
$ sed -e 's/(.*//' < linux-git-diff.log | sort | uniq -c | sort -rn | head -10
4237 lstat
88 mmap
56 open
50 close
50 access
48 fstat
45 mprotect
43 read
15 stat
13 munmap
The number of lstat+fstat is equal 4285 for git-diff
$ sed -e 's/(.*//' < linux-git-status.log | sort | uniq -c | sort -rn | head -10
3279 lstat
2048 open
1976 getdents
1062 close
1058 fstat
97 mmap
67 read
48 access
45 mprotect
40 write
The number of lstat+fstat is equal 4337 for git-status.
Successful open:
$ grep -c '^open(.*= [^-]' linux-*
linux-git-diff.log:50
linux-git-status.log:1064
Successful open for directories:
$ grep -c '^open(.*O_DIRECTORY.*= [^-]' linux-*
linux-git-diff.log:1
linux-git-status.log:989
Not successful attempts to open:
$ grep -c '^open(.*= -1' linux-*
linux-git-diff.log:6
linux-git-status.log:984
Attempts to open .gitignore:
$ grep -c '^open(.*.\.gitignore"' linux-*
linux-git-diff.log:0
linux-git-status.log:987
=== Linux with NO_D_TYPE_IN_DIRENT = YesPlease ===
$ wc -l linux-git-*no-dtype.log
4674 linux-git-diff-no-dtype.log
14040 linux-git-status-no-dtype.log
$ sed -e 's/(.*//' < linux-git-diff-no-dtype.log | sort | uniq -c | sort -rn | head -10
4237 lstat
88 mmap
56 open
50 close
50 access
48 fstat
45 mprotect
43 read
15 stat
13 munmap
The number of lstat+fstat is equal 4285 for git-diff
$ sed -e 's/(.*//' < linux-git-status-no-dtype.log | sort | uniq -c | sort -rn | head -10
7512 lstat
2048 open
1976 getdents
1062 close
1058 fstat
97 mmap
67 read
48 access
45 mprotect
40 write
The number of lstat+fstat is equal 8570 for git-status.
Successful open:
$ grep -c '^open(.*= [^-]' linux-*-no-dtype.log
linux-git-diff-no-dtype.log:50
linux-git-status-no-dtype.log:1064
Successful open for directories:
$ grep -c '^open(.*O_DIRECTORY.*= [^-]' linux-*-no-dtype.log
linux-git-diff-no-dtype.log:1
linux-git-status-no-dtype.log:989
Not successful attempts to open:
$ grep -c '^open(.*= -1' linux-*-no-dtype.log
linux-git-diff-no-dtype.log:6
linux-git-status-no-dtype.log:984
Attempts to open .gitignore:
$ grep -c '^open(.*.\.gitignore"' linux-*-no-dtype.log
linux-git-diff-no-dtype.log:0
linux-git-status-no-dtype.log:987
=======
Dmitry
next reply other threads:[~2009-07-07 0:06 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-07-07 0:05 Dmitry Potapov [this message]
2009-07-08 19:49 ` Too many 'stat' calls by git-status on Windows Ramsay Jones
2009-07-09 2:04 ` Linus Torvalds
2009-07-09 2:35 ` Linus Torvalds
2009-07-09 2:40 ` [PATCH 1/3] Add 'fill_directory()' helper function for directory traversal Linus Torvalds
2009-07-09 2:42 ` [PATCH 2/3] Simplify read_directory[_recursive]() arguments Linus Torvalds
2009-07-09 2:43 ` [PATCH 3/3] Avoid doing extra 'lstat()'s for d_type if we have an up-to-date cache entry Linus Torvalds
2009-07-09 8:18 ` Junio C Hamano
2009-07-09 15:52 ` Linus Torvalds
2009-07-09 16:32 ` Junio C Hamano
2009-07-09 16:59 ` Linus Torvalds
2009-07-09 18:34 ` Junio C Hamano
2009-07-09 17:13 ` Linus Torvalds
2009-07-09 17:18 ` Linus Torvalds
2009-07-09 18:37 ` Junio C Hamano
2009-07-09 18:53 ` Linus Torvalds
2009-07-09 20:44 ` [PATCH 4/3] Avoid using 'lstat()' to figure out directories Linus Torvalds
2009-07-09 20:47 ` [PATCH 5/3] Prepare symlink caching for thread-safety Linus Torvalds
2009-07-09 20:48 ` [PATCH 6/3] Export thread-safe version of 'has_symlink_leading_path()' Linus Torvalds
2009-07-09 20:50 ` [PATCH 7/3] Make index preloading check the whole path to the file Linus Torvalds
2009-07-09 20:56 ` Linus Torvalds
2009-07-10 3:12 ` Junio C Hamano
2009-07-10 3:29 ` Linus Torvalds
2009-07-10 3:40 ` Linus Torvalds
2009-07-11 2:53 ` Junio C Hamano
2009-07-11 3:04 ` Linus Torvalds
2009-07-12 0:09 ` [PATCH 6/3] Export thread-safe version of 'has_symlink_leading_path()' Kjetil Barvik
2009-07-12 21:33 ` Junio C Hamano
2009-07-09 22:36 ` [PATCH 4/3] Avoid using 'lstat()' to figure out directories Paolo Bonzini
2009-07-09 23:26 ` Linus Torvalds
2009-07-09 23:52 ` Linus Torvalds
2009-07-10 0:13 ` Linus Torvalds
2009-07-09 23:37 ` Junio C Hamano
2009-07-09 21:05 ` [PATCH 3/3] Avoid doing extra 'lstat()'s for d_type if we have an up-to-date cache entry Dmitry Potapov
2009-07-09 21:52 ` Eric Blake
2009-07-09 23:30 ` [PATCH 3/3] Avoid doing extra 'lstat()'s for d_type if we have?an " Dmitry Potapov
2009-07-10 13:04 ` Dmitry Potapov
2009-07-09 23:29 ` [PATCH 3/3] Avoid doing extra 'lstat()'s for d_type if we have an " Dmitry Potapov
2009-07-09 13:50 ` Dmitry Potapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090707000500.GA5594@dpotapov.dyndns.org \
--to=dpotapov@gmail.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).