From: Duy Nguyen <pclouds@gmail.com>
To: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Robert Zeh <robert.allan.zeh@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Git List <git@vger.kernel.org>,
finnag@pvv.org
Subject: Re: inotify to minimize stat() calls
Date: Sun, 10 Feb 2013 18:17:32 +0700 [thread overview]
Message-ID: <20130210111732.GA24377@lanh> (raw)
In-Reply-To: <CACsJy8DeM5--WVXg3b65RxLBS7Jho-7KmcGwWk7B5uAx77yOEw@mail.gmail.com>
On Sun, Feb 10, 2013 at 12:24:58PM +0700, Duy Nguyen wrote:
> On Sun, Feb 10, 2013 at 12:10 AM, Ramkumar Ramachandra
> <artagnon@gmail.com> wrote:
> > Finn notes in the commit message that it offers no speedup, because
> > .gitignore files in every directory still have to be read. I think
> > this is silly: we really should be caching .gitignore, and touching it
> > only when lstat() reports that the file has changed.
> >
> > ...
> >
> > Really, the elephant in the room right now seems to be .gitignore.
> > Until that is fixed, there is really no use of writing this inotify
> > daemon, no? Can someone enlighten me on how exactly .gitignore files
> > are processed?
>
> .gitignore is a different issue. I think it's mainly used with
> read_directory/fill_directory to collect ignored files (or not-ignored
> files). And it's not always used (well, status and add does, but diff
> should not). I think wee need to measure how much mass lstat
> elimination gains us (especially on big repos) and how much
> .gitignore/.gitattributes caching does.
OK let's count. I start with a "standard" repository, linux-2.6. This
is the number from strace -T on "git status" (*). The first column is
accumulated time, the second the number of syscalls.
top syscalls sorted top syscalls sorted
by acc. time by number
----------------------------------------------
0.401906 40950 lstat 0.401906 40950 lstat
0.190484 5343 getdents 0.150055 5374 open
0.150055 5374 open 0.190484 5343 getdents
0.074843 2806 close 0.074843 2806 close
0.003216 157 read 0.003216 157 read
The following patch pretends every entry is uptodate without
lstat. With the patch, we can see refresh code is the cause of mass
lstat, as lstat disappears:
0.185347 5343 getdents 0.144173 5374 open
0.144173 5374 open 0.185347 5343 getdents
0.071844 2806 close 0.071844 2806 close
0.004918 135 brk 0.003378 157 read
0.003378 157 read 0.004918 135 brk
-- 8< --
diff --git a/read-cache.c b/read-cache.c
index 827ae55..94d8ed8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1018,6 +1018,10 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
if (ce_uptodate(ce))
return ce;
+#if 1
+ ce_mark_uptodate(ce);
+ return ce;
+#endif
/*
* CE_VALID or CE_SKIP_WORKTREE means the user promised us
* that the change to the work tree does not matter and told
-- 8< --
The following patch eliminates untracked search code. As we can see,
open+getdents also disappears with this patch:
0.462909 40950 lstat 0.462909 40950 lstat
0.003417 129 brk 0.003417 129 brk
0.000762 53 read 0.000762 53 read
0.000720 36 open 0.000720 36 open
0.000544 12 munmap 0.000454 33 close
So from syscalls point of view, we know what code issues most of
them. Let's see how much time we gain be these patches, which is an
approximate of the gain by inotify support. This time I measure on
gentoo-x86.git [1] because this one has really big worktree (100k
files)
unmodified read-cache.c dir.c both
real 0m0.550s 0m0.479s 0m0.287s 0m0.213s
user 0m0.305s 0m0.315s 0m0.201s 0m0.182s
sys 0m0.240s 0m0.157s 0m0.084s 0m0.030s
and the syscall picture on gentoo-x86.git:
1.106615 101942 lstat 1.106615 101942 lstat
0.667235 47083 getdents 0.641604 47114 open
0.641604 47114 open 0.667235 47083 getdents
0.286711 23573 close 0.286711 23573 close
0.005842 350 brk 0.005842 350 brk
We can see that shortcuting untracked code gives bigger gain than
index refresh code. So I have to agree that .gitignore may be the big
elephant in this particular case.
Bear in mind though this is Linux, where lstat is fast. On systems
with slow lstat, these timings could look very different due to the
large number of lstat calls compared to open+getdents. I really like
to see similar numbers on Windows.
read_directory/fill_directory code is mostly used by "git add" (not
with -u) and "git status", while refresh code is executed in add,
checkout, commit/status, diff, merge. So while smaller gain, reducing
lstat calls could benefit in more cases.
A relatively slow "git add" is acceptable. "git status" should be
fast. Although in my workflow, I do "git diff [--stat] [--cached]"
much more often than "git status" so relatively slow "git status" does
not hurt me much. But people may do it differently.
On speeding up read_directory with inotify support. I haven't thought
it through, but I think we could save (or get it via socket) a list of
untracked files in .git, regardless ignore status, with the help from
inotify. When this list is verified valid, read_directory could be
modified to traverse the tree using this list (plus the index) instead
of opendir+readdir. Not sure how the change might look though.
[1] http://git-exp.overlays.gentoo.org/gitweb/?p=exp/gentoo-x86.git;a=summary
(*) the script to produce those numbers is
-- 8< --
#!/bin/sh
export LANG=C
strace -T "$@" 2>&1 >/dev/null |
sed 's/\(^[^(]*\)(.*<\([0-9.]*\)>$/\1 \2/' |
awk '{
sec[$1]+=$2;
count[$1]++;
}
END {
for (i in sec)
printf("%f %d %s\n", sec[i], count[i], i);
}' >/tmp/s
sort -nr /tmp/s | head -n5
sort -nrk2 /tmp/s | head -n5
-- 8< --
next prev parent reply other threads:[~2013-02-10 11:17 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-08 21:10 inotify to minimize stat() calls Ramkumar Ramachandra
2013-02-08 22:15 ` Junio C Hamano
2013-02-08 22:45 ` Junio C Hamano
2013-02-09 2:10 ` Duy Nguyen
2013-02-09 2:37 ` Junio C Hamano
2013-02-09 2:56 ` Junio C Hamano
2013-02-09 3:36 ` Robert Zeh
2013-02-09 12:05 ` Ramkumar Ramachandra
2013-02-09 12:11 ` Ramkumar Ramachandra
2013-02-09 12:53 ` Ramkumar Ramachandra
2013-02-09 12:59 ` Duy Nguyen
2013-02-09 17:10 ` Ramkumar Ramachandra
2013-02-09 18:56 ` Ramkumar Ramachandra
2013-02-10 5:24 ` Duy Nguyen
2013-02-10 11:17 ` Duy Nguyen [this message]
2013-02-10 11:22 ` Duy Nguyen
2013-02-10 20:16 ` Junio C Hamano
2013-02-11 2:56 ` Duy Nguyen
2013-02-11 11:12 ` Duy Nguyen
2013-03-07 22:16 ` Torsten Bögershausen
2013-03-08 0:04 ` Junio C Hamano
2013-03-08 7:01 ` Torsten Bögershausen
2013-03-08 8:15 ` Junio C Hamano
2013-03-08 9:24 ` Torsten Bögershausen
2013-03-08 10:53 ` Duy Nguyen
2013-03-10 8:23 ` Ramkumar Ramachandra
2013-03-13 12:59 ` [PATCH] status: hint the user about -uno if read_directory takes too long Nguyễn Thái Ngọc Duy
2013-03-13 15:21 ` Torsten Bögershausen
2013-03-13 16:16 ` Junio C Hamano
2013-03-14 10:22 ` Duy Nguyen
2013-03-14 15:05 ` Junio C Hamano
2013-03-15 12:30 ` Duy Nguyen
2013-03-15 15:52 ` Torsten Bögershausen
2013-03-15 15:57 ` Ramkumar Ramachandra
2013-03-15 16:53 ` Junio C Hamano
2013-03-15 17:41 ` Torsten Bögershausen
2013-03-15 20:06 ` Junio C Hamano
2013-03-15 21:14 ` Torsten Bögershausen
2013-03-15 21:59 ` Junio C Hamano
2013-03-16 7:21 ` Torsten Bögershausen
2013-03-17 4:47 ` Junio C Hamano
2013-03-16 1:51 ` Duy Nguyen
2013-02-10 13:26 ` inotify to minimize stat() calls demerphq
2013-02-10 15:35 ` Duy Nguyen
2013-02-14 14:36 ` Magnus Bäck
2013-02-10 16:45 ` Ramkumar Ramachandra
2013-02-11 3:03 ` Duy Nguyen
2013-02-10 16:58 ` Erik Faye-Lund
2013-02-11 3:53 ` Duy Nguyen
2013-02-12 20:48 ` Karsten Blees
2013-02-13 10:06 ` Duy Nguyen
2013-02-13 12:15 ` Duy Nguyen
2013-02-13 18:18 ` Jeff King
2013-02-13 19:47 ` Jeff King
2013-02-13 20:25 ` Karsten Blees
2013-02-13 22:55 ` Jeff King
2013-02-14 0:48 ` Karsten Blees
2013-02-27 14:45 ` [PATCH] name-hash.c: fix endless loop with core.ignorecase=true Karsten Blees
2013-02-27 16:53 ` Junio C Hamano
2013-02-27 21:52 ` Karsten Blees
2013-02-27 23:57 ` [PATCH v2] " Karsten Blees
2013-02-28 0:27 ` Junio C Hamano
2013-02-19 9:49 ` inotify to minimize stat() calls Ramkumar Ramachandra
2013-02-19 14:25 ` Karsten Blees
2013-02-19 13:16 ` Drew Northup
2013-02-19 13:47 ` Duy Nguyen
2013-02-09 19:35 ` Junio C Hamano
2013-02-10 19:03 ` Robert Zeh
2013-02-10 19:26 ` Martin Fick
2013-02-10 20:18 ` Robert Zeh
2013-02-11 3:21 ` Duy Nguyen
2013-02-11 14:13 ` Robert Zeh
2013-02-19 9:57 ` Ramkumar Ramachandra
2013-04-24 17:20 ` [PATCH] " Robert Zeh
2013-04-24 21:32 ` Duy Nguyen
2013-04-25 19:44 ` Robert Zeh
2013-04-25 21:20 ` Duy Nguyen
2013-04-26 15:35 ` Robert Zeh
2013-04-25 8:18 ` Thomas Rast
2013-04-25 19:37 ` Robert Zeh
2013-04-25 19:59 ` Thomas Rast
2013-04-27 13:51 ` Thomas Rast
2013-04-27 23:56 ` Duy Nguyen
[not found] ` <CAKXa9=r2A7UeBV2s2H3wVGdPkS1zZ9huNJhtvTC-p0S5Ed12xA@mail.gmail.com>
2013-04-30 0:27 ` Duy Nguyen
2013-02-09 11:32 ` Ramkumar Ramachandra
2013-02-14 15:16 ` Ævar Arnfjörð Bjarmason
2013-02-14 16:31 ` Junio C Hamano
2013-02-19 9:40 ` Ramkumar Ramachandra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130210111732.GA24377@lanh \
--to=pclouds@gmail.com \
--cc=artagnon@gmail.com \
--cc=finnag@pvv.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=robert.allan.zeh@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).