From: Karsten Blees <karsten.blees@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: kusmabite@gmail.com, Ramkumar Ramachandra <artagnon@gmail.com>,
Robert Zeh <robert.allan.zeh@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Git List <git@vger.kernel.org>,
finnag@pvv.org
Subject: Re: inotify to minimize stat() calls
Date: Tue, 12 Feb 2013 21:48:18 +0100 [thread overview]
Message-ID: <511AAA92.4030508@gmail.com> (raw)
In-Reply-To: <CACsJy8AWyJ=dW5f44huWyPPe4X62xyi+R9CNM5Tg6u6TYf+thQ@mail.gmail.com>
Am 11.02.2013 04:53, schrieb Duy Nguyen:
> On Sun, Feb 10, 2013 at 11:58 PM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> Karsten Blees has done something similar-ish on Windows, and he posted
>> the results here:
>>
>> https://groups.google.com/forum/#!topic/msysgit/fL_jykUmUNE/discussion
>>
The new hashtable implementation in fscache [1] supports O(1) removal and has no mingw dependencies - might come in handy for anyone trying to implement an inotify daemon.
[1] https://github.com/kblees/git/commit/f7eb85c2
>> I also seem to remember he doing a ReadDirectoryChangesW version, but
>> I don't remember what happened with that.
>
> Thanks. I came across that but did not remember. For one thing, we
> know the inotify alternative for Windows: ReadDirectoryChangesW.
>
I dropped ReadDirectoryChangesW because maintaining a 'live' file system cache became more and more complicated. For example, according to MSDN docs, ReadDirectoryChangesW *may* report short DOS 8.3 names (i.e. "PROGRA~1" instead of "Program Files"), so a correct and fast cache implementation would have to be indexed by long *and* short names...
Another problem was that the 'live' cache had quite negative performance impact on mutating git commands (checkout, reset...). An inotify daemon running as a background process (not in-process as fscache) will probably affect everyone that modifies the working copy, e.g. running 'make' or the test-suite. This should be considered in the design.
> I copy "git status"'s (impressive) numbers from fscache-v0 for those
> who are interested in:
>
> preload | -u | normal | cached | gain
> --------+-----+--------+--------+------
> false | all | 25.144 | 3.055 | 8.2
> false | no | 22.822 | 1.748 | 12.8
> true | all | 9.234 | 2.179 | 4.2
> true | no | 6.833 | 0.955 | 7.2
>
Note that I wasn't able to reproduce such bad 'normal' values in later tests, I guess disk fragmentation and/or virus scanner must have tricked me on that day...gain factors of 2.5 - 5 are more appropriate.
However, the difference between git status -uall and -uno was always about 1.3 s in all fscache versions, even though opendir/readdir/closedir was served entirely from the cache. I added a bit of performance tracing to find the cause, and I think most of the time spent in wt_status_collect_untracked can be eliminated:
1.) 0.939 s is spent in dir.c/excluded (i.e. checking .gitignore). This check is done for *every* file in the working copy, including files in the index. Checking the index first could eliminate most of that, i.e.:
(Note: patches are for discussion only, I'm aware that they may have unintended side effects...)
@@ -1097,6 +1097,8 @@ static enum path_treatment treat_path(struct dir_struct *dir,
return path_ignored;
strbuf_setlen(path, baselen);
strbuf_addstr(path, de->d_name);
+ if (cache_name_exists(path->buf, path->len, ignore_case))
+ return path_ignored;
if (simplify_away(path->buf, path->len, simplify))
return path_ignored;
---
2.) 0.135 s is spent in name-hash.c/hash_index_entry_directories, reindexing the same directories over and over again. In the end, the hashtable contains 939k directory entries, even though the WebKit test repo only has 7k directories. Checking if a directory entry already exists could reduce that, i.e.:
@@ -53,14 +55,23 @@ static void hash_index_entry_directories(struct index_state *istate, struct cach
unsigned int hash;
void **pos;
double t = ticks();
+ struct cache_entry *ce2;
+ int len = ce_namelen(ce);
- const char *ptr = ce->name;
- while (*ptr) {
- while (*ptr && *ptr != '/')
- ++ptr;
- if (*ptr == '/') {
- ++ptr;
- hash = hash_name(ce->name, ptr - ce->name);
+ while (len > 0) {
+ while (len > 0 && ce->name[len - 1] != '/')
+ len--;
+ if (len > 0) {
+ hash = hash_name(ce->name, len);
+ ce2 = lookup_hash(hash, &istate->name_hash);
+ while (ce2) {
+ if (same_name(ce2, ce->name, len, ignore_case)) {
+ add_since(t, &hash_dirs);
+ return;
+ }
+ ce2 = ce2->dir_next;
+ }
+ len--;
pos = insert_hash(hash, ce, &istate->name_hash);
if (pos) {
ce->dir_next = *pos;
---
Tests were done with the WebKit repo (~200k files, ~7k directories, 15 .gitignore files, ~100 entries in root .gitignore). Instrumented code can be found here: https://github.com/kblees/git/tree/kb/git-status-performance-tracing
Here's the performance traces of 'git status -s -uall'
Before patches:
trace: at builtin/commit.c:1221, time: 0.523429 s: cmd_status/read_cache_preload
trace: at builtin/commit.c:1223, time: 0.00403477 s: cmd_status/refresh_index
trace: at builtin/commit.c:1231, time: 0.00318494 s: cmd_status/hold_locked_index
trace: at wt-status.c:539, time: 0.00527396 s: wt_status_collect_changes_worktree
trace: at wt-status.c:544, time: 0.00545771 s: wt_status_collect_changes
trace: at wt-status.c:546, time: 1.286 s: wt_status_collect_untracked
trace: at builtin/commit.c:1233, time: 1.29852 s: cmd_status/wt_status_collect
trace: at dir.c:1540, time: 0.00170986 s: read_directory_recursive/strbuf_add
trace: at dir.c:1541, time: 0.00623972 s: read_directory_recursive/opendir
trace: at dir.c:1542, time: 0.00517881 s: read_directory_recursive/readdir
trace: at dir.c:1543, time: 0.992936 s: read_directory_recursive/treat_path
trace: at dir.c:1544, time: 0.277942 s: read_directory_recursive/dir_add_name
trace: at dir.c:1545, time: 0.0014594 s: read_directory_recursive/close
trace: at dir.c:1546, time: 0.939349 s: treat_one_path/excluded
trace: at dir.c:1547, time: 0.0050811 s: treat_one_path/dir_add_ignored
trace: at dir.c:1548, time: 0.00515875 s: treat_one_path/get_dtype
trace: at dir.c:1549, time: 0.00329322 s: treat_one_path/treat_directory
trace: at dir.c:1550, time: 0.222969 s: excluded/prep_exclude
trace: at dir.c:1551, time: 0.00443398 s: excluded/excluded_from_list[EXC_CMDL]
trace: at dir.c:1552, time: 0.699602 s: excluded/excluded_from_list[EXC_DIRS]
trace: at dir.c:1553, time: 0.00475736 s: excluded/excluded_from_list[EXC_FILE]
trace: at read-cache.c:460, time: 0.00967987 s: index_name_pos
trace: at name-hash.c:213, time: 0.190481 s: lazy_init_name_hash
trace: at name-hash.c:216, time: 0.135248 s: hash_index_entry_directories (938865 entries)
trace: at name-hash.c:217, time: 0.0806647 s: index_name_exists
trace: at compat/mingw.c:2137, time: 1.97424 s: command: c:\git\msysgit\git\git-status.exe -s -uall
After patches:
trace: at builtin/commit.c:1221, time: 0.517511 s: cmd_status/read_cache_preload
trace: at builtin/commit.c:1223, time: 0.00405227 s: cmd_status/refresh_index
trace: at builtin/commit.c:1231, time: 0.00322796 s: cmd_status/hold_locked_index
trace: at wt-status.c:539, time: 0.00530057 s: wt_status_collect_changes_worktree
trace: at wt-status.c:544, time: 0.00546062 s: wt_status_collect_changes
trace: at wt-status.c:546, time: 0.322799 s: wt_status_collect_untracked
trace: at builtin/commit.c:1233, time: 0.33536 s: cmd_status/wt_status_collect
trace: at dir.c:1542, time: 0.00120529 s: read_directory_recursive/strbuf_add
trace: at dir.c:1543, time: 0.00476647 s: read_directory_recursive/opendir
trace: at dir.c:1544, time: 0.00502022 s: read_directory_recursive/readdir
trace: at dir.c:1545, time: 0.310515 s: read_directory_recursive/treat_path
trace: at dir.c:1546, time: 0 s: read_directory_recursive/dir_add_name
trace: at dir.c:1547, time: 0.000831234 s: read_directory_recursive/close
trace: at dir.c:1548, time: 0.0668582 s: treat_one_path/excluded
trace: at dir.c:1549, time: 0.000173174 s: treat_one_path/dir_add_ignored
trace: at dir.c:1550, time: 0.000174267 s: treat_one_path/get_dtype
trace: at dir.c:1551, time: 0.00315468 s: treat_one_path/treat_directory
trace: at dir.c:1552, time: 0.039733 s: excluded/prep_exclude
trace: at dir.c:1553, time: 0.000185205 s: excluded/excluded_from_list[EXC_CMDL]
trace: at dir.c:1554, time: 0.0264496 s: excluded/excluded_from_list[EXC_DIRS]
trace: at dir.c:1555, time: 0.000170622 s: excluded/excluded_from_list[EXC_FILE]
trace: at read-cache.c:460, time: 0.00260636 s: index_name_pos
trace: at name-hash.c:224, time: 0.126637 s: lazy_init_name_hash
trace: at name-hash.c:227, time: 0.0500866 s: hash_index_entry_directories (7152 entries)
trace: at name-hash.c:228, time: 0.0790143 s: index_name_exists
trace: at compat/mingw.c:2137, time: 1.00595 s: command: c:\git\msysgit\git\git-status.exe -s -uall
next prev parent reply other threads:[~2013-02-12 20:48 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-08 21:10 inotify to minimize stat() calls Ramkumar Ramachandra
2013-02-08 22:15 ` Junio C Hamano
2013-02-08 22:45 ` Junio C Hamano
2013-02-09 2:10 ` Duy Nguyen
2013-02-09 2:37 ` Junio C Hamano
2013-02-09 2:56 ` Junio C Hamano
2013-02-09 3:36 ` Robert Zeh
2013-02-09 12:05 ` Ramkumar Ramachandra
2013-02-09 12:11 ` Ramkumar Ramachandra
2013-02-09 12:53 ` Ramkumar Ramachandra
2013-02-09 12:59 ` Duy Nguyen
2013-02-09 17:10 ` Ramkumar Ramachandra
2013-02-09 18:56 ` Ramkumar Ramachandra
2013-02-10 5:24 ` Duy Nguyen
2013-02-10 11:17 ` Duy Nguyen
2013-02-10 11:22 ` Duy Nguyen
2013-02-10 20:16 ` Junio C Hamano
2013-02-11 2:56 ` Duy Nguyen
2013-02-11 11:12 ` Duy Nguyen
2013-03-07 22:16 ` Torsten Bögershausen
2013-03-08 0:04 ` Junio C Hamano
2013-03-08 7:01 ` Torsten Bögershausen
2013-03-08 8:15 ` Junio C Hamano
2013-03-08 9:24 ` Torsten Bögershausen
2013-03-08 10:53 ` Duy Nguyen
2013-03-10 8:23 ` Ramkumar Ramachandra
2013-03-13 12:59 ` [PATCH] status: hint the user about -uno if read_directory takes too long Nguyễn Thái Ngọc Duy
2013-03-13 15:21 ` Torsten Bögershausen
2013-03-13 16:16 ` Junio C Hamano
2013-03-14 10:22 ` Duy Nguyen
2013-03-14 15:05 ` Junio C Hamano
2013-03-15 12:30 ` Duy Nguyen
2013-03-15 15:52 ` Torsten Bögershausen
2013-03-15 15:57 ` Ramkumar Ramachandra
2013-03-15 16:53 ` Junio C Hamano
2013-03-15 17:41 ` Torsten Bögershausen
2013-03-15 20:06 ` Junio C Hamano
2013-03-15 21:14 ` Torsten Bögershausen
2013-03-15 21:59 ` Junio C Hamano
2013-03-16 7:21 ` Torsten Bögershausen
2013-03-17 4:47 ` Junio C Hamano
2013-03-16 1:51 ` Duy Nguyen
2013-02-10 13:26 ` inotify to minimize stat() calls demerphq
2013-02-10 15:35 ` Duy Nguyen
2013-02-14 14:36 ` Magnus Bäck
2013-02-10 16:45 ` Ramkumar Ramachandra
2013-02-11 3:03 ` Duy Nguyen
2013-02-10 16:58 ` Erik Faye-Lund
2013-02-11 3:53 ` Duy Nguyen
2013-02-12 20:48 ` Karsten Blees [this message]
2013-02-13 10:06 ` Duy Nguyen
2013-02-13 12:15 ` Duy Nguyen
2013-02-13 18:18 ` Jeff King
2013-02-13 19:47 ` Jeff King
2013-02-13 20:25 ` Karsten Blees
2013-02-13 22:55 ` Jeff King
2013-02-14 0:48 ` Karsten Blees
2013-02-27 14:45 ` [PATCH] name-hash.c: fix endless loop with core.ignorecase=true Karsten Blees
2013-02-27 16:53 ` Junio C Hamano
2013-02-27 21:52 ` Karsten Blees
2013-02-27 23:57 ` [PATCH v2] " Karsten Blees
2013-02-28 0:27 ` Junio C Hamano
2013-02-19 9:49 ` inotify to minimize stat() calls Ramkumar Ramachandra
2013-02-19 14:25 ` Karsten Blees
2013-02-19 13:16 ` Drew Northup
2013-02-19 13:47 ` Duy Nguyen
2013-02-09 19:35 ` Junio C Hamano
2013-02-10 19:03 ` Robert Zeh
2013-02-10 19:26 ` Martin Fick
2013-02-10 20:18 ` Robert Zeh
2013-02-11 3:21 ` Duy Nguyen
2013-02-11 14:13 ` Robert Zeh
2013-02-19 9:57 ` Ramkumar Ramachandra
2013-04-24 17:20 ` [PATCH] " Robert Zeh
2013-04-24 21:32 ` Duy Nguyen
2013-04-25 19:44 ` Robert Zeh
2013-04-25 21:20 ` Duy Nguyen
2013-04-26 15:35 ` Robert Zeh
2013-04-25 8:18 ` Thomas Rast
2013-04-25 19:37 ` Robert Zeh
2013-04-25 19:59 ` Thomas Rast
2013-04-27 13:51 ` Thomas Rast
2013-04-27 23:56 ` Duy Nguyen
[not found] ` <CAKXa9=r2A7UeBV2s2H3wVGdPkS1zZ9huNJhtvTC-p0S5Ed12xA@mail.gmail.com>
2013-04-30 0:27 ` Duy Nguyen
2013-02-09 11:32 ` Ramkumar Ramachandra
2013-02-14 15:16 ` Ævar Arnfjörð Bjarmason
2013-02-14 16:31 ` Junio C Hamano
2013-02-19 9:40 ` Ramkumar Ramachandra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=511AAA92.4030508@gmail.com \
--to=karsten.blees@gmail.com \
--cc=artagnon@gmail.com \
--cc=blees@dcon.de \
--cc=finnag@pvv.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kusmabite@gmail.com \
--cc=pclouds@gmail.com \
--cc=robert.allan.zeh@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.