All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Duy Nguyen <pclouds@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Ramkumar Ramachandra" <artagnon@gmail.com>,
	"Robert Zeh" <robert.allan.zeh@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	finnag@pvv.org, "Torsten Bögershausen" <tboegi@web.de>
Subject: Re: inotify to minimize stat() calls
Date: Thu, 07 Mar 2013 23:16:19 +0100	[thread overview]
Message-ID: <513911B3.7010903@web.de> (raw)
In-Reply-To: <CACsJy8DnvAjQPL4aP_LRC7aqx6OC4M5dMtj-OUot76qET2z08Q@mail.gmail.com>

On 11.02.13 03:56, Duy Nguyen wrote:
> On Mon, Feb 11, 2013 at 3:16 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> The other "lstat()" experiment was a very interesting one, but this
>> is not yet an interesting experiment to see where in the "ignore"
>> codepath we are spending times.
>>
>> We know that we can tell wt_status_collect_untracked() not to bother
>> with the untracked or ignored files with !s->show_untracked_files
>> already, but I think the more interesting question is if we can show
>> the untracked files with less overhead.
>>
>> If we want to show untrackedd files, it is a given that we need to
>> read directories to see what paths there are on the filesystem. Is
>> the opendir/readdir cost dominating in the process? Are we spending
>> a lot of time sifting the result of opendir/readdir via the ignore
>> mechanism? Is reading the "ignore" files costing us much to prime
>> the ignore mechanism?
>>
>> If readdir cost is dominant, then that makes "cache gitignore" a
>> nonsense proposition, I think.  If you really want to "cache"
>> something, you need to have somebody (i.e. a daemon) who constantly
>> keeps an eye on the filesystem changes and can respond with the up
>> to date result directly to fill_directory().  I somehow doubt that
>> it is a direction we would want to go in, though.
> 
> Yeah, it did not cut out syscall cost, I also cut a lot of user-space
> processing (plus .gitignore content access). From the timings I posted
> earlier,
> 
>>         unmodified  dir.c
>> real    0m0.550s    0m0.287s
>> user    0m0.305s    0m0.201s
>> sys     0m0.240s    0m0.084s
> 
> sys time is reduced from 0.24s to 0.08s, so readdir+opendir definitely
> has something to do with it (and perhaps reading .gitignore). But it
> also reduces user time from 0.305 to 0.201s. I don't think avoiding
> readdir+openddir will bring us this gain. It's probably the cost of
> matching .gitignore. I'll try to replace opendir+readdir with a
> no-syscall version. At this point "untracked caching" sounds more
> feasible (and less complex) than ".gitignore cachine".
> 
Thanks for Duy for the measurements, and patches.
I took the freedom to convert the patched dir.c into a 
"runtime configurable" git status option.
I'm not sure if the following copy-and-paste work applies,
(it is based on Git 1.8.1.3), but the time spend for 
"git status --changed-only" is basically half the time of
"git status", similar to what Duy has measured.
I did a test both on a Linux box and Mac OS.

And the speedup is so impressive, that I am tempted to submit a patch simlar
to the following, what do we think about it?
/Torsten




-- >8 --

[PATCH] git status: add option changed-only
git status may be run faster if
- we only check if files are changed which are already known to git.
- we don't check if there are untracked files.

"git status --changed-only" (or the short form "git status -c")

will only check for changed files which are already known to git,
and which are in the index.

The call to read_directory_recursive() is skipped and untracked files
in the working tree are not reported.

Inspired-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
 builtin/commit.c | 2 ++
 dir.c            | 5 +++--
 dir.h            | 3 ++-
 wt-status.c      | 3 +++
 wt-status.h      | 1 +
 5 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d6dd3df..6a5ba11 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1158,6 +1158,8 @@ int cmd_status(int argc, const char **argv, const char *prefix)
 	unsigned char sha1[20];
 	static struct option builtin_status_options[] = {
 		OPT__VERBOSE(&verbose, N_("be verbose")),
+		OPT_BOOLEAN('c', "changed-only", &s.check_changed_only,
+			    N_("Ignore untracked files. Check if files known to git are modified")),
 		OPT_SET_INT('s', "short", &status_format,
 			    N_("show status concisely"), STATUS_FORMAT_SHORT),
 		OPT_BOOLEAN('b', "branch", &s.show_branch,
diff --git a/dir.c b/dir.c
index a473ca2..555b652 100644
--- a/dir.c
+++ b/dir.c
@@ -1274,8 +1274,9 @@ int read_directory(struct dir_struct *dir, const char *path, int len, const char
 		return dir->nr;
 
 	simplify = create_simplify(pathspec);
-	if (!len || treat_leading_path(dir, path, len, simplify))
-		read_directory_recursive(dir, path, len, 0, simplify);
+	if ((!(dir->flags & DIR_CHECK_CHANGED_ONLY)) &&
+			(!len || treat_leading_path(dir, path, len, simplify))) o
+			read_directory_recursive(dir, path, len, 0, simplify);
 	free_simplify(simplify);
 	qsort(dir->entries, dir->nr, sizeof(struct dir_entry *), cmp_name);
 	qsort(dir->ignored, dir->ignored_nr, sizeof(struct dir_entry *), cmp_name);
diff --git a/dir.h b/dir.h
index f5c89e3..1a915a7 100644
--- a/dir.h
+++ b/dir.h
@@ -41,7 +41,8 @@ struct dir_struct {
 		DIR_SHOW_OTHER_DIRECTORIES = 1<<1,
 		DIR_HIDE_EMPTY_DIRECTORIES = 1<<2,
 		DIR_NO_GITLINKS = 1<<3,
-		DIR_COLLECT_IGNORED = 1<<4
+		DIR_COLLECT_IGNORED = 1<<4,
+		DIR_CHECK_CHANGED_ONLY = 1<<5
 	} flags;
 	struct dir_entry **entries;
 	struct dir_entry **ignored;
diff --git a/wt-status.c b/wt-status.c
index d7cfe8f..b315785 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -503,6 +503,9 @@ static void wt_status_collect_untracked(struct wt_status *s)
 	if (s->show_untracked_files != SHOW_ALL_UNTRACKED_FILES)
 		dir.flags |=
 			DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+	if (s->check_changed_only)
+		dir.flags |= DIR_CHECK_CHANGED_ONLY;
+
 	setup_standard_excludes(&dir);
 
 	fill_directory(&dir, s->pathspec);
diff --git a/wt-status.h b/wt-status.h
index 236b41f..7eb0115 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -47,6 +47,7 @@ struct wt_status {
 	const char **pathspec;
 	int verbose;
 	int amend;
+	int check_changed_only;
 	enum commit_whence whence;
 	int nowarn;
 	int use_color;
-- 
1.8.2.rc2

  parent reply	other threads:[~2013-03-07 22:17 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-08 21:10 inotify to minimize stat() calls Ramkumar Ramachandra
2013-02-08 22:15 ` Junio C Hamano
2013-02-08 22:45   ` Junio C Hamano
2013-02-09  2:10     ` Duy Nguyen
2013-02-09  2:37       ` Junio C Hamano
2013-02-09  2:56     ` Junio C Hamano
2013-02-09  3:36       ` Robert Zeh
2013-02-09 12:05         ` Ramkumar Ramachandra
2013-02-09 12:11           ` Ramkumar Ramachandra
2013-02-09 12:53           ` Ramkumar Ramachandra
2013-02-09 12:59             ` Duy Nguyen
2013-02-09 17:10               ` Ramkumar Ramachandra
2013-02-09 18:56                 ` Ramkumar Ramachandra
2013-02-10  5:24                 ` Duy Nguyen
2013-02-10 11:17                   ` Duy Nguyen
2013-02-10 11:22                     ` Duy Nguyen
2013-02-10 20:16                       ` Junio C Hamano
2013-02-11  2:56                         ` Duy Nguyen
2013-02-11 11:12                           ` Duy Nguyen
2013-03-07 22:16                           ` Torsten Bögershausen [this message]
2013-03-08  0:04                             ` Junio C Hamano
2013-03-08  7:01                               ` Torsten Bögershausen
2013-03-08  8:15                                 ` Junio C Hamano
2013-03-08  9:24                                   ` Torsten Bögershausen
2013-03-08 10:53                                   ` Duy Nguyen
2013-03-10  8:23                                     ` Ramkumar Ramachandra
2013-03-13 12:59                                     ` [PATCH] status: hint the user about -uno if read_directory takes too long Nguyễn Thái Ngọc Duy
2013-03-13 15:21                                       ` Torsten Bögershausen
2013-03-13 16:16                                       ` Junio C Hamano
2013-03-14 10:22                                         ` Duy Nguyen
2013-03-14 15:05                                           ` Junio C Hamano
2013-03-15 12:30                                             ` Duy Nguyen
2013-03-15 15:52                                               ` Torsten Bögershausen
2013-03-15 15:57                                                 ` Ramkumar Ramachandra
2013-03-15 16:53                                                 ` Junio C Hamano
2013-03-15 17:41                                                   ` Torsten Bögershausen
2013-03-15 20:06                                                     ` Junio C Hamano
2013-03-15 21:14                                                       ` Torsten Bögershausen
2013-03-15 21:59                                                         ` Junio C Hamano
2013-03-16  7:21                                                           ` Torsten Bögershausen
2013-03-17  4:47                                                             ` Junio C Hamano
2013-03-16  1:51                                           ` Duy Nguyen
2013-02-10 13:26                     ` inotify to minimize stat() calls demerphq
2013-02-10 15:35                       ` Duy Nguyen
2013-02-14 14:36                       ` Magnus Bäck
2013-02-10 16:45                     ` Ramkumar Ramachandra
2013-02-11  3:03                       ` Duy Nguyen
2013-02-10 16:58                     ` Erik Faye-Lund
2013-02-11  3:53                       ` Duy Nguyen
2013-02-12 20:48                         ` Karsten Blees
2013-02-13 10:06                           ` Duy Nguyen
2013-02-13 12:15                           ` Duy Nguyen
2013-02-13 18:18                             ` Jeff King
2013-02-13 19:47                               ` Jeff King
2013-02-13 20:25                               ` Karsten Blees
2013-02-13 22:55                                 ` Jeff King
2013-02-14  0:48                                   ` Karsten Blees
2013-02-27 14:45                                     ` [PATCH] name-hash.c: fix endless loop with core.ignorecase=true Karsten Blees
2013-02-27 16:53                                       ` Junio C Hamano
2013-02-27 21:52                                         ` Karsten Blees
2013-02-27 23:57                                           ` [PATCH v2] " Karsten Blees
2013-02-28  0:27                                             ` Junio C Hamano
2013-02-19  9:49                           ` inotify to minimize stat() calls Ramkumar Ramachandra
2013-02-19 14:25                             ` Karsten Blees
2013-02-19 13:16                   ` Drew Northup
2013-02-19 13:47                     ` Duy Nguyen
2013-02-09 19:35           ` Junio C Hamano
2013-02-10 19:03             ` Robert Zeh
2013-02-10 19:26               ` Martin Fick
2013-02-10 20:18                 ` Robert Zeh
2013-02-11  3:21               ` Duy Nguyen
2013-02-11 14:13                 ` Robert Zeh
2013-02-19  9:57                   ` Ramkumar Ramachandra
2013-04-24 17:20               ` [PATCH] " Robert Zeh
2013-04-24 21:32                 ` Duy Nguyen
2013-04-25 19:44                   ` Robert Zeh
2013-04-25 21:20                     ` Duy Nguyen
2013-04-26 15:35                       ` Robert Zeh
2013-04-25  8:18                 ` Thomas Rast
2013-04-25 19:37                   ` Robert Zeh
2013-04-25 19:59                     ` Thomas Rast
2013-04-27 13:51                       ` Thomas Rast
2013-04-27 23:56                 ` Duy Nguyen
     [not found]                   ` <CAKXa9=r2A7UeBV2s2H3wVGdPkS1zZ9huNJhtvTC-p0S5Ed12xA@mail.gmail.com>
2013-04-30  0:27                     ` Duy Nguyen
2013-02-09 11:32       ` Ramkumar Ramachandra
2013-02-14 15:16 ` Ævar Arnfjörð Bjarmason
2013-02-14 16:31   ` Junio C Hamano
2013-02-19  9:40   ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=513911B3.7010903@web.de \
    --to=tboegi@web.de \
    --cc=artagnon@gmail.com \
    --cc=finnag@pvv.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=robert.allan.zeh@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.