git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH v2 18/22] update-index: manually enable or disable untracked cache
Date: Sat,  8 Nov 2014 16:39:51 +0700	[thread overview]
Message-ID: <1415439595-469-19-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1415439595-469-1-git-send-email-pclouds@gmail.com>

Overall time saving on "git status" is about 40% in the best case
scenario, removing ..collect_untracked() as the most time consuming
function. read and refresh index operations are now at the top (which
should drop when index-helper and/or watchman support is added). More
numbers and analysis below.

webkit.git
==========

169k files. 6k dirs. Lots of test data (i.e. not touched most of the
time)

Base status
-----------

Index version 4 in split index mode and cache-tree populated. No
untracked cache. It shows how time is consumed by "git status". The
same settings are used for other repos below.

18:28:10.199679 builtin/commit.c:1394   performance: 0.000000451 s: cmd_status:setup
18:28:10.474847 read-cache.c:1407       performance: 0.274873831 s: read_index
18:28:10.475295 read-cache.c:1407       performance: 0.000000656 s: read_index
18:28:10.728443 preload-index.c:131     performance: 0.253147487 s: read_index_preload
18:28:10.741422 read-cache.c:1254       performance: 0.012868340 s: refresh_index
18:28:10.752300 wt-status.c:623         performance: 0.010421357 s: wt_status_collect_changes_worktree
18:28:10.762069 wt-status.c:629         performance: 0.009644748 s: wt_status_collect_changes_index
18:28:11.601019 wt-status.c:632         performance: 0.838859547 s: wt_status_collect_untracked
18:28:11.605939 builtin/commit.c:1421   performance: 0.004835004 s: cmd_status:update_index
18:28:11.606580 trace.c:415             performance: 1.407878388 s: git command: 'git' 'status'

Populating status
-----------------

This is after enabling untracked cache and the cache is still empty.
We see a slight increase in .._collect_untracked() and update_index
(because new cache has to be written to $GIT_DIR/index).

18:28:18.915213 builtin/commit.c:1394   performance: 0.000000326 s: cmd_status:setup
18:28:19.197364 read-cache.c:1407       performance: 0.281901416 s: read_index
18:28:19.197754 read-cache.c:1407       performance: 0.000000546 s: read_index
18:28:19.451355 preload-index.c:131     performance: 0.253599607 s: read_index_preload
18:28:19.464400 read-cache.c:1254       performance: 0.012935336 s: refresh_index
18:28:19.475115 wt-status.c:623         performance: 0.010236920 s: wt_status_collect_changes_worktree
18:28:19.486022 wt-status.c:629         performance: 0.010801685 s: wt_status_collect_changes_index
18:28:20.362660 wt-status.c:632         performance: 0.876551366 s: wt_status_collect_untracked
18:28:20.396199 builtin/commit.c:1421   performance: 0.033447969 s: cmd_status:update_index
18:28:20.396939 trace.c:415             performance: 1.482695902 s: git command: 'git' 'status'

Populated status
----------------

After the cache is populated, wt_status_collect_untracked() drops 82%
from 0.838s to 0.144s. Overall time drops 45%. Top offenders are now
read_index() and read_index_preload().

18:28:20.408605 builtin/commit.c:1394   performance: 0.000000457 s: cmd_status:setup
18:28:20.692864 read-cache.c:1407       performance: 0.283980458 s: read_index
18:28:20.693273 read-cache.c:1407       performance: 0.000000661 s: read_index
18:28:20.958814 preload-index.c:131     performance: 0.265540254 s: read_index_preload
18:28:20.972375 read-cache.c:1254       performance: 0.013437429 s: refresh_index
18:28:20.983959 wt-status.c:623         performance: 0.011146646 s: wt_status_collect_changes_worktree
18:28:20.993948 wt-status.c:629         performance: 0.009879094 s: wt_status_collect_changes_index
18:28:21.138125 wt-status.c:632         performance: 0.144084737 s: wt_status_collect_untracked
18:28:21.173678 builtin/commit.c:1421   performance: 0.035463949 s: cmd_status:update_index
18:28:21.174251 trace.c:415             performance: 0.766707355 s: git command: 'git' 'status'

gentoo-x86.git
==============

This repository is a strange one with a balanced, wide and shallow
worktree (about 100k files and 23k dirs) and no .gitignore in
worktree. .._collect_untracked() time drops 88%, total time drops 56%.

Base status
-----------
18:20:40.828642 builtin/commit.c:1394   performance: 0.000000496 s: cmd_status:setup
18:20:41.027233 read-cache.c:1407       performance: 0.198130532 s: read_index
18:20:41.027670 read-cache.c:1407       performance: 0.000000581 s: read_index
18:20:41.171716 preload-index.c:131     performance: 0.144045594 s: read_index_preload
18:20:41.179171 read-cache.c:1254       performance: 0.007320424 s: refresh_index
18:20:41.185785 wt-status.c:623         performance: 0.006144638 s: wt_status_collect_changes_worktree
18:20:41.192701 wt-status.c:629         performance: 0.006780184 s: wt_status_collect_changes_index
18:20:41.991723 wt-status.c:632         performance: 0.798927029 s: wt_status_collect_untracked
18:20:41.994664 builtin/commit.c:1421   performance: 0.002852772 s: cmd_status:update_index
18:20:41.995458 trace.c:415             performance: 1.168427502 s: git command: 'git' 'status'
Populating status
-----------------
18:20:48.968848 builtin/commit.c:1394   performance: 0.000000380 s: cmd_status:setup
18:20:49.172918 read-cache.c:1407       performance: 0.203734214 s: read_index
18:20:49.173341 read-cache.c:1407       performance: 0.000000562 s: read_index
18:20:49.320013 preload-index.c:131     performance: 0.146671391 s: read_index_preload
18:20:49.328039 read-cache.c:1254       performance: 0.007921957 s: refresh_index
18:20:49.334680 wt-status.c:623         performance: 0.006172020 s: wt_status_collect_changes_worktree
18:20:49.342526 wt-status.c:629         performance: 0.007731746 s: wt_status_collect_changes_index
18:20:50.257510 wt-status.c:632         performance: 0.914864222 s: wt_status_collect_untracked
18:20:50.338371 builtin/commit.c:1421   performance: 0.080776477 s: cmd_status:update_index
18:20:50.338900 trace.c:415             performance: 1.371462446 s: git command: 'git' 'status'
Populated status
----------------
18:20:50.351160 builtin/commit.c:1394   performance: 0.000000571 s: cmd_status:setup
18:20:50.577358 read-cache.c:1407       performance: 0.225917338 s: read_index
18:20:50.577794 read-cache.c:1407       performance: 0.000000617 s: read_index
18:20:50.734140 preload-index.c:131     performance: 0.156345564 s: read_index_preload
18:20:50.745717 read-cache.c:1254       performance: 0.011463075 s: refresh_index
18:20:50.755176 wt-status.c:623         performance: 0.008877929 s: wt_status_collect_changes_worktree
18:20:50.763768 wt-status.c:629         performance: 0.008471633 s: wt_status_collect_changes_index
18:20:50.854885 wt-status.c:632         performance: 0.090988721 s: wt_status_collect_untracked
18:20:50.857765 builtin/commit.c:1421   performance: 0.002789097 s: cmd_status:update_index
18:20:50.858411 trace.c:415             performance: 0.508647673 s: git command: 'git' 'status'

linux-2.6
=========

Reference repo. Not too big. .._collect_status() drops 84%. Total time
drops 42%.

Base status
-----------
18:34:09.870122 builtin/commit.c:1394   performance: 0.000000385 s: cmd_status:setup
18:34:09.943218 read-cache.c:1407       performance: 0.072871177 s: read_index
18:34:09.943614 read-cache.c:1407       performance: 0.000000491 s: read_index
18:34:10.004364 preload-index.c:131     performance: 0.060748102 s: read_index_preload
18:34:10.008190 read-cache.c:1254       performance: 0.003714285 s: refresh_index
18:34:10.012087 wt-status.c:623         performance: 0.002775446 s: wt_status_collect_changes_worktree
18:34:10.016054 wt-status.c:629         performance: 0.003862140 s: wt_status_collect_changes_index
18:34:10.214747 wt-status.c:632         performance: 0.198604837 s: wt_status_collect_untracked
18:34:10.216102 builtin/commit.c:1421   performance: 0.001244166 s: cmd_status:update_index
18:34:10.216817 trace.c:415             performance: 0.347670735 s: git command: 'git' 'status'
Populating status
-----------------
18:34:16.595102 builtin/commit.c:1394   performance: 0.000000456 s: cmd_status:setup
18:34:16.666600 read-cache.c:1407       performance: 0.070992413 s: read_index
18:34:16.667012 read-cache.c:1407       performance: 0.000000606 s: read_index
18:34:16.729375 preload-index.c:131     performance: 0.062362492 s: read_index_preload
18:34:16.732565 read-cache.c:1254       performance: 0.003075517 s: refresh_index
18:34:16.736148 wt-status.c:623         performance: 0.002422201 s: wt_status_collect_changes_worktree
18:34:16.739990 wt-status.c:629         performance: 0.003746618 s: wt_status_collect_changes_index
18:34:16.948505 wt-status.c:632         performance: 0.208426710 s: wt_status_collect_untracked
18:34:16.961744 builtin/commit.c:1421   performance: 0.013151887 s: cmd_status:update_index
18:34:16.962233 trace.c:415             performance: 0.368537535 s: git command: 'git' 'status'
Populated status
----------------
18:34:16.970026 builtin/commit.c:1394   performance: 0.000000631 s: cmd_status:setup
18:34:17.046235 read-cache.c:1407       performance: 0.075904673 s: read_index
18:34:17.046644 read-cache.c:1407       performance: 0.000000681 s: read_index
18:34:17.113564 preload-index.c:131     performance: 0.066920253 s: read_index_preload
18:34:17.117281 read-cache.c:1254       performance: 0.003604055 s: refresh_index
18:34:17.121115 wt-status.c:623         performance: 0.002508345 s: wt_status_collect_changes_worktree
18:34:17.125089 wt-status.c:629         performance: 0.003871636 s: wt_status_collect_changes_index
18:34:17.156089 wt-status.c:632         performance: 0.030895703 s: wt_status_collect_untracked
18:34:17.169861 builtin/commit.c:1421   performance: 0.013686404 s: cmd_status:update_index
18:34:17.170391 trace.c:415             performance: 0.201474531 s: git command: 'git' 'status'

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 Documentation/git-update-index.txt |  8 ++++++++
 builtin/update-index.c             | 16 ++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index dfc09d9..f9a35cd 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -172,6 +172,14 @@ may not support it yet.
 	the shared index file. This mode is designed for very large
 	indexes that take a signficant amount of time to read or write.
 
+--untracked-cache::
+--no-untracked-cache::
+	Enable or disable untracked cache extension. This could speed
+	up for commands that involve determining untracked files such
+	as `git status`. The underlying operating system and file
+	system must change `st_mtime` field of a directory if files
+	are added or deleted in that directory.
+
 \--::
 	Do not interpret any more arguments as options.
 
diff --git a/builtin/update-index.c b/builtin/update-index.c
index e8c7fd4..3d2dedd 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -740,6 +740,7 @@ static int reupdate_callback(struct parse_opt_ctx_t *ctx,
 int cmd_update_index(int argc, const char **argv, const char *prefix)
 {
 	int newfd, entries, has_errors = 0, line_termination = '\n';
+	int untracked_cache = -1;
 	int read_from_stdin = 0;
 	int prefix_length = prefix ? strlen(prefix) : 0;
 	int preferred_index_format = 0;
@@ -831,6 +832,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			N_("write index in this format")),
 		OPT_BOOL(0, "split-index", &split_index,
 			N_("enable or disable split index")),
+		OPT_BOOL(0, "untracked-cache", &untracked_cache,
+			N_("enable/disable untracked cache")),
 		OPT_END()
 	};
 
@@ -937,6 +940,19 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		the_index.split_index = NULL;
 		the_index.cache_changed |= SOMETHING_CHANGED;
 	}
+	if (untracked_cache > 0 && !the_index.untracked) {
+		struct untracked_cache *uc;
+
+		uc = xcalloc(1, sizeof(*uc));
+		uc->exclude_per_dir = ".gitignore";
+		/* should be the same flags used by git-status */
+		uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+		the_index.untracked = uc;
+		the_index.cache_changed |= UNTRACKED_CHANGED;
+	} else if (!untracked_cache && the_index.untracked) {
+		the_index.untracked = NULL;
+		the_index.cache_changed |= UNTRACKED_CHANGED;
+	}
 
 	if (active_cache_changed) {
 		if (newfd < 0) {
-- 
2.1.0.rc0.78.gc0d8480

  parent reply	other threads:[~2014-11-08  9:41 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-27 12:10 [PATCH 00/19] Untracked cache to speed up "git status" Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 01/19] dir.c: optionally compute sha-1 of a .gitignore file Nguyễn Thái Ngọc Duy
2014-10-27 22:46   ` Junio C Hamano
2014-10-28  0:12     ` Duy Nguyen
2014-10-28 17:37   ` Torsten Bögershausen
2014-11-02  1:25     ` Duy Nguyen
2014-10-27 12:10 ` [PATCH 02/19] untracked cache: record .gitignore information and dir hierarchy Nguyễn Thái Ngọc Duy
2014-10-28 17:37   ` Torsten Bögershausen
2014-10-27 12:10 ` [PATCH 03/19] untracked cache: initial untracked cache validation Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 04/19] untracked cache: invalidate dirs recursively if .gitignore changes Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 05/19] untracked cache: make a wrapper around {open,read,close}dir() Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 06/19] untracked cache: record/validate dir mtime and reuse cached output Nguyễn Thái Ngọc Duy
2014-10-30 16:19   ` Eric Sunshine
2014-10-27 12:10 ` [PATCH 07/19] untracked cache: mark what dirs should be recursed/saved Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 08/19] untracked cache: don't open non-existent .gitignore Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 09/19] untracked cache: save to an index extension Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 10/19] untracked cache: load from UNTR " Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 11/19] untracked cache: invalidate at index addition or removal Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 12/19] read-cache.c: split racy stat test to a separate function Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 13/19] untracked cache: avoid racy timestamps Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 14/19] untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 15/19] untracked cache: mark index dirty if untracked cache is updated Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 16/19] status: enable untracked cache Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 17/19] update-index: manually enable or disable " Nguyễn Thái Ngọc Duy
2014-10-27 12:10 ` [PATCH 18/19] update-index: test the system before enabling " Nguyễn Thái Ngọc Duy
2014-10-28 17:37   ` Torsten Bögershausen
2014-11-03 12:16     ` Duy Nguyen
2014-11-03 18:09     ` Junio C Hamano
2014-10-28 23:25   ` Eric Sunshine
2014-10-27 12:10 ` [PATCH 19/19] t7063: tests for " Nguyễn Thái Ngọc Duy
2014-11-08  9:39 ` [PATCH v2 00/22] untracked cache updates Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 01/22] dir.c: optionally compute sha-1 of a .gitignore file Nguyễn Thái Ngọc Duy
2014-11-17 19:31     ` David Turner
2014-11-08  9:39   ` [PATCH v2 02/22] untracked cache: record .gitignore information and dir hierarchy Nguyễn Thái Ngọc Duy
2014-11-08 17:08     ` brian m. carlson
2014-11-17 20:35     ` David Turner
2014-11-08  9:39   ` [PATCH v2 03/22] untracked cache: initial untracked cache validation Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 04/22] untracked cache: invalidate dirs recursively if .gitignore changes Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 05/22] untracked cache: make a wrapper around {open,read,close}dir() Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 06/22] untracked cache: record/validate dir mtime and reuse cached output Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 07/22] untracked cache: mark what dirs should be recursed/saved Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 08/22] untracked cache: don't open non-existent .gitignore Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 09/22] untracked cache: save to an index extension Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 10/22] untracked cache: load from UNTR " Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 11/22] untracked cache: invalidate at index addition or removal Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 12/22] read-cache.c: split racy stat test to a separate function Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 13/22] untracked cache: avoid racy timestamps Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 14/22] untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 15/22] untracked cache: mark index dirty if untracked cache is updated Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 16/22] untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 17/22] status: enable untracked cache Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` Nguyễn Thái Ngọc Duy [this message]
2014-11-08  9:39   ` [PATCH v2 19/22] update-index: test the system before enabling " Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 20/22] t7063: tests for " Nguyễn Thái Ngọc Duy
2014-11-08  9:39   ` [PATCH v2 21/22] mingw32: add uname() Nguyễn Thái Ngọc Duy
2014-11-09  3:32     ` Eric Sunshine
2014-11-09  8:36       ` Duy Nguyen
2014-11-09 11:46         ` Torsten Bögershausen
2014-11-09 18:47           ` Junio C Hamano
2014-11-08  9:39   ` [PATCH v2 22/22] untracked cache: guard and disable on system changes Nguyễn Thái Ngọc Duy
2014-11-09  3:39     ` Eric Sunshine
2014-11-09  8:34       ` Duy Nguyen
2014-11-09 21:39     ` Torsten Bögershausen
2014-11-09 23:47       ` Duy Nguyen
2014-11-10 20:48         ` Torsten Bögershausen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1415439595-469-19-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).