All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: [PATCH 19/24] update-index: manually enable or disable untracked cache
Date: Tue, 20 Jan 2015 20:03:28 +0700	[thread overview]
Message-ID: <1421759013-8494-20-git-send-email-pclouds@gmail.com> (raw)
In-Reply-To: <1421759013-8494-1-git-send-email-pclouds@gmail.com>

Overall time saving on "git status" is about 40% in the best case
scenario, removing ..collect_untracked() as the most time consuming
function. read and refresh index operations are now at the top (which
should drop when index-helper and/or watchman support is added). More
numbers and analysis below.

webkit.git
==========

169k files. 6k dirs. Lots of test data (i.e. not touched most of the
time)

Base status
-----------

Index version 4 in split index mode and cache-tree populated. No
untracked cache. It shows how time is consumed by "git status". The
same settings are used for other repos below.

18:28:10.199679 builtin/commit.c:1394   performance: 0.000000451 s: cmd_status:setup
18:28:10.474847 read-cache.c:1407       performance: 0.274873831 s: read_index
18:28:10.475295 read-cache.c:1407       performance: 0.000000656 s: read_index
18:28:10.728443 preload-index.c:131     performance: 0.253147487 s: read_index_preload
18:28:10.741422 read-cache.c:1254       performance: 0.012868340 s: refresh_index
18:28:10.752300 wt-status.c:623         performance: 0.010421357 s: wt_status_collect_changes_worktree
18:28:10.762069 wt-status.c:629         performance: 0.009644748 s: wt_status_collect_changes_index
18:28:11.601019 wt-status.c:632         performance: 0.838859547 s: wt_status_collect_untracked
18:28:11.605939 builtin/commit.c:1421   performance: 0.004835004 s: cmd_status:update_index
18:28:11.606580 trace.c:415             performance: 1.407878388 s: git command: 'git' 'status'

Populating status
-----------------

This is after enabling untracked cache and the cache is still empty.
We see a slight increase in .._collect_untracked() and update_index
(because new cache has to be written to $GIT_DIR/index).

18:28:18.915213 builtin/commit.c:1394   performance: 0.000000326 s: cmd_status:setup
18:28:19.197364 read-cache.c:1407       performance: 0.281901416 s: read_index
18:28:19.197754 read-cache.c:1407       performance: 0.000000546 s: read_index
18:28:19.451355 preload-index.c:131     performance: 0.253599607 s: read_index_preload
18:28:19.464400 read-cache.c:1254       performance: 0.012935336 s: refresh_index
18:28:19.475115 wt-status.c:623         performance: 0.010236920 s: wt_status_collect_changes_worktree
18:28:19.486022 wt-status.c:629         performance: 0.010801685 s: wt_status_collect_changes_index
18:28:20.362660 wt-status.c:632         performance: 0.876551366 s: wt_status_collect_untracked
18:28:20.396199 builtin/commit.c:1421   performance: 0.033447969 s: cmd_status:update_index
18:28:20.396939 trace.c:415             performance: 1.482695902 s: git command: 'git' 'status'

Populated status
----------------

After the cache is populated, wt_status_collect_untracked() drops 82%
from 0.838s to 0.144s. Overall time drops 45%. Top offenders are now
read_index() and read_index_preload().

18:28:20.408605 builtin/commit.c:1394   performance: 0.000000457 s: cmd_status:setup
18:28:20.692864 read-cache.c:1407       performance: 0.283980458 s: read_index
18:28:20.693273 read-cache.c:1407       performance: 0.000000661 s: read_index
18:28:20.958814 preload-index.c:131     performance: 0.265540254 s: read_index_preload
18:28:20.972375 read-cache.c:1254       performance: 0.013437429 s: refresh_index
18:28:20.983959 wt-status.c:623         performance: 0.011146646 s: wt_status_collect_changes_worktree
18:28:20.993948 wt-status.c:629         performance: 0.009879094 s: wt_status_collect_changes_index
18:28:21.138125 wt-status.c:632         performance: 0.144084737 s: wt_status_collect_untracked
18:28:21.173678 builtin/commit.c:1421   performance: 0.035463949 s: cmd_status:update_index
18:28:21.174251 trace.c:415             performance: 0.766707355 s: git command: 'git' 'status'

gentoo-x86.git
==============

This repository is a strange one with a balanced, wide and shallow
worktree (about 100k files and 23k dirs) and no .gitignore in
worktree. .._collect_untracked() time drops 88%, total time drops 56%.

Base status
-----------
18:20:40.828642 builtin/commit.c:1394   performance: 0.000000496 s: cmd_status:setup
18:20:41.027233 read-cache.c:1407       performance: 0.198130532 s: read_index
18:20:41.027670 read-cache.c:1407       performance: 0.000000581 s: read_index
18:20:41.171716 preload-index.c:131     performance: 0.144045594 s: read_index_preload
18:20:41.179171 read-cache.c:1254       performance: 0.007320424 s: refresh_index
18:20:41.185785 wt-status.c:623         performance: 0.006144638 s: wt_status_collect_changes_worktree
18:20:41.192701 wt-status.c:629         performance: 0.006780184 s: wt_status_collect_changes_index
18:20:41.991723 wt-status.c:632         performance: 0.798927029 s: wt_status_collect_untracked
18:20:41.994664 builtin/commit.c:1421   performance: 0.002852772 s: cmd_status:update_index
18:20:41.995458 trace.c:415             performance: 1.168427502 s: git command: 'git' 'status'
Populating status
-----------------
18:20:48.968848 builtin/commit.c:1394   performance: 0.000000380 s: cmd_status:setup
18:20:49.172918 read-cache.c:1407       performance: 0.203734214 s: read_index
18:20:49.173341 read-cache.c:1407       performance: 0.000000562 s: read_index
18:20:49.320013 preload-index.c:131     performance: 0.146671391 s: read_index_preload
18:20:49.328039 read-cache.c:1254       performance: 0.007921957 s: refresh_index
18:20:49.334680 wt-status.c:623         performance: 0.006172020 s: wt_status_collect_changes_worktree
18:20:49.342526 wt-status.c:629         performance: 0.007731746 s: wt_status_collect_changes_index
18:20:50.257510 wt-status.c:632         performance: 0.914864222 s: wt_status_collect_untracked
18:20:50.338371 builtin/commit.c:1421   performance: 0.080776477 s: cmd_status:update_index
18:20:50.338900 trace.c:415             performance: 1.371462446 s: git command: 'git' 'status'
Populated status
----------------
18:20:50.351160 builtin/commit.c:1394   performance: 0.000000571 s: cmd_status:setup
18:20:50.577358 read-cache.c:1407       performance: 0.225917338 s: read_index
18:20:50.577794 read-cache.c:1407       performance: 0.000000617 s: read_index
18:20:50.734140 preload-index.c:131     performance: 0.156345564 s: read_index_preload
18:20:50.745717 read-cache.c:1254       performance: 0.011463075 s: refresh_index
18:20:50.755176 wt-status.c:623         performance: 0.008877929 s: wt_status_collect_changes_worktree
18:20:50.763768 wt-status.c:629         performance: 0.008471633 s: wt_status_collect_changes_index
18:20:50.854885 wt-status.c:632         performance: 0.090988721 s: wt_status_collect_untracked
18:20:50.857765 builtin/commit.c:1421   performance: 0.002789097 s: cmd_status:update_index
18:20:50.858411 trace.c:415             performance: 0.508647673 s: git command: 'git' 'status'

linux-2.6
=========

Reference repo. Not too big. .._collect_status() drops 84%. Total time
drops 42%.

Base status
-----------
18:34:09.870122 builtin/commit.c:1394   performance: 0.000000385 s: cmd_status:setup
18:34:09.943218 read-cache.c:1407       performance: 0.072871177 s: read_index
18:34:09.943614 read-cache.c:1407       performance: 0.000000491 s: read_index
18:34:10.004364 preload-index.c:131     performance: 0.060748102 s: read_index_preload
18:34:10.008190 read-cache.c:1254       performance: 0.003714285 s: refresh_index
18:34:10.012087 wt-status.c:623         performance: 0.002775446 s: wt_status_collect_changes_worktree
18:34:10.016054 wt-status.c:629         performance: 0.003862140 s: wt_status_collect_changes_index
18:34:10.214747 wt-status.c:632         performance: 0.198604837 s: wt_status_collect_untracked
18:34:10.216102 builtin/commit.c:1421   performance: 0.001244166 s: cmd_status:update_index
18:34:10.216817 trace.c:415             performance: 0.347670735 s: git command: 'git' 'status'
Populating status
-----------------
18:34:16.595102 builtin/commit.c:1394   performance: 0.000000456 s: cmd_status:setup
18:34:16.666600 read-cache.c:1407       performance: 0.070992413 s: read_index
18:34:16.667012 read-cache.c:1407       performance: 0.000000606 s: read_index
18:34:16.729375 preload-index.c:131     performance: 0.062362492 s: read_index_preload
18:34:16.732565 read-cache.c:1254       performance: 0.003075517 s: refresh_index
18:34:16.736148 wt-status.c:623         performance: 0.002422201 s: wt_status_collect_changes_worktree
18:34:16.739990 wt-status.c:629         performance: 0.003746618 s: wt_status_collect_changes_index
18:34:16.948505 wt-status.c:632         performance: 0.208426710 s: wt_status_collect_untracked
18:34:16.961744 builtin/commit.c:1421   performance: 0.013151887 s: cmd_status:update_index
18:34:16.962233 trace.c:415             performance: 0.368537535 s: git command: 'git' 'status'
Populated status
----------------
18:34:16.970026 builtin/commit.c:1394   performance: 0.000000631 s: cmd_status:setup
18:34:17.046235 read-cache.c:1407       performance: 0.075904673 s: read_index
18:34:17.046644 read-cache.c:1407       performance: 0.000000681 s: read_index
18:34:17.113564 preload-index.c:131     performance: 0.066920253 s: read_index_preload
18:34:17.117281 read-cache.c:1254       performance: 0.003604055 s: refresh_index
18:34:17.121115 wt-status.c:623         performance: 0.002508345 s: wt_status_collect_changes_worktree
18:34:17.125089 wt-status.c:629         performance: 0.003871636 s: wt_status_collect_changes_index
18:34:17.156089 wt-status.c:632         performance: 0.030895703 s: wt_status_collect_untracked
18:34:17.169861 builtin/commit.c:1421   performance: 0.013686404 s: cmd_status:update_index
18:34:17.170391 trace.c:415             performance: 0.201474531 s: git command: 'git' 'status'

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/git-update-index.txt |  8 ++++++++
 builtin/update-index.c             | 16 ++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index dfc09d9..f9a35cd 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -172,6 +172,14 @@ may not support it yet.
 	the shared index file. This mode is designed for very large
 	indexes that take a signficant amount of time to read or write.
 
+--untracked-cache::
+--no-untracked-cache::
+	Enable or disable untracked cache extension. This could speed
+	up for commands that involve determining untracked files such
+	as `git status`. The underlying operating system and file
+	system must change `st_mtime` field of a directory if files
+	are added or deleted in that directory.
+
 \--::
 	Do not interpret any more arguments as options.
 
diff --git a/builtin/update-index.c b/builtin/update-index.c
index e8c7fd4..3d2dedd 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -740,6 +740,7 @@ static int reupdate_callback(struct parse_opt_ctx_t *ctx,
 int cmd_update_index(int argc, const char **argv, const char *prefix)
 {
 	int newfd, entries, has_errors = 0, line_termination = '\n';
+	int untracked_cache = -1;
 	int read_from_stdin = 0;
 	int prefix_length = prefix ? strlen(prefix) : 0;
 	int preferred_index_format = 0;
@@ -831,6 +832,8 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			N_("write index in this format")),
 		OPT_BOOL(0, "split-index", &split_index,
 			N_("enable or disable split index")),
+		OPT_BOOL(0, "untracked-cache", &untracked_cache,
+			N_("enable/disable untracked cache")),
 		OPT_END()
 	};
 
@@ -937,6 +940,19 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 		the_index.split_index = NULL;
 		the_index.cache_changed |= SOMETHING_CHANGED;
 	}
+	if (untracked_cache > 0 && !the_index.untracked) {
+		struct untracked_cache *uc;
+
+		uc = xcalloc(1, sizeof(*uc));
+		uc->exclude_per_dir = ".gitignore";
+		/* should be the same flags used by git-status */
+		uc->dir_flags = DIR_SHOW_OTHER_DIRECTORIES | DIR_HIDE_EMPTY_DIRECTORIES;
+		the_index.untracked = uc;
+		the_index.cache_changed |= UNTRACKED_CHANGED;
+	} else if (!untracked_cache && the_index.untracked) {
+		the_index.untracked = NULL;
+		the_index.cache_changed |= UNTRACKED_CHANGED;
+	}
 
 	if (active_cache_changed) {
 		if (newfd < 0) {
-- 
2.2.0.84.ge9c7a8a

  parent reply	other threads:[~2015-01-20 13:05 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-20 13:03 [PATCH 00/24] nd/untracked-cache update Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 01/24] dir.c: optionally compute sha-1 of a .gitignore file Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 02/24] untracked cache: record .gitignore information and dir hierarchy Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 03/24] untracked cache: initial untracked cache validation Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 04/24] untracked cache: invalidate dirs recursively if .gitignore changes Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 05/24] untracked cache: make a wrapper around {open,read,close}dir() Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 06/24] untracked cache: record/validate dir mtime and reuse cached output Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 07/24] untracked cache: mark what dirs should be recursed/saved Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 08/24] untracked cache: don't open non-existent .gitignore Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 09/24] ewah: add convenient wrapper ewah_serialize_strbuf() Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 10/24] untracked cache: save to an index extension Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 11/24] untracked cache: load from UNTR " Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 12/24] untracked cache: invalidate at index addition or removal Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 13/24] read-cache.c: split racy stat test to a separate function Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 14/24] untracked cache: avoid racy timestamps Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 15/24] untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 16/24] untracked cache: mark index dirty if untracked cache is updated Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 17/24] untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 18/24] status: enable untracked cache Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` Nguyễn Thái Ngọc Duy [this message]
2015-01-20 13:03 ` [PATCH 20/24] update-index: test the system before enabling " Nguyễn Thái Ngọc Duy
2015-01-21  8:32   ` Junio C Hamano
2015-01-21  9:46     ` Duy Nguyen
2015-01-21 18:51       ` Junio C Hamano
2015-01-22 10:26         ` Duy Nguyen
2015-01-22 18:49           ` Junio C Hamano
2015-01-24  2:51             ` Duy Nguyen
2015-01-20 13:03 ` [PATCH 21/24] t7063: tests for " Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 22/24] mingw32: add uname() Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 23/24] untracked cache: guard and disable on system changes Nguyễn Thái Ngọc Duy
2015-01-20 13:03 ` [PATCH 24/24] git-status.txt: advertisement for untracked cache Nguyễn Thái Ngọc Duy
2015-01-22 20:16   ` brian m. carlson
2015-01-20 19:28 ` [PATCH 00/24] nd/untracked-cache update Torsten Bögershausen
2015-01-21  9:49   ` Duy Nguyen
  -- strict thread matches above, loose matches on Subject: below --
2015-02-08  8:55 [PATCH 00/24] nd/untracked-cache updates Nguyễn Thái Ngọc Duy
2015-02-08  8:55 ` [PATCH 19/24] update-index: manually enable or disable untracked cache Nguyễn Thái Ngọc Duy
2015-03-08 10:12 [PATCH 00/24] nd/untracked-cache updates Nguyễn Thái Ngọc Duy
2015-03-08 10:12 ` [PATCH 19/24] update-index: manually enable or disable untracked cache Nguyễn Thái Ngọc Duy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1421759013-8494-20-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.