git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Thomas Rast <trast@inf.ethz.ch>, Jeff King <peff@peff.net>,
	Piotr Krukowiecki <piotr.krukowiecki@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: git status: small difference between stating whole repository and small subdirectory
Date: Wed, 22 Feb 2012 17:34:19 +0700	[thread overview]
Message-ID: <20120222103418.GA27199@tre> (raw)
In-Reply-To: <7v8vjwgfoq.fsf@alter.siamese.dyndns.org>

On Tue, Feb 21, 2012 at 11:16:37AM -0800, Junio C Hamano wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> 
> > I'm aware that Jeff's tackling at lower level, which retains
> > cache-tree for many more cases.
> >
> > But this patch seems simple and safe
> > to me, and in my experience this case happens quite often (or maybe I
> > tend to keep my index clean). Junio, any chance this patch may get in?
> 
> I do not think we are talking about a duplicated effort here.
> 
> By definition, the change to hook into unpack_trees() and making sure we
> invalidate all the necessary subtrees in the cache cannot give you a cache
> tree that is more populated than what you started with.  And the train of
> thought in Peff's message is to improve this invalidation---we currently
> invalidate everything ;-)
> 
> Somebody has to populate the cache tree fully when we _know_ the index
> matches a certain tree, and adding a call to prime_cache_tree() in
> strategic places is a way to do so.  The most obvious is write-tree, but
> there are a few other existing codepaths that do so.
> 
> Because prime_cache_tree() by itself is a fairly expensive operation that
> reads all the trees recursively, its benefits need to be evaluated. It
> should to happen only in an operation that is already heavy-weight, is
> likely to have read all the trees and have many of them in-core cache, and
> also relatively rarely happens compared to "git add" so that the cost can
> be amortised over time, such as "reset --(hard|mixed)".

It's tradeoff. As you said prime_cache_tree() is expensive.
cache_tree_update is supposed to be cheap. But cache_tree_update() when
empty is even more expensive than prime_cache_tree(). So it boils down
how much cache-tree we have after unpack_trees() and whether it's worth
repopulate cache-tree again.

> Switching branches is likely to fall into that category, but that is just
> my gut feeling.  I would feel better at night if somebody did a benchmark
> ;-)

I timed prime_cache_tree() and cache_tree_update() while switching branch
on linux-2.6, between v2.6.32 and a quite recent version. prime_cache_tree()
took ~55ms while cache_tree_update() 150ms or 90ms (depending on final tree).
It depends on your view, whether 55ms is expensive on such a reasonably large
repository. I took several seconds for me to complete the checkout though.

Checking out with "-q" prime_cache_tree() took 145ms and 80ms respectively,
as expensive as cache_tree_update()

If cache-tree is only used at commit time, I think we could delay
prime_cache_tree() until absolutely needed. We could add an optional index
extension recording the fact that index matches certain tree.
On the first cache_tree_invalidate_path(), if cache-tree is still
empty, we prime cache-tree, then invalidate the requested path.
It would then add no cost to a quick branch switch.

But if diff-cached also takes advantage of cache-tree, it's a different story.

Anyway, I think this patch does better than my last one

-- 8< --
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 6b9061f..e7eaeec 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -387,6 +387,7 @@ static int merge_working_tree(struct checkout_opts *opts,
 	int ret;
 	struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
 	int newfd = hold_locked_index(lock_file, 1);
+	int head_index_mismatch = 1;
 
 	if (read_cache_preload(NULL) < 0)
 		return error(_("corrupt index file"));
@@ -396,6 +397,7 @@ static int merge_working_tree(struct checkout_opts *opts,
 		ret = reset_tree(new->commit->tree, opts, 1);
 		if (ret)
 			return ret;
+		head_index_mismatch = 0;
 	} else {
 		struct tree_desc trees[2];
 		struct tree *tree;
@@ -490,7 +492,27 @@ static int merge_working_tree(struct checkout_opts *opts,
 			ret = reset_tree(new->commit->tree, opts, 0);
 			if (ret)
 				return ret;
-		}
+		} else
+			head_index_mismatch = topts.head_index_mismatch;
+	}
+
+	/*
+	 * Currently cache-tree is always destroyed after
+	 * unpack_trees(). It's probably a good idea to repopulate
+	 * cache-tree. If the user makes a few modifications and
+	 * commits, tree generation woulda be cheap. If they switch
+	 * away again, not so cheap.
+	 *
+	 * When unpack_trees() learns to retains as much cache-tree as
+	 * possible, this code probably does not help much on tree
+	 * generation, unless the tree difference between to heads are
+	 * too far, little cache-tree can be kept.
+	 */
+	if (!head_index_mismatch &&
+	    !cache_tree_fully_valid(active_cache_tree)) {
+		if (!new->commit->tree->object.parsed)
+			parse_tree(new->commit->tree);
+		prime_cache_tree(&active_cache_tree, new->commit->tree);
 	}
 
 	if (write_cache(newfd, active_cache, active_nr) ||
diff --git a/unpack-trees.c b/unpack-trees.c
index 7c9ecf6..f2c518f 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1022,6 +1022,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	o->result.timestamp.nsec = o->src_index->timestamp.nsec;
 	o->merge_size = len;
 	mark_all_ce_unused(o->src_index);
+	if (o->fn != twoway_merge)
+		o->head_index_mismatch = 1;
 
 	/*
 	 * Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries
@@ -1736,6 +1738,8 @@ int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o)
 		    (oldtree && newtree &&
 		     !same(oldtree, newtree) && /* 18 and 19 */
 		     same(current, newtree))) {
+			if (!newtree || (newtree && !same(current, newtree)))
+				o->head_index_mismatch = 1;
 			return keep_entry(current, o);
 		}
 		else if (oldtree && !newtree && same(current, oldtree)) {
diff --git a/unpack-trees.h b/unpack-trees.h
index 5e432f5..b75b64e 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -48,7 +48,8 @@ struct unpack_trees_options {
 		     gently,
 		     exiting_early,
 		     show_all_errors,
-		     dry_run;
+		     dry_run,
+		     head_index_mismatch;
 	const char *prefix;
 	int cache_bottom;
 	struct dir_struct *dir;
-- 8< --

  parent reply	other threads:[~2012-02-22 10:32 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-10  9:42 git status: small difference between stating whole repository and small subdirectory Piotr Krukowiecki
2012-02-10 12:33 ` Nguyen Thai Ngoc Duy
2012-02-10 13:46   ` Piotr Krukowiecki
2012-02-10 14:37     ` Nguyen Thai Ngoc Duy
2012-02-13 16:54       ` Piotr Krukowiecki
2012-02-10 16:18 ` Piotr Krukowiecki
2012-02-14 11:34   ` Thomas Rast
2012-02-15  8:57     ` Piotr Krukowiecki
2012-02-15 11:01       ` Nguyen Thai Ngoc Duy
2012-02-15 15:14         ` Piotr Krukowiecki
2012-02-16 13:22           ` Piotr Krukowiecki
2012-02-15 19:03       ` Jeff King
2012-02-16 13:37         ` Piotr Krukowiecki
2012-02-16 14:05           ` Thomas Rast
2012-02-16 20:15             ` Junio C Hamano
2012-02-17 16:55             ` Piotr Krukowiecki
2012-02-16 19:20           ` Jeff King
2012-02-17 17:19             ` Piotr Krukowiecki
2012-02-17 20:37               ` Jeff King
2012-02-17 22:25                 ` Junio C Hamano
2012-02-17 22:29                   ` Jeff King
2012-02-20  8:25                     ` Piotr Krukowiecki
2012-02-20 14:06                       ` Jeff King
2012-02-20 14:09                         ` Thomas Rast
2012-02-20 14:36                           ` Nguyen Thai Ngoc Duy
2012-02-20 14:39                             ` Jeff King
2012-02-20 15:11                               ` Jeff King
2012-02-20 18:45                                 ` Thomas Rast
2012-02-20 20:35                                   ` Jeff King
2012-02-20 22:04                                     ` Junio C Hamano
2012-02-20 22:41                                       ` Jeff King
2012-02-20 23:31                                         ` Junio C Hamano
2012-02-21  7:21                                           ` Piotr Krukowiecki
2012-02-20 20:08                                 ` Junio C Hamano
2012-02-20 20:17                                   ` Jeff King
2012-02-21 14:45                             ` Nguyen Thai Ngoc Duy
2012-02-21 19:16                               ` Junio C Hamano
2012-02-22  2:12                                 ` Nguyen Thai Ngoc Duy
2012-02-22  2:55                                   ` Junio C Hamano
2012-02-22 12:54                                     ` Nguyen Thai Ngoc Duy
2012-02-22 13:17                                       ` Thomas Rast
2012-02-22 10:34                                 ` Nguyen Thai Ngoc Duy [this message]
2012-02-22  3:32                               ` Junio C Hamano
2012-04-10 15:16                                 ` Piotr Krukowiecki
2012-04-10 16:23                                   ` Junio C Hamano
2012-04-10 18:00                                     ` Jeff King
2012-02-20 19:57                           ` Junio C Hamano
2012-02-20 19:59                             ` Thomas Rast
2012-02-20 14:16                         ` Nguyen Thai Ngoc Duy
2012-02-20 14:22                           ` Jeff King
2012-02-20 19:56                         ` Junio C Hamano
2012-02-20 20:09                           ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120222103418.GA27199@tre \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=piotr.krukowiecki@gmail.com \
    --cc=trast@inf.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).