From: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Thomas Rast <trast@inf.ethz.ch>, Jeff King <peff@peff.net>,
Piotr Krukowiecki <piotr.krukowiecki@gmail.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: git status: small difference between stating whole repository and small subdirectory
Date: Wed, 22 Feb 2012 17:34:19 +0700 [thread overview]
Message-ID: <20120222103418.GA27199@tre> (raw)
In-Reply-To: <7v8vjwgfoq.fsf@alter.siamese.dyndns.org>
On Tue, Feb 21, 2012 at 11:16:37AM -0800, Junio C Hamano wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
> > I'm aware that Jeff's tackling at lower level, which retains
> > cache-tree for many more cases.
> >
> > But this patch seems simple and safe
> > to me, and in my experience this case happens quite often (or maybe I
> > tend to keep my index clean). Junio, any chance this patch may get in?
>
> I do not think we are talking about a duplicated effort here.
>
> By definition, the change to hook into unpack_trees() and making sure we
> invalidate all the necessary subtrees in the cache cannot give you a cache
> tree that is more populated than what you started with. And the train of
> thought in Peff's message is to improve this invalidation---we currently
> invalidate everything ;-)
>
> Somebody has to populate the cache tree fully when we _know_ the index
> matches a certain tree, and adding a call to prime_cache_tree() in
> strategic places is a way to do so. The most obvious is write-tree, but
> there are a few other existing codepaths that do so.
>
> Because prime_cache_tree() by itself is a fairly expensive operation that
> reads all the trees recursively, its benefits need to be evaluated. It
> should to happen only in an operation that is already heavy-weight, is
> likely to have read all the trees and have many of them in-core cache, and
> also relatively rarely happens compared to "git add" so that the cost can
> be amortised over time, such as "reset --(hard|mixed)".
It's tradeoff. As you said prime_cache_tree() is expensive.
cache_tree_update is supposed to be cheap. But cache_tree_update() when
empty is even more expensive than prime_cache_tree(). So it boils down
how much cache-tree we have after unpack_trees() and whether it's worth
repopulate cache-tree again.
> Switching branches is likely to fall into that category, but that is just
> my gut feeling. I would feel better at night if somebody did a benchmark
> ;-)
I timed prime_cache_tree() and cache_tree_update() while switching branch
on linux-2.6, between v2.6.32 and a quite recent version. prime_cache_tree()
took ~55ms while cache_tree_update() 150ms or 90ms (depending on final tree).
It depends on your view, whether 55ms is expensive on such a reasonably large
repository. I took several seconds for me to complete the checkout though.
Checking out with "-q" prime_cache_tree() took 145ms and 80ms respectively,
as expensive as cache_tree_update()
If cache-tree is only used at commit time, I think we could delay
prime_cache_tree() until absolutely needed. We could add an optional index
extension recording the fact that index matches certain tree.
On the first cache_tree_invalidate_path(), if cache-tree is still
empty, we prime cache-tree, then invalidate the requested path.
It would then add no cost to a quick branch switch.
But if diff-cached also takes advantage of cache-tree, it's a different story.
Anyway, I think this patch does better than my last one
-- 8< --
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 6b9061f..e7eaeec 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -387,6 +387,7 @@ static int merge_working_tree(struct checkout_opts *opts,
int ret;
struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
int newfd = hold_locked_index(lock_file, 1);
+ int head_index_mismatch = 1;
if (read_cache_preload(NULL) < 0)
return error(_("corrupt index file"));
@@ -396,6 +397,7 @@ static int merge_working_tree(struct checkout_opts *opts,
ret = reset_tree(new->commit->tree, opts, 1);
if (ret)
return ret;
+ head_index_mismatch = 0;
} else {
struct tree_desc trees[2];
struct tree *tree;
@@ -490,7 +492,27 @@ static int merge_working_tree(struct checkout_opts *opts,
ret = reset_tree(new->commit->tree, opts, 0);
if (ret)
return ret;
- }
+ } else
+ head_index_mismatch = topts.head_index_mismatch;
+ }
+
+ /*
+ * Currently cache-tree is always destroyed after
+ * unpack_trees(). It's probably a good idea to repopulate
+ * cache-tree. If the user makes a few modifications and
+ * commits, tree generation woulda be cheap. If they switch
+ * away again, not so cheap.
+ *
+ * When unpack_trees() learns to retains as much cache-tree as
+ * possible, this code probably does not help much on tree
+ * generation, unless the tree difference between to heads are
+ * too far, little cache-tree can be kept.
+ */
+ if (!head_index_mismatch &&
+ !cache_tree_fully_valid(active_cache_tree)) {
+ if (!new->commit->tree->object.parsed)
+ parse_tree(new->commit->tree);
+ prime_cache_tree(&active_cache_tree, new->commit->tree);
}
if (write_cache(newfd, active_cache, active_nr) ||
diff --git a/unpack-trees.c b/unpack-trees.c
index 7c9ecf6..f2c518f 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1022,6 +1022,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
o->result.timestamp.nsec = o->src_index->timestamp.nsec;
o->merge_size = len;
mark_all_ce_unused(o->src_index);
+ if (o->fn != twoway_merge)
+ o->head_index_mismatch = 1;
/*
* Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries
@@ -1736,6 +1738,8 @@ int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o)
(oldtree && newtree &&
!same(oldtree, newtree) && /* 18 and 19 */
same(current, newtree))) {
+ if (!newtree || (newtree && !same(current, newtree)))
+ o->head_index_mismatch = 1;
return keep_entry(current, o);
}
else if (oldtree && !newtree && same(current, oldtree)) {
diff --git a/unpack-trees.h b/unpack-trees.h
index 5e432f5..b75b64e 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -48,7 +48,8 @@ struct unpack_trees_options {
gently,
exiting_early,
show_all_errors,
- dry_run;
+ dry_run,
+ head_index_mismatch;
const char *prefix;
int cache_bottom;
struct dir_struct *dir;
-- 8< --
next prev parent reply other threads:[~2012-02-22 10:32 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 9:42 git status: small difference between stating whole repository and small subdirectory Piotr Krukowiecki
2012-02-10 12:33 ` Nguyen Thai Ngoc Duy
2012-02-10 13:46 ` Piotr Krukowiecki
2012-02-10 14:37 ` Nguyen Thai Ngoc Duy
2012-02-13 16:54 ` Piotr Krukowiecki
2012-02-10 16:18 ` Piotr Krukowiecki
2012-02-14 11:34 ` Thomas Rast
2012-02-15 8:57 ` Piotr Krukowiecki
2012-02-15 11:01 ` Nguyen Thai Ngoc Duy
2012-02-15 15:14 ` Piotr Krukowiecki
2012-02-16 13:22 ` Piotr Krukowiecki
2012-02-15 19:03 ` Jeff King
2012-02-16 13:37 ` Piotr Krukowiecki
2012-02-16 14:05 ` Thomas Rast
2012-02-16 20:15 ` Junio C Hamano
2012-02-17 16:55 ` Piotr Krukowiecki
2012-02-16 19:20 ` Jeff King
2012-02-17 17:19 ` Piotr Krukowiecki
2012-02-17 20:37 ` Jeff King
2012-02-17 22:25 ` Junio C Hamano
2012-02-17 22:29 ` Jeff King
2012-02-20 8:25 ` Piotr Krukowiecki
2012-02-20 14:06 ` Jeff King
2012-02-20 14:09 ` Thomas Rast
2012-02-20 14:36 ` Nguyen Thai Ngoc Duy
2012-02-20 14:39 ` Jeff King
2012-02-20 15:11 ` Jeff King
2012-02-20 18:45 ` Thomas Rast
2012-02-20 20:35 ` Jeff King
2012-02-20 22:04 ` Junio C Hamano
2012-02-20 22:41 ` Jeff King
2012-02-20 23:31 ` Junio C Hamano
2012-02-21 7:21 ` Piotr Krukowiecki
2012-02-20 20:08 ` Junio C Hamano
2012-02-20 20:17 ` Jeff King
2012-02-21 14:45 ` Nguyen Thai Ngoc Duy
2012-02-21 19:16 ` Junio C Hamano
2012-02-22 2:12 ` Nguyen Thai Ngoc Duy
2012-02-22 2:55 ` Junio C Hamano
2012-02-22 12:54 ` Nguyen Thai Ngoc Duy
2012-02-22 13:17 ` Thomas Rast
2012-02-22 10:34 ` Nguyen Thai Ngoc Duy [this message]
2012-02-22 3:32 ` Junio C Hamano
2012-04-10 15:16 ` Piotr Krukowiecki
2012-04-10 16:23 ` Junio C Hamano
2012-04-10 18:00 ` Jeff King
2012-02-20 19:57 ` Junio C Hamano
2012-02-20 19:59 ` Thomas Rast
2012-02-20 14:16 ` Nguyen Thai Ngoc Duy
2012-02-20 14:22 ` Jeff King
2012-02-20 19:56 ` Junio C Hamano
2012-02-20 20:09 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120222103418.GA27199@tre \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=piotr.krukowiecki@gmail.com \
--cc=trast@inf.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).