From: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: Thomas Rast <trast@inf.ethz.ch>, Jeff King <peff@peff.net>,
Piotr Krukowiecki <piotr.krukowiecki@gmail.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Re: git status: small difference between stating whole repository and small subdirectory
Date: Wed, 22 Feb 2012 17:34:19 +0700 [thread overview]
Message-ID: <20120222103418.GA27199@tre> (raw)
In-Reply-To: <7v8vjwgfoq.fsf@alter.siamese.dyndns.org>
On Tue, Feb 21, 2012 at 11:16:37AM -0800, Junio C Hamano wrote:
> Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
>
> > I'm aware that Jeff's tackling at lower level, which retains
> > cache-tree for many more cases.
> >
> > But this patch seems simple and safe
> > to me, and in my experience this case happens quite often (or maybe I
> > tend to keep my index clean). Junio, any chance this patch may get in?
>
> I do not think we are talking about a duplicated effort here.
>
> By definition, the change to hook into unpack_trees() and making sure we
> invalidate all the necessary subtrees in the cache cannot give you a cache
> tree that is more populated than what you started with. And the train of
> thought in Peff's message is to improve this invalidation---we currently
> invalidate everything ;-)
>
> Somebody has to populate the cache tree fully when we _know_ the index
> matches a certain tree, and adding a call to prime_cache_tree() in
> strategic places is a way to do so. The most obvious is write-tree, but
> there are a few other existing codepaths that do so.
>
> Because prime_cache_tree() by itself is a fairly expensive operation that
> reads all the trees recursively, its benefits need to be evaluated. It
> should to happen only in an operation that is already heavy-weight, is
> likely to have read all the trees and have many of them in-core cache, and
> also relatively rarely happens compared to "git add" so that the cost can
> be amortised over time, such as "reset --(hard|mixed)".
It's tradeoff. As you said prime_cache_tree() is expensive.
cache_tree_update is supposed to be cheap. But cache_tree_update() when
empty is even more expensive than prime_cache_tree(). So it boils down
how much cache-tree we have after unpack_trees() and whether it's worth
repopulate cache-tree again.
> Switching branches is likely to fall into that category, but that is just
> my gut feeling. I would feel better at night if somebody did a benchmark
> ;-)
I timed prime_cache_tree() and cache_tree_update() while switching branch
on linux-2.6, between v2.6.32 and a quite recent version. prime_cache_tree()
took ~55ms while cache_tree_update() 150ms or 90ms (depending on final tree).
It depends on your view, whether 55ms is expensive on such a reasonably large
repository. I took several seconds for me to complete the checkout though.
Checking out with "-q" prime_cache_tree() took 145ms and 80ms respectively,
as expensive as cache_tree_update()
If cache-tree is only used at commit time, I think we could delay
prime_cache_tree() until absolutely needed. We could add an optional index
extension recording the fact that index matches certain tree.
On the first cache_tree_invalidate_path(), if cache-tree is still
empty, we prime cache-tree, then invalidate the requested path.
It would then add no cost to a quick branch switch.
But if diff-cached also takes advantage of cache-tree, it's a different story.
Anyway, I think this patch does better than my last one
-- 8< --
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 6b9061f..e7eaeec 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -387,6 +387,7 @@ static int merge_working_tree(struct checkout_opts *opts,
int ret;
struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
int newfd = hold_locked_index(lock_file, 1);
+ int head_index_mismatch = 1;
if (read_cache_preload(NULL) < 0)
return error(_("corrupt index file"));
@@ -396,6 +397,7 @@ static int merge_working_tree(struct checkout_opts *opts,
ret = reset_tree(new->commit->tree, opts, 1);
if (ret)
return ret;
+ head_index_mismatch = 0;
} else {
struct tree_desc trees[2];
struct tree *tree;
@@ -490,7 +492,27 @@ static int merge_working_tree(struct checkout_opts *opts,
ret = reset_tree(new->commit->tree, opts, 0);
if (ret)
return ret;
- }
+ } else
+ head_index_mismatch = topts.head_index_mismatch;
+ }
+
+ /*
+ * Currently cache-tree is always destroyed after
+ * unpack_trees(). It's probably a good idea to repopulate
+ * cache-tree. If the user makes a few modifications and
+ * commits, tree generation woulda be cheap. If they switch
+ * away again, not so cheap.
+ *
+ * When unpack_trees() learns to retains as much cache-tree as
+ * possible, this code probably does not help much on tree
+ * generation, unless the tree difference between to heads are
+ * too far, little cache-tree can be kept.
+ */
+ if (!head_index_mismatch &&
+ !cache_tree_fully_valid(active_cache_tree)) {
+ if (!new->commit->tree->object.parsed)
+ parse_tree(new->commit->tree);
+ prime_cache_tree(&active_cache_tree, new->commit->tree);
}
if (write_cache(newfd, active_cache, active_nr) ||
diff --git a/unpack-trees.c b/unpack-trees.c
index 7c9ecf6..f2c518f 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1022,6 +1022,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
o->result.timestamp.nsec = o->src_index->timestamp.nsec;
o->merge_size = len;
mark_all_ce_unused(o->src_index);
+ if (o->fn != twoway_merge)
+ o->head_index_mismatch = 1;
/*
* Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries
@@ -1736,6 +1738,8 @@ int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o)
(oldtree && newtree &&
!same(oldtree, newtree) && /* 18 and 19 */
same(current, newtree))) {
+ if (!newtree || (newtree && !same(current, newtree)))
+ o->head_index_mismatch = 1;
return keep_entry(current, o);
}
else if (oldtree && !newtree && same(current, oldtree)) {
diff --git a/unpack-trees.h b/unpack-trees.h
index 5e432f5..b75b64e 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -48,7 +48,8 @@ struct unpack_trees_options {
gently,
exiting_early,
show_all_errors,
- dry_run;
+ dry_run,
+ head_index_mismatch;
const char *prefix;
int cache_bottom;
struct dir_struct *dir;
-- 8< --
next prev parent reply other threads:[~2012-02-22 10:32 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-10 9:42 git status: small difference between stating whole repository and small subdirectory Piotr Krukowiecki
2012-02-10 12:33 ` Nguyen Thai Ngoc Duy
2012-02-10 13:46 ` Piotr Krukowiecki
2012-02-10 14:37 ` Nguyen Thai Ngoc Duy
2012-02-13 16:54 ` Piotr Krukowiecki
2012-02-10 16:18 ` Piotr Krukowiecki
2012-02-14 11:34 ` Thomas Rast
2012-02-15 8:57 ` Piotr Krukowiecki
2012-02-15 11:01 ` Nguyen Thai Ngoc Duy
2012-02-15 15:14 ` Piotr Krukowiecki
2012-02-16 13:22 ` Piotr Krukowiecki
2012-02-15 19:03 ` Jeff King
2012-02-16 13:37 ` Piotr Krukowiecki
2012-02-16 14:05 ` Thomas Rast
2012-02-16 20:15 ` Junio C Hamano
2012-02-17 16:55 ` Piotr Krukowiecki
2012-02-16 19:20 ` Jeff King
2012-02-17 17:19 ` Piotr Krukowiecki
2012-02-17 20:37 ` Jeff King
2012-02-17 22:25 ` Junio C Hamano
2012-02-17 22:29 ` Jeff King
2012-02-20 8:25 ` Piotr Krukowiecki
2012-02-20 14:06 ` Jeff King
2012-02-20 14:09 ` Thomas Rast
2012-02-20 14:36 ` Nguyen Thai Ngoc Duy
2012-02-20 14:39 ` Jeff King
2012-02-20 15:11 ` Jeff King
2012-02-20 18:45 ` Thomas Rast
2012-02-20 20:35 ` Jeff King
2012-02-20 22:04 ` Junio C Hamano
2012-02-20 22:41 ` Jeff King
2012-02-20 23:31 ` Junio C Hamano
2012-02-21 7:21 ` Piotr Krukowiecki
2012-02-20 20:08 ` Junio C Hamano
2012-02-20 20:17 ` Jeff King
2012-02-21 14:45 ` Nguyen Thai Ngoc Duy
2012-02-21 19:16 ` Junio C Hamano
2012-02-22 2:12 ` Nguyen Thai Ngoc Duy
2012-02-22 2:55 ` Junio C Hamano
2012-02-22 12:54 ` Nguyen Thai Ngoc Duy
2012-02-22 13:17 ` Thomas Rast
2012-02-22 10:34 ` Nguyen Thai Ngoc Duy [this message]
2012-02-22 3:32 ` Junio C Hamano
2012-04-10 15:16 ` Piotr Krukowiecki
2012-04-10 16:23 ` Junio C Hamano
2012-04-10 18:00 ` Jeff King
2012-02-20 19:57 ` Junio C Hamano
2012-02-20 19:59 ` Thomas Rast
2012-02-20 14:16 ` Nguyen Thai Ngoc Duy
2012-02-20 14:22 ` Jeff King
2012-02-20 19:56 ` Junio C Hamano
2012-02-20 20:09 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120222103418.GA27199@tre \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=piotr.krukowiecki@gmail.com \
--cc=trast@inf.ethz.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.