git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout
@ 2016-09-09 19:25 Ben Peart
  2016-09-09 21:55 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Peart @ 2016-09-09 19:25 UTC (permalink / raw)
  To: git; +Cc: gitster, pclouds, peartben, Ben Peart

Teach git to avoid unnecessary merge during trivial checkout.  When
running 'git checkout -b foo' git follows a common code path through
the expensive merge_working_tree even when it is unnecessary.  As a
result, 95% of the time is spent in merge_working_tree doing the 2-way
merge between the new and old commit trees that is unneeded.

The time breakdown is as follows:

    merge_working_tree <-- 95%
        unpack_trees <-- 80%
            traverse_trees <-- 50%
            cache_tree_update <-- 17%
            mark_new_skip_worktree <-- 10%

With a large repo, this cost is pronounced.  Using "git checkout -b r"
to create and switch to a new branch costs 166 seconds (all times worst
case with a cold file system cache).

git.c:406               trace: built-in: git 'checkout' '-b' 'r'
read-cache.c:1667       performance: 17.442926555 s: read_index_from
name-hash.c:128         performance: 2.912145231 s: lazy_init_name_hash
read-cache.c:2208       performance: 4.387713335 s: write_locked_index
trace.c:420             performance: 166.458921289 s: git command:
                                        'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 'r'
Switched to a new branch 'r'

By adding a test to skip the unnecessary call to merge_working_tree in
this case reduces the cost to 16 seconds.

git.c:406               trace: built-in: git 'checkout' '-b' 's'
read-cache.c:1667       performance: 16.100742476 s: read_index_from
trace.c:420             performance: 16.461547867 s: git command: 'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 's'
Switched to a new branch 's'

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 builtin/checkout.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 95 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8672d07..4396cb3 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -38,6 +38,10 @@ struct checkout_opts {
 	int ignore_skipworktree;
 	int ignore_other_worktrees;
 	int show_progress;
+	/*
+	 * If new checkout options are added, needs_working_tree_merge
+	 * should be updated accordingly.
+	 */
 
 	const char *new_branch;
 	const char *new_branch_force;
@@ -802,6 +806,87 @@ static void orphaned_commit_warning(struct commit *old, struct commit *new)
 	free(refs.objects);
 }
 
+static int needs_working_tree_merge(const struct checkout_opts *opts,
+	const struct branch_info *old,
+	const struct branch_info *new)
+{
+	/*
+	 * We must do the merge if we are actually moving to a new
+	 * commit tree.
+	 */
+	if (!old->commit || !new->commit ||
+		oidcmp(&old->commit->tree->object.oid, &new->commit->tree->object.oid))
+		return 1;
+
+	/*
+	 * opts->patch_mode cannot be used with switching branches so is
+	 * not tested here
+	 */
+
+	/*
+	 * opts->quiet only impacts output so doesn't require a merge
+	 */
+
+	/*
+	 * Honor the explicit request for a three-way merge or to throw away
+	 * local changes
+	 */
+	if (opts->merge || opts->force)
+		return 1;
+
+	/*
+	 * Checking out the requested commit may require updating the working
+	 * directory and index, let the merge handle it.
+	 */
+	if (opts->force_detach)
+		return 1;
+
+	/*
+	 * opts->writeout_stage cannot be used with switching branches so is
+	 * not tested here
+	 */
+
+	 /*
+	  * Honor the explicit ignore requests
+	  */
+	if (!opts->overwrite_ignore || opts->ignore_skipworktree
+		|| opts->ignore_other_worktrees)
+		return 1;
+
+	 /*
+	  * opts->show_progress only impacts output so doesn't require a merge
+	  */
+
+	 /*
+	 * If we're not creating a new branch, by definition we're changing
+	 * the existing one so need to do the merge
+	 */
+	if (!opts->new_branch)
+		return 1;
+
+	/*
+	 * new_branch_force is defined to "create/reset and checkout a branch"
+	 * so needs to go through the merge to do the reset
+	 */
+	if (opts->new_branch_force)
+		return 1;
+
+	/*
+	 * A new orphaned branch requrires the index and the working tree to be
+	 * adjusted to <start_point>
+	 */
+	if (opts->new_orphan_branch)
+		return 1;
+
+	/*
+	 * Remaining variables are not checkout options but used to track state
+	 * that doesn't trigger the need for a merge.
+	 */
+
+	return 0;
+}
+
+
 static int switch_branches(const struct checkout_opts *opts,
 			   struct branch_info *new)
 {
@@ -827,10 +912,16 @@ static int switch_branches(const struct checkout_opts *opts,
 		parse_commit_or_die(new->commit);
 	}
 
-	ret = merge_working_tree(opts, &old, new, &writeout_error);
-	if (ret) {
-		free(path_to_free);
-		return ret;
+	/*
+	 * Optimize the performance of "git checkout foo" by skipping the call
+	 * to merge_working_tree where possible.
+	 */
+	if (needs_working_tree_merge(opts, &old, new)) {
+		ret = merge_working_tree(opts, &old, new, &writeout_error);
+		if (ret) {
+			free(path_to_free);
+			return ret;
+		}
 	}
 
 	if (!opts->quiet && !old.path && old.commit && new->commit != old.commit)
-- 
2.10.0.windows.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-09 19:25 [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout Ben Peart
@ 2016-09-09 21:55 ` Junio C Hamano
  2016-09-12 18:12   ` Ben Peart
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2016-09-09 21:55 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, pclouds, Ben Peart

Ben Peart <peartben@gmail.com> writes:

> @@ -802,6 +806,87 @@ static void orphaned_commit_warning(struct commit *old, struct commit *new)
>  	free(refs.objects);
>  }
>  
> +static int needs_working_tree_merge(const struct checkout_opts *opts,
> +	const struct branch_info *old,
> +	const struct branch_info *new)
> +{
> +	/*
> +	 * We must do the merge if we are actually moving to a new
> +	 * commit tree.
> +	 */
> +	if (!old->commit || !new->commit ||
> +		oidcmp(&old->commit->tree->object.oid, &new->commit->tree->object.oid))
> +		return 1;

A huge helper function helps it somewhat, compared with the earlier
unreadable mess ;-).

Are we certain that at this point the commit objects are both parsed
and their tree->object.oid are both valid?

> +	/*
> +	 * Honor the explicit request for a three-way merge or to throw away
> +	 * local changes
> +	 */
> +	if (opts->merge || opts->force)
> +		return 1;

Hmph, "git checkout -m HEAD" wouldn't have to do anything wrt the
index status, no?

For that matter, neither "git checkout -f HEAD".  Unless we rely on
unpack_trees() to write over the working tree files.

    ... me goes and looks, and finds that merge_working_tree()
    indeed does have a logic to do quite different thing when
    "--force" is given.

This makes me wonder if the "merge_working_tree() is expensive, so
selectively skip calling it" approach is working at a wrong level.
Wouldn't the merge_working_tree() function itself a better place to
do this kind of "we may not have to do the full two-way merge"
optimization?  It already looks at opts and does things differently
(e.g. when running with "--force", it does not even call unpack).
If you can optimize even more by looking at other fields in opts to
avoid unpack, that would fit better with the structure of the code
that we already have.

> +	/*
> +	 * Checking out the requested commit may require updating the working
> +	 * directory and index, let the merge handle it.
> +	 */
> +	if (opts->force_detach)
> +		return 1;

This does not make much sense to me.  After "git branch -f foo
HEAD", there is no difference in what is done to the index and the
working directory between "git checkout --detach HEAD" and "git
checkout foo", is there?

> +	/*
> +	 * opts->writeout_stage cannot be used with switching branches so is
> +	 * not tested here
> +	 */
> +
> +	 /*
> +	  * Honor the explicit ignore requests
> +	  */
> +	if (!opts->overwrite_ignore || opts->ignore_skipworktree
> +		|| opts->ignore_other_worktrees)
> +		return 1;

Style.  I think you earlier had

	if (a || b ||
            c)

and here you are doing

	if (a || b
            || c)

Please pick one and stick to it (I'd pick the former).

> +	 /*
> +	 * If we're not creating a new branch, by definition we're changing
> +	 * the existing one so need to do the merge
> +	 */
> +	if (!opts->new_branch)
> +		return 1;

Sorry, but I fail to follow that line of thought.  Starting from a
state where your HEAD points at commit A,

 - switching to a detached HEAD pointing at a commit A,
 - switching to an existing branch that already points at the same
   commit A, and
 - force updating an existing branch that was pointing at something
   else to point at the same commit A,

would have the same effect as creating a new branch at commit A and
switching to it, no?  The same comment applies to the remainder of
this function.

More importantly, merge_working_tree() checks things other than what
this function is checking.  For example, it prevents you from
branch-switching (whether it is to switch to an existing branch that
has the same commit as the current HEAD, to switch to detached HEAD
state at the same commit as the current HEAD, or to switch to a new
branch that points at the same commit as the current HEAD) if your
index is unmerged (i.e. you are in the middle of a mergy operation).

So my gut feeling is that this:

> +	/*
> +	 * Optimize the performance of "git checkout foo" by skipping the call
> +	 * to merge_working_tree where possible.
> +	 */
> +	if (needs_working_tree_merge(opts, &old, new)) {
> +		ret = merge_working_tree(opts, &old, new, &writeout_error);

works at the wrong level.  The comment up to 'Optimize the
performance of "git checkout foo"' may correctly state what we want
to achieve, but I think we should do so not with "by skipping the
call to", but with "by optimizing merge_working_tree()".

Thanks.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-09 21:55 ` Junio C Hamano
@ 2016-09-12 18:12   ` Ben Peart
  2016-09-12 20:31     ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Peart @ 2016-09-12 18:12 UTC (permalink / raw)
  To: 'Junio C Hamano'; +Cc: git, pclouds, 'Ben Peart'



> -----Original Message-----
> From: Junio C Hamano [mailto:gitster@pobox.com]
> Sent: Friday, September 9, 2016 5:55 PM
> To: Ben Peart <peartben@gmail.com>
> Cc: git@vger.kernel.org; pclouds@gmail.com; Ben Peart
> <benpeart@microsoft.com>
> Subject: Re: [PATCH v2] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> Ben Peart <peartben@gmail.com> writes:
> 
> > @@ -802,6 +806,87 @@ static void orphaned_commit_warning(struct
> commit *old, struct commit *new)
> >  	free(refs.objects);
> >  }
> >
> > +static int needs_working_tree_merge(const struct checkout_opts *opts,
> > +	const struct branch_info *old,
> > +	const struct branch_info *new)
> > +{
> > +	/*
> > +	 * We must do the merge if we are actually moving to a new
> > +	 * commit tree.
> > +	 */
> > +	if (!old->commit || !new->commit ||
> > +		oidcmp(&old->commit->tree->object.oid, &new->commit-
> >tree->object.oid))
> > +		return 1;
> 
> A huge helper function helps it somewhat, compared with the earlier
> unreadable mess ;-).
> 
> Are we certain that at this point the commit objects are both parsed and
> their tree->object.oid are both valid?
> 
> > +	/*
> > +	 * Honor the explicit request for a three-way merge or to throw away
> > +	 * local changes
> > +	 */
> > +	if (opts->merge || opts->force)
> > +		return 1;
> 
> Hmph, "git checkout -m HEAD" wouldn't have to do anything wrt the index
> status, no?
> 
> For that matter, neither "git checkout -f HEAD".  Unless we rely on
> unpack_trees() to write over the working tree files.
> 
>     ... me goes and looks, and finds that merge_working_tree()
>     indeed does have a logic to do quite different thing when
>     "--force" is given.
> 
> This makes me wonder if the "merge_working_tree() is expensive, so
> selectively skip calling it" approach is working at a wrong level.
> Wouldn't the merge_working_tree() function itself a better place to do
this
> kind of "we may not have to do the full two-way merge"
> optimization?  It already looks at opts and does things differently (e.g.
when
> running with "--force", it does not even call unpack).
> If you can optimize even more by looking at other fields in opts to avoid
> unpack, that would fit better with the structure of the code that we
already
> have.
> 

I completely agree that optimizing within merge_working_tree would provide 
more opportunities for optimization.  I can certainly move the test into
that 
function as a first step.  I've looked into it a little but came to the
conclusion
that it will be non-trivial to determine how to ensure the minimal work is 
done for any arbitrary set of options passed in without breaking something.


While I'd love to see that work done, I just don't have the time to pursue
further 
optimizations that may be available at this point in time.  There are other
things 
(like speeding up status on large repos) I need to work on first.

> > +	/*
> > +	 * Checking out the requested commit may require updating the
> working
> > +	 * directory and index, let the merge handle it.
> > +	 */
> > +	if (opts->force_detach)
> > +		return 1;
> 
> This does not make much sense to me.  After "git branch -f foo HEAD",
there
> is no difference in what is done to the index and the working directory
> between "git checkout --detach HEAD" and "git checkout foo", is there?
> 

I'm attempting to optimize for a single, common path where checkout is 
just creating a new branch (ie "git checkout -b foo") to minimize the 
possibility that I broke some other path I didn't fully understand.  

It is quite possible that there are cases where the index and/or working
directory do not need to be updated or where a merge won't actually 
change anything that this test is not optimized for.  Perhaps I should 
emphasize the "*may* require updating the working directory" in my 
comment.  Because it *could* happen, I let the code fall back to the
old behavior.

> > +	/*
> > +	 * opts->writeout_stage cannot be used with switching branches so is
> > +	 * not tested here
> > +	 */
> > +
> > +	 /*
> > +	  * Honor the explicit ignore requests
> > +	  */
> > +	if (!opts->overwrite_ignore || opts->ignore_skipworktree
> > +		|| opts->ignore_other_worktrees)
> > +		return 1;
> 
> Style.  I think you earlier had
> 
> 	if (a || b ||
>             c)
> 
> and here you are doing
> 
> 	if (a || b
>             || c)
> 
> Please pick one and stick to it (I'd pick the former).

Done

> 
> > +	 /*
> > +	 * If we're not creating a new branch, by definition we're changing
> > +	 * the existing one so need to do the merge
> > +	 */
> > +	if (!opts->new_branch)
> > +		return 1;
> 
> Sorry, but I fail to follow that line of thought.  Starting from a state
where
> your HEAD points at commit A,
> 
>  - switching to a detached HEAD pointing at a commit A,
>  - switching to an existing branch that already points at the same
>    commit A, and
>  - force updating an existing branch that was pointing at something
>    else to point at the same commit A,
> 
> would have the same effect as creating a new branch at commit A and
> switching to it, no?  The same comment applies to the remainder of this
> function.
> 
> More importantly, merge_working_tree() checks things other than what this
> function is checking.  For example, it prevents you from branch-switching
> (whether it is to switch to an existing branch that has the same commit as
the
> current HEAD, to switch to detached HEAD state at the same commit as the
> current HEAD, or to switch to a new branch that points at the same commit
> as the current HEAD) if your index is unmerged (i.e. you are in the middle
of
> a mergy operation).
> 
> So my gut feeling is that this:
> 
> > +	/*
> > +	 * Optimize the performance of "git checkout foo" by skipping the
call
> > +	 * to merge_working_tree where possible.
> > +	 */
> > +	if (needs_working_tree_merge(opts, &old, new)) {
> > +		ret = merge_working_tree(opts, &old, new,
> &writeout_error);
> 
> works at the wrong level.  The comment up to 'Optimize the performance of
> "git checkout foo"' may correctly state what we want to achieve, but I
think
> we should do so not with "by skipping the call to", but with "by
optimizing
> merge_working_tree()".
> 

I agree that optimizing merge_working_tree could result in even greater 
savings and could definitely optimize for more paths/options than this 
patch. While I'd love to see that done, I'm also happy to get a 10x 
improvement in the common case of creating a new branch.

I'll reroll the patch moving the current optimization into 
merge_working_tree and fixing up the style issues you pointed out.

> Thanks.
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-12 18:12   ` Ben Peart
@ 2016-09-12 20:31     ` Junio C Hamano
  2016-09-13 12:33       ` Ben Peart
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2016-09-12 20:31 UTC (permalink / raw)
  To: Ben Peart; +Cc: git, pclouds, 'Ben Peart'

"Ben Peart" <peartben@gmail.com> writes:

> I completely agree that optimizing within merge_working_tree would provide 
> more opportunities for optimization.  I can certainly move the test into
> that function as a first step.

Note that "optimizing more" was not the primary point of my
response.

Quite honestly, I'd rather see us speed up _ONLY_ obviously correct
and commonly used cases, while leaving most cases that _MAY_ turn
out to be optimizable (if we did careful analysis) unoptimized, and
instead have them handled by generic but known to be correct
codepath, if it means we do NOT to have to spend mental bandwidth to
analyze not-common case--that is a much better tradeoff.

The suggestion to move the check one level down in the callchain was
primarily to avoid the proposed optimization from being overly eager
and ending up skipping necessary parts of what merge_working_tree()
does (e.g. like I suspected in the review that the proposed patch
skips the check for "you have unmerged entries" situation).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout
  2016-09-12 20:31     ` Junio C Hamano
@ 2016-09-13 12:33       ` Ben Peart
  0 siblings, 0 replies; 5+ messages in thread
From: Ben Peart @ 2016-09-13 12:33 UTC (permalink / raw)
  To: 'Junio C Hamano'; +Cc: git, pclouds, 'Ben Peart'

> -----Original Message-----
> From: Junio C Hamano [mailto:gitster@pobox.com]
> Sent: Monday, September 12, 2016 4:32 PM
> To: Ben Peart <peartben@gmail.com>
> Cc: git@vger.kernel.org; pclouds@gmail.com; 'Ben Peart'
> <benpeart@microsoft.com>
> Subject: Re: [PATCH v2] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> "Ben Peart" <peartben@gmail.com> writes:
> 
> > I completely agree that optimizing within merge_working_tree would
> > provide more opportunities for optimization.  I can certainly move the
> > test into that function as a first step.
> 
> Note that "optimizing more" was not the primary point of my response.
> 
> Quite honestly, I'd rather see us speed up _ONLY_ obviously correct and
> commonly used cases, while leaving most cases that _MAY_ turn out to be
> optimizable (if we did careful analysis) unoptimized, and instead have
them
> handled by generic but known to be correct codepath, if it means we do NOT
> to have to spend mental bandwidth to analyze not-common case--that is a
> much better tradeoff.
> 
> The suggestion to move the check one level down in the callchain was
> primarily to avoid the proposed optimization from being overly eager and
> ending up skipping necessary parts of what merge_working_tree() does (e.g.
> like I suspected in the review that the proposed patch skips the check for
> "you have unmerged entries" situation).

The check for unmerged entries makes complete sense when you are about 
to attempt to merge different commit trees and generate an updated index 
and working directory.  This optimization however is trying to skip 
those expensive steps for the specific case of creating a new branch and 
switching to it.  In this narrow (but common) case, all that needs to 
happen is that a new ref is created and HEAD switched to it.  Since 
we're not doing a merge, I don't believe the check is necessary.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-09-13 12:33 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-09 19:25 [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout Ben Peart
2016-09-09 21:55 ` Junio C Hamano
2016-09-12 18:12   ` Ben Peart
2016-09-12 20:31     ` Junio C Hamano
2016-09-13 12:33       ` Ben Peart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).