git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
@ 2016-09-28 17:02 Ben Peart
  2016-09-28 17:52 ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Peart @ 2016-09-28 17:02 UTC (permalink / raw)
  To: git
  Cc: Ben Peart, pclouds, Jeff Hostetler, philipoakley,
	'Junio C Hamano'

Resending

> -----Original Message-----
> From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On
> Behalf Of Philip Oakley
> Sent: Saturday, September 24, 2016 3:31 PM
> To: Junio C Hamano <gitster@pobox.com>
> Cc: Ben Peart <Ben.Peart@microsoft.com>; pclouds@gmail.com;
> git@vger.kernel.org
> Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial
> checkout
> 
> Hi Junio,
> 
> From: "Junio C Hamano" <gitster@pobox.com>
> > "Philip Oakley" <philipoakley@iee.org> writes:
> >
> >>> > >"git checkout -b foo" (without -f -m or <start_point>) is defined
> >>> > >in the manual as being a shortcut for/equivalent to:
> >>> > >
> >>> > >        (1a) "git branch foo"
> >>> > >        (1b) "git checkout foo"
> >>> > >
> >>> > >However, it has been our experience in our observed use cases and
> >>> > >all the existing git tests, that it can be treated as equivalent
to:
> >>> > >
> >>> > >        (2a) "git branch foo"
> >>> > >        (2b) "git symbolic-ref HEAD refs/heads/foo"
> >>> > >...
> >>> > >
> >>> > I am still not sure if I like the change of what "checkout -b" is
> >>> > this late in the game, though.
> >>>
> >>> ...
> >>> That said, you're much more on the frontline of receiving negative
> >>> feedback about doing that than I am. :)  How would you like to
> >>> proceed?
> >>
> >> I didn't see an initial confirmation as to what the issue really was.
> >> You indicated the symptom ('a long checkout time'), but then we
> >> missed out on hard facts and example repos, so that the issue was
> >> replicable.
> >
> > I took it as a given, trivial and obvious optimization opportunity,
> > that it is wasteful having to traverse two trees to consolidate and
> > reflect their differences into the working tree when we know upfront
> > that these two trees are identical, no matter what the overhead for
> > doing so is.
> 
> I agree, and I believe Ben agrees.
> 

Correct.  In my original patch request I put more specific information on 
the impact this optimization has in our specific case (reducing the cost 
from 166 seconds to 16 seconds).

> >
> >> At the moment there is the simple workaround of an alias that
> >> executes that two step command dance to achieve what you needed, and
> >> Junio has outlined the issues he needed to be covered from his
> >> maintainer perspective (e.g. the detection of sparse checkouts).
> >> Confirming the root causes would help in setting a baseline.
> >>
> >> I hope that is of help - I'd seen that the discussion had gone quiet.
> >
> > Some of the problems I have are:
> >
> > (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0"
> >     and "git checkout HEAD" (no other parameters to any of them)
> >     ought to give identical index and working tree.  It is too
> >     confusing to leave subtly different results that will lead to
> >     hard to diagnose bugs for only one of them.
> >
> > (2) The proposed log message talks only about "performance
> >     optimization",
> 
> >                                while the purpose of the change is more
> > about
> >     changing the definition
> 
> Here I think is the misunderstanding. His purpose is NOT to change the
> definition (IIUC). As I read the message you reference below (and Ben's
other
> messages), I understood that he was trying to achieve what you said (i.e.
> optimise the trivial and obvious opportunity) of selecting for the common
> case (underlying conditions) where the two command sequences are
> identical. If the selected case / conditions is not identical then it is
defined
> wrongly...
> 
> I suspect that it was Ben's 'soft' explanation that allowed the discussion
to
> diverge.
> 

I'm unaccustomed to doing reviews like this via email so have been 
struggling with how to most effectively communicate about the proposed
change.  I appreciate any help and understanding as I go through this
for the first time.

My intention was not to change the users expected results which
I believe are to "create a new branch and switch to it."  We reinforce
that expectation with the output of the command which completes 
with the text "Switched to a new branch 'foo'"

> 
> >                                                 of what "git checkout -b
> > NEW" is from
> >     "git branch NEW && git checkout NEW" to "git branch NEW && git
> >     symbolic-ref HEAD refs/heads/NEW".  The explanation in a Ben's
> >     later message <007401d21278$445eba80$cd1c2f80$@gmail.com> does
> >     a much better job contrasting the two.
> >
> > (3) I identified only one difference as an example sufficient to
> >     point out why the patch provided is not a pure optimization but
> >     behaviour change.  Fixing that example alone to avoid change in
> >     the behaviour is trivial (see if the "info/sparse-checkout"
> >     file is present and refrain from skipping the proper checkout),
> 
> This is probably the point Ben needs to take on board to narrow the
> conditions down. There may be others.
> 

The fact that "git checkout -b NEW" updates the index and as a
result reflects any changes in the sparse-checkout and the issue 
Junio pointed out earlier about not calling show_local_changes 
at the end of merge_working_tree are the only difference in behavior
I am aware of.  Both of these are easily rectified.

That said, given we are skipping huge amounts of work by no longer 
merging the commit trees, generating a new index, and merging the 
local modifications in the working tree, it is possible that there are
other behavior changes I'm just not aware of.

> >     but a much larger problem is that I do not know (and Ben does
> >     not, I suspect) know what other behaviour changes the patch is
> >     introducing, and worse, the checks are sufficiently dense too
> >     detailed and intimate to the implementation of unpack_trees()
> >     that it is impossible for anybody to make sure the exceptions
> >     defined in this patch and updates to other parts of the system
> >     will be kept in sync.
> 
> I did not believe he was proposing such a change to behaviour, hence his
> difficulty in responding (or at least that is my perception). I.e. he was
> digging a hole in the wrong place.
> 
> It is possible that he had accidentally introduced a behavious change, and
> having failed to explictly say "This patch (should) produces no behavious
> change", which then continued to re-inforce the misunderstanding.
> 
> >
> > So my inclination at this point, unless we see somebody invents a
> > clever way to solve (3), is that any change that violates (1),
> > i.e. as long as the patch does "Are we doing '-b NEW'?  Then we do
> > something subtly different", is not acceptable, and solving (3) in a
> > maintainable way smells like quite a hard thing to do.  But it would
> > be ideal if (3) is solved cleanly, as we will then not have to worry
> > about changing behaviour at all and can apply the optimization for
> > all of the four cases equally.  As a side effect, that approach
> > would solve problem (2) above.
> >
> > If we were to punt on keeping the sanity (1) and introduce a subtly
> > different "create a new branch and point the HEAD at it", an easier
> > way out may be be one of
> >
> > 1. a totally new command, e.g. "git branch-switch NEW" that takes
> >    only a single argument and no other "checkout" options, or
> >
> > 2. a new option to "git checkout" that takes _ONLY_ a single
> >    argument and incompatible with any other option or command line
> >    argument, or
> >
> > 3. an alias that does "git branch" followed by "git symbolic-ref".
> >
> > Neither of the first two sounds palatable, though.
> 
> It will need Ben to come back and clarify, if he did, or did not, want any
> behaviour change (beyond speed of action;-)
> 

There is a subtlety here in what is meant by "any behavior change."
I did not want to change the users expectations of what this command
is used for.  The only noticeable behavior change should only be that it 
sped up by an order of magnitude.  

To get that speed up, there is a change in behavior from git's 
perspective as it is no longer doing a bunch of work it used to do 
which is what is saving the time.

I was aware that skipping the commit merge/new index/merge working 
tree meant that "git checkout NEW" would no longer update these to 
reflect any potential changes to the sparse-checkout file.  

To determine if this would change the results the user was *expecting*,  
I searched the web and found that all the instructions I could locate
that taught people how to update the index/working tree after
making changes to the sparse-checkout file instructed them to use
"git read-tree -mu HEAD."  I didn't find any that told people to use
"git checkout -b NEW"  

Finally, when I made the optimization to skip these steps I then
verified that the test suite still passed all tests.  I realize that 
there is not 100% coverage of tests but I thought it was a good
indication that none of them were impacted by this optimization.

I've tried to think of a way to solve (3) in a more maintainable way 
but have not been able to come up with anything.  Ultimately,
to ensure are only applying the optimization in this specific case,
we have to test to make sure other options don't require the extra
steps.  I'm open to suggestions!

I'm going to be out for the next 2 weeks so will be unable to respond 
to activity on the thread but a co-worker who has been involved will
be responsive to feedback and rolling any new versions of the patch.

Thanks,

Ben




^ permalink raw reply	[flat|nested] 13+ messages in thread
* [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout
@ 2016-09-13 14:26 Ben Peart
  2016-09-13 22:34 ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Peart @ 2016-09-13 14:26 UTC (permalink / raw)
  To: git; +Cc: gitster, pclouds, peartben, Ben Peart

Teach git to avoid unnecessary merge during trivial checkout.  When
running 'git checkout -b foo' git follows a common code path through
the expensive merge_working_tree even when it is unnecessary.  As a
result, 95% of the time is spent in merge_working_tree doing the 2-way
merge between the new and old commit trees that is unneeded.

The time breakdown is as follows:

    merge_working_tree <-- 95%
        unpack_trees <-- 80%
            traverse_trees <-- 50%
            cache_tree_update <-- 17%
            mark_new_skip_worktree <-- 10%

With a large repo, this cost is pronounced.  Using "git checkout -b r"
to create and switch to a new branch costs 166 seconds (all times worst
case with a cold file system cache).

git.c:406               trace: built-in: git 'checkout' '-b' 'r'
read-cache.c:1667       performance: 17.442926555 s: read_index_from
name-hash.c:128         performance: 2.912145231 s: lazy_init_name_hash
read-cache.c:2208       performance: 4.387713335 s: write_locked_index
trace.c:420             performance: 166.458921289 s: git command:
                                        'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 'r'
Switched to a new branch 'r'

By adding a test to skip the unnecessary call to merge_working_tree in
this case reduces the cost to 16 seconds.

git.c:406               trace: built-in: git 'checkout' '-b' 's'
read-cache.c:1667       performance: 16.100742476 s: read_index_from
trace.c:420             performance: 16.461547867 s: git command: 'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 's'
Switched to a new branch 's'

Signed-off-by: Ben Peart <benpeart@microsoft.com>
---
 builtin/checkout.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8672d07..8b2f428 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -38,6 +38,10 @@ struct checkout_opts {
 	int ignore_skipworktree;
 	int ignore_other_worktrees;
 	int show_progress;
+	/*
+	 * If new checkout options are added, needs_working_tree_merge
+	 * should be updated accordingly.
+	 */
 
 	const char *new_branch;
 	const char *new_branch_force;
@@ -460,11 +464,99 @@ static void setup_branch_path(struct branch_info *branch)
 	branch->path = strbuf_detach(&buf, NULL);
 }
 
+static int needs_working_tree_merge(const struct checkout_opts *opts,
+	const struct branch_info *old,
+	const struct branch_info *new)
+{
+	/*
+	 * We must do the merge if we are actually moving to a new
+	 * commit tree.
+	 */
+	if (!old->commit || !new->commit ||
+		oidcmp(&old->commit->tree->object.oid, &new->commit->tree->object.oid))
+		return 1;
+
+	/*
+	 * opts->patch_mode cannot be used with switching branches so is
+	 * not tested here
+	 */
+
+	/*
+	 * opts->quiet only impacts output so doesn't require a merge
+	 */
+
+	/*
+	 * Honor the explicit request for a three-way merge or to throw away
+	 * local changes
+	 */
+	if (opts->merge || opts->force)
+		return 1;
+
+	/*
+	 * Checking out the requested commit may require updating the working
+	 * directory and index, let the merge handle it.
+	 */
+	if (opts->force_detach)
+		return 1;
+
+	/*
+	 * opts->writeout_stage cannot be used with switching branches so is
+	 * not tested here
+	 */
+
+	/*
+	 * Honor the explicit ignore requests
+	 */
+	if (!opts->overwrite_ignore || opts->ignore_skipworktree ||
+		opts->ignore_other_worktrees)
+		return 1;
+
+	/*
+	 * opts->show_progress only impacts output so doesn't require a merge
+	 */
+
+	/*
+	 * If we're not creating a new branch, by definition we're changing
+	 * the existing one so need to do the merge
+	 */
+	if (!opts->new_branch)
+		return 1;
+
+	/*
+	 * new_branch_force is defined to "create/reset and checkout a branch"
+	 * so needs to go through the merge to do the reset
+	 */
+	if (opts->new_branch_force)
+		return 1;
+
+	/*
+	 * A new orphaned branch requrires the index and the working tree to be
+	 * adjusted to <start_point>
+	 */
+	if (opts->new_orphan_branch)
+		return 1;
+
+	/*
+	 * Remaining variables are not checkout options but used to track state
+	 * that doesn't trigger the need for a merge.
+	 */
+
+	return 0;
+}
+
 static int merge_working_tree(const struct checkout_opts *opts,
 			      struct branch_info *old,
 			      struct branch_info *new,
 			      int *writeout_error)
 {
+	/*
+	 * Optimize the performance of "git checkout -b foo" by avoiding
+	 * the expensive merge, index and working directory updates if they
+	 * are not needed.
+	 */
+	if (!needs_working_tree_merge(opts, old, new))
+		return 0;
+
 	int ret;
 	struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
 
-- 
2.10.0.windows.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-09-28 17:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-28 17:02 [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout Ben Peart
2016-09-28 17:52 ` Junio C Hamano
  -- strict thread matches above, loose matches on Subject: below --
2016-09-13 14:26 Ben Peart
2016-09-13 22:34 ` Junio C Hamano
2016-09-14  6:30   ` Oleg Taranenko
2016-09-14 15:48     ` Junio C Hamano
     [not found]   ` <BL2PR03MB3232D3128A72D4EC9ADC2C6F4F10@BL2PR03MB323.namprd03.prod.outlook.com>
     [not found]     ` <BL2PR03MB323E1B2F810C63CB01AA234F4F30@BL2PR03MB323.namprd03.prod.outlook.com>
2016-09-19 13:18       ` Ben Peart
2016-09-19 16:30         ` Junio C Hamano
2016-09-19 17:03           ` Junio C Hamano
2016-09-21 18:32             ` Ben Peart
2016-09-24 14:28               ` Philip Oakley
2016-09-24 18:26                 ` Junio C Hamano
2016-09-24 19:31                   ` Philip Oakley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).