From: "Ben Peart" <peartben@gmail.com>
To: "'Junio C Hamano'" <gitster@pobox.com>
Cc: <git@vger.kernel.org>, <pclouds@gmail.com>,
"'Ben Peart'" <benpeart@microsoft.com>
Subject: RE: [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout
Date: Mon, 12 Sep 2016 14:12:07 -0400 [thread overview]
Message-ID: <13ef001d20d21$2d2ea840$878bf8c0$@gmail.com> (raw)
In-Reply-To: <xmqq1t0sagcm.fsf@gitster.mtv.corp.google.com>
> -----Original Message-----
> From: Junio C Hamano [mailto:gitster@pobox.com]
> Sent: Friday, September 9, 2016 5:55 PM
> To: Ben Peart <peartben@gmail.com>
> Cc: git@vger.kernel.org; pclouds@gmail.com; Ben Peart
> <benpeart@microsoft.com>
> Subject: Re: [PATCH v2] checkout: eliminate unnecessary merge for trivial
> checkout
>
> Ben Peart <peartben@gmail.com> writes:
>
> > @@ -802,6 +806,87 @@ static void orphaned_commit_warning(struct
> commit *old, struct commit *new)
> > free(refs.objects);
> > }
> >
> > +static int needs_working_tree_merge(const struct checkout_opts *opts,
> > + const struct branch_info *old,
> > + const struct branch_info *new)
> > +{
> > + /*
> > + * We must do the merge if we are actually moving to a new
> > + * commit tree.
> > + */
> > + if (!old->commit || !new->commit ||
> > + oidcmp(&old->commit->tree->object.oid, &new->commit-
> >tree->object.oid))
> > + return 1;
>
> A huge helper function helps it somewhat, compared with the earlier
> unreadable mess ;-).
>
> Are we certain that at this point the commit objects are both parsed and
> their tree->object.oid are both valid?
>
> > + /*
> > + * Honor the explicit request for a three-way merge or to throw away
> > + * local changes
> > + */
> > + if (opts->merge || opts->force)
> > + return 1;
>
> Hmph, "git checkout -m HEAD" wouldn't have to do anything wrt the index
> status, no?
>
> For that matter, neither "git checkout -f HEAD". Unless we rely on
> unpack_trees() to write over the working tree files.
>
> ... me goes and looks, and finds that merge_working_tree()
> indeed does have a logic to do quite different thing when
> "--force" is given.
>
> This makes me wonder if the "merge_working_tree() is expensive, so
> selectively skip calling it" approach is working at a wrong level.
> Wouldn't the merge_working_tree() function itself a better place to do
this
> kind of "we may not have to do the full two-way merge"
> optimization? It already looks at opts and does things differently (e.g.
when
> running with "--force", it does not even call unpack).
> If you can optimize even more by looking at other fields in opts to avoid
> unpack, that would fit better with the structure of the code that we
already
> have.
>
I completely agree that optimizing within merge_working_tree would provide
more opportunities for optimization. I can certainly move the test into
that
function as a first step. I've looked into it a little but came to the
conclusion
that it will be non-trivial to determine how to ensure the minimal work is
done for any arbitrary set of options passed in without breaking something.
While I'd love to see that work done, I just don't have the time to pursue
further
optimizations that may be available at this point in time. There are other
things
(like speeding up status on large repos) I need to work on first.
> > + /*
> > + * Checking out the requested commit may require updating the
> working
> > + * directory and index, let the merge handle it.
> > + */
> > + if (opts->force_detach)
> > + return 1;
>
> This does not make much sense to me. After "git branch -f foo HEAD",
there
> is no difference in what is done to the index and the working directory
> between "git checkout --detach HEAD" and "git checkout foo", is there?
>
I'm attempting to optimize for a single, common path where checkout is
just creating a new branch (ie "git checkout -b foo") to minimize the
possibility that I broke some other path I didn't fully understand.
It is quite possible that there are cases where the index and/or working
directory do not need to be updated or where a merge won't actually
change anything that this test is not optimized for. Perhaps I should
emphasize the "*may* require updating the working directory" in my
comment. Because it *could* happen, I let the code fall back to the
old behavior.
> > + /*
> > + * opts->writeout_stage cannot be used with switching branches so is
> > + * not tested here
> > + */
> > +
> > + /*
> > + * Honor the explicit ignore requests
> > + */
> > + if (!opts->overwrite_ignore || opts->ignore_skipworktree
> > + || opts->ignore_other_worktrees)
> > + return 1;
>
> Style. I think you earlier had
>
> if (a || b ||
> c)
>
> and here you are doing
>
> if (a || b
> || c)
>
> Please pick one and stick to it (I'd pick the former).
Done
>
> > + /*
> > + * If we're not creating a new branch, by definition we're changing
> > + * the existing one so need to do the merge
> > + */
> > + if (!opts->new_branch)
> > + return 1;
>
> Sorry, but I fail to follow that line of thought. Starting from a state
where
> your HEAD points at commit A,
>
> - switching to a detached HEAD pointing at a commit A,
> - switching to an existing branch that already points at the same
> commit A, and
> - force updating an existing branch that was pointing at something
> else to point at the same commit A,
>
> would have the same effect as creating a new branch at commit A and
> switching to it, no? The same comment applies to the remainder of this
> function.
>
> More importantly, merge_working_tree() checks things other than what this
> function is checking. For example, it prevents you from branch-switching
> (whether it is to switch to an existing branch that has the same commit as
the
> current HEAD, to switch to detached HEAD state at the same commit as the
> current HEAD, or to switch to a new branch that points at the same commit
> as the current HEAD) if your index is unmerged (i.e. you are in the middle
of
> a mergy operation).
>
> So my gut feeling is that this:
>
> > + /*
> > + * Optimize the performance of "git checkout foo" by skipping the
call
> > + * to merge_working_tree where possible.
> > + */
> > + if (needs_working_tree_merge(opts, &old, new)) {
> > + ret = merge_working_tree(opts, &old, new,
> &writeout_error);
>
> works at the wrong level. The comment up to 'Optimize the performance of
> "git checkout foo"' may correctly state what we want to achieve, but I
think
> we should do so not with "by skipping the call to", but with "by
optimizing
> merge_working_tree()".
>
I agree that optimizing merge_working_tree could result in even greater
savings and could definitely optimize for more paths/options than this
patch. While I'd love to see that done, I'm also happy to get a 10x
improvement in the common case of creating a new branch.
I'll reroll the patch moving the current optimization into
merge_working_tree and fixing up the style issues you pointed out.
> Thanks.
>
next prev parent reply other threads:[~2016-09-12 18:12 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-09 19:25 [PATCH v2] checkout: eliminate unnecessary merge for trivial checkout Ben Peart
2016-09-09 21:55 ` Junio C Hamano
2016-09-12 18:12 ` Ben Peart [this message]
2016-09-12 20:31 ` Junio C Hamano
2016-09-13 12:33 ` Ben Peart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='13ef001d20d21$2d2ea840$878bf8c0$@gmail.com' \
--to=peartben@gmail.com \
--cc=benpeart@microsoft.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.