* [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout @ 2016-09-13 14:26 Ben Peart 2016-09-13 22:34 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Ben Peart @ 2016-09-13 14:26 UTC (permalink / raw) To: git; +Cc: gitster, pclouds, peartben, Ben Peart Teach git to avoid unnecessary merge during trivial checkout. When running 'git checkout -b foo' git follows a common code path through the expensive merge_working_tree even when it is unnecessary. As a result, 95% of the time is spent in merge_working_tree doing the 2-way merge between the new and old commit trees that is unneeded. The time breakdown is as follows: merge_working_tree <-- 95% unpack_trees <-- 80% traverse_trees <-- 50% cache_tree_update <-- 17% mark_new_skip_worktree <-- 10% With a large repo, this cost is pronounced. Using "git checkout -b r" to create and switch to a new branch costs 166 seconds (all times worst case with a cold file system cache). git.c:406 trace: built-in: git 'checkout' '-b' 'r' read-cache.c:1667 performance: 17.442926555 s: read_index_from name-hash.c:128 performance: 2.912145231 s: lazy_init_name_hash read-cache.c:2208 performance: 4.387713335 s: write_locked_index trace.c:420 performance: 166.458921289 s: git command: 'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 'r' Switched to a new branch 'r' By adding a test to skip the unnecessary call to merge_working_tree in this case reduces the cost to 16 seconds. git.c:406 trace: built-in: git 'checkout' '-b' 's' read-cache.c:1667 performance: 16.100742476 s: read_index_from trace.c:420 performance: 16.461547867 s: git command: 'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 's' Switched to a new branch 's' Signed-off-by: Ben Peart <benpeart@microsoft.com> --- builtin/checkout.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 92 insertions(+) diff --git a/builtin/checkout.c b/builtin/checkout.c index 8672d07..8b2f428 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -38,6 +38,10 @@ struct checkout_opts { int ignore_skipworktree; int ignore_other_worktrees; int show_progress; + /* + * If new checkout options are added, needs_working_tree_merge + * should be updated accordingly. + */ const char *new_branch; const char *new_branch_force; @@ -460,11 +464,99 @@ static void setup_branch_path(struct branch_info *branch) branch->path = strbuf_detach(&buf, NULL); } +static int needs_working_tree_merge(const struct checkout_opts *opts, + const struct branch_info *old, + const struct branch_info *new) +{ + /* + * We must do the merge if we are actually moving to a new + * commit tree. + */ + if (!old->commit || !new->commit || + oidcmp(&old->commit->tree->object.oid, &new->commit->tree->object.oid)) + return 1; + + /* + * opts->patch_mode cannot be used with switching branches so is + * not tested here + */ + + /* + * opts->quiet only impacts output so doesn't require a merge + */ + + /* + * Honor the explicit request for a three-way merge or to throw away + * local changes + */ + if (opts->merge || opts->force) + return 1; + + /* + * Checking out the requested commit may require updating the working + * directory and index, let the merge handle it. + */ + if (opts->force_detach) + return 1; + + /* + * opts->writeout_stage cannot be used with switching branches so is + * not tested here + */ + + /* + * Honor the explicit ignore requests + */ + if (!opts->overwrite_ignore || opts->ignore_skipworktree || + opts->ignore_other_worktrees) + return 1; + + /* + * opts->show_progress only impacts output so doesn't require a merge + */ + + /* + * If we're not creating a new branch, by definition we're changing + * the existing one so need to do the merge + */ + if (!opts->new_branch) + return 1; + + /* + * new_branch_force is defined to "create/reset and checkout a branch" + * so needs to go through the merge to do the reset + */ + if (opts->new_branch_force) + return 1; + + /* + * A new orphaned branch requrires the index and the working tree to be + * adjusted to <start_point> + */ + if (opts->new_orphan_branch) + return 1; + + /* + * Remaining variables are not checkout options but used to track state + * that doesn't trigger the need for a merge. + */ + + return 0; +} + static int merge_working_tree(const struct checkout_opts *opts, struct branch_info *old, struct branch_info *new, int *writeout_error) { + /* + * Optimize the performance of "git checkout -b foo" by avoiding + * the expensive merge, index and working directory updates if they + * are not needed. + */ + if (!needs_working_tree_merge(opts, old, new)) + return 0; + int ret; struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file)); -- 2.10.0.windows.1 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout 2016-09-13 14:26 [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout Ben Peart @ 2016-09-13 22:34 ` Junio C Hamano 2016-09-14 6:30 ` Oleg Taranenko [not found] ` <BL2PR03MB3232D3128A72D4EC9ADC2C6F4F10@BL2PR03MB323.namprd03.prod.outlook.com> 0 siblings, 2 replies; 13+ messages in thread From: Junio C Hamano @ 2016-09-13 22:34 UTC (permalink / raw) To: Ben Peart; +Cc: git, pclouds, Ben Peart Ben Peart <peartben@gmail.com> writes: > +static int needs_working_tree_merge(const struct checkout_opts *opts, > + const struct branch_info *old, > + const struct branch_info *new) > +{ > +... > +} I do not think I need to repeat the same remarks on the conditions in this helper, which hasn't changed since v2. Many "comments" in the code do not explain why skipping is justified, or what they claim to check looks to me just plain wrong. For example, there is /* * If we're not creating a new branch, by definition we're changing * the existing one so need to do the merge */ if (!opts->new_branch) return 1; but "git checkout" (no other argument) hits this condition. It disables the most trivial optimization opportunity, because we are not "creating". "By definition, we're changing"? Really? Not quite. If you disable this bogus check, "git checkout" (no other argument) would be allowed to skip the merge_working_tree(), and that in turn reveals another case that the helper is not checking when unpack_trees() MUST be called. Note: namely, when sparse checkout is in effect, switching from HEAD to HEAD can nuke existing working tree files outside the sparse pattern -- YUCK! See penultimate test in t1011 for an example. This yuckiness is not your fault, but needs_working_tree_merge() logic you added needs to refrain from skipping unpack_trees() call when sparse thing is in effect. I'd expect "git checkout -b foo" instead of "git checkout" (no other argument) would fail to honor the sparse thing and reveal this bug, because the above bogus "!opts->new_branch" check will not protect you for that case. In other words, these random series of "if (...) return 1" are bugs hiding other real bugs and we need to reason about which ones are bugs that are hiding what other bugs that are not covered by this function. As Peff said earlier for v1, this is still an unreadable mess. We need to figure out a way to make sure we are skipping on the right condition and not accidentally hiding a bug of failing to check the right condition. I offhand do not have a good suggestion on this; sorry. > static int merge_working_tree(const struct checkout_opts *opts, > struct branch_info *old, > struct branch_info *new, > int *writeout_error) > { > + /* > + * Optimize the performance of "git checkout -b foo" by avoiding > + * the expensive merge, index and working directory updates if they > + * are not needed. > + */ > + if (!needs_working_tree_merge(opts, old, new)) > + return 0; > + > int ret; > struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file)); With the change you made at the beginning of this function, it no longer compiles with -Wdecl-after-stmt, but that is the smallest of the problems. It is a small step in the right direction to move the call to the helper from the caller to this function, but it is a bit too small. Notice that the lines after the above context look like this: hold_locked_index(lock_file, 1); if (read_cache_preload(NULL) < 0) return error(_("index file corrupt")); resolve_undo_clear(); if (opts->force) { ret = reset_tree(new->commit->tree, opts, 1, writeout_error); if (ret) return ret; } else { struct tree_desc trees[2]; ... I would have expected that the check goes inside the "else" thing that actually does a two-tree merge, and the helper loses the check with opts->force, at least. That would still be a change smaller than desired, but at least a meaningful improvement compared to the previous one. As I have already pointed out, in the "else" clause there is a check "is the index free of conflicted entries? if so error out", and that must be honored in !opt->force case, no matter what your needs_working_tree_merge() says. I also was hoping that you would notice, when you were told about the unmerged check, by reading the remainder of the merge_working_tree(), that we need to call show_local_changes() when we are not doing force and when we are not quiet---returning early like the above patch will never be able to call that one downstream in the function. Regardless of what the actual checks end up to be, the right place to do this "optimization" would look more like: builtin/checkout.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/builtin/checkout.c b/builtin/checkout.c index 2b50a49..a6b9e17 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -508,14 +508,19 @@ static int merge_working_tree(const struct checkout_opts *opts, topts.dir->flags |= DIR_SHOW_IGNORED; setup_standard_excludes(topts.dir); } + + if ( we know we can skip the unpack ) { + ret = 0; + } else { tree = parse_tree_indirect(old->commit ? old->commit->object.oid.hash : EMPTY_TREE_SHA1_BIN); init_tree_desc(&trees[0], tree->buffer, tree->size); tree = parse_tree_indirect(new->commit->object.oid.hash); init_tree_desc(&trees[1], tree->buffer, tree->size); - ret = unpack_trees(2, trees, &topts); + } + if (ret == -1) { /* * Unpack couldn't do a trivial merge; either I'd think. Note that the determination of "we can skip" would involve knowing the object names of the two trees involved, so for performance reasons, some of the parse-tree calls may have to come before the call to "do we know we can skip?", but that does not fundamentally change the basic code structure. Thanks. ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout 2016-09-13 22:34 ` Junio C Hamano @ 2016-09-14 6:30 ` Oleg Taranenko 2016-09-14 15:48 ` Junio C Hamano [not found] ` <BL2PR03MB3232D3128A72D4EC9ADC2C6F4F10@BL2PR03MB323.namprd03.prod.outlook.com> 1 sibling, 1 reply; 13+ messages in thread From: Oleg Taranenko @ 2016-09-14 6:30 UTC (permalink / raw) To: Junio C Hamano; +Cc: Ben Peart, git, pclouds, Ben Peart Sorry for bothering, why not introduce a brand new option like git checkout -b foo --skip-worktree-merge for such rare optimization use case? On Wed, Sep 14, 2016 at 12:34 AM, Junio C Hamano <gitster@pobox.com> wrote: > Ben Peart <peartben@gmail.com> writes: > >> +static int needs_working_tree_merge(const struct checkout_opts *opts, >> + const struct branch_info *old, >> + const struct branch_info *new) >> +{ >> +... >> +} > > I do not think I need to repeat the same remarks on the conditions > in this helper, which hasn't changed since v2. Many "comments" in > the code do not explain why skipping is justified, or what they > claim to check looks to me just plain wrong. > > For example, there is > > /* > * If we're not creating a new branch, by definition we're changing > * the existing one so need to do the merge > */ > if (!opts->new_branch) > return 1; > > but "git checkout" (no other argument) hits this condition. It > disables the most trivial optimization opportunity, because we are > not "creating". > > "By definition, we're changing"? Really? Not quite. > > If you disable this bogus check, "git checkout" (no other argument) > would be allowed to skip the merge_working_tree(), and that in turn > reveals another case that the helper is not checking when > unpack_trees() MUST be called. > > Note: namely, when sparse checkout is in effect, switching from > HEAD to HEAD can nuke existing working tree files outside the > sparse pattern -- YUCK! See penultimate test in t1011 for > an example. > > This yuckiness is not your fault, but needs_working_tree_merge() > logic you added needs to refrain from skipping unpack_trees() call > when sparse thing is in effect. I'd expect "git checkout -b foo" > instead of "git checkout" (no other argument) would fail to honor > the sparse thing and reveal this bug, because the above bogus > "!opts->new_branch" check will not protect you for that case. > > In other words, these random series of "if (...) return 1" are bugs > hiding other real bugs and we need to reason about which ones are > bugs that are hiding what other bugs that are not covered by this > function. As Peff said earlier for v1, this is still an unreadable > mess. We need to figure out a way to make sure we are skipping on > the right condition and not accidentally hiding a bug of failing to > check the right condition. I offhand do not have a good suggestion > on this; sorry. > >> static int merge_working_tree(const struct checkout_opts *opts, >> struct branch_info *old, >> struct branch_info *new, >> int *writeout_error) >> { >> + /* >> + * Optimize the performance of "git checkout -b foo" by avoiding >> + * the expensive merge, index and working directory updates if they >> + * are not needed. >> + */ >> + if (!needs_working_tree_merge(opts, old, new)) >> + return 0; >> + >> int ret; >> struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file)); > > With the change you made at the beginning of this function, it no > longer compiles with -Wdecl-after-stmt, but that is the smallest of > the problems. > > It is a small step in the right direction to move the call to the > helper from the caller to this function, but it is a bit too small. > > Notice that the lines after the above context look like this: > > hold_locked_index(lock_file, 1); > if (read_cache_preload(NULL) < 0) > return error(_("index file corrupt")); > > resolve_undo_clear(); > if (opts->force) { > ret = reset_tree(new->commit->tree, opts, 1, writeout_error); > if (ret) > return ret; > } else { > struct tree_desc trees[2]; > ... > > I would have expected that the check goes inside the "else" thing > that actually does a two-tree merge, and the helper loses the check > with opts->force, at least. That would still be a change smaller > than desired, but at least a meaningful improvement compared to the > previous one. As I have already pointed out, in the "else" clause > there is a check "is the index free of conflicted entries? if so > error out", and that must be honored in !opt->force case, no matter > what your needs_working_tree_merge() says. I also was hoping that > you would notice, when you were told about the unmerged check, by > reading the remainder of the merge_working_tree(), that we need to > call show_local_changes() when we are not doing force and when we > are not quiet---returning early like the above patch will never be > able to call that one downstream in the function. > > Regardless of what the actual checks end up to be, the right place > to do this "optimization" would look more like: > > builtin/checkout.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/builtin/checkout.c b/builtin/checkout.c > index 2b50a49..a6b9e17 100644 > --- a/builtin/checkout.c > +++ b/builtin/checkout.c > @@ -508,14 +508,19 @@ static int merge_working_tree(const struct checkout_opts *opts, > topts.dir->flags |= DIR_SHOW_IGNORED; > setup_standard_excludes(topts.dir); > } > + > + if ( we know we can skip the unpack ) { > + ret = 0; > + } else { > tree = parse_tree_indirect(old->commit ? > old->commit->object.oid.hash : > EMPTY_TREE_SHA1_BIN); > init_tree_desc(&trees[0], tree->buffer, tree->size); > tree = parse_tree_indirect(new->commit->object.oid.hash); > init_tree_desc(&trees[1], tree->buffer, tree->size); > - > ret = unpack_trees(2, trees, &topts); > + } > + > if (ret == -1) { > /* > * Unpack couldn't do a trivial merge; either > > I'd think. Note that the determination of "we can skip" would > involve knowing the object names of the two trees involved, so for > performance reasons, some of the parse-tree calls may have to come > before the call to "do we know we can skip?", but that does not > fundamentally change the basic code structure. > > Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout 2016-09-14 6:30 ` Oleg Taranenko @ 2016-09-14 15:48 ` Junio C Hamano 0 siblings, 0 replies; 13+ messages in thread From: Junio C Hamano @ 2016-09-14 15:48 UTC (permalink / raw) To: Oleg Taranenko; +Cc: Ben Peart, git, pclouds, Ben Peart Oleg Taranenko <olegtaranenko@gmail.com> writes: > Sorry for bothering, why not introduce a brand new option like git > checkout -b foo --skip-worktree-merge for such rare optimization use > case? I am not sure what problem such a new option solves. How would you describe and explain what "--skip-worktree-merge" option to the end user? ^ permalink raw reply [flat|nested] 13+ messages in thread
[parent not found: <BL2PR03MB3232D3128A72D4EC9ADC2C6F4F10@BL2PR03MB323.namprd03.prod.outlook.com>]
[parent not found: <BL2PR03MB323E1B2F810C63CB01AA234F4F30@BL2PR03MB323.namprd03.prod.outlook.com>]
* RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout [not found] ` <BL2PR03MB323E1B2F810C63CB01AA234F4F30@BL2PR03MB323.namprd03.prod.outlook.com> @ 2016-09-19 13:18 ` Ben Peart 2016-09-19 16:30 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Ben Peart @ 2016-09-19 13:18 UTC (permalink / raw) To: 'Junio C Hamano'; +Cc: pclouds, peartben, 'Ben Peart', git Let me see if I can better explain what Im trying to accomplish with this patch. "git checkout -b foo" (without -f -m or <start_point>) is defined in the manual as being a shortcut for/equivalent to: (1a) "git branch foo" (1b) "git checkout foo" However, it has been our experience in our observed use cases and all the existing git tests, that it can be treated as equivalent to: (2a) "git branch foo" (2b) "git symbolic-ref HEAD refs/heads/foo" That is, the common perception (use case) is to just create a new branch "foo" (pointing at the current commit) and point HEAD at it WITHOUT making any changes to the index or worktree. However, the (1b) command has "git reset" connotations in that it should examine and manipulate the trees, index, and worktree in the expectation that there MIGHT be work to do. Since this additional work in (1b) takes minutes on large repos and (2b) takes less than a second, my intent was to identify the conditions that this additional work will have no affect and thereby avoid it. Alternatively, was the "-b" option just created as a shortcut only to avoid calling the separate "git branch foo" command and we should not think about the common perception and usage? More comments inline... > -----Original Message----- > From: Junio C Hamano [mailto:gitster@pobox.com] > Sent: Tuesday, September 13, 2016 6:35 PM > To: Ben Peart <mailto:peartben@gmail.com> > Cc: mailto:git@vger.kernel.org; mailto:pclouds@gmail.com; Ben Peart > <mailto:Ben.Peart@microsoft.com> > Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial > checkout > > Ben Peart <mailto:peartben@gmail.com> writes: > > > +static int needs_working_tree_merge(const struct checkout_opts *opts, > > + const struct branch_info *old, > > + const struct branch_info *new) > > +{ > > +... > > +} > > I do not think I need to repeat the same remarks on the conditions in this > helper, which hasn't changed since v2. Many "comments" in the code do not > explain why skipping is justified, or what they claim to check looks to me just > plain wrong. > > For example, there is > > /* > * If we're not creating a new branch, by definition we're changing > * the existing one so need to do the merge > */ > if (!opts->new_branch) > return 1; > > but "git checkout" (no other argument) hits this condition. It disables the > most trivial optimization opportunity, because we are not "creating". > Disabling the optimization for "git checkout" with no argument was intentional. This command does not create a new branch but instead, performs a "soft reset" which will update the index and working directory to reflect changes to the sparse-checkout (for example). If this was not disabled, many tests fail as they expect this behavior. Because "git checkout" does not actually change the refs, if we skipped the merge/index/working directory update, this command becomes a no-op. > "By definition, we're changing"? Really? Not quite. > What I was attempting to communicate is that if we aren't creating a new branch any changes or updates will happen in the existing branch. Since that could only be updating the index and working directory, we don't want to skip those steps or we've defeated any purpose in running the command. > If you disable this bogus check, "git checkout" (no other argument) would be > allowed to skip the merge_working_tree(), and that in turn reveals another > case that the helper is not checking when > unpack_trees() MUST be called. > > Note: namely, when sparse checkout is in effect, switching from > HEAD to HEAD can nuke existing working tree files outside the > sparse pattern -- YUCK! See penultimate test in t1011 for > an example. > > This yuckiness is not your fault, but needs_working_tree_merge() logic you > added needs to refrain from skipping unpack_trees() call when sparse thing > is in effect. I'd expect "git checkout -b foo" > instead of "git checkout" (no other argument) would fail to honor the sparse > thing and reveal this bug, because the above bogus "!opts->new_branch" > check will not protect you for that case. > It is correct that this optimization will skip updating the tree to honor any changes to the sparse-checkout in the case of creating a new branch. Unfortunately, I don't know of any way to detect the changes other than actually doing all the work to update the skip work tree bit in the index. If this behavior is required, then this optimization will need to check if sparse-checkout is enabled and skip the optimization just in case there have been changes. > In other words, these random series of "if (...) return 1" are bugs hiding > other real bugs and we need to reason about which ones are bugs that are > hiding what other bugs that are not covered by this function. As Peff said > earlier for v1, this is still an unreadable mess. We need to figure out a way to > make sure we are skipping on the right condition and not accidentally hiding > a bug of failing to check the right condition. I offhand do not have a good > suggestion on this; sorry. > Beyond this code review process and testing, I don't know how else we make sure we're caught all the conditions where we are OK skipping some of the steps. Any change has inherent risk - a change in behavior even more so. > > static int merge_working_tree(const struct checkout_opts *opts, > > struct branch_info *old, > > struct branch_info *new, > > int *writeout_error) > > { > > + /* > > + * Optimize the performance of "git checkout -b foo" by avoiding > > + * the expensive merge, index and working directory updates if they > > + * are not needed. > > + */ > > + if (!needs_working_tree_merge(opts, old, new)) > > + return 0; > > + > > int ret; > > struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file)); > > With the change you made at the beginning of this function, it no longer > compiles with -Wdecl-after-stmt, but that is the smallest of the problems. I apologize, I didn't realize this was a requirement. It built and passed all existing tests on Windows but I will reorder the declarations to prevent causing issues with other platforms/compilers. > > It is a small step in the right direction to move the call to the helper from the > caller to this function, but it is a bit too small. > > Notice that the lines after the above context look like this: > > hold_locked_index(lock_file, 1); > if (read_cache_preload(NULL) < 0) > return error(_("index file corrupt")); > > resolve_undo_clear(); > if (opts->force) { > ret = reset_tree(new->commit->tree, opts, 1, > writeout_error); > if (ret) > return ret; > } else { > struct tree_desc trees[2]; > ... > > I would have expected that the check goes inside the "else" thing that > actually does a two-tree merge, and the helper loses the check with opts- > >force, at least. That would still be a change smaller than desired, but at > least a meaningful improvement compared to the previous one. I'll restructure it that way. > As I have > already pointed out, in the "else" clause there is a check "is the index free of > conflicted entries? if so error out", and that must be honored in !opt->force > case, no matter what your needs_working_tree_merge() says. Given we're not merging trees, updating the index, or work tree, why do we need to error out in this case? We aren't attempting this optimization if they pass "-m." If there are conflicted entries that haven't been fixed, they will still exist. We're essentially just creating a new reference for the existing commit/index/work tree. > I also was > hoping that you would notice, when you were told about the unmerged > check, by reading the remainder of the merge_working_tree(), that we need > to call show_local_changes() when we are not doing force and when we are > not quiet---returning early like the above patch will never be able to call that > one downstream in the function. It is a good point that my optimization skipped the call to show_local_changes. Thanks for catching that, I've fixed it for my next iteration. > > Regardless of what the actual checks end up to be, the right place to do this > "optimization" would look more like: > > builtin/checkout.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/builtin/checkout.c b/builtin/checkout.c index 2b50a49..a6b9e17 > 100644 > --- a/builtin/checkout.c > +++ b/builtin/checkout.c > @@ -508,14 +508,19 @@ static int merge_working_tree(const struct > checkout_opts *opts, > topts.dir->flags |= DIR_SHOW_IGNORED; > setup_standard_excludes(topts.dir); > } > + > + if ( we know we can skip the unpack ) { > + ret = 0; > + } else { > tree = parse_tree_indirect(old->commit ? > old->commit- > >object.oid.hash : > EMPTY_TREE_SHA1_BIN); > init_tree_desc(&trees[0], tree->buffer, tree->size); > tree = parse_tree_indirect(new->commit- > >object.oid.hash); > init_tree_desc(&trees[1], tree->buffer, tree->size); > - > ret = unpack_trees(2, trees, &topts); > + } > + > if (ret == -1) { > /* > * Unpack couldn't do a trivial merge; either > I'll restructure it to be like you suggest above however, given we will not be merging the tress, we won't have any index changes to write out. I will also skip the calls to cache_tree_update and write_locked_index. > I'd think. Note that the determination of "we can skip" would involve > knowing the object names of the two trees involved, so for performance > reasons, some of the parse-tree calls may have to come before the call to > "do we know we can skip?", but that does not fundamentally change the > basic code structure. > > Thanks. I don't understand why we'd need to know the object names of the two trees given we have the IDs. What did you have in mind that would need those? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout 2016-09-19 13:18 ` Ben Peart @ 2016-09-19 16:30 ` Junio C Hamano 2016-09-19 17:03 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2016-09-19 16:30 UTC (permalink / raw) To: Ben Peart; +Cc: pclouds, 'Ben Peart', git "Ben Peart" <peartben@gmail.com> writes: > Let me see if I can better explain what I’m trying to accomplish with this > patch. > > "git checkout -b foo" (without -f -m or <start_point>) is defined in the > manual as being a shortcut for/equivalent to: > > (1a) "git branch foo" > (1b) "git checkout foo" > > However, it has been our experience in our observed use cases and all the > existing git tests, that it can be treated as equivalent to: > > (2a) "git branch foo" > (2b) "git symbolic-ref HEAD refs/heads/foo" > > That is, the common perception (use case) is to just create a new branch > "foo" (pointing at the current commit) and point HEAD at it WITHOUT making > any changes to the index or worktree. > > However, the (1b) command has "git reset" connotations in that it should > examine and manipulate the trees, index, and worktree in the expectation > that there MIGHT be work to do. > > Since this additional work in (1b) takes minutes on large repos and (2b) > takes less than a second, my intent was to identify the conditions that this > additional work will have no affect and thereby avoid it. > > Alternatively, was the "-b" option just created as a shortcut only to avoid > calling the separate "git branch foo" command and we should not think about > the common perception and usage? If you are trying to change the definition of "checkout -b" from 1 to 2 above, that is a completely different issue. I thought this was an attempt to optimize for the performance without changing the behaviour. So if you did not apologize like this... > It is correct that this optimization will skip updating the tree to honor > any changes to the sparse-checkout in the case of creating a new branch. > Unfortunately, I don't know of any way to detect the changes other than > actually doing all the work to update the skip work tree bit in the index. ... but insisted that skipping the yucky sparse-checkout adjustment in this case was an intended behaviour change, I would have understood (not necessarily agreed, though) what you were trying to do. > Beyond this code review process and testing, I don't know how else we make > sure we're caught all the conditions where we are OK skipping some of the > steps. Any change has inherent risk - a change in behavior even more so. At least we made one-step progress today. I now know that you are trying to change the behaviour, but I didn't know that last week, when I was primarily reacting that your claim that this was performance thing and assuming you meant no change in behaviour, but there was clearly behaviour change, and it was apparent that the denseness of the code made it almost impossible to see if there are unintended changes. I am still not sure if I like the change of what "checkout -b" is this late in the game, though. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout 2016-09-19 16:30 ` Junio C Hamano @ 2016-09-19 17:03 ` Junio C Hamano 2016-09-21 18:32 ` Ben Peart 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2016-09-19 17:03 UTC (permalink / raw) To: Ben Peart; +Cc: pclouds, 'Ben Peart', git Junio C Hamano <gitster@pobox.com> writes: >> "git checkout -b foo" (without -f -m or <start_point>) is defined in the >> manual as being a shortcut for/equivalent to: >> >> (1a) "git branch foo" >> (1b) "git checkout foo" >> >> However, it has been our experience in our observed use cases and all the >> existing git tests, that it can be treated as equivalent to: >> >> (2a) "git branch foo" >> (2b) "git symbolic-ref HEAD refs/heads/foo" >> ... > > I am still not sure if I like the change of what "checkout -b" is > this late in the game, though. Having said all that. I do see the merit of having a shorthand way to invoke your 2 above. It is just that I am not convinced that it is the best way to achieve that goal to redefine what "git checkout -b <new-name>" (no other parameters) does. ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout 2016-09-19 17:03 ` Junio C Hamano @ 2016-09-21 18:32 ` Ben Peart 2016-09-24 14:28 ` Philip Oakley 0 siblings, 1 reply; 13+ messages in thread From: Ben Peart @ 2016-09-21 18:32 UTC (permalink / raw) To: Junio C Hamano, Ben Peart; +Cc: pclouds@gmail.com, git@vger.kernel.org I understand the reluctance to change the existing behavior of the "git checkout -b <new-name>" command. I see this as a tradeoff between taking advantage of the muscle memory for the existing command and coming up with a new shortcut command and training people to use it instead. The fact that all the use cases we've observed and all the git test cases actually produce the same results but significantly faster with that change in behavior made me hope we could redefine the command to take advantage of the muscle memory. That said, you're much more on the frontline of receiving negative feedback about doing that than I am. :) How would you like to proceed? Ben -----Original Message----- From: Junio C Hamano [mailto:gitster@pobox.com] Sent: Monday, September 19, 2016 1:04 PM To: Ben Peart <peartben@gmail.com> Cc: pclouds@gmail.com; Ben Peart <Ben.Peart@microsoft.com>; git@vger.kernel.org Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout Junio C Hamano <gitster@pobox.com> writes: >> "git checkout -b foo" (without -f -m or <start_point>) is defined in >> the manual as being a shortcut for/equivalent to: >> >> (1a) "git branch foo" >> (1b) "git checkout foo" >> >> However, it has been our experience in our observed use cases and all >> the existing git tests, that it can be treated as equivalent to: >> >> (2a) "git branch foo" >> (2b) "git symbolic-ref HEAD refs/heads/foo" >> ... > > I am still not sure if I like the change of what "checkout -b" is this > late in the game, though. Having said all that. I do see the merit of having a shorthand way to invoke your 2 above. It is just that I am not convinced that it is the best way to achieve that goal to redefine what "git checkout -b <new-name>" (no other parameters) does. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout 2016-09-21 18:32 ` Ben Peart @ 2016-09-24 14:28 ` Philip Oakley 2016-09-24 18:26 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Philip Oakley @ 2016-09-24 14:28 UTC (permalink / raw) To: Ben Peart, Junio C Hamano; +Cc: pclouds, git Ben, Using a 'bottom / in-line' posting flow is much preferred, which may require some manual editing[1], hopefully I have it about right... Philip -- [1] this is massaged and mangled Outlook Express, sometimes one has to work with the tools at hand... From: "Ben Peart" <Ben.Peart@microsoft.com> > From: Junio C Hamano [mailto:gitster@pobox.com] > > Junio C Hamano <gitster@pobox.comwrites: > > > > >"git checkout -b foo" (without -f -m or <start_point>) is defined in > > >the manual as being a shortcut for/equivalent to: > > > > > > (1a) "git branch foo" > > > (1b) "git checkout foo" > > > > > >However, it has been our experience in our observed use cases and all > > >the existing git tests, that it can be treated as equivalent to: > > > > > > (2a) "git branch foo" > > > (2b) "git symbolic-ref HEAD refs/heads/foo" > > >... > > > > > I am still not sure if I like the change of what "checkout -b" is this > > late in the game, though. > > > > Having said all that. > > > > I do see the merit of having a shorthand way to invoke your 2 above. > > It is just that I am not convinced that it is the best way to achieve > > that goal to redefine what "git checkout -b <new-name>" (no other > > parameters) does. > > > --- > > I understand the reluctance to change the existing behavior of the "git > checkout -b <new-name>" command. > > I see this as a tradeoff between taking advantage of the muscle memory for > the existing command and coming up with a new shortcut command and > training people to use it instead. > > The fact that all the use cases we've observed and all the git test cases > actually produce the same results but significantly faster with that > change in behavior made me hope we could redefine the command to take > advantage of the muscle memory. > > That said, you're much more on the frontline of receiving negative > feedback about doing that than I am. :) How would you like to proceed? The discussion can often feel harsh [2], especially if there is accidental 'talking past each other', which is usually because of differing perspectives on the issues. I didn't see an initial confirmation as to what the issue really was. You indicated the symptom ('a long checkout time'), but then we missed out on hard facts and example repos, so that the issue was replicable. Is there an example public repo that you can show the issue on? (or anonymise a private one - there is a script for that [3]) Can you give local timings (and indication of the hardware and software versions used for the test, and if appropriate, network setup)? I know at my work that sometime our home drives are multiply mapped to H:, a C:/homedrive directory and a $netshare/me network directory via the Microsofy roaming profiles, and if there is hard synchronization (or whatever term is appropriate) there can be sudden slowdowns as local C: writes drop from 'instant' to 'forever'... Is there anything special about the repos that have the delays? Is it a local process issue that causes the repos to develop those symptoms (see above about not being sure why you have these issues), in which case it could be local self inflicted issues, or it could be that you have a regulatory issue for that domain that requires such symptoms, which would shift the problem from a 'don't do that' response to a 'hmm, how to cover this'. At the moment there is the simple workaround of an alias that executes that two step command dance to achieve what you needed, and Junio has outlined the issues he needed to be covered from his maintainer perspective (e.g. the detection of sparse checkouts). Confirming the root causes would help in setting a baseline. I hope that is of help - I'd seen that the discussion had gone quiet. -- Philip [2] Been there, feel your pain. It's not in any way malicious, just a reflection that email can be a poor medium for such discussions. [3] https://public-inbox.org/git/20140827170127.GA6138@peff.net/ suggest that the `git fast-export --anonymize --all` maybe the approach. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout 2016-09-24 14:28 ` Philip Oakley @ 2016-09-24 18:26 ` Junio C Hamano 2016-09-24 19:31 ` Philip Oakley 0 siblings, 1 reply; 13+ messages in thread From: Junio C Hamano @ 2016-09-24 18:26 UTC (permalink / raw) To: Philip Oakley; +Cc: Ben Peart, pclouds, git "Philip Oakley" <philipoakley@iee.org> writes: >> > >"git checkout -b foo" (without -f -m or <start_point>) is defined in >> > >the manual as being a shortcut for/equivalent to: >> > > >> > > (1a) "git branch foo" >> > > (1b) "git checkout foo" >> > > >> > >However, it has been our experience in our observed use cases and all >> > >the existing git tests, that it can be treated as equivalent to: >> > > >> > > (2a) "git branch foo" >> > > (2b) "git symbolic-ref HEAD refs/heads/foo" >> > >... >> > > >> > I am still not sure if I like the change of what "checkout -b" is this >> > late in the game, though. >> >> ... >> That said, you're much more on the frontline of receiving negative >> feedback about doing that than I am. :) How would you like to >> proceed? > > I didn't see an initial confirmation as to what the issue really > was. You indicated the symptom ('a long checkout time'), but then we > missed out on hard facts and example repos, so that the issue was > replicable. I took it as a given, trivial and obvious optimization opportunity, that it is wasteful having to traverse two trees to consolidate and reflect their differences into the working tree when we know upfront that these two trees are identical, no matter what the overhead for doing so is. > At the moment there is the simple workaround of an alias that executes > that two step command dance to achieve what you needed, and Junio has > outlined the issues he needed to be covered from his maintainer > perspective (e.g. the detection of sparse checkouts). Confirming the > root causes would help in setting a baseline. > > I hope that is of help - I'd seen that the discussion had gone quiet. Some of the problems I have are: (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0" and "git checkout HEAD" (no other parameters to any of them) ought to give identical index and working tree. It is too confusing to leave subtly different results that will lead to hard to diagnose bugs for only one of them. (2) The proposed log message talks only about "performance optimization", while the purpose of the change is more about changing the definition of what "git checkout -b NEW" is from "git branch NEW && git checkout NEW" to "git branch NEW && git symbolic-ref HEAD refs/heads/NEW". The explanation in a Ben's later message <007401d21278$445eba80$cd1c2f80$@gmail.com> does a much better job contrasting the two. (3) I identified only one difference as an example sufficient to point out why the patch provided is not a pure optimization but behaviour change. Fixing that example alone to avoid change in the behaviour is trivial (see if the "info/sparse-checkout" file is present and refrain from skipping the proper checkout), but a much larger problem is that I do not know (and Ben does not, I suspect) know what other behaviour changes the patch is introducing, and worse, the checks are sufficiently dense too detailed and intimate to the implementation of unpack_trees() that it is impossible for anybody to make sure the exceptions defined in this patch and updates to other parts of the system will be kept in sync. So my inclination at this point, unless we see somebody invents a clever way to solve (3), is that any change that violates (1), i.e. as long as the patch does "Are we doing '-b NEW'? Then we do something subtly different", is not acceptable, and solving (3) in a maintainable way smells like quite a hard thing to do. But it would be ideal if (3) is solved cleanly, as we will then not have to worry about changing behaviour at all and can apply the optimization for all of the four cases equally. As a side effect, that approach would solve problem (2) above. If we were to punt on keeping the sanity (1) and introduce a subtly different "create a new branch and point the HEAD at it", an easier way out may be be one of 1. a totally new command, e.g. "git branch-switch NEW" that takes only a single argument and no other "checkout" options, or 2. a new option to "git checkout" that takes _ONLY_ a single argument and incompatible with any other option or command line argument, or 3. an alias that does "git branch" followed by "git symbolic-ref". Neither of the first two sounds palatable, though. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout 2016-09-24 18:26 ` Junio C Hamano @ 2016-09-24 19:31 ` Philip Oakley 0 siblings, 0 replies; 13+ messages in thread From: Philip Oakley @ 2016-09-24 19:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: Ben Peart, pclouds, git Hi Junio, From: "Junio C Hamano" <gitster@pobox.com> > "Philip Oakley" <philipoakley@iee.org> writes: > >>> > >"git checkout -b foo" (without -f -m or <start_point>) is defined in >>> > >the manual as being a shortcut for/equivalent to: >>> > > >>> > > (1a) "git branch foo" >>> > > (1b) "git checkout foo" >>> > > >>> > >However, it has been our experience in our observed use cases and all >>> > >the existing git tests, that it can be treated as equivalent to: >>> > > >>> > > (2a) "git branch foo" >>> > > (2b) "git symbolic-ref HEAD refs/heads/foo" >>> > >... >>> > > >>> > I am still not sure if I like the change of what "checkout -b" is this >>> > late in the game, though. >>> >>> ... >>> That said, you're much more on the frontline of receiving negative >>> feedback about doing that than I am. :) How would you like to >>> proceed? >> >> I didn't see an initial confirmation as to what the issue really >> was. You indicated the symptom ('a long checkout time'), but then we >> missed out on hard facts and example repos, so that the issue was >> replicable. > > I took it as a given, trivial and obvious optimization opportunity, > that it is wasteful having to traverse two trees to consolidate and > reflect their differences into the working tree when we know upfront > that these two trees are identical, no matter what the overhead for > doing so is. I agree, and I believe Ben agrees. > >> At the moment there is the simple workaround of an alias that executes >> that two step command dance to achieve what you needed, and Junio has >> outlined the issues he needed to be covered from his maintainer >> perspective (e.g. the detection of sparse checkouts). Confirming the >> root causes would help in setting a baseline. >> >> I hope that is of help - I'd seen that the discussion had gone quiet. > > Some of the problems I have are: > > (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0" > and "git checkout HEAD" (no other parameters to any of them) > ought to give identical index and working tree. It is too > confusing to leave subtly different results that will lead to > hard to diagnose bugs for only one of them. > > (2) The proposed log message talks only about "performance > optimization", > while the purpose of the change is more > about > changing the definition Here I think is the misunderstanding. His purpose is NOT to change the definition (IIUC). As I read the message you reference below (and Ben's other messages), I understood that he was trying to achieve what you said (i.e. optimise the trivial and obvious opportunity) of selecting for the common case (underlying conditions) where the two command sequences are identical. If the selected case / conditions is not identical then it is defined wrongly... I suspect that it was Ben's 'soft' explanation that allowed the discussion to diverge. > of what "git checkout -b > NEW" is from > "git branch NEW && git checkout NEW" to "git branch NEW && git > symbolic-ref HEAD refs/heads/NEW". The explanation in a Ben's > later message <007401d21278$445eba80$cd1c2f80$@gmail.com> does > a much better job contrasting the two. > > (3) I identified only one difference as an example sufficient to > point out why the patch provided is not a pure optimization but > behaviour change. Fixing that example alone to avoid change in > the behaviour is trivial (see if the "info/sparse-checkout" > file is present and refrain from skipping the proper checkout), This is probably the point Ben needs to take on board to narrow the conditions down. There may be others. > but a much larger problem is that I do not know (and Ben does > not, I suspect) know what other behaviour changes the patch is > introducing, and worse, the checks are sufficiently dense too > detailed and intimate to the implementation of unpack_trees() > that it is impossible for anybody to make sure the exceptions > defined in this patch and updates to other parts of the system > will be kept in sync. I did not believe he was proposing such a change to behaviour, hence his difficulty in responding (or at least that is my perception). I.e. he was digging a hole in the wrong place. It is possible that he had accidentally introduced a behavious change, and having failed to explictly say "This patch (should) produces no behavious change", which then continued to re-inforce the misunderstanding. > > So my inclination at this point, unless we see somebody invents a > clever way to solve (3), is that any change that violates (1), > i.e. as long as the patch does "Are we doing '-b NEW'? Then we do > something subtly different", is not acceptable, and solving (3) in a > maintainable way smells like quite a hard thing to do. But it would > be ideal if (3) is solved cleanly, as we will then not have to worry > about changing behaviour at all and can apply the optimization for > all of the four cases equally. As a side effect, that approach > would solve problem (2) above. > > If we were to punt on keeping the sanity (1) and introduce a subtly > different "create a new branch and point the HEAD at it", an easier > way out may be be one of > > 1. a totally new command, e.g. "git branch-switch NEW" that takes > only a single argument and no other "checkout" options, or > > 2. a new option to "git checkout" that takes _ONLY_ a single > argument and incompatible with any other option or command line > argument, or > > 3. an alias that does "git branch" followed by "git symbolic-ref". > > Neither of the first two sounds palatable, though. It will need Ben to come back and clarify, if he did, or did not, want any behaviour change (beyond speed of action;-) Thanks Philip ^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout @ 2016-09-28 17:02 Ben Peart 2016-09-28 17:52 ` Junio C Hamano 0 siblings, 1 reply; 13+ messages in thread From: Ben Peart @ 2016-09-28 17:02 UTC (permalink / raw) To: git Cc: Ben Peart, pclouds, Jeff Hostetler, philipoakley, 'Junio C Hamano' Resending > -----Original Message----- > From: git-owner@vger.kernel.org [mailto:git-owner@vger.kernel.org] On > Behalf Of Philip Oakley > Sent: Saturday, September 24, 2016 3:31 PM > To: Junio C Hamano <gitster@pobox.com> > Cc: Ben Peart <Ben.Peart@microsoft.com>; pclouds@gmail.com; > git@vger.kernel.org > Subject: Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial > checkout > > Hi Junio, > > From: "Junio C Hamano" <gitster@pobox.com> > > "Philip Oakley" <philipoakley@iee.org> writes: > > > >>> > >"git checkout -b foo" (without -f -m or <start_point>) is defined > >>> > >in the manual as being a shortcut for/equivalent to: > >>> > > > >>> > > (1a) "git branch foo" > >>> > > (1b) "git checkout foo" > >>> > > > >>> > >However, it has been our experience in our observed use cases and > >>> > >all the existing git tests, that it can be treated as equivalent to: > >>> > > > >>> > > (2a) "git branch foo" > >>> > > (2b) "git symbolic-ref HEAD refs/heads/foo" > >>> > >... > >>> > > > >>> > I am still not sure if I like the change of what "checkout -b" is > >>> > this late in the game, though. > >>> > >>> ... > >>> That said, you're much more on the frontline of receiving negative > >>> feedback about doing that than I am. :) How would you like to > >>> proceed? > >> > >> I didn't see an initial confirmation as to what the issue really was. > >> You indicated the symptom ('a long checkout time'), but then we > >> missed out on hard facts and example repos, so that the issue was > >> replicable. > > > > I took it as a given, trivial and obvious optimization opportunity, > > that it is wasteful having to traverse two trees to consolidate and > > reflect their differences into the working tree when we know upfront > > that these two trees are identical, no matter what the overhead for > > doing so is. > > I agree, and I believe Ben agrees. > Correct. In my original patch request I put more specific information on the impact this optimization has in our specific case (reducing the cost from 166 seconds to 16 seconds). > > > >> At the moment there is the simple workaround of an alias that > >> executes that two step command dance to achieve what you needed, and > >> Junio has outlined the issues he needed to be covered from his > >> maintainer perspective (e.g. the detection of sparse checkouts). > >> Confirming the root causes would help in setting a baseline. > >> > >> I hope that is of help - I'd seen that the discussion had gone quiet. > > > > Some of the problems I have are: > > > > (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0" > > and "git checkout HEAD" (no other parameters to any of them) > > ought to give identical index and working tree. It is too > > confusing to leave subtly different results that will lead to > > hard to diagnose bugs for only one of them. > > > > (2) The proposed log message talks only about "performance > > optimization", > > > while the purpose of the change is more > > about > > changing the definition > > Here I think is the misunderstanding. His purpose is NOT to change the > definition (IIUC). As I read the message you reference below (and Ben's other > messages), I understood that he was trying to achieve what you said (i.e. > optimise the trivial and obvious opportunity) of selecting for the common > case (underlying conditions) where the two command sequences are > identical. If the selected case / conditions is not identical then it is defined > wrongly... > > I suspect that it was Ben's 'soft' explanation that allowed the discussion to > diverge. > I'm unaccustomed to doing reviews like this via email so have been struggling with how to most effectively communicate about the proposed change. I appreciate any help and understanding as I go through this for the first time. My intention was not to change the users expected results which I believe are to "create a new branch and switch to it." We reinforce that expectation with the output of the command which completes with the text "Switched to a new branch 'foo'" > > > of what "git checkout -b > > NEW" is from > > "git branch NEW && git checkout NEW" to "git branch NEW && git > > symbolic-ref HEAD refs/heads/NEW". The explanation in a Ben's > > later message <007401d21278$445eba80$cd1c2f80$@gmail.com> does > > a much better job contrasting the two. > > > > (3) I identified only one difference as an example sufficient to > > point out why the patch provided is not a pure optimization but > > behaviour change. Fixing that example alone to avoid change in > > the behaviour is trivial (see if the "info/sparse-checkout" > > file is present and refrain from skipping the proper checkout), > > This is probably the point Ben needs to take on board to narrow the > conditions down. There may be others. > The fact that "git checkout -b NEW" updates the index and as a result reflects any changes in the sparse-checkout and the issue Junio pointed out earlier about not calling show_local_changes at the end of merge_working_tree are the only difference in behavior I am aware of. Both of these are easily rectified. That said, given we are skipping huge amounts of work by no longer merging the commit trees, generating a new index, and merging the local modifications in the working tree, it is possible that there are other behavior changes I'm just not aware of. > > but a much larger problem is that I do not know (and Ben does > > not, I suspect) know what other behaviour changes the patch is > > introducing, and worse, the checks are sufficiently dense too > > detailed and intimate to the implementation of unpack_trees() > > that it is impossible for anybody to make sure the exceptions > > defined in this patch and updates to other parts of the system > > will be kept in sync. > > I did not believe he was proposing such a change to behaviour, hence his > difficulty in responding (or at least that is my perception). I.e. he was > digging a hole in the wrong place. > > It is possible that he had accidentally introduced a behavious change, and > having failed to explictly say "This patch (should) produces no behavious > change", which then continued to re-inforce the misunderstanding. > > > > > So my inclination at this point, unless we see somebody invents a > > clever way to solve (3), is that any change that violates (1), > > i.e. as long as the patch does "Are we doing '-b NEW'? Then we do > > something subtly different", is not acceptable, and solving (3) in a > > maintainable way smells like quite a hard thing to do. But it would > > be ideal if (3) is solved cleanly, as we will then not have to worry > > about changing behaviour at all and can apply the optimization for > > all of the four cases equally. As a side effect, that approach > > would solve problem (2) above. > > > > If we were to punt on keeping the sanity (1) and introduce a subtly > > different "create a new branch and point the HEAD at it", an easier > > way out may be be one of > > > > 1. a totally new command, e.g. "git branch-switch NEW" that takes > > only a single argument and no other "checkout" options, or > > > > 2. a new option to "git checkout" that takes _ONLY_ a single > > argument and incompatible with any other option or command line > > argument, or > > > > 3. an alias that does "git branch" followed by "git symbolic-ref". > > > > Neither of the first two sounds palatable, though. > > It will need Ben to come back and clarify, if he did, or did not, want any > behaviour change (beyond speed of action;-) > There is a subtlety here in what is meant by "any behavior change." I did not want to change the users expectations of what this command is used for. The only noticeable behavior change should only be that it sped up by an order of magnitude. To get that speed up, there is a change in behavior from git's perspective as it is no longer doing a bunch of work it used to do which is what is saving the time. I was aware that skipping the commit merge/new index/merge working tree meant that "git checkout NEW" would no longer update these to reflect any potential changes to the sparse-checkout file. To determine if this would change the results the user was *expecting*, I searched the web and found that all the instructions I could locate that taught people how to update the index/working tree after making changes to the sparse-checkout file instructed them to use "git read-tree -mu HEAD." I didn't find any that told people to use "git checkout -b NEW" Finally, when I made the optimization to skip these steps I then verified that the test suite still passed all tests. I realize that there is not 100% coverage of tests but I thought it was a good indication that none of them were impacted by this optimization. I've tried to think of a way to solve (3) in a more maintainable way but have not been able to come up with anything. Ultimately, to ensure are only applying the optimization in this specific case, we have to test to make sure other options don't require the extra steps. I'm open to suggestions! I'm going to be out for the next 2 weeks so will be unable to respond to activity on the thread but a co-worker who has been involved will be responsive to feedback and rolling any new versions of the patch. Thanks, Ben ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout 2016-09-28 17:02 Ben Peart @ 2016-09-28 17:52 ` Junio C Hamano 0 siblings, 0 replies; 13+ messages in thread From: Junio C Hamano @ 2016-09-28 17:52 UTC (permalink / raw) To: Ben Peart; +Cc: git, Ben Peart, pclouds, Jeff Hostetler, philipoakley "Ben Peart" <peartben@gmail.com> writes: > The fact that "git checkout -b NEW" updates the index and as a > result reflects any changes in the sparse-checkout and the issue > Junio pointed out earlier about not calling show_local_changes > at the end of merge_working_tree are the only difference in behavior > I am aware of. Both of these are easily rectified. > > That said, given we are skipping huge amounts of work by no longer > merging the commit trees, generating a new index, and merging the > local modifications in the working tree, it is possible that there are > other behavior changes I'm just not aware of. That is OK. It is not ok to leave such bugs at the end of the development before the topic is merged to 'master' to be delivered to the end users, but you do not have to fight alone to produce a perfect piece of code with your first attempt. That's what the reviews and testing period are for. If you are shooting for the same behaviour, then that is much better than "make 'checkout -b NEW' be equivalent to a sequence of update-ref && symbolic-ref, which is different from others", which was the second explanation you gave earlier. I am much happier with that goal. But if that is the case, I really do not see any point of singling out "-b NEW" case. The following property MUST be kept: (1) "git checkout -b NEW", "git checkout", "git checkout HEAD^0" and "git checkout HEAD" (no other parameters to any of them) ought to give identical index and working tree. It is too confusing to leave subtly different results that will lead to hard to diagnose bugs for only one of them. That would make the "do we skip unpack_trees() call?" decision a lot simpler to make, I would suspect. We only need to see "are the two trees we would fed unpack_trees() the same as HEAD's tree?" and do not have to look at new_branch and other irrelevant things at all. What happens in the ref namespace is immaterial, as making or skipping an unpack_trees() call would not affect anything other than the resulting index and the working tree. If we want to keep that sparse-checkout wart, we would also need to see if the control file sparse-checkout keeps in $GIT_DIR/ exists, but the result will be much simpler set of rules, and would hopefully help remove the "the optimization kicks in following logic that is an unreviewable-mess" issue. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-09-28 17:52 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-09-13 14:26 [PATCH v3] checkout: eliminate unnecessary merge for trivial checkout Ben Peart 2016-09-13 22:34 ` Junio C Hamano 2016-09-14 6:30 ` Oleg Taranenko 2016-09-14 15:48 ` Junio C Hamano [not found] ` <BL2PR03MB3232D3128A72D4EC9ADC2C6F4F10@BL2PR03MB323.namprd03.prod.outlook.com> [not found] ` <BL2PR03MB323E1B2F810C63CB01AA234F4F30@BL2PR03MB323.namprd03.prod.outlook.com> 2016-09-19 13:18 ` Ben Peart 2016-09-19 16:30 ` Junio C Hamano 2016-09-19 17:03 ` Junio C Hamano 2016-09-21 18:32 ` Ben Peart 2016-09-24 14:28 ` Philip Oakley 2016-09-24 18:26 ` Junio C Hamano 2016-09-24 19:31 ` Philip Oakley -- strict thread matches above, loose matches on Subject: below -- 2016-09-28 17:02 Ben Peart 2016-09-28 17:52 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).