* [PATCH] subtree: fix split processing with multiple subtrees present @ 2023-09-18 20:05 Zach FettersMoore via GitGitGadget 2023-09-18 23:31 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-09-18 20:05 UTC (permalink / raw) To: git; +Cc: Zach FettersMoore, Zach FettersMoore From: Zach FettersMoore <zach.fetters@apollographql.com> When there are multiple subtrees present in a repository and they are all using 'git subtree split', the 'split' command can take a significant (and constantly growing) amount of time to run even when using the '--rejoin' flag. This is due to the fact that when processing commits to determine the last known split to start from when looking for changes, if there has been a split/merge done from another subtree there will be 2 split commits, one mainline and one subtree, for the second subtree that are part of the processing. The non-mainline subtree split commit will cause the processing to always need to search the entire history of the given subtree as part of its processing even though those commits are totally irrelevant to the current subtree split being run. In the diagram below, 'M' represents the mainline repo branch, 'A' represents one subtree, and 'B' represents another. M3 and B1 represent a split commit for subtree B that was created from commit M4. M2 and A1 represent a split commit made from subtree A that was also created based on changes back to and including M4. M1 represents new changes to the repo, in this scenario if you try to run a 'git subtree split --rejoin' for subtree B, commits M1, M2, and A1, will be included in the processing of changes for the new split commit since the last split/rejoin for subtree B was at M3. The issue is that by having A1 included in this processing the command ends up needing to processing every commit down tree A even though none of that is needed or relevant to the current command and result. M1 | \ \ M2 | | | A1 | M3 | | | | B1 M4 | | So this commit makes a change to the processing of commits for the split command in order to ignore non-mainline commits from other subtrees such as A1 in the diagram by adding a new function 'should_ignore_subtree_commit' which is called during 'process_split_commit'. This allows the split/rejoin processing to still function as expected but removes all of the unnecessary processing that takes place currently which greatly inflates the processing time. Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> --- subtree: fix split processing with multiple subtrees present When there are multiple subtrees in a repo and git subtree split --rejoin is being used for the subtrees, the processing of commits for a new split can take a significant (and constantly growing) amount of time because the split commits from other subtrees cause the processing to have to scan the entire history of the other subtree(s). This patch filters out the other subtree split commits that are unnecessary for the split commit processing. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1587 contrib/subtree/git-subtree.sh | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index e0c5d3b0de6..e9250dfb019 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -778,12 +778,29 @@ ensure_valid_ref_format () { die "fatal: '$1' does not look like a ref" } +# Usage: check if a commit from another subtree should be ignored from processing for splits +should_ignore_subtree_commit () { + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] + then + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] + then + return 0 + fi + fi + return 1 +} + # Usage: process_split_commit REV PARENTS process_split_commit () { assert test $# = 2 local rev="$1" local parents="$2" + if should_ignore_subtree_commit $rev + then + return + fi + if test $indent -eq 0 then revcount=$(($revcount + 1)) base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518 -- gitgitgadget ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH] subtree: fix split processing with multiple subtrees present 2023-09-18 20:05 [PATCH] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget @ 2023-09-18 23:31 ` Junio C Hamano 2023-09-19 1:04 ` Junio C Hamano 2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget 2 siblings, 0 replies; 31+ messages in thread From: Junio C Hamano @ 2023-09-18 23:31 UTC (permalink / raw) To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore "Zach FettersMoore via GitGitGadget" <gitgitgadget@gmail.com> writes: > contrib/subtree/git-subtree.sh | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh > index e0c5d3b0de6..e9250dfb019 100755 > --- a/contrib/subtree/git-subtree.sh > +++ b/contrib/subtree/git-subtree.sh > @@ -778,12 +778,29 @@ ensure_valid_ref_format () { > die "fatal: '$1' does not look like a ref" > } > > +# Usage: check if a commit from another subtree should be ignored from processing for splits > +should_ignore_subtree_commit () { > + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] > + then > + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] > + then > + return 0 > + fi > + fi > + return 1 > +} > > # Usage: process_split_commit REV PARENTS > process_split_commit () { > assert test $# = 2 > local rev="$1" > local parents="$2" > > + if should_ignore_subtree_commit $rev > + then > + return > + fi > + Please do not violate Documentation/CodingGuidelines for our shell scripted Porcelain, even if it is a script in contrib/ and also please avoid bash-isms. Also doesn't "subtree" have its own test? If this change is a fix for some problem(s), can we have a test or two that demonstrate how the current code without the patch is broken? Thanks. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] subtree: fix split processing with multiple subtrees present 2023-09-18 20:05 [PATCH] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 2023-09-18 23:31 ` Junio C Hamano @ 2023-09-19 1:04 ` Junio C Hamano 2023-10-26 19:59 ` Zach FettersMoore 2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget 2 siblings, 1 reply; 31+ messages in thread From: Junio C Hamano @ 2023-09-19 1:04 UTC (permalink / raw) To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore "Zach FettersMoore via GitGitGadget" <gitgitgadget@gmail.com> writes: > In the diagram below, 'M' represents the mainline repo branch, 'A' > represents one subtree, and 'B' represents another. M3 and B1 represent > a split commit for subtree B that was created from commit M4. M2 and A1 > represent a split commit made from subtree A that was also created > based on changes back to and including M4. M1 represents new changes to > the repo, in this scenario if you try to run a 'git subtree split > --rejoin' for subtree B, commits M1, M2, and A1, will be included in > the processing of changes for the new split commit since the last > split/rejoin for subtree B was at M3. The issue is that by having A1 > included in this processing the command ends up needing to processing > every commit down tree A even though none of that is needed or relevant > to the current command and result. > > M1 > | \ \ > M2 | | > | A1 | > M3 | | > | | B1 > M4 | | The above paragraph explains which different things you drew in the diagram are representing, but it is not clear how they relate to each other. Do they for example depict parent-child commit relationship? What are the wide gaps between these three tracks and what are the short angled lines leaning to the left near the tip? Is the time/topology flowing from bottom to top? > diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh > index e0c5d3b0de6..e9250dfb019 100755 > --- a/contrib/subtree/git-subtree.sh > +++ b/contrib/subtree/git-subtree.sh > @@ -778,12 +778,29 @@ ensure_valid_ref_format () { > die "fatal: '$1' does not look like a ref" > } > > +# Usage: check if a commit from another subtree should be ignored from processing for splits Way overlong line. Please split them accordingly. I won't comment on what CodingGuidelines tells us already, in this review, but have a few comments here: > +should_ignore_subtree_commit () { > + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] > + then > + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] Here $dir is a free variable that comes from outside. The caller does not supply it as a parameter to this function (and the caller does not receive it as its parameter from its caller). Yet the file as a whole seems to liberally make assignments to it ("git grep dir=" on the file counts 7 assignments). Are we sure we are looking for the right $dir in this particular grep? Side note: I am not familiar with this part of the code at all, so do not take it as "here is a bug", but more as "this smells error prone." Also can $dir have regular expressions special characters? "The existing code and new code alike, git-subtree is not prepared to handle directory names with RE special characters well at all, so do not use them if you do not want your history broken" is an acceptable answer. The caller of this function process_split_commit is cmd_split and process_split_commit (hence this function) is called repeatedly inside a loop. This function makes a traversal over the entire history for each and every iteration in "good" cases where there is no 'mainline' or 'subtree-dir' commits for the given $dir. I wonder if it is more efficient to enumerate all commits that hits these grep criteria in the cmd_split before it starts to call process_split_commit repeatedly. If it knows which commit can be ignored beforehand, it can skip and not call process_split_commit, no? > + then > + return 0 > + fi > + fi > + return 1 > +} > + > # Usage: process_split_commit REV PARENTS > process_split_commit () { > assert test $# = 2 > local rev="$1" > local parents="$2" These seem to assume that $1 and $2 can have $IFS in them, so shouldn't ... > + if should_ignore_subtree_commit $rev ... this call too enclose $rev inside a pair of double-quotes for consistency? We know the loop in the cmd_split that calls this function is reading from "rev-list --parents" and $rev is a 40-hex commit object name (and $parents can have more than one 40-hex commit object names separated with SP), so it is safe to leave $rev unquoted, but it pays to be consistent to help make the code more readable. > + then > + return > + fi > + > if test $indent -eq 0 > then > revcount=$(($revcount + 1)) > > base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518 ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH] subtree: fix split processing with multiple subtrees present 2023-09-19 1:04 ` Junio C Hamano @ 2023-10-26 19:59 ` Zach FettersMoore 0 siblings, 0 replies; 31+ messages in thread From: Zach FettersMoore @ 2023-10-26 19:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Zach FettersMoore via GitGitGadget, git > Please do not violate Documentation/CodingGuidelines for our shell > scripted Porcelain, even if it is a script in contrib/ and also >please avoid bash-isms. I believe I have resolved the CodingGuidelines issues. > Also doesn't "subtree" have its own test? If this change is a fix > for some problem(s), can we have a test or two that demonstrate how > the current code without the patch is broken? I was able to add a test that validates against some of the metrics that are tracked when running a split for processing commits. Validated that before my fix the test fails, and after my fix the test passes. >> In the diagram below, 'M' represents the mainline repo branch, 'A' >> represents one subtree, and 'B' represents another. M3 and B1 represent >> a split commit for subtree B that was created from commit M4. M2 and A1 >> represent a split commit made from subtree A that was also created >> based on changes back to and including M4. M1 represents new changes to >> the repo, in this scenario if you try to run a 'git subtree split >> --rejoin' for subtree B, commits M1, M2, and A1, will be included in >> the processing of changes for the new split commit since the last >> split/rejoin for subtree B was at M3. The issue is that by having A1 >> included in this processing the command ends up needing to processing >> every commit down tree A even though none of that is needed or relevant >> to the current command and result. >> >> M1 >> | \ \ >> M2 | | >> | A1 | >> M3 | | >> | | B1 >> M4 | | > The above paragraph explains which different things you drew in the > diagram are representing, but it is not clear how they relate to > each other. Do they for example depict parent-child commit > relationship? What are the wide gaps between these three tracks and > what are the short angled lines leaning to the left near the tip? > Is the time/topology flowing from bottom to top? I am realizing I made a few mistakes with trying to illustrate the diagram which I will attempt to make more clear below. As for the 3 columns in the diagram, 'M' represents the mainline branch of the repo being developed in, while column 'A' represents the history of a subtree 'A' included in the repo, and column 'B' also represents the history of a subtree 'B' in the repo. The diagram attempts to illustrate when a 'git subtree split --rejoin' is used, that there is a commit made in the subtrees history, and that is then merged into the mainline repo branch. M1 | | M2 --- | | A1 | | M3 ---------- | | | B1 M4 | | Hopefully that helps better illustrate the state of the repo before the new 'git subtree split --rejoin' attempt and why it results in the described issue. >> +should_ignore_subtree_commit () { >> + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] >> + then >> + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] > > Here $dir is a free variable that comes from outside. The caller > does not supply it as a parameter to this function (and the caller > does not receive it as its parameter from its caller). Yet the file > as a whole seems to liberally make assignments to it ("git grep dir=" > on the file counts 7 assignments). Are we sure we are looking for > the right $dir in this particular grep? > > Side note: I am not familiar with this part of the code at > all, so do not take it as "here is a bug", but more as "this > smells error prone." From my testing and what I see for '$dir' usage in the 'cmd_split' function which leads to this code it is the correct '$dir', although I see your point about it being reassigned in different places which makes it error prone. I switched this to use the command line argument '$arg_prefix' since the subtree prefix passed into the command is what we actually want in this case so we can filter out commits from other subtrees. > Also can $dir have regular expressions special characters? "The > existing code and new code alike, git-subtree is not prepared to > handle directory names with RE special characters well at all, so > do not use them if you do not want your history broken" is an > acceptable answer. As far as I can tell from looking at the code (which I only recently started using) the '$dir' which is based on the subtree prefix is not setup to handle this. > The caller of this function process_split_commit is cmd_split and > process_split_commit (hence this function) is called repeatedly > inside a loop. This function makes a traversal over the entire > history for each and every iteration in "good" cases where there is > no 'mainline' or 'subtree-dir' commits for the given $dir. > > I wonder if it is more efficient to enumerate all commits that hits > these grep criteria in the cmd_split before it starts to call > process_split_commit repeatedly. If it knows which commit can be > ignored beforehand, it can skip and not call process_split_commit, > no? Moved this functionality into the 'cmd_split' function as suggested. >> + then >> + return 0 >> + fi >> + fi >> + return 1 >> +} >> + >> # Usage: process_split_commit REV PARENTS >> process_split_commit () { >> assert test $# = 2 >> local rev="$1" >> local parents="$2" > > These seem to assume that $1 and $2 can have $IFS in them, so > shouldn't ... > >> + if should_ignore_subtree_commit $rev > > ... this call too enclose $rev inside a pair of double-quotes for > consistency? We know the loop in the cmd_split that calls this > function is reading from "rev-list --parents" and $rev is a 40-hex > commit object name (and $parents can have more than one 40-hex > commit object names separated with SP), so it is safe to leave $rev > unquoted, but it pays to be consistent to help make the code more > readable. Updated this for consistency On Mon, Sep 18, 2023 at 9:04 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Zach FettersMoore via GitGitGadget" <gitgitgadget@gmail.com> > writes: > > > In the diagram below, 'M' represents the mainline repo branch, 'A' > > represents one subtree, and 'B' represents another. M3 and B1 represent > > a split commit for subtree B that was created from commit M4. M2 and A1 > > represent a split commit made from subtree A that was also created > > based on changes back to and including M4. M1 represents new changes to > > the repo, in this scenario if you try to run a 'git subtree split > > --rejoin' for subtree B, commits M1, M2, and A1, will be included in > > the processing of changes for the new split commit since the last > > split/rejoin for subtree B was at M3. The issue is that by having A1 > > included in this processing the command ends up needing to processing > > every commit down tree A even though none of that is needed or relevant > > to the current command and result. > > > > M1 > > | \ \ > > M2 | | > > | A1 | > > M3 | | > > | | B1 > > M4 | | > > The above paragraph explains which different things you drew in the > diagram are representing, but it is not clear how they relate to > each other. Do they for example depict parent-child commit > relationship? What are the wide gaps between these three tracks and > what are the short angled lines leaning to the left near the tip? > Is the time/topology flowing from bottom to top? > > > diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh > > index e0c5d3b0de6..e9250dfb019 100755 > > --- a/contrib/subtree/git-subtree.sh > > +++ b/contrib/subtree/git-subtree.sh > > @@ -778,12 +778,29 @@ ensure_valid_ref_format () { > > die "fatal: '$1' does not look like a ref" > > } > > > > +# Usage: check if a commit from another subtree should be ignored from processing for splits > > Way overlong line. Please split them accordingly. I won't comment > on what CodingGuidelines tells us already, in this review, but have > a few comments here: > > > +should_ignore_subtree_commit () { > > + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] > > + then > > + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] > > Here $dir is a free variable that comes from outside. The caller > does not supply it as a parameter to this function (and the caller > does not receive it as its parameter from its caller). Yet the file > as a whole seems to liberally make assignments to it ("git grep dir=" > on the file counts 7 assignments). Are we sure we are looking for > the right $dir in this particular grep? > > Side note: I am not familiar with this part of the code at > all, so do not take it as "here is a bug", but more as "this > smells error prone." > > Also can $dir have regular expressions special characters? "The > existing code and new code alike, git-subtree is not prepared to > handle directory names with RE special characters well at all, so > do not use them if you do not want your history broken" is an > acceptable answer. > > The caller of this function process_split_commit is cmd_split and > process_split_commit (hence this function) is called repeatedly > inside a loop. This function makes a traversal over the entire > history for each and every iteration in "good" cases where there is > no 'mainline' or 'subtree-dir' commits for the given $dir. > > I wonder if it is more efficient to enumerate all commits that hits > these grep criteria in the cmd_split before it starts to call > process_split_commit repeatedly. If it knows which commit can be > ignored beforehand, it can skip and not call process_split_commit, > no? > > > + then > > + return 0 > > + fi > > + fi > > + return 1 > > +} > > + > > # Usage: process_split_commit REV PARENTS > > process_split_commit () { > > assert test $# = 2 > > local rev="$1" > > local parents="$2" > > These seem to assume that $1 and $2 can have $IFS in them, so > shouldn't ... > > > + if should_ignore_subtree_commit $rev > > ... this call too enclose $rev inside a pair of double-quotes for > consistency? We know the loop in the cmd_split that calls this > function is reading from "rev-list --parents" and $rev is a 40-hex > commit object name (and $parents can have more than one 40-hex > commit object names separated with SP), so it is safe to leave $rev > unquoted, but it pays to be consistent to help make the code more > readable. > > > + then > > + return > > + fi > > + > > if test $indent -eq 0 > > then > > revcount=$(($revcount + 1)) > > > > base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518 ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 0/2] subtree: fix split processing with multiple subtrees present 2023-09-18 20:05 [PATCH] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 2023-09-18 23:31 ` Junio C Hamano 2023-09-19 1:04 ` Junio C Hamano @ 2023-09-22 16:25 ` Zach FettersMoore via GitGitGadget 2023-09-22 16:25 ` [PATCH v2 1/2] " Zach FettersMoore via GitGitGadget ` (2 more replies) 2 siblings, 3 replies; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-09-22 16:25 UTC (permalink / raw) To: git; +Cc: Zach FettersMoore When there are multiple subtrees in a repo and git subtree split --rejoin is being used for the subtrees, the processing of commits for a new split can take a significant (and constantly growing) amount of time because the split commits from other subtrees cause the processing to have to scan the entire history of the other subtree(s). This patch filters out the other subtree split commits that are unnecessary for the split commit processing. Zach FettersMoore (2): subtree: fix split processing with multiple subtrees present subtree: changing location of commit ignore processing contrib/subtree/git-subtree.sh | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/1587 Range-diff vs v1: 1: 43175154a82 = 1: 43175154a82 subtree: fix split processing with multiple subtrees present -: ----------- > 2: d6811daf7cf subtree: changing location of commit ignore processing -- gitgitgadget ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 1/2] subtree: fix split processing with multiple subtrees present 2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget @ 2023-09-22 16:25 ` Zach FettersMoore via GitGitGadget 2023-09-22 16:25 ` [PATCH v2 2/2] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget 2023-09-29 20:32 ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 2 siblings, 0 replies; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-09-22 16:25 UTC (permalink / raw) To: git; +Cc: Zach FettersMoore, Zach FettersMoore From: Zach FettersMoore <zach.fetters@apollographql.com> When there are multiple subtrees present in a repository and they are all using 'git subtree split', the 'split' command can take a significant (and constantly growing) amount of time to run even when using the '--rejoin' flag. This is due to the fact that when processing commits to determine the last known split to start from when looking for changes, if there has been a split/merge done from another subtree there will be 2 split commits, one mainline and one subtree, for the second subtree that are part of the processing. The non-mainline subtree split commit will cause the processing to always need to search the entire history of the given subtree as part of its processing even though those commits are totally irrelevant to the current subtree split being run. In the diagram below, 'M' represents the mainline repo branch, 'A' represents one subtree, and 'B' represents another. M3 and B1 represent a split commit for subtree B that was created from commit M4. M2 and A1 represent a split commit made from subtree A that was also created based on changes back to and including M4. M1 represents new changes to the repo, in this scenario if you try to run a 'git subtree split --rejoin' for subtree B, commits M1, M2, and A1, will be included in the processing of changes for the new split commit since the last split/rejoin for subtree B was at M3. The issue is that by having A1 included in this processing the command ends up needing to processing every commit down tree A even though none of that is needed or relevant to the current command and result. M1 | \ \ M2 | | | A1 | M3 | | | | B1 M4 | | So this commit makes a change to the processing of commits for the split command in order to ignore non-mainline commits from other subtrees such as A1 in the diagram by adding a new function 'should_ignore_subtree_commit' which is called during 'process_split_commit'. This allows the split/rejoin processing to still function as expected but removes all of the unnecessary processing that takes place currently which greatly inflates the processing time. Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> --- contrib/subtree/git-subtree.sh | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index e0c5d3b0de6..e9250dfb019 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -778,12 +778,29 @@ ensure_valid_ref_format () { die "fatal: '$1' does not look like a ref" } +# Usage: check if a commit from another subtree should be ignored from processing for splits +should_ignore_subtree_commit () { + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] + then + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] + then + return 0 + fi + fi + return 1 +} + # Usage: process_split_commit REV PARENTS process_split_commit () { assert test $# = 2 local rev="$1" local parents="$2" + if should_ignore_subtree_commit $rev + then + return + fi + if test $indent -eq 0 then revcount=$(($revcount + 1)) -- gitgitgadget ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 2/2] subtree: changing location of commit ignore processing 2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget 2023-09-22 16:25 ` [PATCH v2 1/2] " Zach FettersMoore via GitGitGadget @ 2023-09-22 16:25 ` Zach FettersMoore via GitGitGadget 2023-09-29 20:32 ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 2 siblings, 0 replies; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-09-22 16:25 UTC (permalink / raw) To: git; +Cc: Zach FettersMoore, Zach FettersMoore From: Zach FettersMoore <zach.fetters@apollographql.com> Based on feedback from original commit: -Updated the location of check whether a commit should be ignored during split processing -Updated code to better fit coding guidelines Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> --- contrib/subtree/git-subtree.sh | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index e9250dfb019..e69991a9d80 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -778,11 +778,13 @@ ensure_valid_ref_format () { die "fatal: '$1' does not look like a ref" } -# Usage: check if a commit from another subtree should be ignored from processing for splits -should_ignore_subtree_commit () { - if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] +# Usage: check if a commit from another subtree should be +# ignored from processing for splits +should_ignore_subtree_split_commit () { + if test -n "$(git log -1 --grep="git-subtree-dir:" $1)" then - if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] + if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && + test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)" then return 0 fi @@ -796,11 +798,6 @@ process_split_commit () { local rev="$1" local parents="$2" - if should_ignore_subtree_commit $rev - then - return - fi - if test $indent -eq 0 then revcount=$(($revcount + 1)) @@ -980,7 +977,20 @@ cmd_split () { eval "$grl" | while read rev parents do - process_split_commit "$rev" "$parents" + if should_ignore_subtree_split_commit "$rev" + then + continue + fi + parsedParents='' + for parent in $parents + do + should_ignore_subtree_split_commit "$parent" + if test $? -eq 1 + then + parsedParents+="$parent " + fi + done + process_split_commit "$rev" "$parsedParents" done || exit $? latest_new=$(cache_get latest_new) || exit $? -- gitgitgadget ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present 2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget 2023-09-22 16:25 ` [PATCH v2 1/2] " Zach FettersMoore via GitGitGadget 2023-09-22 16:25 ` [PATCH v2 2/2] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget @ 2023-09-29 20:32 ` Zach FettersMoore via GitGitGadget 2023-09-29 20:32 ` [PATCH v3 1/3] " Zach FettersMoore via GitGitGadget ` (3 more replies) 2 siblings, 4 replies; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-09-29 20:32 UTC (permalink / raw) To: git; +Cc: Zach FettersMoore When there are multiple subtrees in a repo and git subtree split --rejoin is being used for the subtrees, the processing of commits for a new split can take a significant (and constantly growing) amount of time because the split commits from other subtrees cause the processing to have to scan the entire history of the other subtree(s). This patch filters out the other subtree split commits that are unnecessary for the split commit processing. Zach FettersMoore (3): subtree: fix split processing with multiple subtrees present subtree: changing location of commit ignore processing subtree: adding test to validate fix contrib/subtree/git-subtree.sh | 29 ++++++++++++++++++++- contrib/subtree/t/t7900-subtree.sh | 41 ++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+), 1 deletion(-) base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/1587 Range-diff vs v2: 1: 43175154a82 = 1: 43175154a82 subtree: fix split processing with multiple subtrees present 2: d6811daf7cf = 2: d6811daf7cf subtree: changing location of commit ignore processing -: ----------- > 3: eff8bfcc042 subtree: adding test to validate fix -- gitgitgadget ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v3 1/3] subtree: fix split processing with multiple subtrees present 2023-09-29 20:32 ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget @ 2023-09-29 20:32 ` Zach FettersMoore via GitGitGadget 2023-09-29 20:32 ` [PATCH v3 2/3] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget ` (2 subsequent siblings) 3 siblings, 0 replies; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-09-29 20:32 UTC (permalink / raw) To: git; +Cc: Zach FettersMoore, Zach FettersMoore From: Zach FettersMoore <zach.fetters@apollographql.com> When there are multiple subtrees present in a repository and they are all using 'git subtree split', the 'split' command can take a significant (and constantly growing) amount of time to run even when using the '--rejoin' flag. This is due to the fact that when processing commits to determine the last known split to start from when looking for changes, if there has been a split/merge done from another subtree there will be 2 split commits, one mainline and one subtree, for the second subtree that are part of the processing. The non-mainline subtree split commit will cause the processing to always need to search the entire history of the given subtree as part of its processing even though those commits are totally irrelevant to the current subtree split being run. In the diagram below, 'M' represents the mainline repo branch, 'A' represents one subtree, and 'B' represents another. M3 and B1 represent a split commit for subtree B that was created from commit M4. M2 and A1 represent a split commit made from subtree A that was also created based on changes back to and including M4. M1 represents new changes to the repo, in this scenario if you try to run a 'git subtree split --rejoin' for subtree B, commits M1, M2, and A1, will be included in the processing of changes for the new split commit since the last split/rejoin for subtree B was at M3. The issue is that by having A1 included in this processing the command ends up needing to processing every commit down tree A even though none of that is needed or relevant to the current command and result. M1 | \ \ M2 | | | A1 | M3 | | | | B1 M4 | | So this commit makes a change to the processing of commits for the split command in order to ignore non-mainline commits from other subtrees such as A1 in the diagram by adding a new function 'should_ignore_subtree_commit' which is called during 'process_split_commit'. This allows the split/rejoin processing to still function as expected but removes all of the unnecessary processing that takes place currently which greatly inflates the processing time. Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> --- contrib/subtree/git-subtree.sh | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index e0c5d3b0de6..e9250dfb019 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -778,12 +778,29 @@ ensure_valid_ref_format () { die "fatal: '$1' does not look like a ref" } +# Usage: check if a commit from another subtree should be ignored from processing for splits +should_ignore_subtree_commit () { + if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] + then + if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] + then + return 0 + fi + fi + return 1 +} + # Usage: process_split_commit REV PARENTS process_split_commit () { assert test $# = 2 local rev="$1" local parents="$2" + if should_ignore_subtree_commit $rev + then + return + fi + if test $indent -eq 0 then revcount=$(($revcount + 1)) -- gitgitgadget ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 2/3] subtree: changing location of commit ignore processing 2023-09-29 20:32 ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 2023-09-29 20:32 ` [PATCH v3 1/3] " Zach FettersMoore via GitGitGadget @ 2023-09-29 20:32 ` Zach FettersMoore via GitGitGadget 2023-09-29 20:33 ` [PATCH v3 3/3] subtree: adding test to validate fix Zach FettersMoore via GitGitGadget 2023-10-26 19:17 ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 3 siblings, 0 replies; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-09-29 20:32 UTC (permalink / raw) To: git; +Cc: Zach FettersMoore, Zach FettersMoore From: Zach FettersMoore <zach.fetters@apollographql.com> Based on feedback from original commit: -Updated the location of check whether a commit should be ignored during split processing -Updated code to better fit coding guidelines Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> --- contrib/subtree/git-subtree.sh | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index e9250dfb019..e69991a9d80 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -778,11 +778,13 @@ ensure_valid_ref_format () { die "fatal: '$1' does not look like a ref" } -# Usage: check if a commit from another subtree should be ignored from processing for splits -should_ignore_subtree_commit () { - if [ "$(git log -1 --grep="git-subtree-dir:" $1)" ] +# Usage: check if a commit from another subtree should be +# ignored from processing for splits +should_ignore_subtree_split_commit () { + if test -n "$(git log -1 --grep="git-subtree-dir:" $1)" then - if [[ -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -z "$(git log -1 --grep="git-subtree-dir: $dir$" $1)" ]] + if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && + test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)" then return 0 fi @@ -796,11 +798,6 @@ process_split_commit () { local rev="$1" local parents="$2" - if should_ignore_subtree_commit $rev - then - return - fi - if test $indent -eq 0 then revcount=$(($revcount + 1)) @@ -980,7 +977,20 @@ cmd_split () { eval "$grl" | while read rev parents do - process_split_commit "$rev" "$parents" + if should_ignore_subtree_split_commit "$rev" + then + continue + fi + parsedParents='' + for parent in $parents + do + should_ignore_subtree_split_commit "$parent" + if test $? -eq 1 + then + parsedParents+="$parent " + fi + done + process_split_commit "$rev" "$parsedParents" done || exit $? latest_new=$(cache_get latest_new) || exit $? -- gitgitgadget ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 3/3] subtree: adding test to validate fix 2023-09-29 20:32 ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 2023-09-29 20:32 ` [PATCH v3 1/3] " Zach FettersMoore via GitGitGadget 2023-09-29 20:32 ` [PATCH v3 2/3] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget @ 2023-09-29 20:33 ` Zach FettersMoore via GitGitGadget 2023-10-17 20:02 ` Zach FettersMoore 2023-10-26 19:17 ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 3 siblings, 1 reply; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-09-29 20:33 UTC (permalink / raw) To: git; +Cc: Zach FettersMoore, Zach FettersMoore From: Zach FettersMoore <zach.fetters@apollographql.com> Adding a test to validate that the proposed fix solves the issue. The test accomplishes this by checking the output of the split command to ensure the output from the progress of 'process_split_commit' function that represents the 'extracount' of commits processed does not increment. This was tested against the original functionality to show the test failed, and then with this fix to show the test passes. This illustrated that when using multiple subtrees, A and B, when doing a split on subtree B, the processing does not traverse the entire history of subtree A which is unnecessary and would cause the 'extracount' of processed commits to climb based on the number of commits in the history of subtree A. Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> --- contrib/subtree/t/t7900-subtree.sh | 41 ++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh index 49a21dd7c9c..57c12e9f924 100755 --- a/contrib/subtree/t/t7900-subtree.sh +++ b/contrib/subtree/t/t7900-subtree.sh @@ -385,6 +385,47 @@ test_expect_success 'split sub dir/ with --rejoin' ' ) ' +test_expect_success 'split with multiple subtrees' ' + subtree_test_create_repo "$test_count" && + subtree_test_create_repo "$test_count/subA" && + subtree_test_create_repo "$test_count/subB" && + test_create_commit "$test_count" main1 && + test_create_commit "$test_count/subA" subA1 && + test_create_commit "$test_count/subA" subA2 && + test_create_commit "$test_count/subA" subA3 && + test_create_commit "$test_count/subB" subB1 && + ( + cd "$test_count" && + git fetch ./subA HEAD && + git subtree add --prefix=subADir FETCH_HEAD + ) && + ( + cd "$test_count" && + git fetch ./subB HEAD && + git subtree add --prefix=subBDir FETCH_HEAD + ) && + test_create_commit "$test_count" subADir/main-subA1 && + test_create_commit "$test_count" subBDir/main-subB1 && + ( + cd "$test_count" && + git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1" + ) && + ( + cd "$test_count" && + git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1" + ) && + test_create_commit "$test_count" subADir/main-subA2 && + test_create_commit "$test_count" subBDir/main-subB2 && + ( + cd "$test_count" && + git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2" + ) && + ( + cd "$test_count" && + test "$(git subtree split --prefix=subBDir --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" + ) +' + test_expect_success 'split sub dir/ with --rejoin from scratch' ' subtree_test_create_repo "$test_count" && test_create_commit "$test_count" main1 && -- gitgitgadget ^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 3/3] subtree: adding test to validate fix 2023-09-29 20:33 ` [PATCH v3 3/3] subtree: adding test to validate fix Zach FettersMoore via GitGitGadget @ 2023-10-17 20:02 ` Zach FettersMoore 0 siblings, 0 replies; 31+ messages in thread From: Zach FettersMoore @ 2023-10-17 20:02 UTC (permalink / raw) To: gitgitgadget, git; +Cc: zach.fetters From: "Zach FettersMoore via GitGitGadget" <gitgitgadget@gmail.com> > From: Zach FettersMoore <zach.fetters@apollographql.com> > > Adding a test to validate that the proposed fix > solves the issue. > > The test accomplishes this by checking the output > of the split command to ensure the output from > the progress of 'process_split_commit' function > that represents the 'extracount' of commits > processed does not increment. > > This was tested against the original functionality > to show the test failed, and then with this fix > to show the test passes. > > This illustrated that when using multiple subtrees, > A and B, when doing a split on subtree B, the > processing does not traverse the entire history > of subtree A which is unnecessary and would cause > the 'extracount' of processed commits to climb > based on the number of commits in the history of > subtree A. > > Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> > --- > contrib/subtree/t/t7900-subtree.sh | 41 ++++++++++++++++++++++++++++++ > 1 file changed, 41 insertions(+) > > diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh > index 49a21dd7c9c..57c12e9f924 100755 > --- a/contrib/subtree/t/t7900-subtree.sh > +++ b/contrib/subtree/t/t7900-subtree.sh > @@ -385,6 +385,47 @@ test_expect_success 'split sub dir/ with --rejoin' ' > ) > ' > > +test_expect_success 'split with multiple subtrees' ' > + subtree_test_create_repo "$test_count" && > + subtree_test_create_repo "$test_count/subA" && > + subtree_test_create_repo "$test_count/subB" && > + test_create_commit "$test_count" main1 && > + test_create_commit "$test_count/subA" subA1 && > + test_create_commit "$test_count/subA" subA2 && > + test_create_commit "$test_count/subA" subA3 && > + test_create_commit "$test_count/subB" subB1 && > + ( > + cd "$test_count" && > + git fetch ./subA HEAD && > + git subtree add --prefix=subADir FETCH_HEAD > + ) && > + ( > + cd "$test_count" && > + git fetch ./subB HEAD && > + git subtree add --prefix=subBDir FETCH_HEAD > + ) && > + test_create_commit "$test_count" subADir/main-subA1 && > + test_create_commit "$test_count" subBDir/main-subB1 && > + ( > + cd "$test_count" && > + git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1" > + ) && > + ( > + cd "$test_count" && > + git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1" > + ) && > + test_create_commit "$test_count" subADir/main-subA2 && > + test_create_commit "$test_count" subBDir/main-subB2 && > + ( > + cd "$test_count" && > + git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2" > + ) && > + ( > + cd "$test_count" && > + test "$(git subtree split --prefix=subBDir --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" > + ) > +' > + > test_expect_success 'split sub dir/ with --rejoin from scratch' ' > subtree_test_create_repo "$test_count" && > test_create_commit "$test_count" main1 && > -- Checking to see if there is something else I need to do with this to get it across the finish line, wasn't sure if something else was needed since it's been dormant since my last push a few weeks ago. Thanks ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v4] subtree: fix split processing with multiple subtrees present 2023-09-29 20:32 ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget ` (2 preceding siblings ...) 2023-09-29 20:33 ` [PATCH v3 3/3] subtree: adding test to validate fix Zach FettersMoore via GitGitGadget @ 2023-10-26 19:17 ` Zach FettersMoore via GitGitGadget 2023-11-18 11:28 ` Christian Couder 2023-11-28 21:17 ` [PATCH v5] " Zach FettersMoore via GitGitGadget 3 siblings, 2 replies; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-10-26 19:17 UTC (permalink / raw) To: git; +Cc: Zach FettersMoore, Zach FettersMoore From: Zach FettersMoore <zach.fetters@apollographql.com> When there are multiple subtrees present in a repository and they are all using 'git subtree split', the 'split' command can take a significant (and constantly growing) amount of time to run even when using the '--rejoin' flag. This is due to the fact that when processing commits to determine the last known split to start from when looking for changes, if there has been a split/merge done from another subtree there will be 2 split commits, one mainline and one subtree, for the second subtree that are part of the processing. The non-mainline subtree split commit will cause the processing to always need to search the entire history of the given subtree as part of its processing even though those commits are totally irrelevant to the current subtree split being run. In the diagram below, 'M' represents the mainline repo branch, 'A' represents one subtree, and 'B' represents another. M3 and B1 represent a split commit for subtree B that was created from commit M4. M2 and A1 represent a split commit made from subtree A that was also created based on changes back to and including M4. M1 represents new changes to the repo, in this scenario if you try to run a 'git subtree split --rejoin' for subtree B, commits M1, M2, and A1, will be included in the processing of changes for the new split commit since the last split/rejoin for subtree B was at M3. The issue is that by having A1 included in this processing the command ends up needing to processing every commit down tree A even though none of that is needed or relevant to the current command and result. M1 | \ \ M2 | | | A1 | M3 | | | | B1 M4 | | So this commit makes a change to the processing of commits for the split command in order to ignore non-mainline commits from other subtrees such as A1 in the diagram by adding a new function 'should_ignore_subtree_commit' which is called during 'process_split_commit'. This allows the split/rejoin processing to still function as expected but removes all of the unnecessary processing that takes place currently which greatly inflates the processing time. Added a test to validate that the proposed fix solves the issue. The test accomplishes this by checking the output of the split command to ensure the output from the progress of 'process_split_commit' function that represents the 'extracount' of commits processed does not increment. This was tested against the original functionality to show the test failed, and then with this fix to show the test passes. This illustrated that when using multiple subtrees, A and B, when doing a split on subtree B, the processing does not traverse the entire history of subtree A which is unnecessary and would cause the 'extracount' of processed commits to climb based on the number of commits in the history of subtree A. Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> --- subtree: fix split processing with multiple subtrees present When there are multiple subtrees in a repo and git subtree split --rejoin is being used for the subtrees, the processing of commits for a new split can take a significant (and constantly growing) amount of time because the split commits from other subtrees cause the processing to have to scan the entire history of the other subtree(s). This patch filters out the other subtree split commits that are unnecessary for the split commit processing. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v4 Pull-Request: https://github.com/gitgitgadget/git/pull/1587 Range-diff vs v3: 1: 43175154a82 < -: ----------- subtree: fix split processing with multiple subtrees present 2: d6811daf7cf < -: ----------- subtree: changing location of commit ignore processing 3: eff8bfcc042 ! 1: 353152910eb subtree: adding test to validate fix @@ Metadata Author: Zach FettersMoore <zach.fetters@apollographql.com> ## Commit message ## - subtree: adding test to validate fix + subtree: fix split processing with multiple subtrees present - Adding a test to validate that the proposed fix + When there are multiple subtrees present in a repository and they are + all using 'git subtree split', the 'split' command can take a + significant (and constantly growing) amount of time to run even when + using the '--rejoin' flag. This is due to the fact that when processing + commits to determine the last known split to start from when looking + for changes, if there has been a split/merge done from another subtree + there will be 2 split commits, one mainline and one subtree, for the + second subtree that are part of the processing. The non-mainline + subtree split commit will cause the processing to always need to search + the entire history of the given subtree as part of its processing even + though those commits are totally irrelevant to the current subtree + split being run. + + In the diagram below, 'M' represents the mainline repo branch, 'A' + represents one subtree, and 'B' represents another. M3 and B1 represent + a split commit for subtree B that was created from commit M4. M2 and A1 + represent a split commit made from subtree A that was also created + based on changes back to and including M4. M1 represents new changes to + the repo, in this scenario if you try to run a 'git subtree split + --rejoin' for subtree B, commits M1, M2, and A1, will be included in + the processing of changes for the new split commit since the last + split/rejoin for subtree B was at M3. The issue is that by having A1 + included in this processing the command ends up needing to processing + every commit down tree A even though none of that is needed or relevant + to the current command and result. + + M1 + | \ \ + M2 | | + | A1 | + M3 | | + | | B1 + M4 | | + + So this commit makes a change to the processing of commits for the split + command in order to ignore non-mainline commits from other subtrees such + as A1 in the diagram by adding a new function + 'should_ignore_subtree_commit' which is called during + 'process_split_commit'. This allows the split/rejoin processing to still + function as expected but removes all of the unnecessary processing that + takes place currently which greatly inflates the processing time. + + Added a test to validate that the proposed fix solves the issue. The test accomplishes this by checking the output @@ Commit message Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> + ## contrib/subtree/git-subtree.sh ## +@@ contrib/subtree/git-subtree.sh: ensure_valid_ref_format () { + die "fatal: '$1' does not look like a ref" + } + ++# Usage: check if a commit from another subtree should be ++# ignored from processing for splits ++should_ignore_subtree_split_commit () { ++ if test -n "$(git log -1 --grep="git-subtree-dir:" $1)" ++ then ++ if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && ++ test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)" ++ then ++ return 0 ++ fi ++ fi ++ return 1 ++} ++ + # Usage: process_split_commit REV PARENTS + process_split_commit () { + assert test $# = 2 +@@ contrib/subtree/git-subtree.sh: cmd_split () { + eval "$grl" | + while read rev parents + do +- process_split_commit "$rev" "$parents" ++ if should_ignore_subtree_split_commit "$rev" ++ then ++ continue ++ fi ++ parsedParents='' ++ for parent in $parents ++ do ++ should_ignore_subtree_split_commit "$parent" ++ if test $? -eq 1 ++ then ++ parsedParents+="$parent " ++ fi ++ done ++ process_split_commit "$rev" "$parsedParents" + done || exit $? + + latest_new=$(cache_get latest_new) || exit $? + ## contrib/subtree/t/t7900-subtree.sh ## @@ contrib/subtree/t/t7900-subtree.sh: test_expect_success 'split sub dir/ with --rejoin' ' ) @@ contrib/subtree/t/t7900-subtree.sh: test_expect_success 'split sub dir/ with --r + ) && + ( + cd "$test_count" && -+ test "$(git subtree split --prefix=subBDir --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" ++ test "$(git subtree split --prefix=subBDir --squash --rejoin \ ++ -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" + ) +' + contrib/subtree/git-subtree.sh | 29 ++++++++++++++++++++- contrib/subtree/t/t7900-subtree.sh | 42 ++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+), 1 deletion(-) diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index e0c5d3b0de6..e69991a9d80 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -778,6 +778,20 @@ ensure_valid_ref_format () { die "fatal: '$1' does not look like a ref" } +# Usage: check if a commit from another subtree should be +# ignored from processing for splits +should_ignore_subtree_split_commit () { + if test -n "$(git log -1 --grep="git-subtree-dir:" $1)" + then + if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && + test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)" + then + return 0 + fi + fi + return 1 +} + # Usage: process_split_commit REV PARENTS process_split_commit () { assert test $# = 2 @@ -963,7 +977,20 @@ cmd_split () { eval "$grl" | while read rev parents do - process_split_commit "$rev" "$parents" + if should_ignore_subtree_split_commit "$rev" + then + continue + fi + parsedParents='' + for parent in $parents + do + should_ignore_subtree_split_commit "$parent" + if test $? -eq 1 + then + parsedParents+="$parent " + fi + done + process_split_commit "$rev" "$parsedParents" done || exit $? latest_new=$(cache_get latest_new) || exit $? diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh index 49a21dd7c9c..87d59afd761 100755 --- a/contrib/subtree/t/t7900-subtree.sh +++ b/contrib/subtree/t/t7900-subtree.sh @@ -385,6 +385,48 @@ test_expect_success 'split sub dir/ with --rejoin' ' ) ' +test_expect_success 'split with multiple subtrees' ' + subtree_test_create_repo "$test_count" && + subtree_test_create_repo "$test_count/subA" && + subtree_test_create_repo "$test_count/subB" && + test_create_commit "$test_count" main1 && + test_create_commit "$test_count/subA" subA1 && + test_create_commit "$test_count/subA" subA2 && + test_create_commit "$test_count/subA" subA3 && + test_create_commit "$test_count/subB" subB1 && + ( + cd "$test_count" && + git fetch ./subA HEAD && + git subtree add --prefix=subADir FETCH_HEAD + ) && + ( + cd "$test_count" && + git fetch ./subB HEAD && + git subtree add --prefix=subBDir FETCH_HEAD + ) && + test_create_commit "$test_count" subADir/main-subA1 && + test_create_commit "$test_count" subBDir/main-subB1 && + ( + cd "$test_count" && + git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1" + ) && + ( + cd "$test_count" && + git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1" + ) && + test_create_commit "$test_count" subADir/main-subA2 && + test_create_commit "$test_count" subBDir/main-subB2 && + ( + cd "$test_count" && + git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2" + ) && + ( + cd "$test_count" && + test "$(git subtree split --prefix=subBDir --squash --rejoin \ + -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" + ) +' + test_expect_success 'split sub dir/ with --rejoin from scratch' ' subtree_test_create_repo "$test_count" && test_create_commit "$test_count" main1 && base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518 -- gitgitgadget ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v4] subtree: fix split processing with multiple subtrees present 2023-10-26 19:17 ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget @ 2023-11-18 11:28 ` Christian Couder 2023-11-28 21:04 ` Zach FettersMoore 2023-11-28 21:17 ` [PATCH v5] " Zach FettersMoore via GitGitGadget 1 sibling, 1 reply; 31+ messages in thread From: Christian Couder @ 2023-11-18 11:28 UTC (permalink / raw) To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore On Thu, Oct 26, 2023 at 9:18 PM Zach FettersMoore via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Zach FettersMoore <zach.fetters@apollographql.com> > > When there are multiple subtrees present in a repository and they are > all using 'git subtree split', the 'split' command can take a > significant (and constantly growing) amount of time to run even when > using the '--rejoin' flag. This is due to the fact that when processing > commits to determine the last known split to start from when looking > for changes, if there has been a split/merge done from another subtree > there will be 2 split commits, one mainline and one subtree, for the > second subtree that are part of the processing. The non-mainline > subtree split commit will cause the processing to always need to search > the entire history of the given subtree as part of its processing even > though those commits are totally irrelevant to the current subtree > split being run. Thanks for your continued work on this! I am not familiar with git subtree so I might miss obvious things. On the other hand, my comments might help increase a bit the number of people who could review this patch. > In the diagram below, 'M' represents the mainline repo branch, 'A' > represents one subtree, and 'B' represents another. M3 and B1 represent > a split commit for subtree B that was created from commit M4. M2 and A1 > represent a split commit made from subtree A that was also created > based on changes back to and including M4. M1 represents new changes to > the repo, in this scenario if you try to run a 'git subtree split > --rejoin' for subtree B, commits M1, M2, and A1, will be included in > the processing of changes for the new split commit since the last > split/rejoin for subtree B was at M3. The issue is that by having A1 > included in this processing the command ends up needing to processing > every commit down tree A even though none of that is needed or relevant > to the current command and result. > > M1 > | \ \ > M2 | | > | A1 | > M3 | | > | | B1 > M4 | | About the above, Junio already commented the following: -> The above paragraph explains which different things you drew in the -> diagram are representing, but it is not clear how they relate to -> each other. Do they for example depict parent-child commit -> relationship? What are the wide gaps between these three tracks and -> what are the short angled lines leaning to the left near the tip? -> Is the time/topology flowing from bottom to top? and it doesn't look like you have addressed that comment. When you say "M3 and B1 represent a split commit for subtree B that was created from commit M4." I am not sure what it means exactly. Could you give example commands that could have created the M3 and B1 commits? > So this commit makes a change to the processing of commits for the split > command in order to ignore non-mainline commits from other subtrees such > as A1 in the diagram by adding a new function > 'should_ignore_subtree_commit' which is called during > 'process_split_commit'. This allows the split/rejoin processing to still > function as expected but removes all of the unnecessary processing that > takes place currently which greatly inflates the processing time. Could you tell a bit more what kind of processing time reduction is or would be possible on what kind of repo? Have you benchmark-ed or just timed this somehow on one of your repos or better on an open source repo (so that we could reproduce if we wanted)? > Added a test to validate that the proposed fix > solves the issue. > > The test accomplishes this by checking the output > of the split command to ensure the output from > the progress of 'process_split_commit' function > that represents the 'extracount' of commits > processed does not increment. Does not increment compared to what? > This was tested against the original functionality > to show the test failed, and then with this fix > to show the test passes. > > This illustrated that when using multiple subtrees, > A and B, when doing a split on subtree B, the > processing does not traverse the entire history > of subtree A which is unnecessary and would cause > the 'extracount' of processed commits to climb > based on the number of commits in the history of > subtree A. Does this mean that the test checks that the extracount is the same when subtree A exists as when it doesn't exist? [...] > contrib/subtree/git-subtree.sh | 29 ++++++++++++++++++++- > contrib/subtree/t/t7900-subtree.sh | 42 ++++++++++++++++++++++++++++++ > 2 files changed, 70 insertions(+), 1 deletion(-) > > diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh > index e0c5d3b0de6..e69991a9d80 100755 > --- a/contrib/subtree/git-subtree.sh > +++ b/contrib/subtree/git-subtree.sh > @@ -778,6 +778,20 @@ ensure_valid_ref_format () { > die "fatal: '$1' does not look like a ref" > } > > +# Usage: check if a commit from another subtree should be > +# ignored from processing for splits > +should_ignore_subtree_split_commit () { Maybe adding: assert test $# = 1 local rev="$1" here, and using $rev instead of $1 in this function could make things a bit clearer and similar to what is done in other functions. > + if test -n "$(git log -1 --grep="git-subtree-dir:" $1)" > + then > + if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && > + test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)" > + then > + return 0 > + fi > + fi > + return 1 > +} The above doesn't seem to be properly indented. We use tabs not spaces. > # Usage: process_split_commit REV PARENTS > process_split_commit () { > assert test $# = 2 > @@ -963,7 +977,20 @@ cmd_split () { > eval "$grl" | > while read rev parents > do > - process_split_commit "$rev" "$parents" > + if should_ignore_subtree_split_commit "$rev" > + then > + continue > + fi > + parsedParents='' It seems to me that we name variables "parsed_parents" (or sometimes "parsedparents") rather than "parsedParents". > + for parent in $parents > + do > + should_ignore_subtree_split_commit "$parent" > + if test $? -eq 1 I think the 2 lines above could be replaced by: + if ! should_ignore_subtree_split_commit "$parent" > + then > + parsedParents+="$parent " It doesn't seem to me that we use "+=" much in our shell scripts. https://www.shellcheck.net/ emits the following: (warning): In POSIX sh, += is undefined. so I guess we don't use it because it's not available in some usual shells. (I haven't checked the script with https://www.shellcheck.net/ before and after your patch, but it might help avoid bash-isms and such issues.) > + fi > + done > + process_split_commit "$rev" "$parsedParents" > done || exit $? It looks like we use "exit $?" a lot in git-subtree.sh while we use just "exit" most often elsewhere. Not sure why. > latest_new=$(cache_get latest_new) || exit $? > diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh > index 49a21dd7c9c..87d59afd761 100755 > --- a/contrib/subtree/t/t7900-subtree.sh > +++ b/contrib/subtree/t/t7900-subtree.sh > @@ -385,6 +385,48 @@ test_expect_success 'split sub dir/ with --rejoin' ' > ) > ' > > +test_expect_success 'split with multiple subtrees' ' > + subtree_test_create_repo "$test_count" && > + subtree_test_create_repo "$test_count/subA" && > + subtree_test_create_repo "$test_count/subB" && > + test_create_commit "$test_count" main1 && > + test_create_commit "$test_count/subA" subA1 && > + test_create_commit "$test_count/subA" subA2 && > + test_create_commit "$test_count/subA" subA3 && > + test_create_commit "$test_count/subB" subB1 && > + ( > + cd "$test_count" && > + git fetch ./subA HEAD && > + git subtree add --prefix=subADir FETCH_HEAD > + ) && > + ( > + cd "$test_count" && > + git fetch ./subB HEAD && > + git subtree add --prefix=subBDir FETCH_HEAD > + ) && > + test_create_commit "$test_count" subADir/main-subA1 && > + test_create_commit "$test_count" subBDir/main-subB1 && > + ( > + cd "$test_count" && > + git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1" > + ) && Not sure why there are so many sub-shells used, and why the -C option is not used instead to tell Git to work in a subdirectory. I guess you copied what most existing (old) tests in this test script do. For example perhaps the 4 line above could be replaced by just: + git -C "$test_count" subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1" && > + ( > + cd "$test_count" && > + git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1" > + ) && > + test_create_commit "$test_count" subADir/main-subA2 && > + test_create_commit "$test_count" subBDir/main-subB2 && > + ( > + cd "$test_count" && > + git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2" > + ) && > + ( > + cd "$test_count" && > + test "$(git subtree split --prefix=subBDir --squash --rejoin \ > + -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" > + ) > +' It's not clear to me what the test is doing. Maybe you could split it into 2 tests. Perhaps one setting up a repo with multiple subtrees and one checking that a new split ignores other subtree split commits. Perhaps adding a few comments would help too. Best, Christian. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v4] subtree: fix split processing with multiple subtrees present 2023-11-18 11:28 ` Christian Couder @ 2023-11-28 21:04 ` Zach FettersMoore 0 siblings, 0 replies; 31+ messages in thread From: Zach FettersMoore @ 2023-11-28 21:04 UTC (permalink / raw) To: Christian Couder; +Cc: Zach FettersMoore via GitGitGadget, git >> In the diagram below, 'M' represents the mainline repo branch, 'A' >> represents one subtree, and 'B' represents another. M3 and B1 represent >> a split commit for subtree B that was created from commit M4. M2 and A1 >> represent a split commit made from subtree A that was also created >> based on changes back to and including M4. M1 represents new changes to >> the repo, in this scenario if you try to run a 'git subtree split >> --rejoin' for subtree B, commits M1, M2, and A1, will be included in >> the processing of changes for the new split commit since the last >> split/rejoin for subtree B was at M3. The issue is that by having A1 >> included in this processing the command ends up needing to processing >> every commit down tree A even though none of that is needed or relevant >> to the current command and result. >> >> M1 >> | \ \ >> M2 | | >> | A1 | >> M3 | | >> | | B1 >> M4 | | > About the above, Junio already commented the following: > > -> The above paragraph explains which different things you drew in the > -> diagram are representing, but it is not clear how they relate to > -> each other. Do they for example depict parent-child commit > -> relationship? What are the wide gaps between these three tracks and > -> what are the short angled lines leaning to the left near the tip? > -> Is the time/topology flowing from bottom to top? > > and it doesn't look like you have addressed that comment. > > When you say "M3 and B1 represent a split commit for subtree B that > was created from commit M4." I am not sure what it means exactly. > Could you give example commands that could have created the M3 and B1 > commits? I removed the diagram from the commit message since it seems a little unclear, and in its place I added an example of an open source repo (which I am currently using the fix in) and the commands to replicate the issue. Hopefully that better illustrates how I came across the issue and what it is. >> So this commit makes a change to the processing of commits for the split >> command in order to ignore non-mainline commits from other subtrees such >> as A1 in the diagram by adding a new function >> 'should_ignore_subtree_commit' which is called during >> 'process_split_commit'. This allows the split/rejoin processing to still >> function as expected but removes all of the unnecessary processing that >> takes place currently which greatly inflates the processing time. > Could you tell a bit more what kind of processing time reduction is or > would be possible on what kind of repo? Have you benchmark-ed or just > timed this somehow on one of your repos or better on an open source > repo (so that we could reproduce if we wanted)? I added some extra info for this to the commit message as well, but to answer your question yes I discovered and benchmarked this issue in a repo I help maintain. I was seeing splits take upwards of 12 minutes before the fix, and after they were taking only seconds. Also provided infor on the repo and how to reproduce in the updated commit message. >> Added a test to validate that the proposed fix >> solves the issue. >> >> The test accomplishes this by checking the output >> of the split command to ensure the output from >> the progress of 'process_split_commit' function >> that represents the 'extracount' of commits >> processed does not increment. > Does not increment compared to what? I reworded this to say the 'extracount' remains at 0 since there should be no extra processed commits from the second subtree in the test. >> This was tested against the original functionality >> to show the test failed, and then with this fix >> to show the test passes. >> >> This illustrated that when using multiple subtrees, >> A and B, when doing a split on subtree B, the >> processing does not traverse the entire history >> of subtree A which is unnecessary and would cause >> the 'extracount' of processed commits to climb >> based on the number of commits in the history of >> subtree A. > Does this mean that the test checks that the extracount is the same > when subtree A exists as when it doesn't exist? This means the test is checking that the 'extracount' remains at 0, because anything above 0 would mean commits from subtree A were being processed, which is where the issue stems from. >> contrib/subtree/git-subtree.sh | 29 ++++++++++++++++++++- >> contrib/subtree/t/t7900-subtree.sh | 42 ++++++++++++++++++++++++++++++ >> 2 files changed, 70 insertions(+), 1 deletion(-) >> >> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree= >.sh >> index e0c5d3b0de6..e69991a9d80 100755 >> --- a/contrib/subtree/git-subtree.sh >> +++ b/contrib/subtree/git-subtree.sh >> @@ -778,6 +778,20 @@ ensure_valid_ref_format () { >> die "fatal: '$1' does not look like a ref" >> } >> >> +# Usage: check if a commit from another subtree should be >> +# ignored from processing for splits >> +should_ignore_subtree_split_commit () { > Maybe adding: > > assert test $# =3D 1 > local rev=3D"$1" > > here, and using $rev instead of $1 in this function could make things > a bit clearer and similar to what is done in other functions. Updated. >> + if test -n "$(git log -1 --grep=3D"git-subtree-dir:" $1)" >> + then >> + if test -z "$(git log -1 --grep=3D"git-subtree-mainline:" $1)" && >> + test -z "$(git log -1 --grep=3D"git-subtree-dir: = >> $arg_prefix$" $1)" >> + then >> + return 0 >> + fi >> + fi >> + return 1 >> +} > The above doesn't seem to be properly indented. We use tabs not spaces. Fixed. >> # Usage: process_split_commit REV PARENTS >> process_split_commit () { >> assert test $# =3D 2 >> @@ -963,7 +977,20 @@ cmd_split () { >> eval "$grl" | >> while read rev parents >> do >> - process_split_commit "$rev" "$parents" >> + if should_ignore_subtree_split_commit "$rev" >> + then >> + continue >> + fi >> + parsedParents=3D'' > It seems to me that we name variables "parsed_parents" (or sometimes > "parsedparents") rather than "parsedParents". Fixed. >> + for parent in $parents >> + do >> + should_ignore_subtree_split_commit "$parent" >> + if test $? -eq 1 > I think the 2 lines above could be replaced by: > > + if ! should_ignore_subtree_split_commit "$parent" Updated. >> + then >> + parsedParents+=3D"$parent " > It doesn't seem to me that we use "+=3D" much in our shell scripts. > https://www.shellcheck.net/ emits the following: > > (warning): In POSIX sh, +=3D is undefined. > > so I guess we don't use it because it's not available in some usual shells. > > (I haven't checked the script with https://www.shellcheck.net/ before > and after your patch, but it might help avoid bash-isms and such > issues.) Updated this to remove the '+=' usage. >> + fi >> + done >> + process_split_commit "$rev" "$parsedParents" >> done || exit $? > It looks like we use "exit $?" a lot in git-subtree.sh while we use > just "exit" most often elsewhere. Not sure why. Yea I am unsure of the reasoning of that, I was just trying to follow the what the existing script was already doing. >> latest_new=3D$(cache_get latest_new) || exit $? >> diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900= >> -subtree.sh >> index 49a21dd7c9c..87d59afd761 100755 >> --- a/contrib/subtree/t/t7900-subtree.sh >> +++ b/contrib/subtree/t/t7900-subtree.sh >> @@ -385,6 +385,48 @@ test_expect_success 'split sub dir/ with --rejoin' ' >> ) >> ' >> >> +test_expect_success 'split with multiple subtrees' ' >> + subtree_test_create_repo "$test_count" && >> + subtree_test_create_repo "$test_count/subA" && >> + subtree_test_create_repo "$test_count/subB" && >> + test_create_commit "$test_count" main1 && >> + test_create_commit "$test_count/subA" subA1 && >> + test_create_commit "$test_count/subA" subA2 && >> + test_create_commit "$test_count/subA" subA3 && >> + test_create_commit "$test_count/subB" subB1 && >> + ( >> + cd "$test_count" && >> + git fetch ./subA HEAD && >> + git subtree add --prefix=3DsubADir FETCH_HEAD >> + ) && >> + ( >> + cd "$test_count" && >> + git fetch ./subB HEAD && >> + git subtree add --prefix=3DsubBDir FETCH_HEAD >> + ) && >> + test_create_commit "$test_count" subADir/main-subA1 && >> + test_create_commit "$test_count" subBDir/main-subB1 && >> + ( >> + cd "$test_count" && >> + git subtree split --prefix=3DsubADir --squash --rejoin -m= >> "Sub A Split 1" >> + ) && > Not sure why there are so many sub-shells used, and why the -C option > is not used instead to tell Git to work in a subdirectory. I guess you > copied what most existing (old) tests in this test script do. > > For example perhaps the 4 line above could be replaced by just: > > + git -C "$test_count" subtree split --prefix=3DsubADir > --squash --rejoin -m "Sub A Split 1" && Yea I was following what was being done in other existing tests, although this seems like a better way to do this so I updated the test to remove the extra sub-shells. >> + ( >> + cd "$test_count" && >> + git subtree split --prefix=3DsubBDir --squash --rejoin -m= >> "Sub B Split 1" >> + ) && >> + test_create_commit "$test_count" subADir/main-subA2 && >> + test_create_commit "$test_count" subBDir/main-subB2 && >> + ( >> + cd "$test_count" && >> + git subtree split --prefix=3DsubADir --squash --rejoin -m= >> "Sub A Split 2" >> + ) && >> + ( >> + cd "$test_count" && >> + test "$(git subtree split --prefix=3DsubBDir --squash --r= >> ejoin \ >> + -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" =3D "" >> + ) >> +' > It's not clear to me what the test is doing. Maybe you could split it > into 2 tests. Perhaps one setting up a repo with multiple subtrees and > one checking that a new split ignores other subtree split commits. > Perhaps adding a few comments would help too. Added some comments before the test to describe the steps the test is taking in order to verify the desired behavior. On Sat, Nov 18, 2023 at 6:28 AM Christian Couder <christian.couder@gmail.com> wrote: > > On Thu, Oct 26, 2023 at 9:18 PM Zach FettersMoore via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > > > From: Zach FettersMoore <zach.fetters@apollographql.com> > > > > When there are multiple subtrees present in a repository and they are > > all using 'git subtree split', the 'split' command can take a > > significant (and constantly growing) amount of time to run even when > > using the '--rejoin' flag. This is due to the fact that when processing > > commits to determine the last known split to start from when looking > > for changes, if there has been a split/merge done from another subtree > > there will be 2 split commits, one mainline and one subtree, for the > > second subtree that are part of the processing. The non-mainline > > subtree split commit will cause the processing to always need to search > > the entire history of the given subtree as part of its processing even > > though those commits are totally irrelevant to the current subtree > > split being run. > > Thanks for your continued work on this! > > I am not familiar with git subtree so I might miss obvious things. On > the other hand, my comments might help increase a bit the number of > people who could review this patch. > > > In the diagram below, 'M' represents the mainline repo branch, 'A' > > represents one subtree, and 'B' represents another. M3 and B1 represent > > a split commit for subtree B that was created from commit M4. M2 and A1 > > represent a split commit made from subtree A that was also created > > based on changes back to and including M4. M1 represents new changes to > > the repo, in this scenario if you try to run a 'git subtree split > > --rejoin' for subtree B, commits M1, M2, and A1, will be included in > > the processing of changes for the new split commit since the last > > split/rejoin for subtree B was at M3. The issue is that by having A1 > > included in this processing the command ends up needing to processing > > every commit down tree A even though none of that is needed or relevant > > to the current command and result. > > > > M1 > > | \ \ > > M2 | | > > | A1 | > > M3 | | > > | | B1 > > M4 | | > > About the above, Junio already commented the following: > > -> The above paragraph explains which different things you drew in the > -> diagram are representing, but it is not clear how they relate to > -> each other. Do they for example depict parent-child commit > -> relationship? What are the wide gaps between these three tracks and > -> what are the short angled lines leaning to the left near the tip? > -> Is the time/topology flowing from bottom to top? > > and it doesn't look like you have addressed that comment. > > When you say "M3 and B1 represent a split commit for subtree B that > was created from commit M4." I am not sure what it means exactly. > Could you give example commands that could have created the M3 and B1 > commits? > > > So this commit makes a change to the processing of commits for the split > > command in order to ignore non-mainline commits from other subtrees such > > as A1 in the diagram by adding a new function > > 'should_ignore_subtree_commit' which is called during > > 'process_split_commit'. This allows the split/rejoin processing to still > > function as expected but removes all of the unnecessary processing that > > takes place currently which greatly inflates the processing time. > > Could you tell a bit more what kind of processing time reduction is or > would be possible on what kind of repo? Have you benchmark-ed or just > timed this somehow on one of your repos or better on an open source > repo (so that we could reproduce if we wanted)? > > > Added a test to validate that the proposed fix > > solves the issue. > > > > The test accomplishes this by checking the output > > of the split command to ensure the output from > > the progress of 'process_split_commit' function > > that represents the 'extracount' of commits > > processed does not increment. > > Does not increment compared to what? > > > This was tested against the original functionality > > to show the test failed, and then with this fix > > to show the test passes. > > > > This illustrated that when using multiple subtrees, > > A and B, when doing a split on subtree B, the > > processing does not traverse the entire history > > of subtree A which is unnecessary and would cause > > the 'extracount' of processed commits to climb > > based on the number of commits in the history of > > subtree A. > > Does this mean that the test checks that the extracount is the same > when subtree A exists as when it doesn't exist? > > [...] > > > contrib/subtree/git-subtree.sh | 29 ++++++++++++++++++++- > > contrib/subtree/t/t7900-subtree.sh | 42 ++++++++++++++++++++++++++++++ > > 2 files changed, 70 insertions(+), 1 deletion(-) > > > > diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh > > index e0c5d3b0de6..e69991a9d80 100755 > > --- a/contrib/subtree/git-subtree.sh > > +++ b/contrib/subtree/git-subtree.sh > > @@ -778,6 +778,20 @@ ensure_valid_ref_format () { > > die "fatal: '$1' does not look like a ref" > > } > > > > +# Usage: check if a commit from another subtree should be > > +# ignored from processing for splits > > +should_ignore_subtree_split_commit () { > > Maybe adding: > > assert test $# = 1 > local rev="$1" > > here, and using $rev instead of $1 in this function could make things > a bit clearer and similar to what is done in other functions. > > > + if test -n "$(git log -1 --grep="git-subtree-dir:" $1)" > > + then > > + if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && > > + test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)" > > + then > > + return 0 > > + fi > > + fi > > + return 1 > > +} > > The above doesn't seem to be properly indented. We use tabs not spaces. > > > # Usage: process_split_commit REV PARENTS > > process_split_commit () { > > assert test $# = 2 > > @@ -963,7 +977,20 @@ cmd_split () { > > eval "$grl" | > > while read rev parents > > do > > - process_split_commit "$rev" "$parents" > > + if should_ignore_subtree_split_commit "$rev" > > + then > > + continue > > + fi > > + parsedParents='' > > It seems to me that we name variables "parsed_parents" (or sometimes > "parsedparents") rather than "parsedParents". > > > + for parent in $parents > > + do > > + should_ignore_subtree_split_commit "$parent" > > + if test $? -eq 1 > > I think the 2 lines above could be replaced by: > > + if ! should_ignore_subtree_split_commit "$parent" > > > + then > > + parsedParents+="$parent " > > It doesn't seem to me that we use "+=" much in our shell scripts. > https://www.shellcheck.net/ emits the following: > > (warning): In POSIX sh, += is undefined. > > so I guess we don't use it because it's not available in some usual shells. > > (I haven't checked the script with https://www.shellcheck.net/ before > and after your patch, but it might help avoid bash-isms and such > issues.) > > > + fi > > + done > > + process_split_commit "$rev" "$parsedParents" > > done || exit $? > > It looks like we use "exit $?" a lot in git-subtree.sh while we use > just "exit" most often elsewhere. Not sure why. > > > latest_new=$(cache_get latest_new) || exit $? > > diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh > > index 49a21dd7c9c..87d59afd761 100755 > > --- a/contrib/subtree/t/t7900-subtree.sh > > +++ b/contrib/subtree/t/t7900-subtree.sh > > @@ -385,6 +385,48 @@ test_expect_success 'split sub dir/ with --rejoin' ' > > ) > > ' > > > > +test_expect_success 'split with multiple subtrees' ' > > + subtree_test_create_repo "$test_count" && > > + subtree_test_create_repo "$test_count/subA" && > > + subtree_test_create_repo "$test_count/subB" && > > + test_create_commit "$test_count" main1 && > > + test_create_commit "$test_count/subA" subA1 && > > + test_create_commit "$test_count/subA" subA2 && > > + test_create_commit "$test_count/subA" subA3 && > > + test_create_commit "$test_count/subB" subB1 && > > + ( > > + cd "$test_count" && > > + git fetch ./subA HEAD && > > + git subtree add --prefix=subADir FETCH_HEAD > > + ) && > > + ( > > + cd "$test_count" && > > + git fetch ./subB HEAD && > > + git subtree add --prefix=subBDir FETCH_HEAD > > + ) && > > + test_create_commit "$test_count" subADir/main-subA1 && > > + test_create_commit "$test_count" subBDir/main-subB1 && > > + ( > > + cd "$test_count" && > > + git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1" > > + ) && > > Not sure why there are so many sub-shells used, and why the -C option > is not used instead to tell Git to work in a subdirectory. I guess you > copied what most existing (old) tests in this test script do. > > For example perhaps the 4 line above could be replaced by just: > > + git -C "$test_count" subtree split --prefix=subADir > --squash --rejoin -m "Sub A Split 1" && > > > + ( > > + cd "$test_count" && > > + git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1" > > + ) && > > + test_create_commit "$test_count" subADir/main-subA2 && > > + test_create_commit "$test_count" subBDir/main-subB2 && > > + ( > > + cd "$test_count" && > > + git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2" > > + ) && > > + ( > > + cd "$test_count" && > > + test "$(git subtree split --prefix=subBDir --squash --rejoin \ > > + -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" > > + ) > > +' > > It's not clear to me what the test is doing. Maybe you could split it > into 2 tests. Perhaps one setting up a repo with multiple subtrees and > one checking that a new split ignores other subtree split commits. > Perhaps adding a few comments would help too. > > Best, > Christian. ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v5] subtree: fix split processing with multiple subtrees present 2023-10-26 19:17 ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 2023-11-18 11:28 ` Christian Couder @ 2023-11-28 21:17 ` Zach FettersMoore via GitGitGadget 2023-11-30 20:33 ` Christian Couder 2023-12-01 14:54 ` [PATCH v6] " Zach FettersMoore via GitGitGadget 1 sibling, 2 replies; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-11-28 21:17 UTC (permalink / raw) To: git; +Cc: Christian Couder, Zach FettersMoore, Zach FettersMoore From: Zach FettersMoore <zach.fetters@apollographql.com> When there are multiple subtrees present in a repository and they are all using 'git subtree split', the 'split' command can take a significant (and constantly growing) amount of time to run even when using the '--rejoin' flag. This is due to the fact that when processing commits to determine the last known split to start from when looking for changes, if there has been a split/merge done from another subtree there will be 2 split commits, one mainline and one subtree, for the second subtree that are part of the processing. The non-mainline subtree split commit will cause the processing to always need to search the entire history of the given subtree as part of its processing even though those commits are totally irrelevant to the current subtree split being run. To see this in practice you can use the open source GitHub repo 'apollo-ios-dev' and do the following in order: -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen' directories -Create a commit containing these changes -Do a split on apollo-ios-codegen - git subtree split --prefix=apollo-ios-codegen --squash --rejoin -Do a split on apollo-ios - git subtree split --prefix=apollo-ios --squash --rejoin -Make changes to a file in apollo-ios-codegen -Create a commit containing the change(s) -Do a split on apollo-ios-codegen - git subtree split --prefix=apollo-ios-codegen --squash --rejoin You will see that the final split is looking for the last split on apollo-ios-codegen to use as it's starting point to process commits. Since there is a split commit from apollo-ios in between the 2 splits run on apollo-ios-codegen, the processing ends up traversing the entire history of apollo-ios which increases the time it takes to do a split based on how long of a history apollo-ios has, while none of these commits are relevant to the split being done on apollo-ios-codegen. So this commit makes a change to the processing of commits for the split command in order to ignore non-mainline commits from other subtrees such as apollo-ios in the above breakdown by adding a new function 'should_ignore_subtree_commit' which is called during 'process_split_commit'. This allows the split/rejoin processing to still function as expected but removes all of the unnecessary processing that takes place currently which greatly inflates the processing time. In the above example, previously the final split would take ~10-12 minutes, while after this fix it takes seconds. Added a test to validate that the proposed fix solves the issue. The test accomplishes this by checking the output of the split command to ensure the output from the progress of 'process_split_commit' function that represents the 'extracount' of commits processed remains at 0, meaning none of the commits from the second subtree were processed. This was tested against the original functionality to show the test failed, and then with this fix to show the test passes. This illustrated that when using multiple subtrees, A and B, when doing a split on subtree B, the processing does not traverse the entire history of subtree A which is unnecessary and would cause the 'extracount' of processed commits to climb based on the number of commits in the history of subtree A. Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> --- subtree: fix split processing with multiple subtrees present When there are multiple subtrees in a repo and git subtree split --rejoin is being used for the subtrees, the processing of commits for a new split can take a significant (and constantly growing) amount of time because the split commits from other subtrees cause the processing to have to scan the entire history of the other subtree(s). This patch filters out the other subtree split commits that are unnecessary for the split commit processing. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v5 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v5 Pull-Request: https://github.com/gitgitgadget/git/pull/1587 Range-diff vs v4: 1: 353152910eb ! 1: e7445a95f30 subtree: fix split processing with multiple subtrees present @@ Commit message though those commits are totally irrelevant to the current subtree split being run. - In the diagram below, 'M' represents the mainline repo branch, 'A' - represents one subtree, and 'B' represents another. M3 and B1 represent - a split commit for subtree B that was created from commit M4. M2 and A1 - represent a split commit made from subtree A that was also created - based on changes back to and including M4. M1 represents new changes to - the repo, in this scenario if you try to run a 'git subtree split - --rejoin' for subtree B, commits M1, M2, and A1, will be included in - the processing of changes for the new split commit since the last - split/rejoin for subtree B was at M3. The issue is that by having A1 - included in this processing the command ends up needing to processing - every commit down tree A even though none of that is needed or relevant - to the current command and result. + To see this in practice you can use the open source GitHub repo + 'apollo-ios-dev' and do the following in order: - M1 - | \ \ - M2 | | - | A1 | - M3 | | - | | B1 - M4 | | + -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen' + directories + -Create a commit containing these changes + -Do a split on apollo-ios-codegen + - git subtree split --prefix=apollo-ios-codegen --squash --rejoin + -Do a split on apollo-ios + - git subtree split --prefix=apollo-ios --squash --rejoin + -Make changes to a file in apollo-ios-codegen + -Create a commit containing the change(s) + -Do a split on apollo-ios-codegen + - git subtree split --prefix=apollo-ios-codegen --squash --rejoin - So this commit makes a change to the processing of commits for the split - command in order to ignore non-mainline commits from other subtrees such - as A1 in the diagram by adding a new function - 'should_ignore_subtree_commit' which is called during - 'process_split_commit'. This allows the split/rejoin processing to still - function as expected but removes all of the unnecessary processing that - takes place currently which greatly inflates the processing time. + You will see that the final split is looking for the last split + on apollo-ios-codegen to use as it's starting point to process + commits. Since there is a split commit from apollo-ios in between the + 2 splits run on apollo-ios-codegen, the processing ends up traversing + the entire history of apollo-ios which increases the time it takes to + do a split based on how long of a history apollo-ios has, while none + of these commits are relevant to the split being done on + apollo-ios-codegen. + + So this commit makes a change to the processing of commits for the + split command in order to ignore non-mainline commits from other + subtrees such as apollo-ios in the above breakdown by adding a new + function 'should_ignore_subtree_commit' which is called during + 'process_split_commit'. This allows the split/rejoin processing to + still function as expected but removes all of the unnecessary + processing that takes place currently which greatly inflates the + processing time. In the above example, previously the final split + would take ~10-12 minutes, while after this fix it takes seconds. Added a test to validate that the proposed fix solves the issue. @@ Commit message of the split command to ensure the output from the progress of 'process_split_commit' function that represents the 'extracount' of commits - processed does not increment. + processed remains at 0, meaning none of the commits + from the second subtree were processed. This was tested against the original functionality to show the test failed, and then with this fix @@ contrib/subtree/git-subtree.sh: ensure_valid_ref_format () { +# Usage: check if a commit from another subtree should be +# ignored from processing for splits +should_ignore_subtree_split_commit () { -+ if test -n "$(git log -1 --grep="git-subtree-dir:" $1)" -+ then -+ if test -z "$(git log -1 --grep="git-subtree-mainline:" $1)" && -+ test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $1)" -+ then -+ return 0 -+ fi -+ fi -+ return 1 ++ assert test $# = 1 ++ local rev="$1" ++ if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)" ++ then ++ if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" && ++ test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)" ++ then ++ return 0 ++ fi ++ fi ++ return 1 +} + # Usage: process_split_commit REV PARENTS @@ contrib/subtree/git-subtree.sh: cmd_split () { + then + continue + fi -+ parsedParents='' ++ parsedparents='' + for parent in $parents + do -+ should_ignore_subtree_split_commit "$parent" -+ if test $? -eq 1 ++ if ! should_ignore_subtree_split_commit "$parent" + then -+ parsedParents+="$parent " ++ parsedparents="$parsedparents$parent " + fi + done -+ process_split_commit "$rev" "$parsedParents" ++ process_split_commit "$rev" "$parsedparents" done || exit $? latest_new=$(cache_get latest_new) || exit $? @@ contrib/subtree/t/t7900-subtree.sh: test_expect_success 'split sub dir/ with --r ) ' ++# Tests that commits from other subtrees are not processed as ++# part of a split. ++# ++# This test performs the following: ++# - Creates Repo with subtrees 'subA' and 'subB' ++# - Creates commits in the repo including changes to subtrees ++# - Runs the following 'split' and commit' commands in order: ++# - Perform 'split' on subtree A ++# - Perform 'split' on subtree B ++# - Create new commits with changes to subtree A and B ++# - Perform split on subtree A ++# - Check that the commits in subtree B are not processed ++# as part of the subtree A split +test_expect_success 'split with multiple subtrees' ' + subtree_test_create_repo "$test_count" && + subtree_test_create_repo "$test_count/subA" && @@ contrib/subtree/t/t7900-subtree.sh: test_expect_success 'split sub dir/ with --r + test_create_commit "$test_count/subA" subA2 && + test_create_commit "$test_count/subA" subA3 && + test_create_commit "$test_count/subB" subB1 && -+ ( -+ cd "$test_count" && -+ git fetch ./subA HEAD && -+ git subtree add --prefix=subADir FETCH_HEAD -+ ) && -+ ( -+ cd "$test_count" && -+ git fetch ./subB HEAD && -+ git subtree add --prefix=subBDir FETCH_HEAD -+ ) && ++ git -C "$test_count" fetch ./subA HEAD && ++ git -C "$test_count" subtree add --prefix=subADir FETCH_HEAD && ++ git -C "$test_count" fetch ./subB HEAD && ++ git -C "$test_count" subtree add --prefix=subBDir FETCH_HEAD && + test_create_commit "$test_count" subADir/main-subA1 && + test_create_commit "$test_count" subBDir/main-subB1 && -+ ( -+ cd "$test_count" && -+ git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 1" -+ ) && -+ ( -+ cd "$test_count" && -+ git subtree split --prefix=subBDir --squash --rejoin -m "Sub B Split 1" -+ ) && ++ git -C "$test_count" subtree split --prefix=subADir \ ++ --squash --rejoin -m "Sub A Split 1" && ++ git -C "$test_count" subtree split --prefix=subBDir \ ++ --squash --rejoin -m "Sub B Split 1" && + test_create_commit "$test_count" subADir/main-subA2 && + test_create_commit "$test_count" subBDir/main-subB2 && -+ ( -+ cd "$test_count" && -+ git subtree split --prefix=subADir --squash --rejoin -m "Sub A Split 2" -+ ) && -+ ( -+ cd "$test_count" && -+ test "$(git subtree split --prefix=subBDir --squash --rejoin \ -+ -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" -+ ) ++ git -C "$test_count" subtree split --prefix=subADir \ ++ --squash --rejoin -m "Sub A Split 2" && ++ test "$(git -C "$test_count" subtree split --prefix=subBDir \ ++ --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" +' + test_expect_success 'split sub dir/ with --rejoin from scratch' ' contrib/subtree/git-subtree.sh | 30 +++++++++++++++++++++- contrib/subtree/t/t7900-subtree.sh | 40 ++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index e0c5d3b0de6..a0bf958ea66 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -778,6 +778,22 @@ ensure_valid_ref_format () { die "fatal: '$1' does not look like a ref" } +# Usage: check if a commit from another subtree should be +# ignored from processing for splits +should_ignore_subtree_split_commit () { + assert test $# = 1 + local rev="$1" + if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)" + then + if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" && + test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)" + then + return 0 + fi + fi + return 1 +} + # Usage: process_split_commit REV PARENTS process_split_commit () { assert test $# = 2 @@ -963,7 +979,19 @@ cmd_split () { eval "$grl" | while read rev parents do - process_split_commit "$rev" "$parents" + if should_ignore_subtree_split_commit "$rev" + then + continue + fi + parsedparents='' + for parent in $parents + do + if ! should_ignore_subtree_split_commit "$parent" + then + parsedparents="$parsedparents$parent " + fi + done + process_split_commit "$rev" "$parsedparents" done || exit $? latest_new=$(cache_get latest_new) || exit $? diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh index 49a21dd7c9c..ca4df5be832 100755 --- a/contrib/subtree/t/t7900-subtree.sh +++ b/contrib/subtree/t/t7900-subtree.sh @@ -385,6 +385,46 @@ test_expect_success 'split sub dir/ with --rejoin' ' ) ' +# Tests that commits from other subtrees are not processed as +# part of a split. +# +# This test performs the following: +# - Creates Repo with subtrees 'subA' and 'subB' +# - Creates commits in the repo including changes to subtrees +# - Runs the following 'split' and commit' commands in order: +# - Perform 'split' on subtree A +# - Perform 'split' on subtree B +# - Create new commits with changes to subtree A and B +# - Perform split on subtree A +# - Check that the commits in subtree B are not processed +# as part of the subtree A split +test_expect_success 'split with multiple subtrees' ' + subtree_test_create_repo "$test_count" && + subtree_test_create_repo "$test_count/subA" && + subtree_test_create_repo "$test_count/subB" && + test_create_commit "$test_count" main1 && + test_create_commit "$test_count/subA" subA1 && + test_create_commit "$test_count/subA" subA2 && + test_create_commit "$test_count/subA" subA3 && + test_create_commit "$test_count/subB" subB1 && + git -C "$test_count" fetch ./subA HEAD && + git -C "$test_count" subtree add --prefix=subADir FETCH_HEAD && + git -C "$test_count" fetch ./subB HEAD && + git -C "$test_count" subtree add --prefix=subBDir FETCH_HEAD && + test_create_commit "$test_count" subADir/main-subA1 && + test_create_commit "$test_count" subBDir/main-subB1 && + git -C "$test_count" subtree split --prefix=subADir \ + --squash --rejoin -m "Sub A Split 1" && + git -C "$test_count" subtree split --prefix=subBDir \ + --squash --rejoin -m "Sub B Split 1" && + test_create_commit "$test_count" subADir/main-subA2 && + test_create_commit "$test_count" subBDir/main-subB2 && + git -C "$test_count" subtree split --prefix=subADir \ + --squash --rejoin -m "Sub A Split 2" && + test "$(git -C "$test_count" subtree split --prefix=subBDir \ + --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" +' + test_expect_success 'split sub dir/ with --rejoin from scratch' ' subtree_test_create_repo "$test_count" && test_create_commit "$test_count" main1 && base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518 -- gitgitgadget ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v5] subtree: fix split processing with multiple subtrees present 2023-11-28 21:17 ` [PATCH v5] " Zach FettersMoore via GitGitGadget @ 2023-11-30 20:33 ` Christian Couder 2023-11-30 21:01 ` Zach FettersMoore 2023-12-01 14:54 ` [PATCH v6] " Zach FettersMoore via GitGitGadget 1 sibling, 1 reply; 31+ messages in thread From: Christian Couder @ 2023-11-30 20:33 UTC (permalink / raw) To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore On Tue, Nov 28, 2023 at 10:17 PM Zach FettersMoore via GitGitGadget <gitgitgadget@gmail.com> wrote: > To see this in practice you can use the open source GitHub repo > 'apollo-ios-dev' and do the following in order: > > -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen' It looks like there is a spurious A after 'apollo-ios' in the line above. > directories > -Create a commit containing these changes > -Do a split on apollo-ios-codegen > - git subtree split --prefix=apollo-ios-codegen --squash --rejoin I might be doing something stupid or wrong, but I get the following: $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin fatal: could not rev-parse split hash cc70a7d49e84696f0df210710445784c504ed748 from commit 360f068ea0d57f250621ab7dbe205313f52a0e98 hint: hash might be a tag, try fetching it from the subtree repository: hint: git fetch <subtree-repository> cc70a7d49e84696f0df210710445784c504ed748 > -Do a split on apollo-ios > - git subtree split --prefix=apollo-ios --squash --rejoin Same issue: $ git subtree split --prefix=apollo-ios --squash --rejoin fatal: could not rev-parse split hash b852c0aa1fd5ab9e1323da92b606ad3f2211e111 from commit b48030c3eb6e2faf4bff981c5c63ca72aceecdfa hint: hash might be a tag, try fetching it from the subtree repository: hint: git fetch <subtree-repository> b852c0aa1fd5ab9e1323da92b606ad3f2211e111 I didn't try to get farther than this, as it seems that some instructions might be missing. [...] > So this commit makes a change to the processing of commits for the > split command in order to ignore non-mainline commits from other > subtrees such as apollo-ios in the above breakdown by adding a new > function 'should_ignore_subtree_commit' which is called during > 'process_split_commit'. This allows the split/rejoin processing to > still function as expected but removes all of the unnecessary > processing that takes place currently which greatly inflates the > processing time. In the above example, previously the final split > would take ~10-12 minutes, while after this fix it takes seconds. Nice! Except for the above issues in the commit message, the rest of the patch looks good to me, thanks! ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v5] subtree: fix split processing with multiple subtrees present 2023-11-30 20:33 ` Christian Couder @ 2023-11-30 21:01 ` Zach FettersMoore 0 siblings, 0 replies; 31+ messages in thread From: Zach FettersMoore @ 2023-11-30 21:01 UTC (permalink / raw) To: Christian Couder; +Cc: Zach FettersMoore via GitGitGadget, git >> To see this in practice you can use the open source GitHub repo >> 'apollo-ios-dev' and do the following in order: >> >> -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen' > It looks like there is a spurious A after 'apollo-ios' in the line above. Thanks for catching that, definitely a typo on my part. >> directories >> -Create a commit containing these changes >> -Do a split on apollo-ios-codegen >> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin > I might be doing something stupid or wrong, but I get the following: > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin > fatal: could not rev-parse split hash > cc70a7d49e84696f0df210710445784c504ed748 from commit > 360f068ea0d57f250621ab7dbe205313f52a0e98 > hint: hash might be a tag, try fetching it from the subtree repository: > hint: git fetch <subtree-repository> cc70a7d49e84696f0df210710445784c504ed748 Updated this to include doing a fetch to ensure all remote repo info is available locally. >> -Do a split on apollo-ios >> - git subtree split --prefix=apollo-ios --squash --rejoin > Same issue: > > $ git subtree split --prefix=apollo-ios --squash --rejoin > fatal: could not rev-parse split hash > b852c0aa1fd5ab9e1323da92b606ad3f2211e111 from commit > b48030c3eb6e2faf4bff981c5c63ca72aceecdfa > hint: hash might be a tag, try fetching it from the subtree repository: > hint: git fetch <subtree-repository> b852c0aa1fd5ab9e1323da92b606ad3f2211e111 > > I didn't try to get farther than this, as it seems that some > instructions might be missing. Same as above, added extra instruction to do a fetch first. Also added a little extra info that the issue may present after the first split in the instructions depending on the current state of the repo being used. Also added a way to do the same steps with the changes applied to see that it resolves the issue. >> So this commit makes a change to the processing of commits for the >> split command in order to ignore non-mainline commits from other >> subtrees such as apollo-ios in the above breakdown by adding a new >> function 'should_ignore_subtree_commit' which is called during >> 'process_split_commit'. This allows the split/rejoin processing to >> still function as expected but removes all of the unnecessary >> processing that takes place currently which greatly inflates the >> processing time. In the above example, previously the final split >> would take ~10-12 minutes, while after this fix it takes seconds. > Nice! > > Except for the above issues in the commit message, the rest of the > patch looks good to me, thanks! Great! Thanks for the review and guidance! ^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v6] subtree: fix split processing with multiple subtrees present 2023-11-28 21:17 ` [PATCH v5] " Zach FettersMoore via GitGitGadget 2023-11-30 20:33 ` Christian Couder @ 2023-12-01 14:54 ` Zach FettersMoore via GitGitGadget 2023-12-04 11:08 ` Christian Couder 2025-08-21 3:13 ` subtree: [v2.44 regression] split may produce different history Colin Stagner 1 sibling, 2 replies; 31+ messages in thread From: Zach FettersMoore via GitGitGadget @ 2023-12-01 14:54 UTC (permalink / raw) To: git; +Cc: Christian Couder, Zach FettersMoore, Zach FettersMoore From: Zach FettersMoore <zach.fetters@apollographql.com> When there are multiple subtrees present in a repository and they are all using 'git subtree split', the 'split' command can take a significant (and constantly growing) amount of time to run even when using the '--rejoin' flag. This is due to the fact that when processing commits to determine the last known split to start from when looking for changes, if there has been a split/merge done from another subtree there will be 2 split commits, one mainline and one subtree, for the second subtree that are part of the processing. The non-mainline subtree split commit will cause the processing to always need to search the entire history of the given subtree as part of its processing even though those commits are totally irrelevant to the current subtree split being run. To see this in practice you can use the open source GitHub repo 'apollo-ios-dev' and do the following in order: -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen' directories -Create a commit containing these changes -Do a split on apollo-ios-codegen - Do a fetch on the subtree repo - git fetch git@github.com:apollographql/apollo-ios-codegen.git - git subtree split --prefix=apollo-ios-codegen --squash --rejoin - Depending on the current state of the 'apollo-ios-dev' repo you may see the issue at this point if the last split was on apollo-ios -Do a split on apollo-ios - Do a fetch on the subtree repo - git fetch git@github.com:apollographql/apollo-ios.git - git subtree split --prefix=apollo-ios --squash --rejoin -Make changes to a file in apollo-ios-codegen -Create a commit containing the change(s) -Do a split on apollo-ios-codegen - git subtree split --prefix=apollo-ios-codegen --squash --rejoin -To see that the patch fixes the issue you can use the custom subtree script in the repo so following the same steps as above, except instead of using 'git subtree ...' for the commands use 'git-subtree.sh ...' for the commands You will see that the final split is looking for the last split on apollo-ios-codegen to use as it's starting point to process commits. Since there is a split commit from apollo-ios in between the 2 splits run on apollo-ios-codegen, the processing ends up traversing the entire history of apollo-ios which increases the time it takes to do a split based on how long of a history apollo-ios has, while none of these commits are relevant to the split being done on apollo-ios-codegen. So this commit makes a change to the processing of commits for the split command in order to ignore non-mainline commits from other subtrees such as apollo-ios in the above breakdown by adding a new function 'should_ignore_subtree_commit' which is called during 'process_split_commit'. This allows the split/rejoin processing to still function as expected but removes all of the unnecessary processing that takes place currently which greatly inflates the processing time. In the above example, previously the final split would take ~10-12 minutes, while after this fix it takes seconds. Added a test to validate that the proposed fix solves the issue. The test accomplishes this by checking the output of the split command to ensure the output from the progress of 'process_split_commit' function that represents the 'extracount' of commits processed remains at 0, meaning none of the commits from the second subtree were processed. This was tested against the original functionality to show the test failed, and then with this fix to show the test passes. This illustrated that when using multiple subtrees, A and B, when doing a split on subtree B, the processing does not traverse the entire history of subtree A which is unnecessary and would cause the 'extracount' of processed commits to climb based on the number of commits in the history of subtree A. Signed-off-by: Zach FettersMoore <zach.fetters@apollographql.com> --- subtree: fix split processing with multiple subtrees present When there are multiple subtrees in a repo and git subtree split --rejoin is being used for the subtrees, the processing of commits for a new split can take a significant (and constantly growing) amount of time because the split commits from other subtrees cause the processing to have to scan the entire history of the other subtree(s). This patch filters out the other subtree split commits that are unnecessary for the split commit processing. Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1587%2FBobaFetters%2Fzf%2Fmulti-subtree-processing-v6 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1587/BobaFetters/zf/multi-subtree-processing-v6 Pull-Request: https://github.com/gitgitgadget/git/pull/1587 Range-diff vs v5: 1: e7445a95f30 ! 1: 2a65ec0e4df subtree: fix split processing with multiple subtrees present @@ Commit message To see this in practice you can use the open source GitHub repo 'apollo-ios-dev' and do the following in order: - -Make a changes to a file in 'apollo-ios'A and 'apollo-ios-codegen' + -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen' directories -Create a commit containing these changes -Do a split on apollo-ios-codegen + - Do a fetch on the subtree repo + - git fetch git@github.com:apollographql/apollo-ios-codegen.git - git subtree split --prefix=apollo-ios-codegen --squash --rejoin + - Depending on the current state of the 'apollo-ios-dev' repo + you may see the issue at this point if the last split was on + apollo-ios -Do a split on apollo-ios + - Do a fetch on the subtree repo + - git fetch git@github.com:apollographql/apollo-ios.git - git subtree split --prefix=apollo-ios --squash --rejoin -Make changes to a file in apollo-ios-codegen -Create a commit containing the change(s) -Do a split on apollo-ios-codegen - git subtree split --prefix=apollo-ios-codegen --squash --rejoin + -To see that the patch fixes the issue you can use the custom subtree + script in the repo so following the same steps as above, except + instead of using 'git subtree ...' for the commands use + 'git-subtree.sh ...' for the commands You will see that the final split is looking for the last split on apollo-ios-codegen to use as it's starting point to process contrib/subtree/git-subtree.sh | 30 +++++++++++++++++++++- contrib/subtree/t/t7900-subtree.sh | 40 ++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+), 1 deletion(-) diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index e0c5d3b0de6..a0bf958ea66 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh @@ -778,6 +778,22 @@ ensure_valid_ref_format () { die "fatal: '$1' does not look like a ref" } +# Usage: check if a commit from another subtree should be +# ignored from processing for splits +should_ignore_subtree_split_commit () { + assert test $# = 1 + local rev="$1" + if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)" + then + if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" && + test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)" + then + return 0 + fi + fi + return 1 +} + # Usage: process_split_commit REV PARENTS process_split_commit () { assert test $# = 2 @@ -963,7 +979,19 @@ cmd_split () { eval "$grl" | while read rev parents do - process_split_commit "$rev" "$parents" + if should_ignore_subtree_split_commit "$rev" + then + continue + fi + parsedparents='' + for parent in $parents + do + if ! should_ignore_subtree_split_commit "$parent" + then + parsedparents="$parsedparents$parent " + fi + done + process_split_commit "$rev" "$parsedparents" done || exit $? latest_new=$(cache_get latest_new) || exit $? diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh index 49a21dd7c9c..ca4df5be832 100755 --- a/contrib/subtree/t/t7900-subtree.sh +++ b/contrib/subtree/t/t7900-subtree.sh @@ -385,6 +385,46 @@ test_expect_success 'split sub dir/ with --rejoin' ' ) ' +# Tests that commits from other subtrees are not processed as +# part of a split. +# +# This test performs the following: +# - Creates Repo with subtrees 'subA' and 'subB' +# - Creates commits in the repo including changes to subtrees +# - Runs the following 'split' and commit' commands in order: +# - Perform 'split' on subtree A +# - Perform 'split' on subtree B +# - Create new commits with changes to subtree A and B +# - Perform split on subtree A +# - Check that the commits in subtree B are not processed +# as part of the subtree A split +test_expect_success 'split with multiple subtrees' ' + subtree_test_create_repo "$test_count" && + subtree_test_create_repo "$test_count/subA" && + subtree_test_create_repo "$test_count/subB" && + test_create_commit "$test_count" main1 && + test_create_commit "$test_count/subA" subA1 && + test_create_commit "$test_count/subA" subA2 && + test_create_commit "$test_count/subA" subA3 && + test_create_commit "$test_count/subB" subB1 && + git -C "$test_count" fetch ./subA HEAD && + git -C "$test_count" subtree add --prefix=subADir FETCH_HEAD && + git -C "$test_count" fetch ./subB HEAD && + git -C "$test_count" subtree add --prefix=subBDir FETCH_HEAD && + test_create_commit "$test_count" subADir/main-subA1 && + test_create_commit "$test_count" subBDir/main-subB1 && + git -C "$test_count" subtree split --prefix=subADir \ + --squash --rejoin -m "Sub A Split 1" && + git -C "$test_count" subtree split --prefix=subBDir \ + --squash --rejoin -m "Sub B Split 1" && + test_create_commit "$test_count" subADir/main-subA2 && + test_create_commit "$test_count" subBDir/main-subB2 && + git -C "$test_count" subtree split --prefix=subADir \ + --squash --rejoin -m "Sub A Split 2" && + test "$(git -C "$test_count" subtree split --prefix=subBDir \ + --squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = "" +' + test_expect_success 'split sub dir/ with --rejoin from scratch' ' subtree_test_create_repo "$test_count" && test_create_commit "$test_count" main1 && base-commit: bda494f4043963b9ec9a1ecd4b19b7d1cd9a0518 -- gitgitgadget ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2023-12-01 14:54 ` [PATCH v6] " Zach FettersMoore via GitGitGadget @ 2023-12-04 11:08 ` Christian Couder 2023-12-11 15:39 ` Zach FettersMoore 2025-08-21 3:13 ` subtree: [v2.44 regression] split may produce different history Colin Stagner 1 sibling, 1 reply; 31+ messages in thread From: Christian Couder @ 2023-12-04 11:08 UTC (permalink / raw) To: Zach FettersMoore via GitGitGadget; +Cc: git, Zach FettersMoore On Fri, Dec 1, 2023 at 3:54 PM Zach FettersMoore via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Zach FettersMoore <zach.fetters@apollographql.com> > > When there are multiple subtrees present in a repository and they are > all using 'git subtree split', the 'split' command can take a > significant (and constantly growing) amount of time to run even when > using the '--rejoin' flag. This is due to the fact that when processing > commits to determine the last known split to start from when looking > for changes, if there has been a split/merge done from another subtree > there will be 2 split commits, one mainline and one subtree, for the > second subtree that are part of the processing. The non-mainline > subtree split commit will cause the processing to always need to search > the entire history of the given subtree as part of its processing even > though those commits are totally irrelevant to the current subtree > split being run. > > To see this in practice you can use the open source GitHub repo > 'apollo-ios-dev' and do the following in order: > > -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen' > directories > -Create a commit containing these changes > -Do a split on apollo-ios-codegen > - Do a fetch on the subtree repo > - git fetch git@github.com:apollographql/apollo-ios-codegen.git > - git subtree split --prefix=apollo-ios-codegen --squash --rejoin Now I get the following without your patch at this step: $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin [...]/libexec/git-core/git-subtree: 318: Maximum function recursion depth (1000) reached Line 318 in git-subtree.sh contains the following: missed=$(cache_miss "$@") || exit $? With your patch it seems to work: $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin Merge made by the 'ort' strategy. e274aed3ba6d0659fb4cc014587cf31c1e8df7f4 > - Depending on the current state of the 'apollo-ios-dev' repo > you may see the issue at this point if the last split was on > apollo-ios I guess I see it, but it seems a bit different for me than what you describe. Otherwise your patch looks good to me now. Thanks, Christian. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2023-12-04 11:08 ` Christian Couder @ 2023-12-11 15:39 ` Zach FettersMoore 2023-12-12 16:06 ` Christian Couder 0 siblings, 1 reply; 31+ messages in thread From: Zach FettersMoore @ 2023-12-11 15:39 UTC (permalink / raw) To: Christian Couder; +Cc: Zach FettersMoore via GitGitGadget, git >> >> From: Zach FettersMoore <zach.fetters@apollographql.com> >> >> When there are multiple subtrees present in a repository and they are >> all using 'git subtree split', the 'split' command can take a >> significant (and constantly growing) amount of time to run even when >> using the '--rejoin' flag. This is due to the fact that when processing >> commits to determine the last known split to start from when looking >> for changes, if there has been a split/merge done from another subtree >> there will be 2 split commits, one mainline and one subtree, for the >> second subtree that are part of the processing. The non-mainline >> subtree split commit will cause the processing to always need to search >> the entire history of the given subtree as part of its processing even >> though those commits are totally irrelevant to the current subtree >> split being run. >> >> To see this in practice you can use the open source GitHub repo >> 'apollo-ios-dev' and do the following in order: >> >> -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen' >> directories >> -Create a commit containing these changes >> -Do a split on apollo-ios-codegen >> - Do a fetch on the subtree repo >> - git fetch git@github.com:apollographql/apollo-ios-codegen.git >> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin > Now I get the following without your patch at this step: > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin > [...]/libexec/git-core/git-subtree: 318: Maximum function recursion > depth (1000) reached > > Line 318 in git-subtree.sh contains the following: > > missed=$(cache_miss "$@") || exit $? > > With your patch it seems to work: > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin > Merge made by the 'ort' strategy. > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4 Looking into this some it looks like it could be a bash config difference? My machine always runs it all the way through vs failing for recursion depth. Although that would also be an issue which is solved by this fix. >> - Depending on the current state of the 'apollo-ios-dev' repo >> you may see the issue at this point if the last split was on >> apollo-ios > I guess I see it, but it seems a bit different for me than what you describe. > > Otherwise your patch looks good to me now. Yea I hadn't accounted for/realized that some folks may see a recursion depth error vs it just taking a long time like it does for me. Also what I was saying with the apollo-ios-dev repo is you may not need all the steps to see the issue, because its possible the state of the repo is already in a position to display the issue just by doing a split on apollo-ios-codegen. Great! Thanks again for all the feedback and guidance! Is there anything else I need to do to get this across the finish line and merged in? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2023-12-11 15:39 ` Zach FettersMoore @ 2023-12-12 16:06 ` Christian Couder 2023-12-12 22:28 ` Junio C Hamano 2023-12-20 15:25 ` Christian Couder 0 siblings, 2 replies; 31+ messages in thread From: Christian Couder @ 2023-12-12 16:06 UTC (permalink / raw) To: Zach FettersMoore; +Cc: Zach FettersMoore via GitGitGadget, git On Mon, Dec 11, 2023 at 4:39 PM Zach FettersMoore <zach.fetters@apollographql.com> wrote: > > >> > >> From: Zach FettersMoore <zach.fetters@apollographql.com> > >> To see this in practice you can use the open source GitHub repo > >> 'apollo-ios-dev' and do the following in order: > >> > >> -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen' > >> directories > >> -Create a commit containing these changes > >> -Do a split on apollo-ios-codegen > >> - Do a fetch on the subtree repo > >> - git fetch git@github.com:apollographql/apollo-ios-codegen.git > >> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin > > > Now I get the following without your patch at this step: > > > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin > > [...]/libexec/git-core/git-subtree: 318: Maximum function recursion > > depth (1000) reached > > > > Line 318 in git-subtree.sh contains the following: > > > > missed=$(cache_miss "$@") || exit $? > > > > With your patch it seems to work: > > > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin > > Merge made by the 'ort' strategy. > > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4 > > Looking into this some it looks like it could be a bash config > difference? My machine always runs it all the way through vs > failing for recursion depth. Although that would also be an issue > which is solved by this fix. I use Ubuntu where /bin/sh is dash so my current guess is that dash might have a smaller recursion limit than bash. I just found https://stackoverflow.com/questions/69493528/git-subtree-maximum-function-recursion-depth which seems to agree. I will try to test using bash soon. > >> - Depending on the current state of the 'apollo-ios-dev' repo > >> you may see the issue at this point if the last split was on > >> apollo-ios > > > I guess I see it, but it seems a bit different for me than what you describe. > > > > Otherwise your patch looks good to me now. > > Yea I hadn't accounted for/realized that some folks may see a recursion > depth error vs it just taking a long time like it does for me. Also what > I was saying with the apollo-ios-dev repo is you may not need all the steps > to see the issue, because its possible the state of the repo is already > in a position to display the issue just by doing a split on > apollo-ios-codegen. > > Great! Thanks again for all the feedback and guidance! Is there anything > else I need to do to get this across the finish line and merged in? Hopefully I will be able to confirm I see the same error as you with bash soon, and it will be enough to get it merged. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2023-12-12 16:06 ` Christian Couder @ 2023-12-12 22:28 ` Junio C Hamano 2023-12-13 15:20 ` Zach FettersMoore 2023-12-20 15:25 ` Christian Couder 1 sibling, 1 reply; 31+ messages in thread From: Junio C Hamano @ 2023-12-12 22:28 UTC (permalink / raw) To: Christian Couder Cc: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git Christian Couder <christian.couder@gmail.com> writes: >> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin >> > Merge made by the 'ort' strategy. >> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4 >> >> Looking into this some it looks like it could be a bash config >> difference? My machine always runs it all the way through vs >> failing for recursion depth. Although that would also be an issue >> which is solved by this fix. > > I use Ubuntu where /bin/sh is dash so my current guess is that dash > might have a smaller recursion limit than bash. That sounds quite bad. Does it have to be recursive (iow, if we can rewrite the logic to be iterative instead, that would be a much better way to fix the issue)? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2023-12-12 22:28 ` Junio C Hamano @ 2023-12-13 15:20 ` Zach FettersMoore 2024-01-03 16:33 ` Christian Couder 0 siblings, 1 reply; 31+ messages in thread From: Zach FettersMoore @ 2023-12-13 15:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: Christian Couder, Zach FettersMoore via GitGitGadget, git Christian Couder <christian.couder@gmail.com> writes: >>> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin >>> > Merge made by the 'ort' strategy. >>> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4 >>> >>> Looking into this some it looks like it could be a bash config >>> difference? My machine always runs it all the way through vs >>> failing for recursion depth. Although that would also be an issue >>> which is solved by this fix. >> >> I use Ubuntu where /bin/sh is dash so my current guess is that dash >> might have a smaller recursion limit than bash. > > That sounds quite bad. Does it have to be recursive (iow, if we can > rewrite the logic to be iterative instead, that would be a much better > way to fix the issue)? I don't think an iterative vs recursive approach fixes this particular issue, the root of the issue this patch is fixing is that lots of commits from the history of subtrees not being acted upon are being processed when they don't need to be. So the iterative approach would likely resolve the recursion limit issue for some shells, but in my instance I don't see a recursion limit error, it just takes an extraordinary amount of time to run the split command because of all the unnecessary processing which needs to be avoided which this patch fixes. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2023-12-13 15:20 ` Zach FettersMoore @ 2024-01-03 16:33 ` Christian Couder 0 siblings, 0 replies; 31+ messages in thread From: Christian Couder @ 2024-01-03 16:33 UTC (permalink / raw) To: Zach FettersMoore; +Cc: Junio C Hamano, Zach FettersMoore via GitGitGadget, git (Sorry for replying only to Zach instead of everyone previously.) On Wed, Dec 13, 2023 at 4:20 PM Zach FettersMoore <zach.fetters@apollographql.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > >>> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin > >>> > Merge made by the 'ort' strategy. > >>> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4 > >>> > >>> Looking into this some it looks like it could be a bash config > >>> difference? My machine always runs it all the way through vs > >>> failing for recursion depth. Although that would also be an issue > >>> which is solved by this fix. > >> > >> I use Ubuntu where /bin/sh is dash so my current guess is that dash > >> might have a smaller recursion limit than bash. > > > > That sounds quite bad. Does it have to be recursive (iow, if we can > > rewrite the logic to be iterative instead, that would be a much better > > way to fix the issue)? > > I don't think an iterative vs recursive approach fixes this > particular issue, the root of the issue this patch is fixing > is that lots of commits from the history of subtrees not > being acted upon are being processed when they don't need to > be. So the iterative approach would likely resolve the > recursion limit issue for some shells, but in my instance > I don't see a recursion limit error, it just takes an > extraordinary amount of time to run the split command > because of all the unnecessary processing which needs to be > avoided which this patch fixes. Fixing possible recursion might be an improvement on top of your patch. But without your patch the test case it describes would anyway take a lot more time than seems necessary. So I agree that your patch should definitely be merged anyway. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2023-12-12 16:06 ` Christian Couder 2023-12-12 22:28 ` Junio C Hamano @ 2023-12-20 15:25 ` Christian Couder 2024-01-25 10:09 ` Christian Couder 1 sibling, 1 reply; 31+ messages in thread From: Christian Couder @ 2023-12-20 15:25 UTC (permalink / raw) To: Zach FettersMoore; +Cc: Zach FettersMoore via GitGitGadget, git On Tue, Dec 12, 2023 at 5:06 PM Christian Couder <christian.couder@gmail.com> wrote: > > On Mon, Dec 11, 2023 at 4:39 PM Zach FettersMoore > <zach.fetters@apollographql.com> wrote: > > > > >> > > >> From: Zach FettersMoore <zach.fetters@apollographql.com> > > > >> To see this in practice you can use the open source GitHub repo > > >> 'apollo-ios-dev' and do the following in order: > > >> > > >> -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen' > > >> directories > > >> -Create a commit containing these changes > > >> -Do a split on apollo-ios-codegen > > >> - Do a fetch on the subtree repo > > >> - git fetch git@github.com:apollographql/apollo-ios-codegen.git > > >> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin > > > > > Now I get the following without your patch at this step: > > > > > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin > > > [...]/libexec/git-core/git-subtree: 318: Maximum function recursion > > > depth (1000) reached > > > > > > Line 318 in git-subtree.sh contains the following: > > > > > > missed=$(cache_miss "$@") || exit $? > > > > > > With your patch it seems to work: > > > > > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin > > > Merge made by the 'ort' strategy. > > > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4 > > > > Looking into this some it looks like it could be a bash config > > difference? My machine always runs it all the way through vs > > failing for recursion depth. Although that would also be an issue > > which is solved by this fix. > > I use Ubuntu where /bin/sh is dash so my current guess is that dash > might have a smaller recursion limit than bash. > > I just found https://stackoverflow.com/questions/69493528/git-subtree-maximum-function-recursion-depth > which seems to agree. > > I will try to test using bash soon. Sorry, to not have tried earlier before with bash. Now I have tried it and yeah it works fine with you patch, while without it the last step of the reproduction recipe takes a lot of time and results in a core dump: /home/christian/libexec/git-core/git-subtree: line 924: 857920 Done eval "$grl" 857921 Segmentation fault (core dumped) | while read rev parents; do process_split_commit "$rev" "$parents"; done So overall I think your patch is great! Thanks! ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2023-12-20 15:25 ` Christian Couder @ 2024-01-25 10:09 ` Christian Couder 2024-01-25 16:38 ` Junio C Hamano 0 siblings, 1 reply; 31+ messages in thread From: Christian Couder @ 2024-01-25 10:09 UTC (permalink / raw) To: Zach FettersMoore, Junio C Hamano; +Cc: Zach FettersMoore via GitGitGadget, git It seems that this topic has fallen into the cracks or something, while the associated pch looks good to me. On Wed, Dec 20, 2023 at 4:25 PM Christian Couder <christian.couder@gmail.com> wrote: > > On Tue, Dec 12, 2023 at 5:06 PM Christian Couder > <christian.couder@gmail.com> wrote: > > > > On Mon, Dec 11, 2023 at 4:39 PM Zach FettersMoore > > <zach.fetters@apollographql.com> wrote: > > > > > > >> > > > >> From: Zach FettersMoore <zach.fetters@apollographql.com> > > > > > >> To see this in practice you can use the open source GitHub repo > > > >> 'apollo-ios-dev' and do the following in order: > > > >> > > > >> -Make a changes to a file in 'apollo-ios' and 'apollo-ios-codegen' > > > >> directories > > > >> -Create a commit containing these changes > > > >> -Do a split on apollo-ios-codegen > > > >> - Do a fetch on the subtree repo > > > >> - git fetch git@github.com:apollographql/apollo-ios-codegen.git > > > >> - git subtree split --prefix=apollo-ios-codegen --squash --rejoin > > > > > > > Now I get the following without your patch at this step: > > > > > > > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin > > > > [...]/libexec/git-core/git-subtree: 318: Maximum function recursion > > > > depth (1000) reached > > > > > > > > Line 318 in git-subtree.sh contains the following: > > > > > > > > missed=$(cache_miss "$@") || exit $? > > > > > > > > With your patch it seems to work: > > > > > > > > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin > > > > Merge made by the 'ort' strategy. > > > > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4 > > > > > > Looking into this some it looks like it could be a bash config > > > difference? My machine always runs it all the way through vs > > > failing for recursion depth. Although that would also be an issue > > > which is solved by this fix. > > > > I use Ubuntu where /bin/sh is dash so my current guess is that dash > > might have a smaller recursion limit than bash. > > > > I just found https://stackoverflow.com/questions/69493528/git-subtree-maximum-function-recursion-depth > > which seems to agree. > > > > I will try to test using bash soon. > > Sorry, to not have tried earlier before with bash. > > Now I have tried it and yeah it works fine with you patch, while > without it the last step of the reproduction recipe takes a lot of > time and results in a core dump: > > /home/christian/libexec/git-core/git-subtree: line 924: 857920 Done > eval "$grl" > 857921 Segmentation fault (core dumped) | while read rev parents; do > process_split_commit "$rev" "$parents"; > done > > So overall I think your patch is great! Thanks! ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2024-01-25 10:09 ` Christian Couder @ 2024-01-25 16:38 ` Junio C Hamano 2024-01-25 18:52 ` Christian Couder 0 siblings, 1 reply; 31+ messages in thread From: Junio C Hamano @ 2024-01-25 16:38 UTC (permalink / raw) To: Christian Couder Cc: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git Christian Couder <christian.couder@gmail.com> writes: > It seems that this topic has fallen into the cracks or something, > while the associated pch looks good to me. Yeah, it wasn't clear to me that your message you are responding to was your Reviewed-by:. If I recall my impression correctly from the time I skimmed its proposed log message the last time, it focused on describing a single failure case the author encountered in the real world and said that the patch changed the behaviour to correct that single case, and was not very clear if it was meant as a general fix. Is the patch text, including its proposed patch description, satisfactory to you? In other words, is the above your Reviewed-by:? Thanks for pinging the thread. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2024-01-25 16:38 ` Junio C Hamano @ 2024-01-25 18:52 ` Christian Couder 2024-01-25 18:56 ` Junio C Hamano 0 siblings, 1 reply; 31+ messages in thread From: Christian Couder @ 2024-01-25 18:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git On Thu, Jan 25, 2024 at 5:38 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > It seems that this topic has fallen into the cracks or something, > > while the associated pch looks good to me. > > Yeah, it wasn't clear to me that your message you are responding to > was your Reviewed-by:. If I recall my impression correctly from the > time I skimmed its proposed log message the last time, it focused on > describing a single failure case the author encountered in the real > world and said that the patch changed the behaviour to correct that > single case, and was not very clear if it was meant as a general > fix. Is the patch text, including its proposed patch description, > satisfactory to you? In other words, is the above your Reviewed-by:? Yes, it's satisfactory for me, and I am Ok to give my Reviewed-by:, thanks! ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present 2024-01-25 18:52 ` Christian Couder @ 2024-01-25 18:56 ` Junio C Hamano 0 siblings, 0 replies; 31+ messages in thread From: Junio C Hamano @ 2024-01-25 18:56 UTC (permalink / raw) To: Christian Couder Cc: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git Christian Couder <christian.couder@gmail.com> writes: > On Thu, Jan 25, 2024 at 5:38 PM Junio C Hamano <gitster@pobox.com> wrote: >> >> Christian Couder <christian.couder@gmail.com> writes: >> >> > It seems that this topic has fallen into the cracks or something, >> > while the associated pch looks good to me. >> >> Yeah, it wasn't clear to me that your message you are responding to >> was your Reviewed-by:. If I recall my impression correctly from the >> time I skimmed its proposed log message the last time, it focused on >> describing a single failure case the author encountered in the real >> world and said that the patch changed the behaviour to correct that >> single case, and was not very clear if it was meant as a general >> fix. Is the patch text, including its proposed patch description, >> satisfactory to you? In other words, is the above your Reviewed-by:? > > Yes, it's satisfactory for me, and I am Ok to give my Reviewed-by:, thanks! Thanks. Will queue. ^ permalink raw reply [flat|nested] 31+ messages in thread
* subtree: [v2.44 regression] split may produce different history 2023-12-01 14:54 ` [PATCH v6] " Zach FettersMoore via GitGitGadget 2023-12-04 11:08 ` Christian Couder @ 2025-08-21 3:13 ` Colin Stagner 1 sibling, 0 replies; 31+ messages in thread From: Colin Stagner @ 2025-08-21 3:13 UTC (permalink / raw) To: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git Cc: Christian Couder 98ba49ccc247 likely introduces a regression in "git subtree split" [1] [2]. For some inputs, the split history is incomplete and does not match previous git versions. For git subtree split -P somedir if the history of `somedir` also contains *squashed* subtree *merges*, the split history may be incomplete. MWE follows: ```bash git init mwe && cd mwe # create history we will subtree merge later git checkout --orphan deeper touch two_deep && git add two_deep git commit -m 'deeper: a nested subtree' # create top-level project history with one # subproject in a directory sub/ git checkout --orphan main git reset --hard echo 'A test for git-subtree'>README.txt mkdir sub && touch sub/README.sub.txt git add . git commit -m 'Initial commit' # add "deeper" branch as sub/deeper # the --squash is important here since it omits # "git-subtree-mainline:", which 98ba49ccc247 # looks for in `should_ignore_subtree_split_commit()` git subtree add --squash -P sub/deeper deeper ``` Now `git ls-tree --name-only -r main` looks like this: README.txt sub/README.sub.txt sub/deeper/two_deep We can split `sub` off as its own top-level history. ```bash git ls-tree -r --name-only -- \ "$(git subtree.sh split -P sub)" ``` Before the patch, that looks like: README.sub.txt deeper/two_deep Which is correct. After the patch, there is only: README.sub.txt which is missing the entire `deeper/` directory. (The hash output from `split` is also different.) I suspect the test in `should_ignore_subtree_split_commit ()` inadvertently rejects commits that should be kept. I tested with Git binaries from v2.43 on Ubuntu [3]. [1]: https://git.kernel.org/pub/scm/git/git.git/commit/?id=98ba49ccc247c3521659aa3d43c970e8978922c5 [2]: https://lore.kernel.org/all/pull.1587.v6.git.1701442494319.gitgitgadget@gmail.com/ [3]: http://archive.ubuntu.com/ubuntu/pool/main/g/git/git_2.43.0-1ubuntu7.3_amd64.deb ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-08-21 3:51 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-18 20:05 [PATCH] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 2023-09-18 23:31 ` Junio C Hamano 2023-09-19 1:04 ` Junio C Hamano 2023-10-26 19:59 ` Zach FettersMoore 2023-09-22 16:25 ` [PATCH v2 0/2] " Zach FettersMoore via GitGitGadget 2023-09-22 16:25 ` [PATCH v2 1/2] " Zach FettersMoore via GitGitGadget 2023-09-22 16:25 ` [PATCH v2 2/2] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget 2023-09-29 20:32 ` [PATCH v3 0/3] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 2023-09-29 20:32 ` [PATCH v3 1/3] " Zach FettersMoore via GitGitGadget 2023-09-29 20:32 ` [PATCH v3 2/3] subtree: changing location of commit ignore processing Zach FettersMoore via GitGitGadget 2023-09-29 20:33 ` [PATCH v3 3/3] subtree: adding test to validate fix Zach FettersMoore via GitGitGadget 2023-10-17 20:02 ` Zach FettersMoore 2023-10-26 19:17 ` [PATCH v4] subtree: fix split processing with multiple subtrees present Zach FettersMoore via GitGitGadget 2023-11-18 11:28 ` Christian Couder 2023-11-28 21:04 ` Zach FettersMoore 2023-11-28 21:17 ` [PATCH v5] " Zach FettersMoore via GitGitGadget 2023-11-30 20:33 ` Christian Couder 2023-11-30 21:01 ` Zach FettersMoore 2023-12-01 14:54 ` [PATCH v6] " Zach FettersMoore via GitGitGadget 2023-12-04 11:08 ` Christian Couder 2023-12-11 15:39 ` Zach FettersMoore 2023-12-12 16:06 ` Christian Couder 2023-12-12 22:28 ` Junio C Hamano 2023-12-13 15:20 ` Zach FettersMoore 2024-01-03 16:33 ` Christian Couder 2023-12-20 15:25 ` Christian Couder 2024-01-25 10:09 ` Christian Couder 2024-01-25 16:38 ` Junio C Hamano 2024-01-25 18:52 ` Christian Couder 2024-01-25 18:56 ` Junio C Hamano 2025-08-21 3:13 ` subtree: [v2.44 regression] split may produce different history Colin Stagner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).