git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] contrib/subtree: fix split with squashed subtrees
@ 2025-08-24 19:10 Colin Stagner
  2025-09-01 13:54 ` Phillip Wood
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Colin Stagner @ 2025-08-24 19:10 UTC (permalink / raw)
  To: git; +Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher,
	Colin Stagner

98ba49ccc2 (subtree: fix split processing with multiple subtrees
present, 2023-12-01) increases the performance of

    git subtree split --prefix=subA

by ignoring subtree merges which are outside of `subA/`. It also
introduces a regression. Subtree merges that should be retained
are incorrectly ignored if they:

1. are nested under `subA/`; and
2. are merged with `--squash`.

For example, a subtree merged like:

    git subtree merge --squash --prefix=subA/subB "$rev"
    #                 ^^^^^^^^          ^^^^

is erroneously ignored during a split of `subA`. This causes
missing tree files and different commit hashes starting in
git v2.44.0-rc0.

The method:

    should_ignore_subtree_split_commit REV

should test only if REV is a subtree commit, but the combination of

    git log -1 --grep=...

actually searches all *parent* commits until a `--grep` match is
discovered. Limit these checks to the current REV only.

Tests now cover nested subtrees.

Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---

Notes:
    The unit test changes in t7900-subtree.sh demonstrate the bug.
    
    See also:
    
    * <pull.1587.v5.git.1701206267300.gitgitgadget@gmail.com>
    * <c9e8f54f-2594-4092-ae41-f1da73e97f6e@howdoi.land>

 contrib/subtree/git-subtree.sh     |  6 +--
 contrib/subtree/t/t7900-subtree.sh | 70 ++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 3fddba797c..139049351d 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -788,17 +788,17 @@ ensure_valid_ref_format () {
 # Usage: check if a commit from another subtree should be
 # ignored from processing for splits
 should_ignore_subtree_split_commit () {
 	assert test $# = 1
 	local rev="$1"
-	if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
+	if test -n "$(git log -1 --grep="git-subtree-dir:" "$rev^!")"
 	then
-		if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" &&
-			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)"
+		if test -z "$(git log -1 --grep="git-subtree-mainline:" "$rev^!")" &&
+			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" "$rev^!")"
 		then
 			return 0
 		fi
 	fi
 	return 1
 }
 
 # Usage: process_split_commit REV PARENTS
diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
index 3edbb33af4..1ddc213621 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -68,6 +68,34 @@ test_create_pre2_32_repo () {
 	git -C "$1-clone" replace HEAD^2 $new_commit
 }
 
+# test_create_subtree_add REPO ORPHAN PREFIX FILENAME ...
+#
+# Create a simple subtree on a new branch named ORPHAN in REPO.
+# The subtree is then merged into the current branch of REPO,
+# under PREFIX. The generated subtree has has one commit
+# with subject and tag FILENAME with a single file "FILENAME.t"
+#
+# When this method returns:
+# - the current branch of REPO will have file PREFIX/FILENAME.t
+# - REPO will have a branch named ORPHAN with subtree history
+#
+# additional arguments are forwarded to "subtree add"
+test_create_subtree_add () {
+	(
+		cd "$1" &&
+		orphan="$2" &&
+		prefix="$3" &&
+		filename="$4" &&
+		shift 4 &&
+		last="$(git branch --show-current)" &&
+		git checkout --orphan "$orphan" &&
+		git rm -rf . &&
+		test_commit "$filename" &&
+		git checkout "$last" &&
+		git subtree add --prefix="$prefix" "$@" "$orphan"
+	)
+}
+
 test_expect_success 'shows short help text for -h' '
 	test_expect_code 129 git subtree -h >out 2>err &&
 	test_must_be_empty err &&
@@ -426,6 +454,48 @@ test_expect_success 'split with multiple subtrees' '
 		--squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
 '
 
+# When subtree split-ing a directory that has other subtree
+# *merges* underneath it, the split must include those subtrees.
+# This test creates a nested subtree, `subA/subB`, and tests
+# that the tree is correct after a subtree split of `subA/`.
+# The test covers:
+# - An initial `subtree add`; and
+# - A follow-up `subtree merge`
+# both with and without `--squashed`.
+for is_squashed in '' 'y';
+do
+	test_expect_success "split keeps nested ${is_squashed:+--squash }subtrees that are part of the split" '
+		subtree_test_create_repo "$test_count" &&
+		(
+			cd "$test_count" &&
+			mkdir subA &&
+			test_commit subA/file1 &&
+			git branch -m main &&
+			test_create_subtree_add \
+				. mksubtree subA/subB file2 ${is_squashed:+--squash} &&
+			test -e subA/file1.t &&
+			test -e subA/subB/file2.t &&
+			git subtree split --prefix=subA --branch=bsplit &&
+			git checkout bsplit &&
+			test -e file1.t &&
+			test -e subB/file2.t &&
+			git checkout mksubtree &&
+			git branch -D bsplit &&
+			test_commit file3 &&
+			git checkout main &&
+			git subtree merge \
+				${is_squashed:+--squash} \
+				--prefix=subA/subB mksubtree &&
+			test -e subA/subB/file3.t &&
+			git subtree split --prefix=subA --branch=bsplit &&
+			git checkout bsplit &&
+			test -e file1.t &&
+			test -e subB/file2.t &&
+			test -e subB/file3.t
+		)
+	'
+done
+
 test_expect_success 'split sub dir/ with --rejoin from scratch' '
 	subtree_test_create_repo "$test_count" &&
 	test_create_commit "$test_count" main1 &&

base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] contrib/subtree: fix split with squashed subtrees
  2025-08-24 19:10 [PATCH] contrib/subtree: fix split with squashed subtrees Colin Stagner
@ 2025-09-01 13:54 ` Phillip Wood
  2025-09-01 20:43   ` Colin Stagner
  2025-09-05  2:27 ` [PATCH v2] " Colin Stagner
  2025-09-10  3:11 ` [PATCH v3] " Colin Stagner
  2 siblings, 1 reply; 15+ messages in thread
From: Phillip Wood @ 2025-09-01 13:54 UTC (permalink / raw)
  To: Colin Stagner, git
  Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher

Hi Colin

On 24/08/2025 20:10, Colin Stagner wrote:
> 98ba49ccc2 (subtree: fix split processing with multiple subtrees
> present, 2023-12-01) increases the performance of
> 
>      git subtree split --prefix=subA
> 
> by ignoring subtree merges which are outside of `subA/`. It also
> introduces a regression. Subtree merges that should be retained
> are incorrectly ignored if they:
> 
> 1. are nested under `subA/`; and
> 2. are merged with `--squash`.
> 
> For example, a subtree merged like:
> 
>      git subtree merge --squash --prefix=subA/subB "$rev"
>      #                 ^^^^^^^^          ^^^^
> 
> is erroneously ignored during a split of `subA`. This causes
> missing tree files and different commit hashes starting in
> git v2.44.0-rc0.
> 
> The method:
> 
>      should_ignore_subtree_split_commit REV
> 
> should test only if REV is a subtree commit, but the combination of
> 
>      git log -1 --grep=...
> 
> actually searches all *parent* commits until a `--grep` match is
> discovered. Limit these checks to the current REV only.

Thanks for the clear explanation of the problem and the proposed solution.

> Tests now cover nested subtrees.

Great
> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index 3fddba797c..139049351d 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -788,17 +788,17 @@ ensure_valid_ref_format () {
>   # Usage: check if a commit from another subtree should be
>   # ignored from processing for splits
>   should_ignore_subtree_split_commit () {
>   	assert test $# = 1
>   	local rev="$1"
> -	if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
> +	if test -n "$(git log -1 --grep="git-subtree-dir:" "$rev^!")"

This makes sense as we only want to grep the current commit. We could 
drop the "-1" as we're only considering a single commit.

>   	then
> -		if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" &&
> -			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)"
> +		if test -z "$(git log -1 --grep="git-subtree-mainline:" "$rev^!")" &&
> +			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" "$rev^!")"

I'm less sure about this change. Is the second test checking making sure 
we don't prune this commit if it has an ancestor that is a subtree merge 
for the subtree we're interested in? It would be very helpful if Zach 
could comment on what was intended here.

If it turns out that all three tests only want to consider a single 
commit then it would be be more efficient to run a single git command 
and check the output with something like

	git show -s 
--format='%(trailers:key=git-subtree-dir,key=git-subtree-mainline' $rev 
| while read trailer
		do
			# check trailers here using case "$trailer"
		done

Thanks

Phillip

>   		then
>   			return 0
>   		fi
>   	fi
>   	return 1
>   }
>   
>   # Usage: process_split_commit REV PARENTS
> diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
> index 3edbb33af4..1ddc213621 100755
> --- a/contrib/subtree/t/t7900-subtree.sh
> +++ b/contrib/subtree/t/t7900-subtree.sh
> @@ -68,6 +68,34 @@ test_create_pre2_32_repo () {
>   	git -C "$1-clone" replace HEAD^2 $new_commit
>   }
>   
> +# test_create_subtree_add REPO ORPHAN PREFIX FILENAME ...
> +#
> +# Create a simple subtree on a new branch named ORPHAN in REPO.
> +# The subtree is then merged into the current branch of REPO,
> +# under PREFIX. The generated subtree has has one commit
> +# with subject and tag FILENAME with a single file "FILENAME.t"
> +#
> +# When this method returns:
> +# - the current branch of REPO will have file PREFIX/FILENAME.t
> +# - REPO will have a branch named ORPHAN with subtree history
> +#
> +# additional arguments are forwarded to "subtree add"
> +test_create_subtree_add () {
> +	(
> +		cd "$1" &&
> +		orphan="$2" &&
> +		prefix="$3" &&
> +		filename="$4" &&
> +		shift 4 &&
> +		last="$(git branch --show-current)" &&
> +		git checkout --orphan "$orphan" &&
> +		git rm -rf . &&
> +		test_commit "$filename" &&
> +		git checkout "$last" &&
> +		git subtree add --prefix="$prefix" "$@" "$orphan"
> +	)
> +}
> +
>   test_expect_success 'shows short help text for -h' '
>   	test_expect_code 129 git subtree -h >out 2>err &&
>   	test_must_be_empty err &&
> @@ -426,6 +454,48 @@ test_expect_success 'split with multiple subtrees' '
>   		--squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
>   '
>   
> +# When subtree split-ing a directory that has other subtree
> +# *merges* underneath it, the split must include those subtrees.
> +# This test creates a nested subtree, `subA/subB`, and tests
> +# that the tree is correct after a subtree split of `subA/`.
> +# The test covers:
> +# - An initial `subtree add`; and
> +# - A follow-up `subtree merge`
> +# both with and without `--squashed`.
> +for is_squashed in '' 'y';
> +do
> +	test_expect_success "split keeps nested ${is_squashed:+--squash }subtrees that are part of the split" '
> +		subtree_test_create_repo "$test_count" &&
> +		(
> +			cd "$test_count" &&
> +			mkdir subA &&
> +			test_commit subA/file1 &&
> +			git branch -m main &&
> +			test_create_subtree_add \
> +				. mksubtree subA/subB file2 ${is_squashed:+--squash} &&
> +			test -e subA/file1.t &&
> +			test -e subA/subB/file2.t &&
> +			git subtree split --prefix=subA --branch=bsplit &&
> +			git checkout bsplit &&
> +			test -e file1.t &&
> +			test -e subB/file2.t &&
> +			git checkout mksubtree &&
> +			git branch -D bsplit &&
> +			test_commit file3 &&
> +			git checkout main &&
> +			git subtree merge \
> +				${is_squashed:+--squash} \
> +				--prefix=subA/subB mksubtree &&
> +			test -e subA/subB/file3.t &&
> +			git subtree split --prefix=subA --branch=bsplit &&
> +			git checkout bsplit &&
> +			test -e file1.t &&
> +			test -e subB/file2.t &&
> +			test -e subB/file3.t
> +		)
> +	'
> +done
> +
>   test_expect_success 'split sub dir/ with --rejoin from scratch' '
>   	subtree_test_create_repo "$test_count" &&
>   	test_create_commit "$test_count" main1 &&
> 
> base-commit: c44beea485f0f2feaf460e2ac87fdd5608d63cf0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] contrib/subtree: fix split with squashed subtrees
  2025-09-01 13:54 ` Phillip Wood
@ 2025-09-01 20:43   ` Colin Stagner
  2025-09-02 13:22     ` Phillip Wood
  0 siblings, 1 reply; 15+ messages in thread
From: Colin Stagner @ 2025-09-01 20:43 UTC (permalink / raw)
  To: git, phillip.wood
  Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher,
	Colin Stagner

On 9/1/25 08:54, Phillip Wood wrote:

Colin Stagner <ask+git@howdoi.land> writes:

>> -    if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
>> +    if test -n "$(git log -1 --grep="git-subtree-dir:" "$rev^!")"
> 
> We could drop the "-1" as we're only considering a single commit.

Concur.

>> -        if test -z "$(git log -1 --grep="git-subtree-mainline:" 
>> $rev)" &&
>> -            test -z "$(git log -1 --grep="git-subtree-dir: 
>> $arg_prefix$" $rev)"
>> +        if test -z "$(git log -1 --grep="git-subtree-mainline:" 
>> "$rev^!")" &&
>> +            test -z "$(git log -1 --grep="git-subtree-dir: 
>> $arg_prefix$" "$rev^!")"
>
> I'm less sure about this change. Is the second test checking
> making sure we don't prune this commit if it has an ancestor
> that is a subtree merge for the subtree we're interested in?

The outer loop in git-subtree.sh:983 appears to iterate from the root 
commit forwards… and not from the HEAD backwards.

     git rev-list --topo-order --reverse --parents $rev $unrevs
     #                         ^^^^^^^^^

Since the iteration is ancestor-first, I'm having difficulty seeing why 
`should_ignore_subtree_split_commit()` would want to do an ancestor 
traversal at all. It already sees the commits ancestor-first. But there 
could be a reason that I don't know.


Here is a more long-winded breakdown of these tests. From what I can 
determine:

     test -z "$(git log -1 --grep="git-subtree-mainline:" "$rev")

excludes squashed commits created from

     git subtree merge --prefix subM --squash srcBranch

The --squash creates two commits:

1. A single-parent "Squashed 'subM/' content from", which
    squashes the changes from srcBranch. This commit's tree
    is like the one on srcBranch. It does not have the `subM/`
    prefix.

2. A merge commit which rewrites the tree in (1) to add
    the `subM/` leading directory, then merge it with the
    current branch. The merge commit doesn't have any
    `git-subtree:` trailers.

We must exclude (1) since the trees aren't actually compatible with 
HEAD. (They don't have the `subM` prefix). We must keep (2). The above 
`test -z` appears to do this.


I am *much* less certain about the second test:

    test -z "$(git log -1 \
               --grep="git-subtree-dir: $arg_prefix$" $rev)"

I think this was intended to keep the mainline portion from a previous 
`git subtree split --rejoin`. But if I remove this `test -z`, all the 
unit tests still pass—including mine. There may not be any test coverage 
for this line. I will probably omit this `test` from v2.


> It would be very helpful if Zach could comment on what was intended here.

Yes, this would aid my understanding a lot.


> If it turns out that all three tests only want to consider a single 
> commit then it would be be more efficient to run a single git command 
> and check the output with something like
> 
>      git show -s --format='%(trailers:key=git-subtree-dir,key=git- 
> subtree-mainline' $rev | while read trailer
>          do
>              # check trailers here using case "$trailer"
>          done

This is a cleaner approach, and I'll explore it for v2. Any objection to 
long options like `--no-patch` instead of `-s`? I find these are better 
for scripts since there's less hunting around in man pages.

Thanks for your review,

Colin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] contrib/subtree: fix split with squashed subtrees
  2025-09-01 20:43   ` Colin Stagner
@ 2025-09-02 13:22     ` Phillip Wood
  2025-09-02 14:57       ` Phillip Wood
  0 siblings, 1 reply; 15+ messages in thread
From: Phillip Wood @ 2025-09-02 13:22 UTC (permalink / raw)
  To: Colin Stagner, git, phillip.wood
  Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher

On 01/09/2025 21:43, Colin Stagner wrote:
> On 9/1/25 08:54, Phillip Wood wrote:
> Colin Stagner <ask+git@howdoi.land> writes:
>>> -    if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
>>> +    if test -n "$(git log -1 --grep="git-subtree-dir:" "$rev^!")"
>>
>> We could drop the "-1" as we're only considering a single commit.
> 
> Concur.
> 
>>> -        if test -z "$(git log -1 --grep="git-subtree-mainline:" 
>>> $rev)" &&
>>> -            test -z "$(git log -1 --grep="git-subtree-dir: 
>>> $arg_prefix$" $rev)"
>>> +        if test -z "$(git log -1 --grep="git-subtree-mainline:" 
>>> "$rev^!")" &&
>>> +            test -z "$(git log -1 --grep="git-subtree-dir: 
>>> $arg_prefix$" "$rev^!")"
>>
>> I'm less sure about this change. Is the second test checking
>> making sure we don't prune this commit if it has an ancestor
>> that is a subtree merge for the subtree we're interested in?
> 
> The outer loop in git-subtree.sh:983 appears to iterate from the root 
> commit forwards… and not from the HEAD backwards.
> 
>      git rev-list --topo-order --reverse --parents $rev $unrevs
>      #                         ^^^^^^^^^
> 
> Since the iteration is ancestor-first, I'm having difficulty seeing why 
> `should_ignore_subtree_split_commit()` would want to do an ancestor 
> traversal at all. It already sees the commits ancestor-first. But there 
> could be a reason that I don't know.

Ah, I was only looking at this patch, not how it was called. That begs 
the question "what's the point of these checks if we've already visited 
all the ancestors anyway". I think the answer is that it is pruning the 
recursion that happens in check_parent() and checking the commits that 
come from that rev-list command is pointless. The regression test 
introduced with this function only looks at $extracount which comes from 
the recursion. I haven't looked too closely but it would be nice if we 
could move this check so it is only run when check_parents() is recursing.

> Here is a more long-winded breakdown of these tests. From what I can 
> determine:
> 
>      test -z "$(git log -1 --grep="git-subtree-mainline:" "$rev")
> 
> excludes squashed commits created from
> 
>      git subtree merge --prefix subM --squash srcBranch
 > > The --squash creates two commits:
> 
> 1. A single-parent "Squashed 'subM/' content from", which
>     squashes the changes from srcBranch. This commit's tree
>     is like the one on srcBranch. It does not have the `subM/`
>     prefix.
Yes, and the commit message comes from squash_msg() which adds the 
"git-subtree-dir:" and "git-subtree-split:" trailers but not 
"git-subtree-mainline:".

> 2. A merge commit which rewrites the tree in (1) to add
>     the `subM/` leading directory, then merge it with the
>     current branch. The merge commit doesn't have any
>     `git-subtree:` trailers.

Indeed. Running

	git subtree split --squash --rejoin --prefix subM

will create a squash commit as above and a merge with a commit message 
created by rejoin_msg() and contains the "git-subtree-dir:", 
"git-subtree-split:" and "git-subtree-mainline:" trailers.

> We must exclude (1) since the trees aren't actually compatible with 
> HEAD. (They don't have the `subM` prefix). We must keep (2). The above 
> `test -z` appears to do this.

So we'll exclude squashed merges but not squashed splits? I think you're 
right that we don't want "git log" to walk the history here.

> I am *much* less certain about the second test:
> 
>     test -z "$(git log -1 \
>                --grep="git-subtree-dir: $arg_prefix$" $rev)"
> 
> I think this was intended to keep the mainline portion from a previous 
> `git subtree split --rejoin`. But if I remove this `test -z`, all the 
> unit tests still pass—including mine. There may not be any test coverage 
> for this line. I will probably omit this `test` from v2.

I'm not very familiar with git-subtree but I thought this was ensuring 
that we did not exclude the ancestors of a squash or split that involves 
the subtree that we're interested in. I wouldn't be surprised if the 
test coverage was lacking.
>> It would be very helpful if Zach could comment on what was intended here.
> 
> Yes, this would aid my understanding a lot.

and mine too.

>> If it turns out that all three tests only want to consider a single 
>> commit then it would be be more efficient to run a single git command 
>> and check the output with something like
>>
>>      git show -s --format='%(trailers:key=git-subtree-dir,key=git- 
>> subtree-mainline' $rev | while read trailer
>>          do
>>              # check trailers here using case "$trailer"
>>          done
> 
> This is a cleaner approach, and I'll explore it for v2. Any objection to 
> long options like `--no-patch` instead of `-s`? I find these are better 
> for scripts since there's less hunting around in man pages.

Sure, I used '-s' out of habit but '--no-patch' would be clearer 
(especially as I can't see any link between the short and long option 
names in this case)

Thanks

Phillip


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] contrib/subtree: fix split with squashed subtrees
  2025-09-02 13:22     ` Phillip Wood
@ 2025-09-02 14:57       ` Phillip Wood
  2025-09-04  1:34         ` Colin Stagner
  0 siblings, 1 reply; 15+ messages in thread
From: Phillip Wood @ 2025-09-02 14:57 UTC (permalink / raw)
  To: Colin Stagner, git, phillip.wood
  Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher

On 02/09/2025 14:22, Phillip Wood wrote:
> On 01/09/2025 21:43, Colin Stagner wrote:
>> On 9/1/25 08:54, Phillip Wood wrote:
>> Colin Stagner <ask+git@howdoi.land> writes:
>>
>> The outer loop in git-subtree.sh:983 appears to iterate from the root 
>> commit forwards… and not from the HEAD backwards.
>>
>>      git rev-list --topo-order --reverse --parents $rev $unrevs
>>      #                         ^^^^^^^^^
>>
>> Since the iteration is ancestor-first, I'm having difficulty seeing 
>> why `should_ignore_subtree_split_commit()` would want to do an 
>> ancestor traversal at all. It already sees the commits ancestor-first. 
>> But there could be a reason that I don't know.
> 
> Ah, I was only looking at this patch, not how it was called. That begs 
> the question "what's the point of these checks if we've already visited 
> all the ancestors anyway". I think the answer is that it is pruning the 
> recursion that happens in check_parent() and checking the commits that 
> come from that rev-list command is pointless. The regression test 
> introduced with this function only looks at $extracount which comes from 
> the recursion. I haven't looked too closely but it would be nice if we 
> could move this check so it is only run when check_parents() is recursing.

Sorry that's not quite right. check_parents() recurses into 
process_split_commit() rather than the loop that call 
should_ignore_subtree_split_commit(). I think what this check does do is 
prune some parents which stops check_parents() from recursing into other 
subtrees so the check is in the right place.

Thanks

Phillip


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] contrib/subtree: fix split with squashed subtrees
  2025-09-02 14:57       ` Phillip Wood
@ 2025-09-04  1:34         ` Colin Stagner
  0 siblings, 0 replies; 15+ messages in thread
From: Colin Stagner @ 2025-09-04  1:34 UTC (permalink / raw)
  To: Phillip Wood, git, phillip.wood
  Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher

On 9/2/25 09:57, Phillip Wood wrote:
> On 01/09/2025 21:43, Colin Stagner wrote:
>>
>> The outer loop in git-subtree.sh:983 appears to iterate from the root 
>> commit forwards… and not from the HEAD backwards.
>>
>>      git rev-list --topo-order --reverse --parents $rev $unrevs
>>      #                         ^^^^^^^^^
>>
>> Since the iteration is ancestor-first, I'm having difficulty seeing 
>> why `should_ignore_subtree_split_commit()` would want to do an 
>> ancestor traversal at all. It already sees the commits ancestor- 
>> first.
> check_parents() recurses into process_split_commit() rather than the
> loop that call should_ignore_subtree_split_commit(). I think what this
> check does do is prune some parents which stops check_parents() from
> recursing into other subtrees so the check is in the right place.

I agree. In the original patch [1], Zach indicated that the check 
results in a significant speedup for rejoin-heavy repos. The check is 
clearly doing something.

Performance improvements may be possible. Instead of looking at commits 
one at a time, this operation might be faster as part of a one-shot 
HEAD-to-root traversal:

    git log --grep 'for stuff' --format='%(trailers:...)' $unrev..HEAD

Commits that are deemed "uninteresting" or unnecessary could then be 
provided, in bulk, as negative refs to the `git rev-list` traversal.

But my plan is to make the smallest and most portable maint-2.44 bugfix. 
I think that non-essential performance changes are a task for later.


>> I am *much* less certain about the second test:
>> 
>>      test -z "$(git log -1 \
>>                 --grep="git-subtree-dir: $arg_prefix$" $rev)"
>> 
>> If I remove this `test -z`, all the unit tests still pass—including mine. There may not be any test coverage for this line.
> I'm not very familiar with git-subtree but I thought this was ensuring 
> that we did not exclude the ancestors of a squash or split that involves 
> the subtree that we're interested in.

It does, but I am still having problems finding commits that actually 
trigger it.

It appears that `find_existing_splits()` in git-subtree.sh:459 filters 
out the commits that the `test -z` I quoted above would otherwise 
detect. `find_existing_splits()` searches for a previous --rejoin commit 
to use as an `$unrev` stopping point for the rev-walk. It searches for 
commits matching

     git log --grep="^git-subtree-dir: $dir/*\$"

in combination with `git-subtree-mainline:`.

This is essentially the same test as in 
`should_ignore_subtree_split_commit()`.

In --ignore-joins mode, `find_existing_splits()` looks for different 
commits. I experimented a bit with adding --ignore-joins to some of the 
existing unit tests, but I still could not find any instance where this 
`test -z` makes a difference.

That said... I am inclined to keep this second test. The bug I am 
patching is the result of an overzealous prune. The last thing I want to 
do is to inadvertently prune commits we need for the sake of a 
performance boost.


> I wouldn't be surprised if the test coverage was lacking.

There don't appear to be any tests at all for --ignore-joins, aside from 
option parsing.


[1]: 98ba49ccc2 (subtree: fix split processing with multiple subtrees 
present, 2023-12-01)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2] contrib/subtree: fix split with squashed subtrees
  2025-08-24 19:10 [PATCH] contrib/subtree: fix split with squashed subtrees Colin Stagner
  2025-09-01 13:54 ` Phillip Wood
@ 2025-09-05  2:27 ` Colin Stagner
  2025-09-08 15:21   ` Phillip Wood
  2025-09-10  3:11 ` [PATCH v3] " Colin Stagner
  2 siblings, 1 reply; 15+ messages in thread
From: Colin Stagner @ 2025-09-05  2:27 UTC (permalink / raw)
  To: git, phillip.wood, Phillip Wood
  Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher,
	Colin Stagner

98ba49ccc2 (subtree: fix split processing with multiple subtrees
present, 2023-12-01) increases the performance of

    git subtree split --prefix=subA

by ignoring subtree merges which are outside of `subA/`. It also
introduces a regression. Subtree merges that should be retained
are incorrectly ignored if they:

1. are nested under `subA/`; and
2. are merged with `--squash`.

For example, a subtree merged like:

    git subtree merge --squash --prefix=subA/subB "$rev"
    #                 ^^^^^^^^          ^^^^

is erroneously ignored during a split of `subA`. This causes
missing tree files and different commit hashes starting in
git v2.44.0-rc0.

The method:

    should_ignore_subtree_split_commit REV

should test only a single commit REV, but the combination of

    git log -1 --grep=...

actually searches all *parent* commits until a `--grep` match is
discovered.

Rewrite this method to test only one REV at a time. Extract commit
information with a single `git` call as opposed to three. The
`test` conditions for rejecting a commit remain unchanged.

Unit tests now cover nested subtrees.

Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---

Notes:
    This bugfix patch is intended for maint-2.44 and up.
    
    v2 rewrites `should_ignore_subtree_split_commit()` completely. In
    addition to the review comments, v2 also:
    
    * adds `--no-show-signature` to align with
      8841b5222c (subtree: fix add and pull for GPG-signed commits,
      2018-02-23)
    
    * removes use of `local` from `should_ignore_subtree_split_commit`.
      `local` is not part of POSIX sh. Other uses of `local` remain in
      untouched code.
    
    * unit tests are unchanged since v1.

 contrib/subtree/git-subtree.sh     | 36 +++++++++++----
 contrib/subtree/t/t7900-subtree.sh | 70 ++++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+), 8 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 5dab3f506c..c3cd60d341 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -783,24 +783,44 @@ ensure_clean () {
 # Usage: ensure_valid_ref_format REF
 ensure_valid_ref_format () {
 	assert test $# = 1
 	git check-ref-format "refs/heads/$1" ||
 		die "fatal: '$1' does not look like a ref"
 }
 
-# Usage: check if a commit from another subtree should be
+# Usage: should_ignore_subtree_split_commit REV
+#
+# Check if REV is a commit from another subtree and should be
 # ignored from processing for splits
 should_ignore_subtree_split_commit () {
 	assert test $# = 1
-	local rev="$1"
+
+	git show \
+		--no-patch \
+		--no-show-signature \
+		--format='%(trailers:key=git-subtree-dir,key=git-subtree-mainline)' \
+		"$1" |
+	(
+	have_mainline=
+	subtree_dir=
+
+	while read -r trailer val
+	do
+		case "$trailer" in
+		(git-subtree-dir:)
+			subtree_dir="${val%/}" ;;
+		(git-subtree-mainline:)
+			have_mainline=y ;;
+		esac
+	done
+
-	if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
+	if test -n "${subtree_dir:-}" &&
+		test -z "${have_mainline:-}" &&
+		test "${subtree_dir}" != "$arg_prefix"
 	then
-		if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" &&
-			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)"
-		then
-			return 0
-		fi
+		return 0
 	fi
 	return 1
+	)
 }
 
 # Usage: process_split_commit REV PARENTS
diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
index ca4df5be83..8bd45e7be7 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -67,6 +67,34 @@ test_create_pre2_32_repo () {
 	git -C "$1-clone" replace HEAD^2 $new_commit
 }
 
+# test_create_subtree_add REPO ORPHAN PREFIX FILENAME ...
+#
+# Create a simple subtree on a new branch named ORPHAN in REPO.
+# The subtree is then merged into the current branch of REPO,
+# under PREFIX. The generated subtree has has one commit
+# with subject and tag FILENAME with a single file "FILENAME.t"
+#
+# When this method returns:
+# - the current branch of REPO will have file PREFIX/FILENAME.t
+# - REPO will have a branch named ORPHAN with subtree history
+#
+# additional arguments are forwarded to "subtree add"
+test_create_subtree_add () {
+	(
+		cd "$1" &&
+		orphan="$2" &&
+		prefix="$3" &&
+		filename="$4" &&
+		shift 4 &&
+		last="$(git branch --show-current)" &&
+		git checkout --orphan "$orphan" &&
+		git rm -rf . &&
+		test_commit "$filename" &&
+		git checkout "$last" &&
+		git subtree add --prefix="$prefix" "$@" "$orphan"
+	)
+}
+
 test_expect_success 'shows short help text for -h' '
 	test_expect_code 129 git subtree -h >out 2>err &&
 	test_must_be_empty err &&
@@ -425,6 +453,48 @@ test_expect_success 'split with multiple subtrees' '
 		--squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
 '
 
+# When subtree split-ing a directory that has other subtree
+# *merges* underneath it, the split must include those subtrees.
+# This test creates a nested subtree, `subA/subB`, and tests
+# that the tree is correct after a subtree split of `subA/`.
+# The test covers:
+# - An initial `subtree add`; and
+# - A follow-up `subtree merge`
+# both with and without `--squashed`.
+for is_squashed in '' 'y';
+do
+	test_expect_success "split keeps nested ${is_squashed:+--squash }subtrees that are part of the split" '
+		subtree_test_create_repo "$test_count" &&
+		(
+			cd "$test_count" &&
+			mkdir subA &&
+			test_commit subA/file1 &&
+			git branch -m main &&
+			test_create_subtree_add \
+				. mksubtree subA/subB file2 ${is_squashed:+--squash} &&
+			test -e subA/file1.t &&
+			test -e subA/subB/file2.t &&
+			git subtree split --prefix=subA --branch=bsplit &&
+			git checkout bsplit &&
+			test -e file1.t &&
+			test -e subB/file2.t &&
+			git checkout mksubtree &&
+			git branch -D bsplit &&
+			test_commit file3 &&
+			git checkout main &&
+			git subtree merge \
+				${is_squashed:+--squash} \
+				--prefix=subA/subB mksubtree &&
+			test -e subA/subB/file3.t &&
+			git subtree split --prefix=subA --branch=bsplit &&
+			git checkout bsplit &&
+			test -e file1.t &&
+			test -e subB/file2.t &&
+			test -e subB/file3.t
+		)
+	'
+done
+
 test_expect_success 'split sub dir/ with --rejoin from scratch' '
 	subtree_test_create_repo "$test_count" &&
 	test_create_commit "$test_count" main1 &&

base-commit: 09669c729af92144fde84e97d358759b5b42b555
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] contrib/subtree: fix split with squashed subtrees
  2025-09-05  2:27 ` [PATCH v2] " Colin Stagner
@ 2025-09-08 15:21   ` Phillip Wood
  2025-09-10  1:56     ` Colin Stagner
  0 siblings, 1 reply; 15+ messages in thread
From: Phillip Wood @ 2025-09-08 15:21 UTC (permalink / raw)
  To: Colin Stagner, git, phillip.wood
  Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher

Hi Colin

This is looking good. I've left a few comments below, especially on the 
new test.

On 05/09/2025 03:27, Colin Stagner wrote:
>   
> -# Usage: check if a commit from another subtree should be
> +# Usage: should_ignore_subtree_split_commit REV
> +#
> +# Check if REV is a commit from another subtree and should be
>   # ignored from processing for splits
>   should_ignore_subtree_split_commit () {
>   	assert test $# = 1
> -	local rev="$1"
> +
> +	git show \
> +		--no-patch \
> +		--no-show-signature \
> +		--format='%(trailers:key=git-subtree-dir,key=git-subtree-mainline)' \
> +		"$1" |
> +	(
> +	have_mainline=
> +	subtree_dir=
> +
> +	while read -r trailer val
> +	do
> +		case "$trailer" in
> +		(git-subtree-dir:)
> +			subtree_dir="${val%/}" ;;
> +		(git-subtree-mainline:)
> +			have_mainline=y ;;
> +		esac
> +	done

This looks good, we run git log once and then parse the trailers. We do 
not use the optional '(' in case statements in our code though.

> -	if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
> +	if test -n "${subtree_dir:-}" &&
> +		test -z "${have_mainline:-}" &&
> +		test "${subtree_dir}" != "$arg_prefix"

If we have a git-subtree-dir: trailer whose value does not match the 
subtree we're interested in and there is no git-subtree-mainline: 
trailer then we skip this commit - good. What's the idea behind using 
"${var:-}" rather than "{var}"?

>   	then
> -		if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" &&
> -			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)"
> -		then
> -			return 0
> -		fi
> +		return 0
>   	fi
>   	return 1
> +	)
>   }
>   
>   # Usage: process_split_commit REV PARENTS
> diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
> index ca4df5be83..8bd45e7be7 100755
> --- a/contrib/subtree/t/t7900-subtree.sh
> +++ b/contrib/subtree/t/t7900-subtree.sh
> @@ -67,6 +67,34 @@ test_create_pre2_32_repo () {
>   	git -C "$1-clone" replace HEAD^2 $new_commit
>   }
>   
> +# test_create_subtree_add REPO ORPHAN PREFIX FILENAME ...
> +#
> +# Create a simple subtree on a new branch named ORPHAN in REPO.
> +# The subtree is then merged into the current branch of REPO,
> +# under PREFIX. The generated subtree has has one commit
> +# with subject and tag FILENAME with a single file "FILENAME.t"
> +#
> +# When this method returns:
> +# - the current branch of REPO will have file PREFIX/FILENAME.t
> +# - REPO will have a branch named ORPHAN with subtree history
> +#
> +# additional arguments are forwarded to "subtree add"
> +test_create_subtree_add () {
> +	(
> +		cd "$1" &&
> +		orphan="$2" &&
> +		prefix="$3" &&
> +		filename="$4" &&
> +		shift 4 &&
> +		last="$(git branch --show-current)" &&
> +		git checkout --orphan "$orphan" &&
> +		git rm -rf . &&

If you use "git switch --orphan" that clears the worktree for you

> +		test_commit "$filename" &&
> +		git checkout "$last" &&

I think this could be "git checkout @{-1}" and then we'd avoid having to 
run "git branch" above

> +		git subtree add --prefix="$prefix" "$@" "$orphan"
> +	)
> +}
> +
>   test_expect_success 'shows short help text for -h' '
>   	test_expect_code 129 git subtree -h >out 2>err &&
>   	test_must_be_empty err &&
> @@ -425,6 +453,48 @@ test_expect_success 'split with multiple subtrees' '
>   		--squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
>   '
>   
> +# When subtree split-ing a directory that has other subtree
> +# *merges* underneath it, the split must include those subtrees.
> +# This test creates a nested subtree, `subA/subB`, and tests
> +# that the tree is correct after a subtree split of `subA/`.
> +# The test covers:
> +# - An initial `subtree add`; and
> +# - A follow-up `subtree merge`
> +# both with and without `--squashed`.
> +for is_squashed in '' 'y';

no need for ';' at the end of the line

> +do
> +	test_expect_success "split keeps nested ${is_squashed:+--squash }subtrees that are part of the split" '
> +		subtree_test_create_repo "$test_count" &&
> +		(
> +			cd "$test_count" &&
> +			mkdir subA &&
> +			test_commit subA/file1 &&
> +			git branch -m main &&

For tests that depend on the default branch name you can add

	GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME

to the start of the file before it sources test-lib.sh. Or we could 
change subtree_create_repo() to pass all its arguments along to 
test_create_repo() and use "subtree_create_repo --initial-branch=main 
$test_count"

> +			test_create_subtree_add \
> +				. mksubtree subA/subB file2 ${is_squashed:+--squash} &&
> +			test -e subA/file1.t &&

We have test_path_is_file() for this which prints a useful diagnostic 
message it it fails

Thanks

Phillip

> +			test -e subA/subB/file2.t &&
> +			git subtree split --prefix=subA --branch=bsplit &&
> +			git checkout bsplit &&
> +			test -e file1.t &&
> +			test -e subB/file2.t &&
> +			git checkout mksubtree &&
> +			git branch -D bsplit &&
> +			test_commit file3 &&
> +			git checkout main &&
> +			git subtree merge \
> +				${is_squashed:+--squash} \
> +				--prefix=subA/subB mksubtree &&
> +			test -e subA/subB/file3.t &&
> +			git subtree split --prefix=subA --branch=bsplit &&
> +			git checkout bsplit &&
> +			test -e file1.t &&
> +			test -e subB/file2.t &&
> +			test -e subB/file3.t
> +		)
> +	'
> +done
> +
>   test_expect_success 'split sub dir/ with --rejoin from scratch' '
>   	subtree_test_create_repo "$test_count" &&
>   	test_create_commit "$test_count" main1 &&
> 
> base-commit: 09669c729af92144fde84e97d358759b5b42b555


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] contrib/subtree: fix split with squashed subtrees
  2025-09-08 15:21   ` Phillip Wood
@ 2025-09-10  1:56     ` Colin Stagner
  2025-09-10  2:02       ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Colin Stagner @ 2025-09-10  1:56 UTC (permalink / raw)
  To: phillip.wood, git; +Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher

Phillip,

Hello again! I have adopted your recommendations everywhere except for 
`git checkout @{-1}`. Details below.

On 9/8/25 10:21, Phillip Wood wrote:
> On 05/09/2025 03:27, Colin Stagner wrote:

>> +    while read -r trailer val
>> +    do
>> +        case "$trailer" in
>> +        (git-subtree-dir:)
>> +            subtree_dir="${val%/}" ;;
>> +        (git-subtree-mainline:)
>> +            have_mainline=y ;;
>> +        esac
>> +    done
> 
> We do not use the optional '(' in case statements

Will fix in v3.

> 
>> -    if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
>> +    if test -n "${subtree_dir:-}" &&
>> +        test -z "${have_mainline:-}" &&
>> +        test "${subtree_dir}" != "$arg_prefix"
> 
> What's the idea behind using "${var:-}" rather than "{var}"?

I write a lot of shell scripts that run "set -u" (aka "set -o nounset"), 
so I do this a lot when testing for empty vars. In this case, it's not 
actually necessary since `have_mainline` is explicitly defined above. 
And we don't run `set -u` anyway.

Will remove from v3.


>> +test_create_subtree_add () {
>> +    (
>> +        cd "$1" &&
>> +        orphan="$2" &&
>> +        prefix="$3" &&
>> +        filename="$4" &&
>> +        shift 4 &&
>> +        last="$(git branch --show-current)" &&
>> +        git checkout --orphan "$orphan" &&
>> +        git rm -rf . &&
> 
> If you use "git switch --orphan" that clears the worktree for you

Very useful. I'll start using it in v3.


>> +        test_commit "$filename" &&
>> +        git checkout "$last" &&
> 
> I think this could be "git checkout @{-1}" and then we'd avoid having to 
> run "git branch" above

I experimented with this, but I couldn't get it to work on git 2.44. 
Although the reflog shows the refs I expect, using

     git switch '@{-1}'

dies with

     fatal: invalid reference: @{-1}

checkout doesn't work either. Perhaps there is something about --orphan 
that is messing up the history.

I could make `test_create_subtree_add` take a mainline branch name, 
but... unless there's something unsound about v2, I think we should just 
keep v2. `git branch --show-current` looks like well-defined porcelain.

Any other ideas?


>> +# The test covers:
>> +# - An initial `subtree add`; and
>> +# - A follow-up `subtree merge`
>> +# both with and without `--squashed`.
>> +for is_squashed in '' 'y';
> 
> no need for ';' at the end of the line

Fixed for v3.

>> +        subtree_test_create_repo "$test_count" &&
>> +        (
>> +            cd "$test_count" &&
>> +            mkdir subA &&
>> +            test_commit subA/file1 &&
>> +            git branch -m main &&
> 
> For tests that depend on the default branch name you can add
> 
>      GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>      export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> 
> to the start of the file before it sources test-lib.sh.

GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main looks very common, so I'll go 
with that for v3.


>> +            test_create_subtree_add \
>> +                . mksubtree subA/subB file2 ${is_squashed:+--squash} &&
>> +            test -e subA/file1.t &&
> 
> We have test_path_is_file() for this which prints a useful diagnostic 
> message

Fixed all occurrences in v3.


Colin


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] contrib/subtree: fix split with squashed subtrees
  2025-09-10  1:56     ` Colin Stagner
@ 2025-09-10  2:02       ` Junio C Hamano
  2025-09-10  3:00         ` Colin Stagner
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2025-09-10  2:02 UTC (permalink / raw)
  To: Colin Stagner
  Cc: phillip.wood, git, Zach FettersMoore, Christian Couder,
	Patrik Weiskircher

Colin Stagner <ask+git@howdoi.land> writes:

>>> -    if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
>>> +    if test -n "${subtree_dir:-}" &&
>>> +        test -z "${have_mainline:-}" &&
>>> +        test "${subtree_dir}" != "$arg_prefix"
>> What's the idea behind using "${var:-}" rather than "{var}"?
>
> I write a lot of shell scripts that run "set -u" (aka "set -o
> nounset"), so I do this a lot when testing for empty vars. In this
> case, it's not actually necessary since `have_mainline` is explicitly
> defined above. And we don't run `set -u` anyway.

Besides, "if test -n ${subtree_dir-}" without colon would be the
more proper way for those who care about "set -u", wouldn't it?  It
is not that you want to substitute with an empty string that comes
between that "-" and "}" when subtree_dir is unset or set to empty.
You are preparing for the case where the variable is truly not set,
and the variable being set to an empty string is not something you
are worried about.  THe same for ${have_mainline:-}.

>> If you use "git switch --orphan" that clears the worktree for you
>
> Very useful. I'll start using it in v3.

Excellent suggestion.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] contrib/subtree: fix split with squashed subtrees
  2025-09-10  2:02       ` Junio C Hamano
@ 2025-09-10  3:00         ` Colin Stagner
  2025-09-10 15:10           ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Colin Stagner @ 2025-09-10  3:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: phillip.wood, git, Zach FettersMoore, Christian Couder,
	Patrik Weiskircher

On 9/9/25 21:02, Junio C Hamano wrote:
> Besides, "if test -n ${subtree_dir-}" without colon would be the
> more proper way for those who care about "set -u", wouldn't it?  It
> is not that you want to substitute with an empty string that comes
> between that "-" and "}" when subtree_dir is unset or set to empty.
> You are preparing for the case where the variable is truly not set,
> and the variable being set to an empty string is not something you
> are worried about.
Yes, "test -n ${subtree_dir-}" is definitely the more correct expression.

At the very real risk of embarrassing myself in public today... in the 
particular case of a "test -n," is there actually an appreciable 
difference? Either way, the output of the substitution is empty if the 
input is empty or undefined. Here, "test -n ${subtree_dir:-}" is merely 
less efficient. Right?

The difference between "${x:-}" vs "${x-}" really starts to matter if 
you want to permit the empty string (or not). It also matters if you 
call a command that has side effects.

(And in the context of this patch, neither are necessary.)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3] contrib/subtree: fix split with squashed subtrees
  2025-08-24 19:10 [PATCH] contrib/subtree: fix split with squashed subtrees Colin Stagner
  2025-09-01 13:54 ` Phillip Wood
  2025-09-05  2:27 ` [PATCH v2] " Colin Stagner
@ 2025-09-10  3:11 ` Colin Stagner
  2025-09-10  9:39   ` Phillip Wood
  2 siblings, 1 reply; 15+ messages in thread
From: Colin Stagner @ 2025-09-10  3:11 UTC (permalink / raw)
  To: git, phillip.wood, Phillip Wood
  Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher,
	Colin Stagner

98ba49ccc2 (subtree: fix split processing with multiple subtrees
present, 2023-12-01) increases the performance of

    git subtree split --prefix=subA

by ignoring subtree merges which are outside of `subA/`. It also
introduces a regression. Subtree merges that should be retained
are incorrectly ignored if they:

1. are nested under `subA/`; and
2. are merged with `--squash`.

For example, a subtree merged like:

    git subtree merge --squash --prefix=subA/subB "$rev"
    #                 ^^^^^^^^          ^^^^

is erroneously ignored during a split of `subA`. This causes
missing tree files and different commit hashes starting in
git v2.44.0-rc0.

The method:

    should_ignore_subtree_split_commit REV

should test only a single commit REV, but the combination of

    git log -1 --grep=...

actually searches all *parent* commits until a `--grep` match is
discovered.

Rewrite this method to test only one REV at a time. Extract commit
information with a single `git` call as opposed to three. The
`test` conditions for rejecting a commit remain unchanged.

Unit tests now cover nested subtrees.

Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---

Notes:
    This bugfix patch is intended for maint-2.44 and up.

 contrib/subtree/git-subtree.sh     | 36 +++++++++++----
 contrib/subtree/t/t7900-subtree.sh | 71 ++++++++++++++++++++++++++++++
 2 files changed, 99 insertions(+), 8 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 5dab3f506c..ad9b9b0191 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -783,24 +783,44 @@ ensure_clean () {
 # Usage: ensure_valid_ref_format REF
 ensure_valid_ref_format () {
 	assert test $# = 1
 	git check-ref-format "refs/heads/$1" ||
 		die "fatal: '$1' does not look like a ref"
 }
 
-# Usage: check if a commit from another subtree should be
+# Usage: should_ignore_subtree_split_commit REV
+#
+# Check if REV is a commit from another subtree and should be
 # ignored from processing for splits
 should_ignore_subtree_split_commit () {
 	assert test $# = 1
-	local rev="$1"
+
+	git show \
+		--no-patch \
+		--no-show-signature \
+		--format='%(trailers:key=git-subtree-dir,key=git-subtree-mainline)' \
+		"$1" |
+	(
+	have_mainline=
+	subtree_dir=
+
+	while read -r trailer val
+	do
+		case "$trailer" in
+		git-subtree-dir:)
+			subtree_dir="${val%/}" ;;
+		git-subtree-mainline:)
+			have_mainline=y ;;
+		esac
+	done
+
-	if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
+	if test -n "${subtree_dir}" &&
+		test -z "${have_mainline}" &&
+		test "${subtree_dir}" != "$arg_prefix"
 	then
-		if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" &&
-			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)"
-		then
-			return 0
-		fi
+		return 0
 	fi
 	return 1
+	)
 }
 
 # Usage: process_split_commit REV PARENTS
diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
index ca4df5be83..25be40e12b 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -9,6 +9,9 @@ This test verifies the basic operation of the add, merge, split, pull,
 and push subcommands of git subtree.
 '
 
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
 TEST_DIRECTORY=$(pwd)/../../../t
 . "$TEST_DIRECTORY"/test-lib.sh
 
@@ -67,6 +70,33 @@ test_create_pre2_32_repo () {
 	git -C "$1-clone" replace HEAD^2 $new_commit
 }
 
+# test_create_subtree_add REPO ORPHAN PREFIX FILENAME ...
+#
+# Create a simple subtree on a new branch named ORPHAN in REPO.
+# The subtree is then merged into the current branch of REPO,
+# under PREFIX. The generated subtree has has one commit
+# with subject and tag FILENAME with a single file "FILENAME.t"
+#
+# When this method returns:
+# - the current branch of REPO will have file PREFIX/FILENAME.t
+# - REPO will have a branch named ORPHAN with subtree history
+#
+# additional arguments are forwarded to "subtree add"
+test_create_subtree_add () {
+	(
+		cd "$1" &&
+		orphan="$2" &&
+		prefix="$3" &&
+		filename="$4" &&
+		shift 4 &&
+		last="$(git branch --show-current)" &&
+		git switch --orphan "$orphan" &&
+		test_commit "$filename" &&
+		git checkout "$last" &&
+		git subtree add --prefix="$prefix" "$@" "$orphan"
+	)
+}
+
 test_expect_success 'shows short help text for -h' '
 	test_expect_code 129 git subtree -h >out 2>err &&
 	test_must_be_empty err &&
@@ -425,6 +455,47 @@ test_expect_success 'split with multiple subtrees' '
 		--squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
 '
 
+# When subtree split-ing a directory that has other subtree
+# *merges* underneath it, the split must include those subtrees.
+# This test creates a nested subtree, `subA/subB`, and tests
+# that the tree is correct after a subtree split of `subA/`.
+# The test covers:
+# - An initial `subtree add`; and
+# - A follow-up `subtree merge`
+# both with and without `--squashed`.
+for is_squashed in '' 'y'
+do
+	test_expect_success "split keeps nested ${is_squashed:+--squash }subtrees that are part of the split" '
+		subtree_test_create_repo "$test_count" &&
+		(
+			cd "$test_count" &&
+			mkdir subA &&
+			test_commit subA/file1 &&
+			test_create_subtree_add \
+				. mksubtree subA/subB file2 ${is_squashed:+--squash} &&
+			test_path_is_file subA/file1.t &&
+			test_path_is_file subA/subB/file2.t &&
+			git subtree split --prefix=subA --branch=bsplit &&
+			git checkout bsplit &&
+			test_path_is_file file1.t &&
+			test_path_is_file subB/file2.t &&
+			git checkout mksubtree &&
+			git branch -D bsplit &&
+			test_commit file3 &&
+			git checkout main &&
+			git subtree merge \
+				${is_squashed:+--squash} \
+				--prefix=subA/subB mksubtree &&
+			test_path_is_file subA/subB/file3.t &&
+			git subtree split --prefix=subA --branch=bsplit &&
+			git checkout bsplit &&
+			test_path_is_file file1.t &&
+			test_path_is_file subB/file2.t &&
+			test_path_is_file subB/file3.t
+		)
+	'
+done
+
 test_expect_success 'split sub dir/ with --rejoin from scratch' '
 	subtree_test_create_repo "$test_count" &&
 	test_create_commit "$test_count" main1 &&

base-commit: 09669c729af92144fde84e97d358759b5b42b555
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] contrib/subtree: fix split with squashed subtrees
  2025-09-10  3:11 ` [PATCH v3] " Colin Stagner
@ 2025-09-10  9:39   ` Phillip Wood
  2025-09-11 16:01     ` Junio C Hamano
  0 siblings, 1 reply; 15+ messages in thread
From: Phillip Wood @ 2025-09-10  9:39 UTC (permalink / raw)
  To: Colin Stagner, git, phillip.wood
  Cc: Zach FettersMoore, Christian Couder, Patrik Weiskircher

Hi Colin

Thanks for working on this. I'm not particularly familiar with 
git-subtree but as far as I can see this version looks good.

Thanks

Phillip

On 10/09/2025 04:11, Colin Stagner wrote:
> 98ba49ccc2 (subtree: fix split processing with multiple subtrees
> present, 2023-12-01) increases the performance of
> 
>      git subtree split --prefix=subA
> 
> by ignoring subtree merges which are outside of `subA/`. It also
> introduces a regression. Subtree merges that should be retained
> are incorrectly ignored if they:
> 
> 1. are nested under `subA/`; and
> 2. are merged with `--squash`.
> 
> For example, a subtree merged like:
> 
>      git subtree merge --squash --prefix=subA/subB "$rev"
>      #                 ^^^^^^^^          ^^^^
> 
> is erroneously ignored during a split of `subA`. This causes
> missing tree files and different commit hashes starting in
> git v2.44.0-rc0.
> 
> The method:
> 
>      should_ignore_subtree_split_commit REV
> 
> should test only a single commit REV, but the combination of
> 
>      git log -1 --grep=...
> 
> actually searches all *parent* commits until a `--grep` match is
> discovered.
> 
> Rewrite this method to test only one REV at a time. Extract commit
> information with a single `git` call as opposed to three. The
> `test` conditions for rejecting a commit remain unchanged.
> 
> Unit tests now cover nested subtrees.
> 
> Signed-off-by: Colin Stagner <ask+git@howdoi.land>
> ---
> 
> Notes:
>      This bugfix patch is intended for maint-2.44 and up.
> 
>   contrib/subtree/git-subtree.sh     | 36 +++++++++++----
>   contrib/subtree/t/t7900-subtree.sh | 71 ++++++++++++++++++++++++++++++
>   2 files changed, 99 insertions(+), 8 deletions(-)
> 
> diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
> index 5dab3f506c..ad9b9b0191 100755
> --- a/contrib/subtree/git-subtree.sh
> +++ b/contrib/subtree/git-subtree.sh
> @@ -783,24 +783,44 @@ ensure_clean () {
>   # Usage: ensure_valid_ref_format REF
>   ensure_valid_ref_format () {
>   	assert test $# = 1
>   	git check-ref-format "refs/heads/$1" ||
>   		die "fatal: '$1' does not look like a ref"
>   }
>   
> -# Usage: check if a commit from another subtree should be
> +# Usage: should_ignore_subtree_split_commit REV
> +#
> +# Check if REV is a commit from another subtree and should be
>   # ignored from processing for splits
>   should_ignore_subtree_split_commit () {
>   	assert test $# = 1
> -	local rev="$1"
> +
> +	git show \
> +		--no-patch \
> +		--no-show-signature \
> +		--format='%(trailers:key=git-subtree-dir,key=git-subtree-mainline)' \
> +		"$1" |
> +	(
> +	have_mainline=
> +	subtree_dir=
> +
> +	while read -r trailer val
> +	do
> +		case "$trailer" in
> +		git-subtree-dir:)
> +			subtree_dir="${val%/}" ;;
> +		git-subtree-mainline:)
> +			have_mainline=y ;;
> +		esac
> +	done
> +
> -	if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)"
> +	if test -n "${subtree_dir}" &&
> +		test -z "${have_mainline}" &&
> +		test "${subtree_dir}" != "$arg_prefix"
>   	then
> -		if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" &&
> -			test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)"
> -		then
> -			return 0
> -		fi
> +		return 0
>   	fi
>   	return 1
> +	)
>   }
>   
>   # Usage: process_split_commit REV PARENTS
> diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
> index ca4df5be83..25be40e12b 100755
> --- a/contrib/subtree/t/t7900-subtree.sh
> +++ b/contrib/subtree/t/t7900-subtree.sh
> @@ -9,6 +9,9 @@ This test verifies the basic operation of the add, merge, split, pull,
>   and push subcommands of git subtree.
>   '
>   
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
>   TEST_DIRECTORY=$(pwd)/../../../t
>   . "$TEST_DIRECTORY"/test-lib.sh
>   
> @@ -67,6 +70,33 @@ test_create_pre2_32_repo () {
>   	git -C "$1-clone" replace HEAD^2 $new_commit
>   }
>   
> +# test_create_subtree_add REPO ORPHAN PREFIX FILENAME ...
> +#
> +# Create a simple subtree on a new branch named ORPHAN in REPO.
> +# The subtree is then merged into the current branch of REPO,
> +# under PREFIX. The generated subtree has has one commit
> +# with subject and tag FILENAME with a single file "FILENAME.t"
> +#
> +# When this method returns:
> +# - the current branch of REPO will have file PREFIX/FILENAME.t
> +# - REPO will have a branch named ORPHAN with subtree history
> +#
> +# additional arguments are forwarded to "subtree add"
> +test_create_subtree_add () {
> +	(
> +		cd "$1" &&
> +		orphan="$2" &&
> +		prefix="$3" &&
> +		filename="$4" &&
> +		shift 4 &&
> +		last="$(git branch --show-current)" &&
> +		git switch --orphan "$orphan" &&
> +		test_commit "$filename" &&
> +		git checkout "$last" &&
> +		git subtree add --prefix="$prefix" "$@" "$orphan"
> +	)
> +}
> +
>   test_expect_success 'shows short help text for -h' '
>   	test_expect_code 129 git subtree -h >out 2>err &&
>   	test_must_be_empty err &&
> @@ -425,6 +455,47 @@ test_expect_success 'split with multiple subtrees' '
>   		--squash --rejoin -d -m "Sub B Split 1" 2>&1 | grep -w "\[1\]")" = ""
>   '
>   
> +# When subtree split-ing a directory that has other subtree
> +# *merges* underneath it, the split must include those subtrees.
> +# This test creates a nested subtree, `subA/subB`, and tests
> +# that the tree is correct after a subtree split of `subA/`.
> +# The test covers:
> +# - An initial `subtree add`; and
> +# - A follow-up `subtree merge`
> +# both with and without `--squashed`.
> +for is_squashed in '' 'y'
> +do
> +	test_expect_success "split keeps nested ${is_squashed:+--squash }subtrees that are part of the split" '
> +		subtree_test_create_repo "$test_count" &&
> +		(
> +			cd "$test_count" &&
> +			mkdir subA &&
> +			test_commit subA/file1 &&
> +			test_create_subtree_add \
> +				. mksubtree subA/subB file2 ${is_squashed:+--squash} &&
> +			test_path_is_file subA/file1.t &&
> +			test_path_is_file subA/subB/file2.t &&
> +			git subtree split --prefix=subA --branch=bsplit &&
> +			git checkout bsplit &&
> +			test_path_is_file file1.t &&
> +			test_path_is_file subB/file2.t &&
> +			git checkout mksubtree &&
> +			git branch -D bsplit &&
> +			test_commit file3 &&
> +			git checkout main &&
> +			git subtree merge \
> +				${is_squashed:+--squash} \
> +				--prefix=subA/subB mksubtree &&
> +			test_path_is_file subA/subB/file3.t &&
> +			git subtree split --prefix=subA --branch=bsplit &&
> +			git checkout bsplit &&
> +			test_path_is_file file1.t &&
> +			test_path_is_file subB/file2.t &&
> +			test_path_is_file subB/file3.t
> +		)
> +	'
> +done
> +
>   test_expect_success 'split sub dir/ with --rejoin from scratch' '
>   	subtree_test_create_repo "$test_count" &&
>   	test_create_commit "$test_count" main1 &&
> 
> base-commit: 09669c729af92144fde84e97d358759b5b42b555


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] contrib/subtree: fix split with squashed subtrees
  2025-09-10  3:00         ` Colin Stagner
@ 2025-09-10 15:10           ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2025-09-10 15:10 UTC (permalink / raw)
  To: Colin Stagner
  Cc: phillip.wood, git, Zach FettersMoore, Christian Couder,
	Patrik Weiskircher

Colin Stagner <ask+git@howdoi.land> writes:

> On 9/9/25 21:02, Junio C Hamano wrote:
>> Besides, "if test -n ${subtree_dir-}" without colon would be the
>> more proper way for those who care about "set -u", wouldn't it?  It
>> is not that you want to substitute with an empty string that comes
>> between that "-" and "}" when subtree_dir is unset or set to empty.
>> You are preparing for the case where the variable is truly not set,
>> and the variable being set to an empty string is not something you
>> are worried about.
> Yes, "test -n ${subtree_dir-}" is definitely the more correct expression.
>
> At the very real risk of embarrassing myself in public today... in the
> particular case of a "test -n," is there actually an appreciable
> difference? Either way, the output of the substitution is empty if the
> input is empty or undefined. Here, "test -n ${subtree_dir:-}" is
> merely less efficient. Right?
>
> The difference between "${x:-}" vs "${x-}" really starts to matter if
> you want to permit the empty string (or not). It also matters if you
> call a command that has side effects.
>
> (And in the context of this patch, neither are necessary.)

Correct.  There is no practical difference.

Your explanation for using the "default values" parameter expansion
in this script, knowing that "set -u" is not in use, being it is out
of inertia, I would have expected them to be written in a way that
is suitable when "set -u" is in use, which is without colon.  Doing
something "different" on a variable that is set but set to an empty
string is not something you would want to do to deal with "set -u",
so I found it strange to see the colon there.

There is no practical difference, since the "default value"
specified is an empty string, so a variable set to an empty string
will use the empty string between ":-" and "}" instead of its value
that is another empty string, and you can tell these two empty
strings apart in the result ;-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3] contrib/subtree: fix split with squashed subtrees
  2025-09-10  9:39   ` Phillip Wood
@ 2025-09-11 16:01     ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2025-09-11 16:01 UTC (permalink / raw)
  To: Phillip Wood
  Cc: Colin Stagner, git, phillip.wood, Zach FettersMoore,
	Christian Couder, Patrik Weiskircher

Phillip Wood <phillip.wood123@gmail.com> writes:

> Hi Colin
>
> Thanks for working on this. I'm not particularly familiar with
> git-subtree but as far as I can see this version looks good.
>
> Thanks
>
> Phillip

Thanks, both of you.  Queued.


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-09-11 16:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-24 19:10 [PATCH] contrib/subtree: fix split with squashed subtrees Colin Stagner
2025-09-01 13:54 ` Phillip Wood
2025-09-01 20:43   ` Colin Stagner
2025-09-02 13:22     ` Phillip Wood
2025-09-02 14:57       ` Phillip Wood
2025-09-04  1:34         ` Colin Stagner
2025-09-05  2:27 ` [PATCH v2] " Colin Stagner
2025-09-08 15:21   ` Phillip Wood
2025-09-10  1:56     ` Colin Stagner
2025-09-10  2:02       ` Junio C Hamano
2025-09-10  3:00         ` Colin Stagner
2025-09-10 15:10           ` Junio C Hamano
2025-09-10  3:11 ` [PATCH v3] " Colin Stagner
2025-09-10  9:39   ` Phillip Wood
2025-09-11 16:01     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).