public inbox for git@vger.kernel.org
 help / color / mirror / Atom feed
From: Colin Stagner <ask+git@howdoi.land>
To: git@vger.kernel.org,
	Zach FettersMoore <zach.fetters@apollographql.com>,
	Christian Heusel <christian@heusel.eu>,
	george@mail.dietrich.pub
Cc: Colin Stagner <ask+git@howdoi.land>,
	Christian Hesse <list@eworm.de>,
	Phillip Wood <phillip.wood@dunelm.org.uk>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH 3/3] contrib/subtree: process out-of-prefix subtrees
Date: Sun, 15 Feb 2026 14:18:47 -0600	[thread overview]
Message-ID: <20260215201906.889951-4-ask+git@howdoi.land> (raw)
In-Reply-To: <20260215201906.889951-1-ask+git@howdoi.land>

`should_ignore_subtree_split_commit` detects subtrees which are
outside of the current path --prefix and ignores them. This can
speed up splits of repositories that have many subtrees.

Since its inception [1], every iteration of this logic [2], [3]
incorrectly excludes commits. This alters the split history. The
split history and its commit hashes are API contract, so this is
not permissible.

While a commit from a different subtree may look like it doesn't
contribute anything to a split, sometimes it does. Merge commits
are a particular hot spot. For these, the pruning logic in
`copy_or_skip` performs:

1. a check for "treesame" parents
2. two different common ancestry checks

These checks operate on the **split history**, not the input
history. The split history omits commits that do not affect the
--prefix. This can significantly alter the ancestry of a merge.
In order to determine if `copy_or_skip` will skip a merge, it
is likely necessary to compute all the split history... which
is what `should_ignore_subtree_split_commit` tries to avoid.

To make this logic API-preserving, we could gate it behind a
new CLI argument. The present implementation is actually a
speed penalty in many cases, however, so this is not done here.

Remove the `should_ignore_subtree_split_commit` logic. This
fixes the regression reported in [4].

[1]: 98ba49ccc2 (subtree: fix split processing with multiple subtrees present, 2023-12-01)

[2]: 83f9dad7d6 (contrib/subtree: fix split with squashed subtrees, 2025-09-09)

[3]: 28a7e27cff (contrib/subtree: detect rewritten subtree commits, 2026-01-09)

[4]: <20251230170719.845029-1-george@mail.dietrich.pub>

Reported-by: George <george@mail.dietrich.pub>

Reported-by: Christian Heusel <christian@heusel.eu>

Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---
 contrib/subtree/git-subtree.sh     | 50 +---------------------
 contrib/subtree/t/t7900-subtree.sh | 68 ++++++++++++++++++++++++++++--
 2 files changed, 65 insertions(+), 53 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 17106d1a72..ba9fb2ee5d 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -785,42 +785,6 @@ ensure_valid_ref_format () {
 		die "fatal: '$1' does not look like a ref"
 }
 
-# Usage: should_ignore_subtree_split_commit REV
-#
-# Check if REV is a commit from another subtree and should be
-# ignored from processing for splits
-should_ignore_subtree_split_commit () {
-	assert test $# = 1
-
-	git show \
-		--no-patch \
-		--no-show-signature \
-		--format='%(trailers:key=git-subtree-dir,key=git-subtree-mainline)' \
-		"$1" |
-	(
-	have_mainline=
-	subtree_dir=
-
-	while read -r trailer val
-	do
-		case "$trailer" in
-		git-subtree-dir:)
-			subtree_dir="${val%/}" ;;
-		git-subtree-mainline:)
-			have_mainline=y ;;
-		esac
-	done
-
-	if test -n "${subtree_dir}" &&
-		test -z "${have_mainline}" &&
-		test "${subtree_dir}" != "$arg_prefix"
-	then
-		return 0
-	fi
-	return 1
-	)
-}
-
 # Usage: process_split_commit REV PARENTS
 process_split_commit () {
 	assert test $# = 2
@@ -1006,19 +970,7 @@ cmd_split () {
 	eval "$grl" |
 	while read rev parents
 	do
-		if should_ignore_subtree_split_commit "$rev"
-		then
-			continue
-		fi
-		parsedparents=''
-		for parent in $parents
-		do
-			if ! should_ignore_subtree_split_commit "$parent"
-			then
-				parsedparents="$parsedparents$parent "
-			fi
-		done
-		process_split_commit "$rev" "$parsedparents"
+		process_split_commit "$rev" "$parents"
 	done || exit $?
 
 	latest_new=$(cache_get latest_new) || exit $?
diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh
index dad8dea63a..05a774ad47 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -428,8 +428,7 @@ test_expect_success 'split sub dir/ with --rejoin' '
 # 	- Perform 'split' on subtree B
 # 	- Create new commits with changes to subtree A and B
 # 	- Perform split on subtree A
-# 	- Check that the commits in subtree B are not processed
-#			as part of the subtree A split
+# 	- Check for expected history
 test_expect_success 'split with multiple subtrees' '
 	subtree_test_create_repo "$test_count" &&
 	subtree_test_create_repo "$test_count/subA" &&
@@ -458,8 +457,8 @@ test_expect_success 'split with multiple subtrees' '
 		--squash --rejoin -m "Sub A Split 2" -b a2 &&
 	test "$(git -C "$test_count" rev-list --count main..a2)" -eq 2 &&
 	test "$(git -C "$test_count" rev-list --count a1..a2)" -eq 1 &&
-	test "$(git -C "$test_count" subtree split --prefix=subBDir \
-		--squash --rejoin -d -m "Sub B Split 1" -b b2 2>&1 | grep -w "\[1\]")" = "" &&
+	git -C "$test_count" subtree split --prefix=subBDir \
+		--squash --rejoin -d -m "Sub B Split 1" -b b2 &&
 	test "$(git -C "$test_count" rev-list --count main..b2)" -eq 2 &&
 	test "$(git -C "$test_count" rev-list --count b1..b2)" -eq 1
 '
@@ -507,6 +506,67 @@ do
 	'
 done
 
+# Usually,
+#
+#    git subtree merge -P subA --squash f00...
+#
+# makes two commits, in this order:
+#
+# 1. Squashed 'subA/' content from commit f00...
+# 2. Merge commit (1) as 'subA'
+#
+# Commit 1 updates the subtree but does *not* rewrite paths.
+# Commit 2 rewrites all trees to start with `subA/`
+#
+# Commit 1 either has no parents or depends only on other
+# "Squashed 'subA/' content" commits.
+#
+# For merge without --squash, subtree produces just one commit:
+# a merge commit with git-subtree trailers.
+#
+# In either case, if the user rebases these commits, they will
+# still have the git-subtree-* trailers… but will NOT have
+# the layout described above.
+#
+# Test that subsequent `git subtree split` are not confused by this.
+test_expect_success 'split with rebased subtree commit' '
+	subtree_test_create_repo "$test_count" &&
+	(
+		cd "$test_count" &&
+		test_commit file0 &&
+		test_create_subtree_add \
+			. mksubtree subA file1 --squash &&
+		test_path_is_file subA/file1.t &&
+		mkdir subB &&
+		test_commit subB/bfile &&
+		git commit --amend -F - <<'EOF' &&
+Squashed '\''subB/'\'' content from commit '\''badf00da911bbe895347b4b236f5461d55dc9877'\''
+
+Simulate a cherry-picked or rebased subtree commit.
+
+git-subtree-dir: subB
+git-subtree-split: badf00da911bbe895347b4b236f5461d55dc9877
+EOF
+		test_commit subA/file2 &&
+		test_commit subB/bfile2 &&
+		git commit --amend -F - <<'EOF' &&
+Split '\''subB/'\'' into commit '\''badf00da911bbe895347b4b236f5461d55dc9877'\''
+
+Simulate a cherry-picked or rebased subtree commit.
+
+git-subtree-dir: subB
+git-subtree-mainline: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+git-subtree-split: badf00da911bbe895347b4b236f5461d55dc9877
+EOF
+		git subtree split --prefix=subA --branch=bsplit &&
+		git checkout bsplit &&
+		test_path_is_file file1.t &&
+		test_path_is_file file2.t &&
+		test "$(last_commit_subject)" = "subA/file2" &&
+		test "$(git rev-list --count bsplit)" -eq 2
+	)
+'
+
 test_expect_success 'split sub dir/ with --rejoin from scratch' '
 	subtree_test_create_repo "$test_count" &&
 	test_create_commit "$test_count" main1 &&
-- 
2.43.0


  parent reply	other threads:[~2026-02-15 20:19 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-15 20:18 [PATCH 0/3] contrib/subtree: process out-of-prefix subtrees Colin Stagner
2026-02-15 20:18 ` [PATCH 1/3] contrib/subtree: capture additional test-cases Colin Stagner
2026-02-15 20:18 ` [PATCH 2/3] contrib/subtree: test history depth Colin Stagner
2026-02-15 20:18 ` Colin Stagner [this message]
2026-02-16 21:33   ` [PATCH 3/3] contrib/subtree: process out-of-prefix subtrees D. Ben Knoble
2026-02-18  2:25     ` Colin Stagner
2026-02-18  2:31 ` [PATCH v2 0/3] " Colin Stagner
2026-02-18  2:31   ` [PATCH v2 1/3] contrib/subtree: capture additional test-cases Colin Stagner
2026-02-18  2:31   ` [PATCH v2 2/3] contrib/subtree: test history depth Colin Stagner
2026-02-18  2:31   ` [PATCH v2 3/3] contrib/subtree: process out-of-prefix subtrees Colin Stagner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260215201906.889951-4-ask+git@howdoi.land \
    --to=ask+git@howdoi.land \
    --cc=christian@heusel.eu \
    --cc=george@mail.dietrich.pub \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=list@eworm.de \
    --cc=phillip.wood@dunelm.org.uk \
    --cc=zach.fetters@apollographql.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox