Git development
 help / color / mirror / Atom feed
* [PATCH 0/3] contrib/subtree: reduce recursion during split
@ 2026-02-15 20:17 Colin Stagner
  2026-02-15 20:17 ` [PATCH 1/3] contrib/subtree: reduce function side-effects Colin Stagner
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Colin Stagner @ 2026-02-15 20:17 UTC (permalink / raw)
  To: git, Christian Heusel, george
  Cc: Colin Stagner, Christian Hesse, Phillip Wood, Junio C Hamano

* cs/subtree-split-recursion: when processing large history
  graphs on Debian or Ubuntu, "git subtree" can die with a
  "recursion depth reached" error. Reduce recursion.

On Debian's POSIX sh, shell recursion is artificially limited
to 1000 calls. You can check if your sh has limited recursion
with:

    #!/bin/sh
    recurse() {
        r=$(( r + 1 ))
        test "$r" -le 1000 || { echo OK; exit; }
        recurse
    } && r=0 && recurse

Depending on the history graph, subtree split can recurse deeply
enough to encounter this limit. Rewrite the rejoin-deepening
algorithm to reduce recursive calls.

Colin Stagner (3):
  contrib/subtree: reduce function side-effects
  contrib/subtree: functionalize split traversal
  contrib/subtree: reduce recursion during split

 contrib/subtree/git-subtree.sh | 95 +++++++++++++++++++++++++++++++---
 1 file changed, 88 insertions(+), 7 deletions(-)


base-commit: 852829b3dd2fe4e7c7fc4d8badde644cf1b66c74
-- 
2.43.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] contrib/subtree: reduce function side-effects
  2026-02-15 20:17 [PATCH 0/3] contrib/subtree: reduce recursion during split Colin Stagner
@ 2026-02-15 20:17 ` Colin Stagner
  2026-02-15 20:17 ` [PATCH 2/3] contrib/subtree: functionalize split traversal Colin Stagner
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Colin Stagner @ 2026-02-15 20:17 UTC (permalink / raw)
  To: git, Christian Heusel, george
  Cc: Colin Stagner, Christian Hesse, Phillip Wood, Junio C Hamano

`process_subtree_split_trailer()` communicates its return value
to the caller by setting a variable (`sub`) that is also defined
by the calling function. This is both unclear and encourages
side-effects.

Invoke this function in a sub-shell instead.

Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---
 contrib/subtree/git-subtree.sh | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 17106d1a72..1cdf39a481 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -370,6 +370,10 @@ try_remove_previous () {
 }
 
 # Usage: process_subtree_split_trailer SPLIT_HASH MAIN_HASH [REPOSITORY]
+#
+# Parse SPLIT_HASH as a commit. If the commit is not found, fetches
+# REPOSITORY and tries again. If found, prints full commit hash.
+# Otherwise, dies.
 process_subtree_split_trailer () {
 	assert test $# -ge 2
 	assert test $# -le 3
@@ -397,6 +401,7 @@ process_subtree_split_trailer () {
 			die "$fail_msg"
 		fi
 	fi
+	echo "${sub}"
 }
 
 # Usage: find_latest_squash DIR [REPOSITORY]
@@ -429,7 +434,7 @@ find_latest_squash () {
 			main="$b"
 			;;
 		git-subtree-split:)
-			process_subtree_split_trailer "$b" "$sq" "$repository"
+			sub="$(process_subtree_split_trailer "$b" "$sq" "$repository")" || exit 1
 			;;
 		END)
 			if test -n "$sub"
@@ -486,7 +491,7 @@ find_existing_splits () {
 			main="$b"
 			;;
 		git-subtree-split:)
-			process_subtree_split_trailer "$b" "$sq" "$repository"
+			sub="$(process_subtree_split_trailer "$b" "$sq" "$repository")" || exit 1
 			;;
 		END)
 			debug "Main is: '$main'"
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] contrib/subtree: functionalize split traversal
  2026-02-15 20:17 [PATCH 0/3] contrib/subtree: reduce recursion during split Colin Stagner
  2026-02-15 20:17 ` [PATCH 1/3] contrib/subtree: reduce function side-effects Colin Stagner
@ 2026-02-15 20:17 ` Colin Stagner
  2026-02-15 20:17 ` [PATCH 3/3] contrib/subtree: reduce recursion during split Colin Stagner
  2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
  3 siblings, 0 replies; 22+ messages in thread
From: Colin Stagner @ 2026-02-15 20:17 UTC (permalink / raw)
  To: git, Christian Heusel, george
  Cc: Colin Stagner, Christian Hesse, Phillip Wood, Junio C Hamano

`git subtree split` requires an ancestor-first history traversal.
Refactor the existing rev-list traversal into its own function,
`find_commits_to_split`.

Pass unrevs via stdin to avoid limits on the maximum length of
command-line arguments. Also remove an unnecessary `eval`.

Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---
 contrib/subtree/git-subtree.sh | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 1cdf39a481..7a62ef7504 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -516,6 +516,31 @@ find_existing_splits () {
 	done || exit $?
 }
 
+# Usage: find_commits_to_split REV UNREVS [ARGS...]
+#
+# List each commit to split, with its parents.
+#
+# Specify the starting REV for the split, which is usually
+# a branch tip. Populate UNREVS with the last --rejoin for
+# this prefix, if any. Typically, `subtree split` ignores
+# history prior to the last --rejoin... unless and if it
+# becomes necessary to consider it. `find_existing_splits` is
+# a convenient source of UNREVS.
+#
+# Remaining arguments are passed to rev-list.
+#
+# Outputs commits in ancestor-first order, one per line, with
+# parent information. Outputs all parents before any child.
+find_commits_to_split() {
+	assert test $# -ge 2
+	rev="$1"
+	unrevs="$2"
+	shift 2
+
+	echo "$unrevs" |
+	git rev-list --topo-order --reverse --parents --stdin "$rev" "$@"
+}
+
 # Usage: copy_commit REV TREE FLAGS_STR
 copy_commit () {
 	assert test $# = 3
@@ -1003,12 +1028,11 @@ cmd_split () {
 	# We can't restrict rev-list to only $dir here, because some of our
 	# parents have the $dir contents the root, and those won't match.
 	# (and rev-list --follow doesn't seem to solve this)
-	grl='git rev-list --topo-order --reverse --parents $rev $unrevs'
-	revmax=$(eval "$grl" | wc -l)
+	revmax="$(find_commits_to_split "$rev" "$unrevs" --count)"
 	revcount=0
 	createcount=0
 	extracount=0
-	eval "$grl" |
+	find_commits_to_split "$rev" "$unrevs" |
 	while read rev parents
 	do
 		if should_ignore_subtree_split_commit "$rev"
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] contrib/subtree: reduce recursion during split
  2026-02-15 20:17 [PATCH 0/3] contrib/subtree: reduce recursion during split Colin Stagner
  2026-02-15 20:17 ` [PATCH 1/3] contrib/subtree: reduce function side-effects Colin Stagner
  2026-02-15 20:17 ` [PATCH 2/3] contrib/subtree: functionalize split traversal Colin Stagner
@ 2026-02-15 20:17 ` Colin Stagner
  2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
  3 siblings, 0 replies; 22+ messages in thread
From: Colin Stagner @ 2026-02-15 20:17 UTC (permalink / raw)
  To: git, Christian Heusel, george
  Cc: Colin Stagner, Christian Hesse, Phillip Wood, Junio C Hamano

On Debian-alikes, POSIX sh has a hardcoded recursion depth
of 1000. This limit operates like bash's `$FUNCNEST` [1], but
it does not actually respect `$FUNCNEST`. This is non-standard
behavior. On other distros, the sh recursion depth is limited
only by the available stack size.

With certain history graphs, subtree splits are recursive—with
one recursion per commit. Attempting to split complex repos that
have thousands of commits, like [2], may fail on these distros.

Reduce the amount of recursion required by eagerly discovering
the complete range of commits to process.

The recursion is a side-effect of the rejoin-finder in
`find_existing_splits`. Rejoin mode, as in

    git subtree split --rejoin -b hax main ...

improves the speed of later splits by merging the split history
back into `main`. This gives the splitting algorithm a stopping
point. The rejoin maps one commit on `main` to one split commit
on `hax`. If we encounter this commit, we know that it maps to
`hax`.

But this is only a single point in the history. Many splits
require history from before the rejoin. See patch content for
examples.

If pre-rejoin history is required, `check_parents` recursively
discovers each individual parent, with one recursion per commit.
The recursion deepens the entire tree, even if an older rejoin
is available. This quickly overwhelms the Debian sh stack.

Instead of recursively processing each commit, process *all* the
commits back to the next obvious starting point: i.e., either the
next-oldest --rejoin or the beginning of history. This is where the
recursion is likely to stop anyway.

While this still requires recursion, it is *considerably* less
recursive.

[1]: https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html#index-FUNCNEST

[2]: https://github.com/christian-heusel/aur.git

Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---
 contrib/subtree/git-subtree.sh | 56 ++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 7a62ef7504..54d7151a50 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -312,6 +312,46 @@ cache_miss () {
 }
 
 # Usage: check_parents [REVS...]
+#
+# During a split, check that every commit in REVS has already been
+# processed via `process_split_commit`. If not, deepen the history
+# until it is.
+#
+# Commits authored by `subtree split` have to be created in the
+# same order as every other git commit: ancestor-first, with new
+# commits building on old commits. The traversal order normally
+# ensures this is the case, but it also excludes --rejoins commits
+# by default.
+#
+# The --rejoin tells us, "this mainline commit is equivalent to
+# this split commit." The relationship is only known for that
+# exact commit---and not before or after it. Frequently, commits
+# prior to a rejoin are not needed... but, just as often, they
+# are! Consider this history graph:
+#
+#              --D---
+#             /      \
+#         A--B--C--R--X--Y    main
+#                 /     /
+#          a--b--c     /      split
+#              \      /
+#               --e--/
+#
+# The main branch has commits A, B, and C. main is split into
+# commits a, b, and c. The split history is rejoined at R.
+#
+# There are at least two cases where we might need the A-B-C
+# history that is prior to R:
+#
+# 1. Commit D is based on history prior to R, but
+#    it isn't merged into mainline until after R.
+#
+# 2. Commit e is based on old split history. It is merged
+#    back into mainline with a subtree merge. Again, this
+#    happens after R.
+#
+# check_parents detects these cases and deepens the history
+# to the next available rejoin.
 check_parents () {
 	missed=$(cache_miss "$@") || exit $?
 	local indent=$(($indent + 1))
@@ -319,8 +359,20 @@ check_parents () {
 	do
 		if ! test -r "$cachedir/notree/$miss"
 		then
-			debug "incorrect order: $miss"
-			process_split_commit "$miss" ""
+			debug "found commit excluded by --rejoin: $miss. skipping to the next --rejoin..."
+			unrevs="$(find_existing_splits "$dir" "$miss" "$repository")" || exit 1
+
+			find_commits_to_split "$miss" "$unrevs" |
+			while read -r rev parents
+			do
+				process_split_commit "$rev" "$parents"
+			done
+
+			if ! test -r "$cachedir/$miss" &&
+				! test -r "$cachedir/notree/$miss"
+			then
+				die "failed to deepen history at $miss"
+			fi
 		fi
 	done
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-02-15 20:17 [PATCH 0/3] contrib/subtree: reduce recursion during split Colin Stagner
                   ` (2 preceding siblings ...)
  2026-02-15 20:17 ` [PATCH 3/3] contrib/subtree: reduce recursion during split Colin Stagner
@ 2026-03-05 23:55 ` Colin Stagner
  2026-03-05 23:55   ` [PATCH v2 1/3] contrib/subtree: reduce function side-effects Colin Stagner
                     ` (4 more replies)
  3 siblings, 5 replies; 22+ messages in thread
From: Colin Stagner @ 2026-03-05 23:55 UTC (permalink / raw)
  To: git, Christian Heusel, george
  Cc: Christian Hesse, Phillip Wood, Junio C Hamano, Colin Stagner

* cs/subtree-split-recursion: when processing large history
  graphs on Debian or Ubuntu, "git subtree" can die with a
  "recursion depth reached" error. Reduce recursion.

On Debian's POSIX sh, shell recursion is artificially limited
to 1000 calls. You can check if your sh has limited recursion
with:

    #!/bin/sh
    recurse() {
        r=$(( r + 1 ))
        test "$r" -le 1000 || { echo OK; exit; }
        recurse
    } && r=0 && recurse

Depending on the history graph, subtree split can recurse deeply
enough to encounter this limit. Rewrite the rejoin-deepening
algorithm to reduce recursive calls.

---
Changes in v2:
- Rebase on master

---
Colin Stagner (3):
      contrib/subtree: reduce function side-effects
      contrib/subtree: functionalize split traversal
      contrib/subtree: reduce recursion during split

 contrib/subtree/git-subtree.sh | 95 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 88 insertions(+), 7 deletions(-)
---
base-commit: 628a66ccf68d141d57d06e100c3514a54b31d6b7
change-id: 20260304-cs-subtree-split-recursion-6a7083cf9163

Best regards,
--  
Colin Stagner <ask+git@howdoi.land>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/3] contrib/subtree: reduce function side-effects
  2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
@ 2026-03-05 23:55   ` Colin Stagner
  2026-03-05 23:55   ` [PATCH v2 2/3] contrib/subtree: functionalize split traversal Colin Stagner
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Colin Stagner @ 2026-03-05 23:55 UTC (permalink / raw)
  To: git, Christian Heusel, george
  Cc: Christian Hesse, Phillip Wood, Junio C Hamano, Colin Stagner

`process_subtree_split_trailer()` communicates its return value
to the caller by setting a variable (`sub`) that is also defined
by the calling function. This is both unclear and encourages
side-effects.

Invoke this function in a sub-shell instead.

Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---
 contrib/subtree/git-subtree.sh | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 791fd8260c..bae5d9170b 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -373,6 +373,10 @@ try_remove_previous () {
 }
 
 # Usage: process_subtree_split_trailer SPLIT_HASH MAIN_HASH [REPOSITORY]
+#
+# Parse SPLIT_HASH as a commit. If the commit is not found, fetches
+# REPOSITORY and tries again. If found, prints full commit hash.
+# Otherwise, dies.
 process_subtree_split_trailer () {
 	assert test $# -ge 2
 	assert test $# -le 3
@@ -400,6 +404,7 @@ process_subtree_split_trailer () {
 			die "$fail_msg"
 		fi
 	fi
+	echo "${sub}"
 }
 
 # Usage: find_latest_squash DIR [REPOSITORY]
@@ -432,7 +437,7 @@ find_latest_squash () {
 			main="$b"
 			;;
 		git-subtree-split:)
-			process_subtree_split_trailer "$b" "$sq" "$repository"
+			sub="$(process_subtree_split_trailer "$b" "$sq" "$repository")" || exit 1
 			;;
 		END)
 			if test -n "$sub"
@@ -489,7 +494,7 @@ find_existing_splits () {
 			main="$b"
 			;;
 		git-subtree-split:)
-			process_subtree_split_trailer "$b" "$sq" "$repository"
+			sub="$(process_subtree_split_trailer "$b" "$sq" "$repository")" || exit 1
 			;;
 		END)
 			debug "Main is: '$main'"

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/3] contrib/subtree: functionalize split traversal
  2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
  2026-03-05 23:55   ` [PATCH v2 1/3] contrib/subtree: reduce function side-effects Colin Stagner
@ 2026-03-05 23:55   ` Colin Stagner
  2026-03-05 23:55   ` [PATCH v2 3/3] contrib/subtree: reduce recursion during split Colin Stagner
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Colin Stagner @ 2026-03-05 23:55 UTC (permalink / raw)
  To: git, Christian Heusel, george
  Cc: Christian Hesse, Phillip Wood, Junio C Hamano, Colin Stagner

`git subtree split` requires an ancestor-first history traversal.
Refactor the existing rev-list traversal into its own function,
`find_commits_to_split`.

Pass unrevs via stdin to avoid limits on the maximum length of
command-line arguments. Also remove an unnecessary `eval`.

Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---
 contrib/subtree/git-subtree.sh | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index bae5d9170b..c1756b3e74 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -519,6 +519,31 @@ find_existing_splits () {
 	done || exit $?
 }
 
+# Usage: find_commits_to_split REV UNREVS [ARGS...]
+#
+# List each commit to split, with its parents.
+#
+# Specify the starting REV for the split, which is usually
+# a branch tip. Populate UNREVS with the last --rejoin for
+# this prefix, if any. Typically, `subtree split` ignores
+# history prior to the last --rejoin... unless and if it
+# becomes necessary to consider it. `find_existing_splits` is
+# a convenient source of UNREVS.
+#
+# Remaining arguments are passed to rev-list.
+#
+# Outputs commits in ancestor-first order, one per line, with
+# parent information. Outputs all parents before any child.
+find_commits_to_split() {
+	assert test $# -ge 2
+	rev="$1"
+	unrevs="$2"
+	shift 2
+
+	echo "$unrevs" |
+	git rev-list --topo-order --reverse --parents --stdin "$rev" "$@"
+}
+
 # Usage: copy_commit REV TREE FLAGS_STR
 copy_commit () {
 	assert test $# = 3
@@ -976,12 +1001,11 @@ cmd_split () {
 	# We can't restrict rev-list to only $dir here, because some of our
 	# parents have the $dir contents the root, and those won't match.
 	# (and rev-list --follow doesn't seem to solve this)
-	grl='git rev-list --topo-order --reverse --parents $rev $unrevs'
-	revmax=$(eval "$grl" | wc -l)
+	revmax="$(find_commits_to_split "$rev" "$unrevs" --count)"
 	revcount=0
 	createcount=0
 	extracount=0
-	eval "$grl" |
+	find_commits_to_split "$rev" "$unrevs" |
 	while read rev parents
 	do
 		process_split_commit "$rev" "$parents"

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/3] contrib/subtree: reduce recursion during split
  2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
  2026-03-05 23:55   ` [PATCH v2 1/3] contrib/subtree: reduce function side-effects Colin Stagner
  2026-03-05 23:55   ` [PATCH v2 2/3] contrib/subtree: functionalize split traversal Colin Stagner
@ 2026-03-05 23:55   ` Colin Stagner
  2026-03-13 22:51   ` [PATCH v2 0/3] " Junio C Hamano
  2026-04-16 13:25   ` Ian Jackson
  4 siblings, 0 replies; 22+ messages in thread
From: Colin Stagner @ 2026-03-05 23:55 UTC (permalink / raw)
  To: git, Christian Heusel, george
  Cc: Christian Hesse, Phillip Wood, Junio C Hamano, Colin Stagner

On Debian-alikes, POSIX sh has a hardcoded recursion depth
of 1000. This limit operates like bash's `$FUNCNEST` [1], but
it does not actually respect `$FUNCNEST`. This is non-standard
behavior. On other distros, the sh recursion depth is limited
only by the available stack size.

With certain history graphs, subtree splits are recursive—with
one recursion per commit. Attempting to split complex repos that
have thousands of commits, like [2], may fail on these distros.

Reduce the amount of recursion required by eagerly discovering
the complete range of commits to process.

The recursion is a side-effect of the rejoin-finder in
`find_existing_splits`. Rejoin mode, as in

    git subtree split --rejoin -b hax main ...

improves the speed of later splits by merging the split history
back into `main`. This gives the splitting algorithm a stopping
point. The rejoin maps one commit on `main` to one split commit
on `hax`. If we encounter this commit, we know that it maps to
`hax`.

But this is only a single point in the history. Many splits
require history from before the rejoin. See patch content for
examples.

If pre-rejoin history is required, `check_parents` recursively
discovers each individual parent, with one recursion per commit.
The recursion deepens the entire tree, even if an older rejoin
is available. This quickly overwhelms the Debian sh stack.

Instead of recursively processing each commit, process *all* the
commits back to the next obvious starting point: i.e., either the
next-oldest --rejoin or the beginning of history. This is where the
recursion is likely to stop anyway.

While this still requires recursion, it is *considerably* less
recursive.

[1]: https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html#index-FUNCNEST

[2]: https://github.com/christian-heusel/aur.git

Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---
 contrib/subtree/git-subtree.sh | 56 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index c1756b3e74..c649a9e393 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -315,6 +315,46 @@ cache_miss () {
 }
 
 # Usage: check_parents [REVS...]
+#
+# During a split, check that every commit in REVS has already been
+# processed via `process_split_commit`. If not, deepen the history
+# until it is.
+#
+# Commits authored by `subtree split` have to be created in the
+# same order as every other git commit: ancestor-first, with new
+# commits building on old commits. The traversal order normally
+# ensures this is the case, but it also excludes --rejoins commits
+# by default.
+#
+# The --rejoin tells us, "this mainline commit is equivalent to
+# this split commit." The relationship is only known for that
+# exact commit---and not before or after it. Frequently, commits
+# prior to a rejoin are not needed... but, just as often, they
+# are! Consider this history graph:
+#
+#              --D---
+#             /      \
+#         A--B--C--R--X--Y    main
+#                 /     /
+#          a--b--c     /      split
+#              \      /
+#               --e--/
+#
+# The main branch has commits A, B, and C. main is split into
+# commits a, b, and c. The split history is rejoined at R.
+#
+# There are at least two cases where we might need the A-B-C
+# history that is prior to R:
+#
+# 1. Commit D is based on history prior to R, but
+#    it isn't merged into mainline until after R.
+#
+# 2. Commit e is based on old split history. It is merged
+#    back into mainline with a subtree merge. Again, this
+#    happens after R.
+#
+# check_parents detects these cases and deepens the history
+# to the next available rejoin.
 check_parents () {
 	missed=$(cache_miss "$@") || exit $?
 	local indent=$(($indent + 1))
@@ -322,8 +362,20 @@ check_parents () {
 	do
 		if ! test -r "$cachedir/notree/$miss"
 		then
-			debug "incorrect order: $miss"
-			process_split_commit "$miss" ""
+			debug "found commit excluded by --rejoin: $miss. skipping to the next --rejoin..."
+			unrevs="$(find_existing_splits "$dir" "$miss" "$repository")" || exit 1
+
+			find_commits_to_split "$miss" "$unrevs" |
+			while read -r rev parents
+			do
+				process_split_commit "$rev" "$parents"
+			done
+
+			if ! test -r "$cachedir/$miss" &&
+				! test -r "$cachedir/notree/$miss"
+			then
+				die "failed to deepen history at $miss"
+			fi
 		fi
 	done
 }

-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
                     ` (2 preceding siblings ...)
  2026-03-05 23:55   ` [PATCH v2 3/3] contrib/subtree: reduce recursion during split Colin Stagner
@ 2026-03-13 22:51   ` Junio C Hamano
  2026-03-13 23:06     ` Junio C Hamano
  2026-04-16 13:25   ` Ian Jackson
  4 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2026-03-13 22:51 UTC (permalink / raw)
  To: Christian Heusel, george, Christian Hesse, Phillip Wood
  Cc: git, Colin Stagner

Colin Stagner <ask+git@howdoi.land> writes:

> * cs/subtree-split-recursion: when processing large history
>   graphs on Debian or Ubuntu, "git subtree" can die with a
>   "recursion depth reached" error. Reduce recursion.
>
> On Debian's POSIX sh, shell recursion is artificially limited
> to 1000 calls. You can check if your sh has limited recursion
> with:
>
>     #!/bin/sh
>     recurse() {
>         r=$(( r + 1 ))
>         test "$r" -le 1000 || { echo OK; exit; }
>         recurse
>     } && r=0 && recurse
>
> Depending on the history graph, subtree split can recurse deeply
> enough to encounter this limit. Rewrite the rejoin-deepening
> algorithm to reduce recursive calls.
>
> ---
> Changes in v2:
> - Rebase on master

We have seen two iterations of this series without anybody
commenting on it.  Is it a sign that the topic, or possibly "git
subtree" itself, is of interest to nobody?  Or is it that it is so
well done that nobody had any comment on it?

I don't use "git subtree" myself, and I do not know of anybody who
will scream at me if I break it by merging an unreviewed patch, so I
can merge it without worrying too much about fallout personally, but
that is a tad irresponsible as the maintainer ;-)

So...?  Any volunteers among those who have a higher stake in the
program than I do (which admittedly is not a high bar to cross)?

Thanks.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-03-13 22:51   ` [PATCH v2 0/3] " Junio C Hamano
@ 2026-03-13 23:06     ` Junio C Hamano
  2026-04-15 17:58       ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2026-03-13 23:06 UTC (permalink / raw)
  To: Colin Stagner
  Cc: Christian Heusel, george, Christian Hesse, Phillip Wood, git

Junio C Hamano <gitster@pobox.com> writes:

> Colin Stagner <ask+git@howdoi.land> writes:
>
>> * cs/subtree-split-recursion: when processing large history
>>   graphs on Debian or Ubuntu, "git subtree" can die with a
>>   "recursion depth reached" error. Reduce recursion.
>>
>> On Debian's POSIX sh, shell recursion is artificially limited
>> to 1000 calls. You can check if your sh has limited recursion
>> with:
>>
>>     #!/bin/sh
>>     recurse() {
>>         r=$(( r + 1 ))
>>         test "$r" -le 1000 || { echo OK; exit; }
>>         recurse
>>     } && r=0 && recurse
>>
>> Depending on the history graph, subtree split can recurse deeply
>> enough to encounter this limit. Rewrite the rejoin-deepening
>> algorithm to reduce recursive calls.
>>
>> ---
>> Changes in v2:
>> - Rebase on master
>
> We have seen two iterations of this series without anybody
> commenting on it.  Is it a sign that the topic, or possibly "git
> subtree" itself, is of interest to nobody?  Or is it that it is so
> well done that nobody had any comment on it?
>
> I don't use "git subtree" myself, and I do not know of anybody who
> will scream at me if I break it by merging an unreviewed patch, so I
> can merge it without worrying too much about fallout personally, but
> that is a tad irresponsible as the maintainer ;-)
>
> So...?  Any volunteers among those who have a higher stake in the
> program than I do (which admittedly is not a high bar to cross)?

FWIW, I can see that [1/3] is a benign clean-up that should not
change any semantics.  [2/3] talks about the variable $sub, which is
used elsewhere, is not protected from getting overwritten by running
the function inside a subprocess, but I do not know if updates to
other variables (like $b, $sq, $repository, but not $fail_msg,
$hint1 and $hint2 that are used only in this function) want to be
seen after the calls to this function outside (and do not want to
find out myself---I'd rather want to see somebody else with stakes
in "git subtree" to verify), but otherwise the change looks benigh
to me.  I have no idea if what [3/3] does is sensible or not (and
again, I'd rather want to see somebody with stakes to double check).

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-03-13 23:06     ` Junio C Hamano
@ 2026-04-15 17:58       ` Junio C Hamano
  2026-04-15 21:39         ` Ben Knoble
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2026-04-15 17:58 UTC (permalink / raw)
  To: git; +Cc: Colin Stagner, Christian Heusel, george, Christian Hesse,
	Phillip Wood

Junio C Hamano <gitster@pobox.com> writes:

>>> Depending on the history graph, subtree split can recurse deeply
>>> enough to encounter this limit. Rewrite the rejoin-deepening
>>> algorithm to reduce recursive calls.
>>>
>>> ---
>>> Changes in v2:
>>> - Rebase on master
>>
>> We have seen two iterations of this series without anybody
>> commenting on it.  Is it a sign that the topic, or possibly "git
>> subtree" itself, is of interest to nobody?  Or is it that it is so
>> well done that nobody had any comment on it?
>>
>> I don't use "git subtree" myself, and I do not know of anybody who
>> will scream at me if I break it by merging an unreviewed patch, so I
>> can merge it without worrying too much about fallout personally, but
>> that is a tad irresponsible as the maintainer ;-)
>>
>> So...?  Any volunteers among those who have a higher stake in the
>> program than I do (which admittedly is not a high bar to cross)?
>
> FWIW, I can see that [1/3] is a benign clean-up that should not
> change any semantics.  [2/3] talks about the variable $sub, which is
> used elsewhere, is not protected ...
> ... in "git subtree" to verify), but otherwise the change looks benign
> to me.  I have no idea if what [3/3] does is sensible or not (and
> again, I'd rather want to see somebody with stakes to double check).

So, yet not any volunteers?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-04-15 17:58       ` Junio C Hamano
@ 2026-04-15 21:39         ` Ben Knoble
  0 siblings, 0 replies; 22+ messages in thread
From: Ben Knoble @ 2026-04-15 21:39 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Colin Stagner, Christian Heusel, george, Christian Hesse,
	Phillip Wood


> Le 15 avr. 2026 à 13:58, Junio C Hamano <gitster@pobox.com> a écrit :
> 
> Junio C Hamano <gitster@pobox.com> writes:
> 
>>>> Depending on the history graph, subtree split can recurse deeply
>>>> enough to encounter this limit. Rewrite the rejoin-deepening
>>>> algorithm to reduce recursive calls.
>>>> 
>>>> ---
>>>> Changes in v2:
>>>> - Rebase on master
>>> 
>>> We have seen two iterations of this series without anybody
>>> commenting on it.  Is it a sign that the topic, or possibly "git
>>> subtree" itself, is of interest to nobody?  Or is it that it is so
>>> well done that nobody had any comment on it?
>>> 
>>> I don't use "git subtree" myself, and I do not know of anybody who
>>> will scream at me if I break it by merging an unreviewed patch, so I
>>> can merge it without worrying too much about fallout personally, but
>>> that is a tad irresponsible as the maintainer ;-)
>>> 
>>> So...?  Any volunteers among those who have a higher stake in the
>>> program than I do (which admittedly is not a high bar to cross)?
>> 
>> FWIW, I can see that [1/3] is a benign clean-up that should not
>> change any semantics.  [2/3] talks about the variable $sub, which is
>> used elsewhere, is not protected ...
>> ... in "git subtree" to verify), but otherwise the change looks benign
>> to me.  I have no idea if what [3/3] does is sensible or not (and
>> again, I'd rather want to see somebody with stakes to double check).
> 
> So, yet not any volunteers?

I have shared with some folks who I thought would have a stake in the matter, but the dearth of replies is evident :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
                     ` (3 preceding siblings ...)
  2026-03-13 22:51   ` [PATCH v2 0/3] " Junio C Hamano
@ 2026-04-16 13:25   ` Ian Jackson
  2026-04-16 19:34     ` Junio C Hamano
  2026-04-17  4:50     ` Colin Stagner
  4 siblings, 2 replies; 22+ messages in thread
From: Ian Jackson @ 2026-04-16 13:25 UTC (permalink / raw)
  To: Colin Stagner
  Cc: git, Christian Heusel, george, Christian Hesse, Phillip Wood,
	Junio C Hamano

Colin Stagner writes ("[PATCH v2 0/3] contrib/subtree: reduce recursion during split"):
> On Debian's POSIX sh, shell recursion is artificially limited
> to 1000 calls. You can check if your sh has limited recursion
> with:

FTR Debian supports multiple options for /bin/sh.  The shell in
question, with the limit that's troubling us, is dash.

> Depending on the history graph, subtree split can recurse deeply
> enough to encounter this limit. Rewrite the rejoin-deepening
> algorithm to reduce recursive calls.

Hi.  I'm a git-subtree user and indeed I was the one who reported the
bug Colin is trying to fix.  I would be happy to do a code review of
these changes.

However, before I get stuck into that, which seems like it will
involve some serious staring at shell code, I'd like to ask what seems
like a logically prior question:

Why not run the script under bash in non-POSIX mode instead?  I think
that would sidestep the problem.  If you don't want this program to
always depend on bash, you could have a little snippet at the top to
re-exec with bash if (1) it's available (2) we don't seem to be
running under bash already.  (Presumably the Debian package of git
would need to Recommend bash then.)

TBH I was quite surprised, when I reported this bug some time ago, to
find that git-subtree was written in shell.  If it had been me I would
probably have used Rust and libgit2.

Ian.

-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-04-16 13:25   ` Ian Jackson
@ 2026-04-16 19:34     ` Junio C Hamano
  2026-04-17  4:50     ` Colin Stagner
  1 sibling, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2026-04-16 19:34 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Colin Stagner, git, Christian Heusel, george, Christian Hesse,
	Phillip Wood

Ian Jackson <ijackson@chiark.greenend.org.uk> writes:

> TBH I was quite surprised, when I reported this bug some time ago, to
> find that git-subtree was written in shell.  If it had been me I would
> probably have used Rust and libgit2.

FWIW, it is my impression that on this mailing list, "git subtree"
is treated as more or less abandonware.  Patches to it often do not
attract any reviewers--- not the original author, nor those who have
subsequently touched it.

If you want to take it over and rewrite it with firmer commitment to
maintain it better (which unfortunately is not a high bar), that may
be appreciated by its users.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-04-16 13:25   ` Ian Jackson
  2026-04-16 19:34     ` Junio C Hamano
@ 2026-04-17  4:50     ` Colin Stagner
  2026-04-19 19:55       ` Ian Jackson
  1 sibling, 1 reply; 22+ messages in thread
From: Colin Stagner @ 2026-04-17  4:50 UTC (permalink / raw)
  To: Ian Jackson
  Cc: git, Christian Heusel, george, Christian Hesse, Phillip Wood,
	Junio C Hamano

On 4/16/26 08:25, Ian Jackson wrote:

> FTR Debian supports multiple options for /bin/sh.  The shell in
> question, with the limit that's troubling us, is dash.

Correct, I experience this behavior in dash.

> Why not run the script under bash in non-POSIX mode instead?  I think
> that would sidestep the problem. 

Our coding guidelines favor POSIX constructs over non-POSIX constructs, 
including for shell scripts [1]. POSIX helps us stay portable.

I'm not convinced that adding more shell interpreters to the mix would 
be a net win in terms of stability or consistency. This patch series 
addresses issues that arise from different implementations of sh. Adding 
bash vs sh to the mix will probably just make more bugs.


> If it had been me I would probably have used Rust and libgit2.

git-subtree has been around since 2009, so you would have first needed 
to invent Rust. :-) That said, a native Rust version of 
git-subtree-split would be much faster and easier to read.


Thanks for looking at this,

Colin

[1]: https://git-scm.com/docs/CodingGuidelines


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-04-17  4:50     ` Colin Stagner
@ 2026-04-19 19:55       ` Ian Jackson
  2026-04-20  1:09         ` Ben Knoble
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Ian Jackson @ 2026-04-19 19:55 UTC (permalink / raw)
  To: Colin Stagner
  Cc: git, Christian Heusel, george, Christian Hesse, Phillip Wood,
	Junio C Hamano

Colin Stagner writes ("Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split"):
> That said, a native Rust version of 
> git-subtree-split would be much faster and easier to read.

I prototyped something along the lines of the algorithm I described
earlier.  It is very fast, as expected.

The output looks plausible when I look at it by eye, but there are
some things that I need to look at more closely.  I should think some
more about invariants and tests.

Overall, I think this is worth pursuing.


Algorithm

I don't think it is going to be possible to precisely reproduce the
output of the existing git-subtree split.  Indeed the existing
git-subtree split is a bit cavalier with metadata (eg `committer` [1])
which probably ought to be changed in any case.

Even so, it should be possible to avoid foolishly rewriting the whole
history of the subtree, since we can stop at all the merges made by
"git-subtree merge", which are easily detectable by the extra metadata
keyword fields in the commit message.


Packaging

Before I go much further, how do we think this would best be packaged?
Currently my experiment is a standalone Rust package using
dependencies ("crates" as Rust calls thme) from current Debian stable
("trixie"). [2]  I haven't tried it with recent deps from upstream
crates.io.  There is not currently any entanglement with git.git; the
repository is accessed using libgit2 via Rust's git2 wrapper (and there
are no tests yet).

I'm tempted to continue this way and rewrite the other git-subtree
subcommands too, since they don't look that hard.  Using git.git
offers some packaging and testing continuity but the dependency
situation might become annoying.

It will probably be possible to make a Rust package which will build
with both recent upstream dependencies, and (say) Debian stable.
Going back much more than that is going to be awkward.

I see there's already some Rust in git.git:contrib/libgit-rs but that
looks like a poc.

Regards,
Ian.

[1] I don't think it's justifiable to convert a commit from the
downstream, into the subtree split version, and retain the original
committer line.  That can violate many people's expectations.
Here's an example from another context:
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1124226

That means we need to use a dummy committer in split commits, and put
the original committer into the message.  We should name the original
downstream commit in the commit message too.

The dummy committer needs to be a fixed string: changing it would
cause history proliferation (maybe even leading to unnecessary merge
conflicts).

[2] I wrote a blog post

   How to use Rust on Debian (and Ubuntu, etc.)
   https://diziet.dreamwidth.org/18122.html

which explains why this is a good approach.

-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-04-19 19:55       ` Ian Jackson
@ 2026-04-20  1:09         ` Ben Knoble
  2026-04-20  1:50         ` Junio C Hamano
  2026-04-20  9:57         ` Ian Jackson
  2 siblings, 0 replies; 22+ messages in thread
From: Ben Knoble @ 2026-04-20  1:09 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Colin Stagner, git, Christian Heusel, george, Christian Hesse,
	Phillip Wood, Junio C Hamano, brian m. carlson

I didn’t see bmc on cc, so added. I think they have some thoughts on how to organize Rust code with Git. Might be relevant, even though this is contrib/

> 
> Le 19 avr. 2026 à 15:56, Ian Jackson <ijackson@chiark.greenend.org.uk> a écrit :
> 
> Colin Stagner writes ("Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split"):
>> That said, a native Rust version of
>> git-subtree-split would be much faster and easier to read.
> 
> I prototyped something along the lines of the algorithm I described
> earlier.  It is very fast, as expected.
> 
> The output looks plausible when I look at it by eye, but there are
> some things that I need to look at more closely.  I should think some
> more about invariants and tests.
> 
> Overall, I think this is worth pursuing.
> 
> 
> Algorithm
> 
> I don't think it is going to be possible to precisely reproduce the
> output of the existing git-subtree split.  Indeed the existing
> git-subtree split is a bit cavalier with metadata (eg `committer` [1])
> which probably ought to be changed in any case.
> 
> Even so, it should be possible to avoid foolishly rewriting the whole
> history of the subtree, since we can stop at all the merges made by
> "git-subtree merge", which are easily detectable by the extra metadata
> keyword fields in the commit message.
> 
> 
> Packaging
> 
> Before I go much further, how do we think this would best be packaged?
> Currently my experiment is a standalone Rust package using
> dependencies ("crates" as Rust calls thme) from current Debian stable
> ("trixie"). [2]  I haven't tried it with recent deps from upstream
> crates.io.  There is not currently any entanglement with git.git; the
> repository is accessed using libgit2 via Rust's git2 wrapper (and there
> are no tests yet).
> 
> I'm tempted to continue this way and rewrite the other git-subtree
> subcommands too, since they don't look that hard.  Using git.git
> offers some packaging and testing continuity but the dependency
> situation might become annoying.
> 
> It will probably be possible to make a Rust package which will build
> with both recent upstream dependencies, and (say) Debian stable.
> Going back much more than that is going to be awkward.
> 
> I see there's already some Rust in git.git:contrib/libgit-rs but that
> looks like a poc.
> 
> Regards,
> Ian.
> 
> [1] I don't think it's justifiable to convert a commit from the
> downstream, into the subtree split version, and retain the original
> committer line.  That can violate many people's expectations.
> Here's an example from another context:
>  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1124226
> 
> That means we need to use a dummy committer in split commits, and put
> the original committer into the message.  We should name the original
> downstream commit in the commit message too.
> 
> The dummy committer needs to be a fixed string: changing it would
> cause history proliferation (maybe even leading to unnecessary merge
> conflicts).
> 
> [2] I wrote a blog post
> 
>   How to use Rust on Debian (and Ubuntu, etc.)
>   https://diziet.dreamwidth.org/18122.html
> 
> which explains why this is a good approach.
> 
> --
> Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.  
> 
> Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
> that is a private address which bypasses my fierce spamfilter.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-04-19 19:55       ` Ian Jackson
  2026-04-20  1:09         ` Ben Knoble
@ 2026-04-20  1:50         ` Junio C Hamano
  2026-04-20  9:57         ` Ian Jackson
  2 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2026-04-20  1:50 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Colin Stagner, git, Christian Heusel, george, Christian Hesse,
	Phillip Wood

Ian Jackson <ijackson@chiark.greenend.org.uk> writes:

> Colin Stagner writes ("Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split"):
>> That said, a native Rust version of 
>> git-subtree-split would be much faster and easier to read.
>
> I prototyped something along the lines of the algorithm I described
> earlier.  It is very fast, as expected.
>
> The output looks plausible when I look at it by eye, but there are
> some things that I need to look at more closely.  I should think some
> more about invariants and tests.
>
> Overall, I think this is worth pursuing.

;-).

> Before I go much further, how do we think this would best be packaged?

My preference is (as it has always been)

 (1) host it somewhere outside of my tree,

 (2) replace contrib/subtree/* with a single file
     contrib/subtree/README that lead people to the new location.

The preference is not limited to subtree but generally applies to
things in contrib/.  I prefer to see them graduate this project and
stand on their own, when they do not have storng dependency on the
git-core project.  From technical point of view, this is especially
true if your plan is to depend on libgit2, as it is not our
dependency.  Back when Git was very young, it did make sense to have
related-but-not-quite-Git things (like gitk and git-gui) shipped to
give them visibility, but we have passed that stage 15 years ago.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-04-19 19:55       ` Ian Jackson
  2026-04-20  1:09         ` Ben Knoble
  2026-04-20  1:50         ` Junio C Hamano
@ 2026-04-20  9:57         ` Ian Jackson
  2026-04-21  5:07           ` Colin Stagner
  2026-04-22 17:12           ` git-subtree rewrite Ian Jackson
  2 siblings, 2 replies; 22+ messages in thread
From: Ian Jackson @ 2026-04-20  9:57 UTC (permalink / raw)
  To: Colin Stagner
  Cc: git, Christian Heusel, george, Christian Hesse, Phillip Wood,
	Junio C Hamano

Ian Jackson writes ("Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split"):
> Algorithm
> 
> I don't think it is going to be possible to precisely reproduce the
> output of the existing git-subtree split.  Indeed the existing
> git-subtree split is a bit cavalier with metadata (eg `committer` [1])
> which probably ought to be changed in any case.
> 
> Even so, it should be possible to avoid foolishly rewriting the whole
> history of the subtree, since we can stop at all the merges made by
> "git-subtree merge", which are easily detectable by the extra metadata
> keyword fields in the commit message.

This last part turns out to be false.

It is only `git-subtree add` that puts this metadata in the commit
message; `git-subtree merge` doesn't.  This makes it very hard to
distinguish a subtree merge from (say) a merge of a branch that
predates the subtree add.

I need to think about this some more but I doubt this can be made to
work well without more significant changes, including to the data
model.  There would have to be some kind of compatibility arrangement
to handle existing histories.

Colin, is that OK with you?  If you would prefer, I could choose a
different name for the resulting program.  My preference would be,
with your consent, to continue to call it "git-subtree", version 2.

Regards,
Ian.

-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-04-20  9:57         ` Ian Jackson
@ 2026-04-21  5:07           ` Colin Stagner
  2026-04-22  9:43             ` Johannes Schindelin
  2026-04-22 17:12           ` git-subtree rewrite Ian Jackson
  1 sibling, 1 reply; 22+ messages in thread
From: Colin Stagner @ 2026-04-21  5:07 UTC (permalink / raw)
  To: Ian Jackson
  Cc: git, Christian Heusel, george, Christian Hesse, Phillip Wood,
	Junio C Hamano

On 4/20/26 04:57, Ian Jackson wrote:

> I need to think about this some more but I doubt this can be made to
> work well without more significant changes, including to the data
> model.  There would have to be some kind of compatibility arrangement
> to handle existing histories.

I would take a look at the test-cases for git-subtree.sh, which document 
some of the kinds of issues you will encounter. They may help you test 
compatibility.

Anything you can do to limit breakage to "opt-in" points-in-time only 
would be greatly appreciated.

> Colin, is that OK with you?

You can name it and develop it however you like. No need to ask 
permission here.

(For the record, I'm also not the maintainer of contrib/git-subtree. 
I've just been trying to fix a few issues with it.)

> If you would prefer, I could choose a different name for the
> resulting program.

If I were writing it, I would give the new program a different name but 
perhaps provide a "compile-time" way to set it to "git-subtree" instead. 
My reason for this is that it may need to exist with the legacy script 
for awhile, and it's good to be able to tell them apart.

Colin


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
  2026-04-21  5:07           ` Colin Stagner
@ 2026-04-22  9:43             ` Johannes Schindelin
  0 siblings, 0 replies; 22+ messages in thread
From: Johannes Schindelin @ 2026-04-22  9:43 UTC (permalink / raw)
  To: Colin Stagner
  Cc: Ian Jackson, git, Christian Heusel, george, Christian Hesse,
	Phillip Wood, Junio C Hamano

Hi Colin & Ian,

On Wed, 22 Apr 2026, Colin Stagner wrote:

> On 4/20/26 04:57, Ian Jackson wrote:
> 
> > I need to think about this some more but I doubt this can be made to
> > work well without more significant changes, including to the data
> > model.  There would have to be some kind of compatibility arrangement
> > to handle existing histories.
> 
> I would take a look at the test-cases for git-subtree.sh, which document 
> some of the kinds of issues you will encounter. They may help you test 
> compatibility.
> 
> Anything you can do to limit breakage to "opt-in" points-in-time only 
> would be greatly appreciated.
> 
> > Colin, is that OK with you?
> 
> You can name it and develop it however you like. No need to ask 
> permission here.
> 
> (For the record, I'm also not the maintainer of contrib/git-subtree. 
> I've just been trying to fix a few issues with it.)
> 
> > If you would prefer, I could choose a different name for the
> > resulting program.
> 
> If I were writing it, I would give the new program a different name but 
> perhaps provide a "compile-time" way to set it to "git-subtree" instead. 
> My reason for this is that it may need to exist with the legacy script 
> for awhile, and it's good to be able to tell them apart.

I just wanted to chime in to cheer you on, I've been following this
Rust-based `git-subtree` idea with interest. You may know it already, Git
for Windows is shipping `git subtree` with its installers for ages, and
given the abysmal performance characteristics of shell-based Git commands
on Windows, it would be really good to replace the shell-scripted version
with the Rust version (also to ensure proper error handling, which is
hard to make comprehensive in Unix shell scripts).

Thank you for pushing this forward!

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* git-subtree rewrite
  2026-04-20  9:57         ` Ian Jackson
  2026-04-21  5:07           ` Colin Stagner
@ 2026-04-22 17:12           ` Ian Jackson
  1 sibling, 0 replies; 22+ messages in thread
From: Ian Jackson @ 2026-04-22 17:12 UTC (permalink / raw)
  To: Avery Pennarun
  Cc: Colin Stagner, git, Christian Heusel, george, Christian Hesse,
	Phillip Wood, Junio C Hamano

Hi, Avery.

tl;dr:
  Do you object if I use the name git-subtree for my rewrite?

  I intend it to be forward compatible with existing git-subtree
  histories and existing command line invocations.


I've been looking into your git-subtree program.  Thanks for it;
it is definitely solving a very real problem reasonably well. [1]
However, I think the existing implementation (in shell) and data model
need some work.

I have done some experiments, with enough success that I have more or
less decided to try to rewrite git-subtree.

I am intending to make my rewrite able to work with existing histories
(ie, projects which have done git-subtree add and git-subtree merge).
I intend to support the existing command line interface, although I
may improve that later.

I am also hoping to be able to define the data model more formally.

The git maintainers and others on the git mailing list seem reasonably
enthusiastic about all this.  My nascent rewrite is a a standalone
Rust package, and the plan would be for it to obsolete the shell
script in git.git/contrib, but live outside the git project itself.

I would like to call my new program "git-subtree" and have it use
(and extend) the exisitng `git-subtree-...:` metadata that
`git-subtree add` puts into its generated commits.

Obviously there are compatibility, packaging, and deployment
considerations, which I'm keeping in mind.  I don't want to break
anyone downstream.  So I will proceed reasonably cautiously.

I hope this is all OK with you.  If not, or if you have questions,
please let me know, using reply-all to this email (so the mailing list
gets a copy).

If I don't hear from you I will go ahead.  The actual programming work
is going to take a while so watch this space but not too closely :-).

Regards,
Ian.

[1] See also my blog post
   Never use git submodules
   https://diziet.dreamwidth.org/14666.html

-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-04-22 17:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-15 20:17 [PATCH 0/3] contrib/subtree: reduce recursion during split Colin Stagner
2026-02-15 20:17 ` [PATCH 1/3] contrib/subtree: reduce function side-effects Colin Stagner
2026-02-15 20:17 ` [PATCH 2/3] contrib/subtree: functionalize split traversal Colin Stagner
2026-02-15 20:17 ` [PATCH 3/3] contrib/subtree: reduce recursion during split Colin Stagner
2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
2026-03-05 23:55   ` [PATCH v2 1/3] contrib/subtree: reduce function side-effects Colin Stagner
2026-03-05 23:55   ` [PATCH v2 2/3] contrib/subtree: functionalize split traversal Colin Stagner
2026-03-05 23:55   ` [PATCH v2 3/3] contrib/subtree: reduce recursion during split Colin Stagner
2026-03-13 22:51   ` [PATCH v2 0/3] " Junio C Hamano
2026-03-13 23:06     ` Junio C Hamano
2026-04-15 17:58       ` Junio C Hamano
2026-04-15 21:39         ` Ben Knoble
2026-04-16 13:25   ` Ian Jackson
2026-04-16 19:34     ` Junio C Hamano
2026-04-17  4:50     ` Colin Stagner
2026-04-19 19:55       ` Ian Jackson
2026-04-20  1:09         ` Ben Knoble
2026-04-20  1:50         ` Junio C Hamano
2026-04-20  9:57         ` Ian Jackson
2026-04-21  5:07           ` Colin Stagner
2026-04-22  9:43             ` Johannes Schindelin
2026-04-22 17:12           ` git-subtree rewrite Ian Jackson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox