* [PATCH v2 1/3] contrib/subtree: reduce function side-effects
2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
@ 2026-03-05 23:55 ` Colin Stagner
2026-03-05 23:55 ` [PATCH v2 2/3] contrib/subtree: functionalize split traversal Colin Stagner
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Colin Stagner @ 2026-03-05 23:55 UTC (permalink / raw)
To: git, Christian Heusel, george
Cc: Christian Hesse, Phillip Wood, Junio C Hamano, Colin Stagner
`process_subtree_split_trailer()` communicates its return value
to the caller by setting a variable (`sub`) that is also defined
by the calling function. This is both unclear and encourages
side-effects.
Invoke this function in a sub-shell instead.
Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---
contrib/subtree/git-subtree.sh | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index 791fd8260c..bae5d9170b 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -373,6 +373,10 @@ try_remove_previous () {
}
# Usage: process_subtree_split_trailer SPLIT_HASH MAIN_HASH [REPOSITORY]
+#
+# Parse SPLIT_HASH as a commit. If the commit is not found, fetches
+# REPOSITORY and tries again. If found, prints full commit hash.
+# Otherwise, dies.
process_subtree_split_trailer () {
assert test $# -ge 2
assert test $# -le 3
@@ -400,6 +404,7 @@ process_subtree_split_trailer () {
die "$fail_msg"
fi
fi
+ echo "${sub}"
}
# Usage: find_latest_squash DIR [REPOSITORY]
@@ -432,7 +437,7 @@ find_latest_squash () {
main="$b"
;;
git-subtree-split:)
- process_subtree_split_trailer "$b" "$sq" "$repository"
+ sub="$(process_subtree_split_trailer "$b" "$sq" "$repository")" || exit 1
;;
END)
if test -n "$sub"
@@ -489,7 +494,7 @@ find_existing_splits () {
main="$b"
;;
git-subtree-split:)
- process_subtree_split_trailer "$b" "$sq" "$repository"
+ sub="$(process_subtree_split_trailer "$b" "$sq" "$repository")" || exit 1
;;
END)
debug "Main is: '$main'"
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v2 2/3] contrib/subtree: functionalize split traversal
2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
2026-03-05 23:55 ` [PATCH v2 1/3] contrib/subtree: reduce function side-effects Colin Stagner
@ 2026-03-05 23:55 ` Colin Stagner
2026-03-05 23:55 ` [PATCH v2 3/3] contrib/subtree: reduce recursion during split Colin Stagner
2026-03-13 22:51 ` [PATCH v2 0/3] " Junio C Hamano
3 siblings, 0 replies; 10+ messages in thread
From: Colin Stagner @ 2026-03-05 23:55 UTC (permalink / raw)
To: git, Christian Heusel, george
Cc: Christian Hesse, Phillip Wood, Junio C Hamano, Colin Stagner
`git subtree split` requires an ancestor-first history traversal.
Refactor the existing rev-list traversal into its own function,
`find_commits_to_split`.
Pass unrevs via stdin to avoid limits on the maximum length of
command-line arguments. Also remove an unnecessary `eval`.
Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---
contrib/subtree/git-subtree.sh | 30 +++++++++++++++++++++++++++---
1 file changed, 27 insertions(+), 3 deletions(-)
diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index bae5d9170b..c1756b3e74 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -519,6 +519,31 @@ find_existing_splits () {
done || exit $?
}
+# Usage: find_commits_to_split REV UNREVS [ARGS...]
+#
+# List each commit to split, with its parents.
+#
+# Specify the starting REV for the split, which is usually
+# a branch tip. Populate UNREVS with the last --rejoin for
+# this prefix, if any. Typically, `subtree split` ignores
+# history prior to the last --rejoin... unless and if it
+# becomes necessary to consider it. `find_existing_splits` is
+# a convenient source of UNREVS.
+#
+# Remaining arguments are passed to rev-list.
+#
+# Outputs commits in ancestor-first order, one per line, with
+# parent information. Outputs all parents before any child.
+find_commits_to_split() {
+ assert test $# -ge 2
+ rev="$1"
+ unrevs="$2"
+ shift 2
+
+ echo "$unrevs" |
+ git rev-list --topo-order --reverse --parents --stdin "$rev" "$@"
+}
+
# Usage: copy_commit REV TREE FLAGS_STR
copy_commit () {
assert test $# = 3
@@ -976,12 +1001,11 @@ cmd_split () {
# We can't restrict rev-list to only $dir here, because some of our
# parents have the $dir contents the root, and those won't match.
# (and rev-list --follow doesn't seem to solve this)
- grl='git rev-list --topo-order --reverse --parents $rev $unrevs'
- revmax=$(eval "$grl" | wc -l)
+ revmax="$(find_commits_to_split "$rev" "$unrevs" --count)"
revcount=0
createcount=0
extracount=0
- eval "$grl" |
+ find_commits_to_split "$rev" "$unrevs" |
while read rev parents
do
process_split_commit "$rev" "$parents"
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* [PATCH v2 3/3] contrib/subtree: reduce recursion during split
2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
2026-03-05 23:55 ` [PATCH v2 1/3] contrib/subtree: reduce function side-effects Colin Stagner
2026-03-05 23:55 ` [PATCH v2 2/3] contrib/subtree: functionalize split traversal Colin Stagner
@ 2026-03-05 23:55 ` Colin Stagner
2026-03-13 22:51 ` [PATCH v2 0/3] " Junio C Hamano
3 siblings, 0 replies; 10+ messages in thread
From: Colin Stagner @ 2026-03-05 23:55 UTC (permalink / raw)
To: git, Christian Heusel, george
Cc: Christian Hesse, Phillip Wood, Junio C Hamano, Colin Stagner
On Debian-alikes, POSIX sh has a hardcoded recursion depth
of 1000. This limit operates like bash's `$FUNCNEST` [1], but
it does not actually respect `$FUNCNEST`. This is non-standard
behavior. On other distros, the sh recursion depth is limited
only by the available stack size.
With certain history graphs, subtree splits are recursive—with
one recursion per commit. Attempting to split complex repos that
have thousands of commits, like [2], may fail on these distros.
Reduce the amount of recursion required by eagerly discovering
the complete range of commits to process.
The recursion is a side-effect of the rejoin-finder in
`find_existing_splits`. Rejoin mode, as in
git subtree split --rejoin -b hax main ...
improves the speed of later splits by merging the split history
back into `main`. This gives the splitting algorithm a stopping
point. The rejoin maps one commit on `main` to one split commit
on `hax`. If we encounter this commit, we know that it maps to
`hax`.
But this is only a single point in the history. Many splits
require history from before the rejoin. See patch content for
examples.
If pre-rejoin history is required, `check_parents` recursively
discovers each individual parent, with one recursion per commit.
The recursion deepens the entire tree, even if an older rejoin
is available. This quickly overwhelms the Debian sh stack.
Instead of recursively processing each commit, process *all* the
commits back to the next obvious starting point: i.e., either the
next-oldest --rejoin or the beginning of history. This is where the
recursion is likely to stop anyway.
While this still requires recursion, it is *considerably* less
recursive.
[1]: https://www.gnu.org/software/bash/manual/html_node/Bash-Variables.html#index-FUNCNEST
[2]: https://github.com/christian-heusel/aur.git
Signed-off-by: Colin Stagner <ask+git@howdoi.land>
---
contrib/subtree/git-subtree.sh | 56 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 54 insertions(+), 2 deletions(-)
diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh
index c1756b3e74..c649a9e393 100755
--- a/contrib/subtree/git-subtree.sh
+++ b/contrib/subtree/git-subtree.sh
@@ -315,6 +315,46 @@ cache_miss () {
}
# Usage: check_parents [REVS...]
+#
+# During a split, check that every commit in REVS has already been
+# processed via `process_split_commit`. If not, deepen the history
+# until it is.
+#
+# Commits authored by `subtree split` have to be created in the
+# same order as every other git commit: ancestor-first, with new
+# commits building on old commits. The traversal order normally
+# ensures this is the case, but it also excludes --rejoins commits
+# by default.
+#
+# The --rejoin tells us, "this mainline commit is equivalent to
+# this split commit." The relationship is only known for that
+# exact commit---and not before or after it. Frequently, commits
+# prior to a rejoin are not needed... but, just as often, they
+# are! Consider this history graph:
+#
+# --D---
+# / \
+# A--B--C--R--X--Y main
+# / /
+# a--b--c / split
+# \ /
+# --e--/
+#
+# The main branch has commits A, B, and C. main is split into
+# commits a, b, and c. The split history is rejoined at R.
+#
+# There are at least two cases where we might need the A-B-C
+# history that is prior to R:
+#
+# 1. Commit D is based on history prior to R, but
+# it isn't merged into mainline until after R.
+#
+# 2. Commit e is based on old split history. It is merged
+# back into mainline with a subtree merge. Again, this
+# happens after R.
+#
+# check_parents detects these cases and deepens the history
+# to the next available rejoin.
check_parents () {
missed=$(cache_miss "$@") || exit $?
local indent=$(($indent + 1))
@@ -322,8 +362,20 @@ check_parents () {
do
if ! test -r "$cachedir/notree/$miss"
then
- debug "incorrect order: $miss"
- process_split_commit "$miss" ""
+ debug "found commit excluded by --rejoin: $miss. skipping to the next --rejoin..."
+ unrevs="$(find_existing_splits "$dir" "$miss" "$repository")" || exit 1
+
+ find_commits_to_split "$miss" "$unrevs" |
+ while read -r rev parents
+ do
+ process_split_commit "$rev" "$parents"
+ done
+
+ if ! test -r "$cachedir/$miss" &&
+ ! test -r "$cachedir/notree/$miss"
+ then
+ die "failed to deepen history at $miss"
+ fi
fi
done
}
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
2026-03-05 23:55 ` [PATCH v2 0/3] " Colin Stagner
` (2 preceding siblings ...)
2026-03-05 23:55 ` [PATCH v2 3/3] contrib/subtree: reduce recursion during split Colin Stagner
@ 2026-03-13 22:51 ` Junio C Hamano
2026-03-13 23:06 ` Junio C Hamano
3 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2026-03-13 22:51 UTC (permalink / raw)
To: Christian Heusel, george, Christian Hesse, Phillip Wood
Cc: git, Colin Stagner
Colin Stagner <ask+git@howdoi.land> writes:
> * cs/subtree-split-recursion: when processing large history
> graphs on Debian or Ubuntu, "git subtree" can die with a
> "recursion depth reached" error. Reduce recursion.
>
> On Debian's POSIX sh, shell recursion is artificially limited
> to 1000 calls. You can check if your sh has limited recursion
> with:
>
> #!/bin/sh
> recurse() {
> r=$(( r + 1 ))
> test "$r" -le 1000 || { echo OK; exit; }
> recurse
> } && r=0 && recurse
>
> Depending on the history graph, subtree split can recurse deeply
> enough to encounter this limit. Rewrite the rejoin-deepening
> algorithm to reduce recursive calls.
>
> ---
> Changes in v2:
> - Rebase on master
We have seen two iterations of this series without anybody
commenting on it. Is it a sign that the topic, or possibly "git
subtree" itself, is of interest to nobody? Or is it that it is so
well done that nobody had any comment on it?
I don't use "git subtree" myself, and I do not know of anybody who
will scream at me if I break it by merging an unreviewed patch, so I
can merge it without worrying too much about fallout personally, but
that is a tad irresponsible as the maintainer ;-)
So...? Any volunteers among those who have a higher stake in the
program than I do (which admittedly is not a high bar to cross)?
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2 0/3] contrib/subtree: reduce recursion during split
2026-03-13 22:51 ` [PATCH v2 0/3] " Junio C Hamano
@ 2026-03-13 23:06 ` Junio C Hamano
0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2026-03-13 23:06 UTC (permalink / raw)
To: Colin Stagner
Cc: Christian Heusel, george, Christian Hesse, Phillip Wood, git
Junio C Hamano <gitster@pobox.com> writes:
> Colin Stagner <ask+git@howdoi.land> writes:
>
>> * cs/subtree-split-recursion: when processing large history
>> graphs on Debian or Ubuntu, "git subtree" can die with a
>> "recursion depth reached" error. Reduce recursion.
>>
>> On Debian's POSIX sh, shell recursion is artificially limited
>> to 1000 calls. You can check if your sh has limited recursion
>> with:
>>
>> #!/bin/sh
>> recurse() {
>> r=$(( r + 1 ))
>> test "$r" -le 1000 || { echo OK; exit; }
>> recurse
>> } && r=0 && recurse
>>
>> Depending on the history graph, subtree split can recurse deeply
>> enough to encounter this limit. Rewrite the rejoin-deepening
>> algorithm to reduce recursive calls.
>>
>> ---
>> Changes in v2:
>> - Rebase on master
>
> We have seen two iterations of this series without anybody
> commenting on it. Is it a sign that the topic, or possibly "git
> subtree" itself, is of interest to nobody? Or is it that it is so
> well done that nobody had any comment on it?
>
> I don't use "git subtree" myself, and I do not know of anybody who
> will scream at me if I break it by merging an unreviewed patch, so I
> can merge it without worrying too much about fallout personally, but
> that is a tad irresponsible as the maintainer ;-)
>
> So...? Any volunteers among those who have a higher stake in the
> program than I do (which admittedly is not a high bar to cross)?
FWIW, I can see that [1/3] is a benign clean-up that should not
change any semantics. [2/3] talks about the variable $sub, which is
used elsewhere, is not protected from getting overwritten by running
the function inside a subprocess, but I do not know if updates to
other variables (like $b, $sq, $repository, but not $fail_msg,
$hint1 and $hint2 that are used only in this function) want to be
seen after the calls to this function outside (and do not want to
find out myself---I'd rather want to see somebody else with stakes
in "git subtree" to verify), but otherwise the change looks benigh
to me. I have no idea if what [3/3] does is sensible or not (and
again, I'd rather want to see somebody with stakes to double check).
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread