* git subtree bugs (mishandled merges, recursion depth)
@ 2024-07-17 16:31 Ian Jackson
0 siblings, 0 replies; 6+ messages in thread
From: Ian Jackson @ 2024-07-17 16:31 UTC (permalink / raw)
To: git
I have what ought to be a fairly straightforward situation that
git-subtree seems to be mishandling.
Steps to reproduce:
git clone https://gitlab.torproject.org/tpo/core/arti.git
cd arti
git checkout 01d02118cdda30636e606fc1a89b3e04f28b8ad1
git subtree split -P maint/rust-maint-common
Expected behaviour:
git subtree (hopefully fairly rapidly) prints a the commitid of the
tip of a branch suitable for merging back to the upstream repo, which
is at https://gitlab.torproject.org/tpo/core//rust-maint-common
The resulting history ought to have a few dozen commits,
most of which are the upstream history of the subtree.
Actual behaviour (git 2.45.2, Debian amd64 1:2.45.2-1 .deb):
$ git subtree split -P maint/rust-maint-common
/usr/lib/git-core/git-subtree: 318: Maximum function recursion depth (1000) reached
$
Actual behaviour (git 2.20.1, Debian ancient 1:2.20.1-2+deb10u9):
Takes a very long time. Everntually produces an output commit
which has most of arti.git#main in its history.
Notes about the source repository:
The state of arti.git:maint/rust-maint-common is the result of the
following:
(i) create a new rust-maint-common.git, and add and edit files
(many of these changes came via gitlab MRs, there are merges)
(ii) in arti.git, `git subtree add`, and make further changes,
to files both within and without the subtree
(iii) Make a gitlab MR from (ii) and merge it into arti.git#main.
(resulting in a fairly merge-rich history)
https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/2267
A workaround:
If I check out main^2 (01d02118cdda30636e606fc1a89b3e04f28b8ad1^2)
and run git-subtree split using the ancient version of git, it still
takes ages, but the output is correct. So the old version of git has
a bug meaning it can produce higly excessive output, when merges are
present.
This workaround is only available because right now the history of
the subtree's files, within arti.git, is fairly simple.
With the new version of git, I get the "recursion depth" error,
regardless.
Thanks for your attention.
Ian.
--
Ian Jackson <ijackson@chiark.greenend.org.uk> These opinions are my own.
Pronouns: they/he. If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.
^ permalink raw reply [flat|nested] 6+ messages in thread
* git subtree bugs (mishandled merges, recursion depth)
@ 2024-07-17 16:49 Ian Jackson
0 siblings, 0 replies; 6+ messages in thread
From: Ian Jackson @ 2024-07-17 16:49 UTC (permalink / raw)
To: git
I have what ought to be a fairly straightforward situation that
git-subtree seems to be mishandling.
Steps to reproduce:
git clone https://gitlab.torproject.org/tpo/core/arti.git
cd arti
git checkout 01d02118cdda30636e606fc1a89b3e04f28b8ad1
git subtree split -P maint/rust-maint-common
Expected behaviour:
git subtree (hopefully fairly rapidly) prints a the commitid of the
tip of a branch suitable for merging back to the upstream repo, which
is at https://gitlab.torproject.org/tpo/core//rust-maint-common
The resulting history ought to have a few dozen commits,
most of which are the upstream history of the subtree.
Actual behaviour (git 2.45.2, Debian amd64 1:2.45.2-1 .deb):
$ git subtree split -P maint/rust-maint-common
/usr/lib/git-core/git-subtree: 318: Maximum function recursion depth (1000) reached
$
Actual behaviour (git 2.20.1, Debian ancient 1:2.20.1-2+deb10u9):
Takes a very long time. Everntually produces an output commit
which has most of arti.git#main in its history.
Notes about the source repository:
The state of arti.git:maint/rust-maint-common is the result of the
following:
(i) create a new rust-maint-common.git, and add and edit files
(many of these changes came via gitlab MRs, there are merges)
(ii) in arti.git, `git subtree add`, and make further changes,
to files both within and without the subtree
(iii) Make a gitlab MR from (ii) and merge it into arti.git#main.
(resulting in a fairly merge-rich history)
https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/2267
A workaround:
If I check out main^2 (01d02118cdda30636e606fc1a89b3e04f28b8ad1^2)
and run git-subtree split using the ancient version of git, it still
takes ages, but the output is correct. So the old version of git has
a bug meaning it can produce higly excessive output, when merges are
present.
This workaround is only available because right now the history of
the subtree's files, within arti.git, is fairly simple.
With the new version of git, I get the "recursion depth" error,
regardless.
Thanks for your attention.
Ian.
--
Ian Jackson <ijackson@chiark.greenend.org.uk> These opinions are my own.
Pronouns: they/he. If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.
^ permalink raw reply [flat|nested] 6+ messages in thread
* git subtree bugs (mishandled merges, recursion depth)
@ 2024-07-17 16:55 Ian Jackson
2026-04-16 1:26 ` Colin Stagner
0 siblings, 1 reply; 6+ messages in thread
From: Ian Jackson @ 2024-07-17 16:55 UTC (permalink / raw)
To: git
I have what ought to be a fairly straightforward situation that
git-subtree seems to be mishandling.
Steps to reproduce:
git clone https://gitlab.torproject.org/tpo/core/arti.git
cd arti
git checkout 01d02118cdda30636e606fc1a89b3e04f28b8ad1
git subtree split -P maint/rust-maint-common
Expected behaviour:
git subtree (hopefully fairly rapidly) prints a the commitid of the
tip of a branch suitable for merging back to the upstream repo, which
is at https://gitlab.torproject.org/tpo/core//rust-maint-common
The resulting history ought to have a few dozen commits,
most of which are the upstream history of the subtree.
Actual behaviour (git 2.45.2, Debian amd64 1:2.45.2-1 .deb):
$ git subtree split -P maint/rust-maint-common
/usr/lib/git-core/git-subtree: 318: Maximum function recursion depth (1000) reached
$
Actual behaviour (git 2.20.1, Debian ancient 1:2.20.1-2+deb10u9):
Takes a very long time. Everntually produces an output commit
which has most of arti.git#main in its history.
Notes about the source repository:
The state of arti.git:maint/rust-maint-common is the result of the
following:
(i) create a new rust-maint-common.git, and add and edit files
(many of these changes came via gitlab MRs, there are merges)
(ii) in arti.git, `git subtree add`, and make further changes,
to files both within and without the subtree
(iii) Make a gitlab MR from (ii) and merge it into arti.git#main.
(resulting in a fairly merge-rich history)
https://gitlab.torproject.org/tpo/core/arti/-/merge_requests/2267
A workaround:
If I check out main^2 (01d02118cdda30636e606fc1a89b3e04f28b8ad1^2)
and run git-subtree split using the ancient version of git, it still
takes ages, but the output is correct. So the old version of git has
a bug meaning it can produce higly excessive output, when merges are
present.
This workaround is only available because right now the history of
the subtree's files, within arti.git, is fairly simple.
With the new version of git, I get the "recursion depth" error,
regardless.
Thanks for your attention.
Ian.
--
Ian Jackson <ijackson@chiark.greenend.org.uk> These opinions are my own.
Pronouns: they/he. If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git subtree bugs (mishandled merges, recursion depth)
2024-07-17 16:55 git subtree bugs (mishandled merges, recursion depth) Ian Jackson
@ 2026-04-16 1:26 ` Colin Stagner
2026-04-16 14:31 ` Ian Jackson
0 siblings, 1 reply; 6+ messages in thread
From: Colin Stagner @ 2026-04-16 1:26 UTC (permalink / raw)
To: Ian Jackson; +Cc: git
Hello Ian, does this git-subtree issue still affect you?
On 7/17/24 11:55, Ian Jackson wrote:
> Steps to reproduce:
>
> git clone https://gitlab.torproject.org/tpo/core/arti.git
> cd arti
> git checkout 01d02118cdda30636e606fc1a89b3e04f28b8ad1
> git subtree split -P maint/rust-maint-common
>
> Actual behaviour (git 2.45.2, Debian amd64 1:2.45.2-1 .deb):
>
> $ git subtree split -P maint/rust-maint-common
> /usr/lib/git-core/git-subtree: 318: Maximum function recursion depth (1000) reached
> $
On Debian's POSIX sh, shell recursion is artificially limited to 1000
calls. This is not typical behavior; most distros I've tested do not cap
it. bash has a configurable recursion depth limit, but sh ignores it.
I've proposed a fix for the recursion depth issue in:
<https://lore.kernel.org/git/20260305-cs-subtree-split-recursion-v2-0-7266be870ba9@howdoi.land>
If you have the time, I'd appreciate some testing and/or a code review.
> Expected behaviour:
>
> The resulting history ought to have a few dozen commits,
> most of which are the upstream history of the subtree.
> Actual behaviour (git 2.20.1, Debian ancient 1:2.20.1-2+deb10u9):
>
> Takes a very long time. Everntually produces an output commit
> which has most of arti.git#main in its history.
Even with my patch series applied, there are many more than a "few dozen
commits" in the history. For me this splits as
9a2422685e6cc05625f47a1fe709f1908f31fc87
with 12307 commits in the history graph.
The reason for this is likely e7b07376e5 (Merge branch
'rs/subtree-fixes', 2018-10-26), which was merged around that time.
Previous versions discarded too much history, and that patch series
added more merge-base ancestry checks.
When merges come into play, the task of choosing which history is
"important" and which history is "not important" is not always clear-cut.
Colin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git subtree bugs (mishandled merges, recursion depth)
2026-04-16 1:26 ` Colin Stagner
@ 2026-04-16 14:31 ` Ian Jackson
2026-04-17 4:14 ` Colin Stagner
0 siblings, 1 reply; 6+ messages in thread
From: Ian Jackson @ 2026-04-16 14:31 UTC (permalink / raw)
To: Colin Stagner; +Cc: git
Colin Stagner writes ("Re: git subtree bugs (mishandled merges, recursion depth)"):
> On 7/17/24 11:55, Ian Jackson wrote:
> > Actual behaviour (git 2.20.1, Debian ancient 1:2.20.1-2+deb10u9):
> >
> > Takes a very long time. Everntually produces an output commit
> > which has most of arti.git#main in its history.
>
> Even with my patch series applied, there are many more than a "few dozen
> commits" in the history. For me this splits as
Hi. (For future reference, that patch series is
[PATCH v2 0/3] contrib/subtree: reduce recursion during split
in the other thread.)
> 9a2422685e6cc05625f47a1fe709f1908f31fc87
>
> with 12307 commits in the history graph.
>
> The reason for this is likely e7b07376e5 (Merge branch
> 'rs/subtree-fixes', 2018-10-26), which was merged around that time.
> Previous versions discarded too much history, and that patch series
> added more merge-base ancestry checks.
>
> When merges come into play, the task of choosing which history is
> "important" and which history is "not important" is not always clear-cut.
I have some thoughts about this.
I didn't find a formal description of git-subtree's data model, or how
git subtree split works, precisely. So I'm going to make some
suppositions.
I observe that git-subtree split doesn't record any metadata in the
split versions of the commits (for example, the downstream project
commitid they were split from).
Repeated splits ought ideally not to constantly generate additional
material. So the algorithm ought to be deterministic. An easy way to
do that is to make splitting a pure function from downstream commits
to subtree commits.
If one can run git subtree split on every commit in the downstream
that has a git subtree merge as an ancestor, then one might think that
means the split must produce as many commits as there are in the
downtream.
But we can map multiple downstream commits to the same subtree
commit. Consider the cases, for some downstream commit D.
0. D is a single parent commit that *does* change the subtree.
This becomes a new commit with parent split(D~).
1. D is a single parent commit that doesn't change the subtree:
We reuse the parent's split: split(D) = split(D~)
2. D is a multi-parent commit. Determine \forall{i} split(D^i).
Discard all split(D^i) which are ancestors of any split(D^j).
If any remaining split(D^i) is not subtree-treesame D,
or there is more than one remaining split(D^i),
construct a new commit with those remaining split(D^i) as parents.
Otherwise all remaining split(D^i) are the same,
and they are treesame to D, so discard: split(D) = split(D^i).
3. D is a subtree merge commit. split(D^1) is explicitly stated
in the git-subtree metadata. Calculate split(D^0) as above.
Then calculate split(D) according to point 2.
In fact, 0 and 1 are special cases of 2.
Do you think it would be worth me prototyping this? I think at least
for my case it would produce considerably fewer commits, but until I
try it that's just guesswork.
Ian.
--
Ian Jackson <ijackson@chiark.greenend.org.uk> These opinions are my own.
Pronouns: they/he. If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git subtree bugs (mishandled merges, recursion depth)
2026-04-16 14:31 ` Ian Jackson
@ 2026-04-17 4:14 ` Colin Stagner
0 siblings, 0 replies; 6+ messages in thread
From: Colin Stagner @ 2026-04-17 4:14 UTC (permalink / raw)
To: Ian Jackson; +Cc: git
On 4/16/26 09:31, Ian Jackson wrote:
> Colin Stagner writes ("Re: git subtree bugs (mishandled merges, recursion depth)"):
>> When merges come into play, the task of choosing which history is
>> "important" and which history is "not important" is not always clear-cut.
>
> I have some thoughts about this.
>
> I didn't find a formal description of git-subtree's data model, or how
> git subtree split works, precisely. So I'm going to make some
> suppositions.
>
> I observe that git-subtree split doesn't record any metadata in the
> split versions of the commits (for example, the downstream project
> commitid they were split from).
This would be helpful information to have, but git-subtree does not
record it in general.
`split --rejoin` mode *does* record the parent→split commitid mapping.
But this is only recorded in the input history, and only for a single
commit. --rejoin actually isn't as helpful as it may first appear.
Keep in mind that no part of history is fixed. Both the input history
and/or the split history might get rebased. If that happens, the
original input→split commitid mapping is either unhelpful or misleading.
> Repeated splits ought ideally not to constantly generate additional
> material. So the algorithm ought to be deterministic.
The algorithm is deterministic. It does need to reconstruct the complete
split history every time, which is inefficient.
The real issue is this line from the git-subtree manual page:
Repeated splits of exactly the same history are guaranteed
to be identical (i.e. to produce the same commit IDs) as
long as the settings passed to split (such as --annotate)
are the same.
which means that git-subtree SHOULDN'T change how splits work… but it
has anyway. There have been many fixes to subtree-split over the years
that have changed its behavior in history-incompatible ways. This
guarantee really hasn't held up.
We really need something committed, like a config file, that records:
1. how the user wants splits to happen *now*; and
2. how the old split history was split before
That way, we can introduce new functionality or approaches without
introducing breakage.
> An easy way to do that is to make splitting a pure
> function from downstream commits to subtree commits.
>
> If one can run git subtree split on every commit in the downstream
> that has a git subtree merge as an ancestor, then one might think that
> means the split must produce as many commits as there are in the
> downtream.
Part of the problem here may be that `subtree split` is not particularly
aware of `subtree merge`. All split sees is that the tree has a
different prefix. Changing this would introduce history-breakage, and it
would be good to make it opt-in.
You can also perform subtree merges without `git-subtree`. Merges done
this way don't record any information in the trailers.
> But we can map multiple downstream commits to the same subtree
> commit. Consider the cases, for some downstream commit D.
>
> 0. D is a single parent commit that *does* change the subtree.
> This becomes a new commit with parent split(D~).
>
> 1. D is a single parent commit that doesn't change the subtree:
> We reuse the parent's split: split(D) = split(D~)
I believe both (0) and (1) are current behavior.
> 2. D is a multi-parent commit. Determine \forall{i} split(D^i).
> Discard all split(D^i) which are ancestors of any split(D^j).
> If any remaining split(D^i) is not subtree-treesame D,
> or there is more than one remaining split(D^i),
> construct a new commit with those remaining split(D^i) as parents.
> Otherwise all remaining split(D^i) are the same,
> and they are treesame to D, so discard: split(D) = split(D^i).
The merge processing is where a lot of history-breaking changes have
occurred. There are probably lots of edge-cases to discover along the
way. I recommend drawing lots and lots of pictures.
> 3. D is a subtree merge commit. split(D^1) is explicitly stated
> in the git-subtree metadata. Calculate split(D^0) as above.
> Then calculate split(D) according to point 2.
In a git-subtree-merge, the merge always has two parents. The split
history is always on the second parent. The trailers are mostly useful
to tell you what the prefix was… but you should still check the commitid
to verify that no rebase has happened.
> Do you think it would be worth me prototyping this?
If it's worth it to you, then it's worth it. If it's good, others will
want it too.
I've tried lots of git workflows, including submodules and subtrees. But
I've found that a regular, plain `git merge`—with a "merge upwards"
workflow—is by far the fastest, easiest, and most flexible. Whenever
possible, I arrange projects so they can work this way.
I do use plenty of subtree merges to bring in dependencies. subtree
merges work great for this.
At least for now, a `subtree split` does not undo a `subtree merge`. If
I'm looking to submit changes I've made on top of a subtree merge, I use
format-patch
git format-patch --relative=my/subtree/prefix ...
to make patches which are layout-compatible with the original repo. Then
I `git am` the patches in the original repo as a topic branch. This
works well for my purposes.
I only use `split` to produce repos with read-only views of
subdirectories. This is mostly due to a project design quirk that is
beyond my control.
If I fork a project, I'll just fork the entire thing. No need to split it.
This is how I work, but you'll find plenty of other opinions out there.
Best of luck,
Colin
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-17 4:14 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-17 16:55 git subtree bugs (mishandled merges, recursion depth) Ian Jackson
2026-04-16 1:26 ` Colin Stagner
2026-04-16 14:31 ` Ian Jackson
2026-04-17 4:14 ` Colin Stagner
-- strict thread matches above, loose matches on Subject: below --
2024-07-17 16:49 Ian Jackson
2024-07-17 16:31 Ian Jackson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox