Git development
 help / color / mirror / Atom feed
From: Ian Jackson <ijackson@chiark.greenend.org.uk>
To: Colin Stagner <ask+git@howdoi.land>
Cc: git@vger.kernel.org
Subject: Re: git subtree bugs (mishandled merges, recursion depth)
Date: Thu, 16 Apr 2026 15:31:05 +0100	[thread overview]
Message-ID: <27104.62121.658449.222834@chiark.greenend.org.uk> (raw)
In-Reply-To: <e9611b58-3886-4f04-8f49-16d140ebfc15@howdoi.land>

Colin Stagner writes ("Re: git subtree bugs (mishandled merges, recursion depth)"):
> On 7/17/24 11:55, Ian Jackson wrote:
> > Actual behaviour (git 2.20.1, Debian ancient 1:2.20.1-2+deb10u9):
> > 
> >  Takes a very long time.  Everntually produces an output commit
> >  which has most of arti.git#main in its history.
> 
> Even with my patch series applied, there are many more than a "few dozen 
> commits" in the history. For me this splits as

Hi.  (For future reference, that patch series is
  [PATCH v2 0/3] contrib/subtree: reduce recursion during split
in the other thread.)

>      9a2422685e6cc05625f47a1fe709f1908f31fc87
> 
> with 12307 commits in the history graph.
> 
> The reason for this is likely e7b07376e5 (Merge branch 
> 'rs/subtree-fixes', 2018-10-26), which was merged around that time. 
> Previous versions discarded too much history, and that patch series 
> added more merge-base ancestry checks.
> 
> When merges come into play, the task of choosing which history is 
> "important" and which history is "not important" is not always clear-cut.

I have some thoughts about this.

I didn't find a formal description of git-subtree's data model, or how
git subtree split works, precisely.  So I'm going to make some
suppositions.

I observe that git-subtree split doesn't record any metadata in the
split versions of the commits (for example, the downstream project
commitid they were split from).

Repeated splits ought ideally not to constantly generate additional
material.  So the algorithm ought to be deterministic.  An easy way to
do that is to make splitting a pure function from downstream commits
to subtree commits.

If one can run git subtree split on every commit in the downstream
that has a git subtree merge as an ancestor, then one might think that
means the split must produce as many commits as there are in the
downtream.

But we can map multiple downstream commits to the same subtree
commit.  Consider the cases, for some downstream commit D.

 0. D is a single parent commit that *does* change the subtree.
    This becomes a new commit with parent split(D~).

 1. D is a single parent commit that doesn't change the subtree:
    We reuse the parent's split: split(D) = split(D~)

 2. D is a multi-parent commit.  Determine \forall{i} split(D^i).
    Discard all split(D^i) which are ancestors of any split(D^j).
    If any remaining split(D^i) is not subtree-treesame D,
    or there is more than one remaining split(D^i),
    construct a new commit with those remaining split(D^i) as parents.
    Otherwise all remaining split(D^i) are the same,
    and they are treesame to D, so discard: split(D) = split(D^i).

 3. D is a subtree merge commit.  split(D^1) is explicitly stated
    in the git-subtree metadata.  Calculate split(D^0) as above.
    Then calculate split(D) according to point 2.

In fact, 0 and 1 are special cases of 2.

Do you think it would be worth me prototyping this?  I think at least
for my case it would produce considerably fewer commits, but until I
try it that's just guesswork.

Ian.

-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

  reply	other threads:[~2026-04-16 14:31 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-17 16:55 git subtree bugs (mishandled merges, recursion depth) Ian Jackson
2026-04-16  1:26 ` Colin Stagner
2026-04-16 14:31   ` Ian Jackson [this message]
2026-04-17  4:14     ` Colin Stagner
  -- strict thread matches above, loose matches on Subject: below --
2024-07-17 16:49 Ian Jackson
2024-07-17 16:31 Ian Jackson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=27104.62121.658449.222834@chiark.greenend.org.uk \
    --to=ijackson@chiark.greenend.org.uk \
    --cc=ask+git@howdoi.land \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox