Git development
 help / color / mirror / Atom feed
* [Bug] Git subtree regression
@ 2025-12-26 19:58 dev
  2025-12-30 17:07 ` george
  0 siblings, 1 reply; 11+ messages in thread
From: dev @ 2025-12-26 19:58 UTC (permalink / raw)
  To: git

Thank you for filling out a Git bug report!
Please answer the following questions to help us understand your issue.

What did you do before the bug happened? (Steps to reproduce your issue)

I use git subtrees to manage the monorepo `https://github.com/athena-framework/athena`.
When using git 2.52.0, I can add a new remote for say the `clock` component via `git remote add clock git@github.com:athena-framework/clock.git`
Then do a `subtree push` via `git subtree push --prefix="src/components/clock" "clock" master`.

What did you expect to happen? (Expected behavior)

I expected it to work and say `Everything up-to-date`, because it is up to date.

What happened instead? (Actual behavior)

It fails because of:

```
To github.com:athena-framework/clock.git
 ! [rejected]        0efb3d9858e3bfee65165508aeeacc50417c9a99 -> master (non-fast-forward)
error: failed to push some refs to 'github.com:athena-framework/clock.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. If you want to integrate the remote changes,
hint: use 'git pull' before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
```

What's different between what you expected and what actually happened?

Seems to be a regression of https://github.com/git/git/commit/83f9dad7d6fb5988b68f80b25bd87c68693195dd as it used to work and now it doesn't.

Anything else you want to add:

I did some initial exploration and it might have something to do with the `clock` component originally being added via `git subtree add --squash`.
For another component:

- git 2.51.1: split produces 92 commits, properly connected to original repo history
- git 2.52.0: split produces 8 commits, disconnected history with a new root

The `git-subtree-split:` marker in the squash commit body doesn't seem to be honored in 2.52.0.

Please review the rest of the bug report below.
You can delete any lines you don't wish to share.


[System Info]
git version:
git version 2.52.0
cpu: x86_64
built from commit: 9a2fb147f2c61d0cab52c883e7e26f5b7948e3ed
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
rust: enabled
libcurl: 8.17.0
OpenSSL: OpenSSL 3.6.0 1 Oct 2025
zlib-ng: 2.2.5
SHA-1: SHA1_DC
SHA-256: SHA256_BLK
default-ref-format: files
default-hash: sha1
uname: Linux 6.18.2-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 18 Dec 2025 18:00:18 +0000 x86_64
compiler info: gnuc: 15.2
libc info: glibc: 2.42
$SHELL (typically, interactive shell): /bin/bash



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug] Git subtree regression
  2025-12-26 19:58 [Bug] Git subtree regression dev
@ 2025-12-30 17:07 ` george
  2026-01-04  4:52   ` Colin Stagner
  0 siblings, 1 reply; 11+ messages in thread
From: george @ 2025-12-30 17:07 UTC (permalink / raw)
  To: george; +Cc: git

---
I explored this more and think I found the root cause.
Commit `83f9dad7d6fb5988b68f80b25bd87c68693195dd` changed `should_ignore_subtree_split_commit()` to examine only a commit's own trailers via `git show --format='%(trailers:...)'`.
The old code used `git log -1 --grep=...` which had the important side effect of searching through ancestor commits.

In a multi-subtree monorepo with this topology:

```
  main:    A---B---M---E    (B = subtree add --squash for subA)
                  /
  feature:   C---D          (D = subtree add --squash for subB)
```

When splitting `subA`, commits `C` and `D` from the feature branch should be **ignored** because they belong to a branch that only contains `subB`, not `subA`.

## Old behavior (2.51.1)

`git log -1 --grep="git-subtree-dir:"` on commit `C` would traverse ancestors and find `D`'s subtree markers for `subB`, correctly identifying `C` as belonging to another subtree's branch.

## New behavior (2.52.0)

`git show` on commit `C` finds no trailers (regular commits don't have them), so `C` is **not** ignored.
This breaks the split because both parents of the merge `M` are processed, but `C` has no cache entry, leading to disconnected history.
Thus, split operations produce fewer commits than expected with broken parent chains, breaking push/pull workflows to upstream subtree repositories.

## Reproduction

This can be reproduced via my monorepo: https://github.com/athena-framework/athena

```bash
# 2.51.1 produces correct result (matches the number of commits in the `athena-framework/clock` repo)
$ git subtree split --prefix="src/components/clock"
4ee66f8198b2532110b75a36575e363ccccff47e  # 20 commits, connected to remote

# 2.52.0 produces broken result
$ git subtree split --prefix="src/components/clock"
0efb3d9858e3bfee65165508aeeacc50417c9a99  # 7 commits, disconnected

# The commits have identical trees but different parents:
$ git cat-file -p 4ee66f8198b2532110b75a36575e363ccccff47e
tree 8333b0cbb2a10528f8c803812af7a8e603e70367
parent d72f22f28ca5ed57ef3c2df74f0abd5569ac5934  # Connected to 19-commit history

$ git cat-file -p 0efb3d9858e3bfee65165508aeeacc50417c9a99
tree 8333b0cbb2a10528f8c803812af7a8e603e70367
parent 81c5dbe70ce26a7758fbe7f87b3ce0704043cfb1  # Only 6 commits, disconnected
```

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug] Git subtree regression
  2025-12-30 17:07 ` george
@ 2026-01-04  4:52   ` Colin Stagner
  2026-01-04 14:27     ` george
  0 siblings, 1 reply; 11+ messages in thread
From: Colin Stagner @ 2026-01-04  4:52 UTC (permalink / raw)
  To: george; +Cc: git

Hello, George!

Thanks for looking in to this.

On 12/30/25 11:07, george@mail.dietrich.pub wrote:
> ---
> I explored this more and think I found the root cause.
> Commit `83f9dad7d6fb5988b68f80b25bd87c68693195dd` changed `should_ignore_subtree_split_commit()` to examine only a commit's own trailers via `git show --format='%(trailers:...)'`.
> The old code used `git log -1 --grep=...` which had the important side effect of searching through ancestor commits.

The old `--grep=...` approach was introduced as a performance speedup 
for large splits. I don't believe the original author intended to alter 
the split result, but the old approach inadvertently did in some cases.


> # 2.52.0 produces broken result
> $ git subtree split --prefix="src/components/clock"
> 0efb3d9858e3bfee65165508aeeacc50417c9a99  # 7 commits,

On v2.52.0 on my machine, I get an error instead:

      fatal: could not rev-parse split hash 
d0ed70566b3e962fbff71145d8155986b48c6885 from commit 
5817d4435bf448f526c3b0049f00e6500277e4bb

I presume I need more history than just master to make this work.

Can you test this split command in git v2.43.7? This is before 
`should_ignore_subtree_split_commit()` was introduced.

I'd like to distill this down into a minimum working example that 
doesn't depend on an external repo like athena. Namely, some shell 
instructions that start from an empty `git init` and create a repo with 
the bug condition. That way, we know exactly and narrowly what sort of 
history graph produces the bug. I think I have almost enough information 
here to do that, but you're welcome to try writing an MWE yourself.

> In a multi-subtree monorepo with this topology:
> 
> ```
>    main:    A---B---M---E    (B = subtree add --squash for subA)
>                    /
>    feature:   C---D          (D = subtree add --squash for subB)
> ```

Just to verify: in this example, is commit M a normal merge commit? Or 
is it also created with subtree?

Colin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug] Git subtree regression
  2026-01-04  4:52   ` Colin Stagner
@ 2026-01-04 14:27     ` george
  2026-01-05  3:36       ` Colin Stagner
  0 siblings, 1 reply; 11+ messages in thread
From: george @ 2026-01-04 14:27 UTC (permalink / raw)
  To: ask+git; +Cc: george, git

---

Ahh, yes. It seems you also need to add `clock` as a remote and fetch it:
```
$ git clone git@github.com:athena-framework/athena.git
$ cd athena
$ git remote add clock git@github.com:athena-framework/clock.git
$ git fetch clock
$ git subtree split --prefix="src/components/clock"
0efb3d9858e3bfee65165508aeeacc50417c9a99
```

I wasn't able to use _exactly_ 2.43.7, but I was able to use 2.43.0 which would still be before that other change.
It also produced the expected commit hash, unlike 2.52.0.
It, also was significantly faster, took ~9s vs 2.51.1 which was ~26s.
2.52.0 was better at ~14s, but of course produces the wrong hash.

This reproduces the issue quite well, and what the root cause likely.
It does seem one component was added differently, as a non-merge commit, which seems break things.
Looking at the Athena monorepo, this can somewhat be confirmed via https://github.com/athena-framework/athena/commits/master/?after=ee21a41e9dfc969e759b532d45c0c0faa21876d6+0.
How the first two commits show up as verified, unlike the other times when I normally do `git subtree add --squash` and push directly to main, they show up as unverified.

```
#!/bin/bash
#
# THE BUG:
# When a commit's direct parent is a squash commit for a DIFFERENT subtree,
# and that squash commit's ancestry includes OUR subtree's squash commit,
# the split breaks.
#
# Old code: `git log -1 --grep` searches ancestry, finds our marker → don't ignore
# New code: only checks parent's own trailers → ignores → breaks parent chain
#
# This pattern occurs when subtree squash commits are cherry-picked or rebased
# into a linear history (instead of the normal merge structure).

set -e

KEEP_TMPDIR="${KEEP_TMPDIR:-}"

TMPDIR=$(mktemp -d)
echo "Working directory: $TMPDIR"

cleanup() {
    if [ -n "$KEEP_TMPDIR" ]; then
        echo "Preserving temp directory: $TMPDIR"
    else
        rm -rf "$TMPDIR"
    fi
}
trap cleanup EXIT

create_repo() {
    local repo="$1"
    git init -b main "$repo"
    git -C "$repo" config user.email "test@test.com"
    git -C "$repo" config user.name "Test User"
    git -C "$repo" config log.date relative
}

create_commit() {
    local repo="$1"
    local name="$2"
    (
        cd "$repo"
        mkdir -p "$(dirname "$name")"
        echo "$name" > "$name"
        git add "$name"
        git commit -m "$name"
    )
}

cd "$TMPDIR"

echo "=== Creating repositories ==="

create_repo monorepo
create_repo subA
create_repo subB

echo "=== Creating upstream commits ==="

create_commit subA subA1
create_commit subA subA2
create_commit subB subB1
create_commit subB subB2

echo "=== Setting up monorepo with linear squash structure ==="

# Initial commit
create_commit monorepo main1

# Add subA with --squash (normal way - creates merge)
git -C monorepo fetch ../subA HEAD
git -C monorepo subtree add --prefix=subA --squash FETCH_HEAD

# Make a change in subA
create_commit monorepo subA/change1

# Now we simulate cherry-picking JUST the squash commit for subB
# (This is what seems to have happened in the athena repo)
# First, get subB ready
git -C monorepo fetch ../subB HEAD

# Create a LINEAR squash commit for subB (simulating cherry-pick of just the squash commit)
# This is the key pattern that triggers the bug - a squash commit as a regular linear commit
(
    cd monorepo
    mkdir -p subB
    git -C ../subB archive HEAD | tar -x -C subB
    git add subB
    # Create a squash-style commit with subtree trailers but as a LINEAR commit
    # Trailers must be in the last paragraph, separated by blank line
    subB_short=$(git -C ../subB rev-parse --short HEAD)
    subB_full=$(git -C ../subB rev-parse HEAD)
    git commit -F - <<EOF
Squashed 'subB/' content from commit $subB_short
git-subtree-dir: subB
git-subtree-split: $subB_full
EOF
)

echo ""
echo "=== Key structure: subB squash is a LINEAR commit, not a merge ==="
git -C monorepo log -1 --format='%H %s' HEAD
echo "Parent count: $(git -C monorepo cat-file -p HEAD | grep -c '^parent')"

# Now make a commit that touches subA
# This commit's parent is the subB squash commit (linear)
create_commit monorepo subA/change2

echo ""
echo "=== Repository structure ==="
git -C monorepo log --oneline --graph

# Verify the squash commit's ancestry includes subA's marker
subB_squash=$(git -C monorepo rev-parse HEAD^)
echo ""
echo "=== Checking ancestry of subB squash commit ($subB_squash) ==="
echo "Looking for subA marker in ancestry..."
if git -C monorepo log -1 --grep="git-subtree-dir: subA" "$subB_squash" --oneline 2>/dev/null; then
    echo "  FOUND - old code would search this and NOT ignore"
else
    echo "  NOT FOUND - test setup may be incomplete"
fi

echo ""
echo "=== Running subtree split on subA ==="

split_hash=$(git -C monorepo subtree split --prefix=subA 2>/dev/null)
echo "Split hash: $split_hash"

split_count=$(git -C monorepo rev-list --count "$split_hash")
echo "Commits in split: $split_count"

echo ""
echo "=== Split history ==="
git -C monorepo log --oneline "$split_hash"

echo ""
echo "=== Result ==="

# Expected: 4 commits (2 upstream + 2 local changes)
if [ "$split_count" -ge 4 ]; then
    echo "PASS: Split produced connected history ($split_count commits)"
    exit 0
else
    echo "FAIL: Split produced disconnected history (only $split_count commits, expected >= 4)"
    exit 1
fi
```

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug] Git subtree regression
  2026-01-04 14:27     ` george
@ 2026-01-05  3:36       ` Colin Stagner
  2026-01-06  4:55         ` george
  0 siblings, 1 reply; 11+ messages in thread
From: Colin Stagner @ 2026-01-05  3:36 UTC (permalink / raw)
  To: george; +Cc: git

On 1/4/26 08:27, george@mail.dietrich.pub wrote:

> It does seem one component was added differently, as a non-merge commit, which seems break things.

> ```
> # Create a LINEAR squash commit for subB (simulating cherry-pick of just the squash commit)
> # This is the key pattern that triggers the bug - a squash commit as a regular linear commit
> (
>      cd monorepo
>      mkdir -p subB
>      git -C ../subB archive HEAD | tar -x -C subB
>      git add subB
>      # Create a squash-style commit with subtree trailers but as a LINEAR commit
>      # Trailers must be in the last paragraph, separated by blank line
>      subB_short=$(git -C ../subB rev-parse --short HEAD)
>      subB_full=$(git -C ../subB rev-parse HEAD)
>      git commit -F - <<EOF
> Squashed 'subB/' content from commit $subB_short
> git-subtree-dir: subB
> git-subtree-split: $subB_full
> EOF
> )
> ```

Yes, this is very likely to cause breakage.

Normally,

     git subtree merge -P subA --squash

makes two commits, in this order:

1. Squashed 'subA/' content from commit f00...
2. Merge commit (1) as 'subA'

Commit 1 updates the subtree but does *not* rewrite paths. If you `git 
show` one, you will see that it has files like

     subA1
     subA2

and *not* subA/subA1.

The path rewrite actually takes place in Commit 2 (the merge), via the 
`-Xsubtree` merge strategy option.

`should_ignore_subtree_split_commit` tries to search for commits like 
(1), which all have the `git-subtree-*` trailer. Normally, these commits 
either have:

* no parents, if they result from a new `git subtree add --squash`; OR

* only parents which are also "Squashed 'subA/' content," if
   they result from a follow-up `git subtree merge --squash`

We can safely ignore these commits—and all of their parents—during a 
`subtree split` if they belong to a different subtree.

Of course, that heuristic doesn't work if the commit has been rebased 
onto other unrelated history—which is what happened in your repo.

I suspect the best way out may be to remove the 
`should_ignore_subtree_split_commit` heuristic entirely. It is mostly 
useful for repos that use `split --rejoin` a lot, and the check itself 
is slow. WDYT?


> How the first two commits show up as verified, unlike the other times when I normally do `git subtree add --squash` and push directly to main, they show up as unverified.

git v2.51.0 also adds --gpg-sign compatibility to subtree. Perhaps this 
is what you are seeing?


> It seems you also need to add `clock` as a remote and fetch it:

Ah, thanks.

Personally, I'm a big advocate for the monorepo layout. In my 
experience, it makes almost every task easier and faster.

Colin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug] Git subtree regression
  2026-01-05  3:36       ` Colin Stagner
@ 2026-01-06  4:55         ` george
  2026-01-10  1:25           ` Colin Stagner
  0 siblings, 1 reply; 11+ messages in thread
From: george @ 2026-01-06  4:55 UTC (permalink / raw)
  To: ask+git; +Cc: george, git

> I suspect the best way out may be to remove the `should_ignore_subtree_split_commit` heuristic entirely.
> It is mostly useful for repos that use `split --rejoin` a lot, and the check itself is slow. WDYT?

I think this sound reasonable yea. Should be sure to include a spec to capture this case going forward too ofc.
This would at least fix the breaking change, and some other heuristic could be applied in the future.

> git v2.51.0 also adds --gpg-sign compatibility to subtree.
> Perhaps this is what you are seeing?

Hmm, I don't think so as I'm just learning about this now.
Will give it a shot next time I have to add another component tho!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug] Git subtree regression
  2026-01-06  4:55         ` george
@ 2026-01-10  1:25           ` Colin Stagner
  2026-01-10 17:22             ` george
  0 siblings, 1 reply; 11+ messages in thread
From: Colin Stagner @ 2026-01-10  1:25 UTC (permalink / raw)
  To: george; +Cc: git

George,

Can you have a look at the patch in 
<20260110011811.788219-1-ask+git@howdoi.land> and see if it solves this 
issue?

Colin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug] Git subtree regression
  2026-01-10  1:25           ` Colin Stagner
@ 2026-01-10 17:22             ` george
  2026-02-15 20:36               ` Colin Stagner
  0 siblings, 1 reply; 11+ messages in thread
From: george @ 2026-01-10 17:22 UTC (permalink / raw)
  To: ask+git; +Cc: george, git

I did! Thank you so much! It seems it not only produces the correct commit hash but is also quite a bit more performant.

```sh
$ git --version
git version 2.51.1

$ time git subtree split --prefix="src/components/clock"
4ee66f8198b2532110b75a36575e363ccccff47e

real    0m32.971s
user    0m18.856s
sys     0m14.627s

$ git --version
git version 2.52.0

$ time git subtree split --prefix="src/components/clock"
0efb3d9858e3bfee65165508aeeacc50417c9a99

real    0m18.680s
user    0m7.698s
sys     0m12.842s

$ /home/george/dev/git/git/git --version
git version 2.52.0.408.gecb62f5599

$ time /home/george/dev/git/git/git subtree split --prefix="src/components/clock"
4ee66f8198b2532110b75a36575e363ccccff47e

real    0m10.816s
user    0m3.909s
sys     0m7.755s
```

Thanks again!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug] Git subtree regression
  2026-01-10 17:22             ` george
@ 2026-02-15 20:36               ` Colin Stagner
  2026-02-16 21:25                 ` D. Ben Knoble
  2026-02-18  4:29                 ` george
  0 siblings, 2 replies; 11+ messages in thread
From: Colin Stagner @ 2026-02-15 20:36 UTC (permalink / raw)
  To: george; +Cc: git

George,

My original patch for this issue introduced other regressions and needed 
to be reverted. I don't recommend using it.

Instead, can you take a look at:

  
https://lore.kernel.org/git/20260215201748.889866-1-ask+git@howdoi.land/

which removes the "ignore other splits" optimization altogether. After 
some research, I suspect that this optimization may not have enough 
information to work correctly and preserve history in all cases.

I'd also appreciate testing of

  
https://lore.kernel.org/git/20260215201748.889866-1-ask+git@howdoi.land/

which fixes a "recursion depth exceeded" bug on Debian/Ubuntu.

I've CC'd you on both of these patch series.

I have tested both of these on selected subdirectories of your athena 
repository. They seem to work. But I'd appreciate it if you could look 
at all the splits you normally do and see if the patches correctly 
preserve history for you.

Thanks,

Colin


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug] Git subtree regression
  2026-02-15 20:36               ` Colin Stagner
@ 2026-02-16 21:25                 ` D. Ben Knoble
  2026-02-18  4:29                 ` george
  1 sibling, 0 replies; 11+ messages in thread
From: D. Ben Knoble @ 2026-02-16 21:25 UTC (permalink / raw)
  To: Colin Stagner; +Cc: george, git

On Mon, Feb 16, 2026 at 3:26 PM Colin Stagner <ask+git@howdoi.land> wrote:
>
> George,
>
> My original patch for this issue introduced other regressions and needed
> to be reverted. I don't recommend using it.
>
> Instead, can you take a look at:
>
>
> https://lore.kernel.org/git/20260215201748.889866-1-ask+git@howdoi.land/
[snip]
> https://lore.kernel.org/git/20260215201748.889866-1-ask+git@howdoi.land/
>
> which fixes a "recursion depth exceeded" bug on Debian/Ubuntu.
>
> I've CC'd you on both of these patch series.

JFYI: looks like you pasted the same link twice ;)

-- 
D. Ben Knoble

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Bug] Git subtree regression
  2026-02-15 20:36               ` Colin Stagner
  2026-02-16 21:25                 ` D. Ben Knoble
@ 2026-02-18  4:29                 ` george
  1 sibling, 0 replies; 11+ messages in thread
From: george @ 2026-02-18  4:29 UTC (permalink / raw)
  To: ask+git; +Cc: george, git

Ahh bummer, thanks for the follow up. I'm on arch and don't seem to suffer from the same recursion limitation as debian/ubuntu do. Because of that I'll defer checking out that set of patches to someone else.

I did however checkout the other and can confirm it looks good! I went thru each of the components and asserted the split hash matches the latest commit on each of the related repos. Also asserted `subtree push` results in an `Everything up to date` message. Thanks again for all your work on this.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-02-18  4:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-26 19:58 [Bug] Git subtree regression dev
2025-12-30 17:07 ` george
2026-01-04  4:52   ` Colin Stagner
2026-01-04 14:27     ` george
2026-01-05  3:36       ` Colin Stagner
2026-01-06  4:55         ` george
2026-01-10  1:25           ` Colin Stagner
2026-01-10 17:22             ` george
2026-02-15 20:36               ` Colin Stagner
2026-02-16 21:25                 ` D. Ben Knoble
2026-02-18  4:29                 ` george

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox