[RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase
@ 2025-09-16 21:39 Antonio Mennillo
  2025-09-16 22:14 ` brian m. carlson
  0 siblings, 1 reply; 8+ messages in thread
From: Antonio Mennillo @ 2025-09-16 21:39 UTC (permalink / raw)
  To: git

Hi Git community,

I’m Antonio Mennillo, a self-taught developer (6 years). I’d like to ask for
technical feedback on a workflow issue I’ve observed during rebase, and share a
small userland tool I wrote to mitigate it.

Problem (observation, possibly a known limitation rather than a bug):
When rebasing feature branches whose commits are semantically interdependent,
Git replays commits one by one. In practice this can trigger a
cascading conflict, similar to a loop. Example:

 - Commit 1: add interface IUserService
 - Commit 6: add UserServiceImpl (depends on 1)
 - Commit 11: change IUserService signature
 - Commits 12–15: update implementation/tests to match

During rebase, conflicts may appear at 1 and again at 6/11, forcing the user to
remember prior resolutions and reconstruct intent across commits. If I’m
mischaracterizing the model, I’d appreciate a correction. I’m sharing this
humbly to verify whether this is expected behavior or if there is prior art I
should be aware of.

Mitigation (userland workflow): I built `git-rebase-clean`, which
squashes the feature branch first and then rebases. This concentrates
conflict resolution into a single atomic step with the full final
context visible. The obvious trade-off is commit history granularity:
you lose individual commits but gain atomic conflict resolution. In my
experience this reduces repeated/conflicting resolutions across
dependent commits.

Usage (data points):
- 5+ months in develop/UAT, used in production environment today the first time
- 30-40 feature branches processed
- High reduction in conflict-resolution time
- Only incidents observed: some IDEs (e.g., IntelliJ) occasionally
fail to load all conflict markers when many conflicts concentrate in
the single squashed commit, requiring fallback to command-line
resolution

Repository: https://github.com/anthem87/clean-rebase
License: GNU AGPL-3.0-or-later for v2.x (earlier tags remain MIT for
historical reasons)

Questions:
1) Is this "semantic dependency" pain point considered an expected
limitation of git rebase, or perhaps an inherent trade-off in the
design?
2) Would you see value in documenting this squash-then-rebase pattern
in Git's guidance (or `contrib/`), as one possible workaround?
3) Are there existing internals, extensions, or directions you
recommend for handling dependent commits more gracefully?

Context note (authorship & licensing):
There has been growing interest around this tool after real-world use
within my company. To avoid misunderstandings about authorship and
usage rights, I transitioned new releases to GNU AGPL-3.0 while older
tags remain MIT; my goal is to preserve clear attribution and set
appropriate terms. If this licensing discussion is out of scope for
the list, please disregard this note.

I’m happy to adapt the tool to community standards if there is interest.
Thank you for Git and for your time.

Best regards,
Antonio Mennillo
Naples, Italy

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase
  2025-09-16 21:39 [RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase Antonio Mennillo
@ 2025-09-16 22:14 ` brian m. carlson
  2025-09-18 17:02   ` Antonio Mennillo
  0 siblings, 1 reply; 8+ messages in thread
From: brian m. carlson @ 2025-09-16 22:14 UTC (permalink / raw)
  To: Antonio Mennillo; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1878 bytes --]

On 2025-09-16 at 21:39:46, Antonio Mennillo wrote:
> Hi Git community,

Hi,

> Problem (observation, possibly a known limitation rather than a bug):
> When rebasing feature branches whose commits are semantically interdependent,
> Git replays commits one by one. In practice this can trigger a
> cascading conflict, similar to a loop. Example:
> 
>  - Commit 1: add interface IUserService
>  - Commit 6: add UserServiceImpl (depends on 1)
>  - Commit 11: change IUserService signature
>  - Commits 12–15: update implementation/tests to match

Usually we would recommend that each commit be atomic.  That is, each
commit should compile and pass all of the tests, so commits 12–15 would
be part of commit 11.

> During rebase, conflicts may appear at 1 and again at 6/11, forcing the user to
> remember prior resolutions and reconstruct intent across commits. If I’m
> mischaracterizing the model, I’d appreciate a correction. I’m sharing this
> humbly to verify whether this is expected behavior or if there is prior art I
> should be aware of.

Are you familiar with `git rerere`?  I have just done a complicated
rebase of this sort and it remembers resolutions for you so you don't
have to.

> Mitigation (userland workflow): I built `git-rebase-clean`, which
> squashes the feature branch first and then rebases. This concentrates
> conflict resolution into a single atomic step with the full final
> context visible. The obvious trade-off is commit history granularity:
> you lose individual commits but gain atomic conflict resolution. In my
> experience this reduces repeated/conflicting resolutions across
> dependent commits.

I think most people on this list will consider losing the history
unacceptable, so I don't think this is a thing we'll want to encourage.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase
  2025-09-16 22:14 ` brian m. carlson
@ 2025-09-18 17:02   ` Antonio Mennillo
  2025-09-18 22:22     ` brian m. carlson
  0 siblings, 1 reply; 8+ messages in thread
From: Antonio Mennillo @ 2025-09-18 17:02 UTC (permalink / raw)
  To: git; +Cc: brian m. carlson

Hi Brian,

Thank you for the thoughtful response, and please excuse me if I am a
bit clumsy —
this is my first interaction with the Git community.

> Usually we would recommend that each commit be atomic...

I perfectly understand this best practice. In my context, working with
many junior developers,
it is not always possible to enforce atomic commits. In some
“emergency” situations we
squashed just to make progress, even if that meant losing granularity.

Your explanation really helped me see things more clearly, and I appreciated
the constructive (non-toxic) way you put it.

> Are you familiar with `git rerere`?...

Yes, rerere helps with repeated textual conflicts, but the scenario I
worry about
is semantic: if commit M changes an interface and commit N implements it, then
changing M during rebase can make N semantically invalid. rerere
cannot solve that
— it remembers markers, but cannot understand that “this
implementation no longer
matches the interface.”

> I think most people on this list will consider losing the history unacceptable...

That feedback inspired me to improve the tool. Version 2.0.0
(literally born today,
thanks to your comment) now works differently:

1. Temporarily squash the feature branch (so the rebase is effectively `1/1`)
2. Resolve conflicts once, with the full final context
3. Automatically re-expand the squash by replaying saved diffs, restoring
   the original commits (with new hashes, as expected in any rebase)

In integration tests this preserves detailed history while avoiding cascading
conflicts. All 23 tests currently pass (including empty commits, binary files,
special characters, conflict abort/continue). Of course, real-world
usage is still pending.

This workflow essentially resolves conflicts with full context, then
reconstructs
granular history — a kind of “best of both worlds.”

I would be very interested in whether the community sees this worth exploring
as an **experimental rebase strategy** (something like `git rebase
--squash-restore`),
or if it overlaps with existing mechanisms I am not aware of.

Code: https://github.com/anthem87/clean-rebase/tree/v2.0.0

Best regards,
Antonio
Antonio

Il giorno mer 17 set 2025 alle ore 00:14 brian m. carlson
<sandals@crustytoothpaste.net> ha scritto:
>
> On 2025-09-16 at 21:39:46, Antonio Mennillo wrote:
> > Hi Git community,
>
> Hi,
>
> > Problem (observation, possibly a known limitation rather than a bug):
> > When rebasing feature branches whose commits are semantically interdependent,
> > Git replays commits one by one. In practice this can trigger a
> > cascading conflict, similar to a loop. Example:
> >
> >  - Commit 1: add interface IUserService
> >  - Commit 6: add UserServiceImpl (depends on 1)
> >  - Commit 11: change IUserService signature
> >  - Commits 12–15: update implementation/tests to match
>
> Usually we would recommend that each commit be atomic.  That is, each
> commit should compile and pass all of the tests, so commits 12–15 would
> be part of commit 11.
>
> > During rebase, conflicts may appear at 1 and again at 6/11, forcing the user to
> > remember prior resolutions and reconstruct intent across commits. If I’m
> > mischaracterizing the model, I’d appreciate a correction. I’m sharing this
> > humbly to verify whether this is expected behavior or if there is prior art I
> > should be aware of.
>
> Are you familiar with `git rerere`?  I have just done a complicated
> rebase of this sort and it remembers resolutions for you so you don't
> have to.
>
> > Mitigation (userland workflow): I built `git-rebase-clean`, which
> > squashes the feature branch first and then rebases. This concentrates
> > conflict resolution into a single atomic step with the full final
> > context visible. The obvious trade-off is commit history granularity:
> > you lose individual commits but gain atomic conflict resolution. In my
> > experience this reduces repeated/conflicting resolutions across
> > dependent commits.
>
> I think most people on this list will consider losing the history
> unacceptable, so I don't think this is a thing we'll want to encourage.
> --
> brian m. carlson (they/them)
> Toronto, Ontario, CA

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase
  2025-09-18 17:02   ` Antonio Mennillo
@ 2025-09-18 22:22     ` brian m. carlson
  2025-09-19 21:14       ` Antonio Mennillo
  0 siblings, 1 reply; 8+ messages in thread
From: brian m. carlson @ 2025-09-18 22:22 UTC (permalink / raw)
  To: Antonio Mennillo; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 4820 bytes --]

On 2025-09-18 at 17:02:59, Antonio Mennillo wrote:
> Hi Brian,
> 
> Thank you for the thoughtful response, and please excuse me if I am a
> bit clumsy —
> this is my first interaction with the Git community.
> 
> > Usually we would recommend that each commit be atomic...
> 
> I perfectly understand this best practice. In my context, working with
> many junior developers,
> it is not always possible to enforce atomic commits. In some
> “emergency” situations we
> squashed just to make progress, even if that meant losing granularity.

Yeah, I've seen that at previous employers.  Even where I work now, I
definitely see differing levels of interest in atomic commits with nice
commit messages.

> Your explanation really helped me see things more clearly, and I appreciated
> the constructive (non-toxic) way you put it.

Well, thank you.

> Yes, rerere helps with repeated textual conflicts, but the scenario I
> worry about
> is semantic: if commit M changes an interface and commit N implements it, then
> changing M during rebase can make N semantically invalid. rerere
> cannot solve that
> — it remembers markers, but cannot understand that “this
> implementation no longer
> matches the interface.”

Yes, this is effectively unavoidable because Git doesn't know the
structure of your code or data.  For instance, in Rust, I could add a
new mandatory-to-implement method to a trait and then all
implementations would need an update.

> That feedback inspired me to improve the tool. Version 2.0.0
> (literally born today,
> thanks to your comment) now works differently:
> 
> 1. Temporarily squash the feature branch (so the rebase is effectively `1/1`)
> 2. Resolve conflicts once, with the full final context
> 3. Automatically re-expand the squash by replaying saved diffs, restoring
>    the original commits (with new hashes, as expected in any rebase)

You haven't described exactly how this works, but it looks like it
performs a fixed set of transforms (e.g., symbol renames).  That can be
useful, but it isn't sufficient in a lot of cases.

For instance, in some work on Git I'm doing, I need to include a header
file for a function.  It was previously included in another header, but
my rebase onto the main branch had it removed, so I need to add it into
the C file I'm working on.  The only way I can 100% know which commit
needs that change programmatically is to try to compile on each commit
until it fails, and insert it there.  This, of course, requires atomic
commits.

In a lot of cases, I use git-autofixup[0], which looks at changed lines
in the diff to determine which commit to squash the changes into. It's a
great idea, but it has limitations, since sometimes a change logically
attaches to a commit but isn't close to it in the context of the diff
(such as my header inclusion change above), and sometimes I've changed
similar lines in an earlier commit A (where my change should be
squashed) as well as in later commit B, yet it gets attributed to B
accidentally.  It's convenient and generally works well, but I have to
be careful not to trust it entirely.

> I would be very interested in whether the community sees this worth exploring
> as an **experimental rebase strategy** (something like `git rebase
> --squash-restore`),
> or if it overlaps with existing mechanisms I am not aware of.

I think the fact that it adds a dependency on tree-sitter is kind of a
non-starter for us.  We're currently having a mildly contentious
discussion over the use of Rust in Git and it won't be mandatory until
Git 3.0.

It certainly covers some use cases, but it looks like it only handles
deletions, file renames, and symbol renames, and the latter only for a
small set of file types.  There's a lot of cases that it doesn't handle,
and I'm afraid that people will end up disappointed by expecting it to
magically handle a lot more than it does.  That doesn't mean it isn't a
useful and valuable tool for many use cases, but users may have much
higher expectations that it will magically solve their problems when
that isn't the case.  I have some tools that I've written myself that
are like that, for instance.

I'll also point out that we try to avoid hard-coding known languages.
People use lots of languages with Git, some of which are not even known
to Vim, which has an exceptionally complete list of file types and
syntax highlighting.  People also do non-code things with Git; for
instance, I store my creative writing in a repository, and other people
use Git for working on documentation or technical books.  As much as
possible, we want tooling that works for at least generic text files and
ideally, all files.
-- 
brian m. carlson (they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase
  2025-09-18 22:22     ` brian m. carlson
@ 2025-09-19 21:14       ` Antonio Mennillo
  2025-09-20 19:13         ` Elijah Newren
  0 siblings, 1 reply; 8+ messages in thread
From: Antonio Mennillo @ 2025-09-19 21:14 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git

Hi Brian,

Thank you again for the feedback, it gave me a fundamental piece of
puzzle about Git's design constraints that I hadn't fully considered.

You're right about tree-sitter being inappropriate for Git core. That
approach is too narrow, covering only specific languages. My v2.0.0
already prioritizes Git's existing plumbing to track content changes
universally, with tree-sitter as an optional enhancement. This stays
content-agnostic and removes that which I can define as a source of
nondeterminism and heuristic complexity that it's elegant to avoid.

To summarize the mechanism:

1) Squash the feature branch into a single commit: temporarily
compressing the branch makes all internal semantic dependencies
visible at once, avoiding cascade conflicts during step-by-step rebase
with non-atomic commits.

2) Resolve conflicts once with full context: the semantic issues are
"suspended" in this unified state where everything is coherent.

3) Restore original commits by replaying saved diffs: using a
hash-based approach with git's native diff/apply machinery, the tool
remains file-type agnostic. The restored commits may still break
individually (e.g., won't compile in isolation), but the branch is now
rebased with granular history preserved.

To address your header file example specifically: when common.h no
longer includes stdlib.h after rebasing onto main, you would indeed
need to add #include <stdlib.h> during conflict resolution. With my
tool, you'd add this include once while resolving the squashed
commit's conflicts. The fix then persists through all restored commits
- so instead of potentially fixing the same missing header multiple
times across commits during a normal rebase, you handle it once.
You're correct that the tool cannot determine which specific commit
should ideally contain this fix (that would require compilation
testing), but it prevents the cascading resolution burden while
preserving the granular history.

The key point: this doesn't fix non-atomic commits. The semantic
problems remain if you examine individual commits. But it prevents
these issues from causing redundant conflict resolution during rebase
by compressing the timeline of the branch in a single instant "t"
where the semantic non-atomic misalignment appears simultaneously.

The tool trades ideal per-commit correctness for efficiency in
real-world scenarios where atomic commits aren't always achievable
(junior developers, emergency fixes, legacy codebases).

Regarding expectations: documentation will be clear that this only
reduces redundant conflict resolution. It doesn't make commits atomic
or understand semantics. It manages semantic inconsistency by
compression, intended as reduced dimensionality. The defects remain in
history but no longer interfere with rebasing.

I apologize if the code evolution seems rushed - v2.0.0 was developed
quickly to establish a working implementation of the core concept. For
v3.0.0, I plan to remove tree-sitter entirely to achieve complete
language-agnosticism. I'll also need to properly evaluate edge cases
like submodules and merge commits through integration testing, though
the squash operation should hopefully be quite independent from these
structures. My testing focus will be on ensuring determinism, proper
rollback capabilities, and full idempotency across all supported
scenarios.

The current implementation is available at
https://github.com/anthem87/clean-rebase/tree/v2.0.0 if you'd like to
explore the approach.

Thanks again for the patience and the attention.

Best regards,
Antonio

Il giorno ven 19 set 2025 alle ore 00:22 brian m. carlson
<sandals@crustytoothpaste.net> ha scritto:
>
> On 2025-09-18 at 17:02:59, Antonio Mennillo wrote:
> > Hi Brian,
> >
> > Thank you for the thoughtful response, and please excuse me if I am a
> > bit clumsy —
> > this is my first interaction with the Git community.
> >
> > > Usually we would recommend that each commit be atomic...
> >
> > I perfectly understand this best practice. In my context, working with
> > many junior developers,
> > it is not always possible to enforce atomic commits. In some
> > “emergency” situations we
> > squashed just to make progress, even if that meant losing granularity.
>
> Yeah, I've seen that at previous employers.  Even where I work now, I
> definitely see differing levels of interest in atomic commits with nice
> commit messages.
>
> > Your explanation really helped me see things more clearly, and I appreciated
> > the constructive (non-toxic) way you put it.
>
> Well, thank you.
>
> > Yes, rerere helps with repeated textual conflicts, but the scenario I
> > worry about
> > is semantic: if commit M changes an interface and commit N implements it, then
> > changing M during rebase can make N semantically invalid. rerere
> > cannot solve that
> > — it remembers markers, but cannot understand that “this
> > implementation no longer
> > matches the interface.”
>
> Yes, this is effectively unavoidable because Git doesn't know the
> structure of your code or data.  For instance, in Rust, I could add a
> new mandatory-to-implement method to a trait and then all
> implementations would need an update.
>
> > That feedback inspired me to improve the tool. Version 2.0.0
> > (literally born today,
> > thanks to your comment) now works differently:
> >
> > 1. Temporarily squash the feature branch (so the rebase is effectively `1/1`)
> > 2. Resolve conflicts once, with the full final context
> > 3. Automatically re-expand the squash by replaying saved diffs, restoring
> >    the original commits (with new hashes, as expected in any rebase)
>
> You haven't described exactly how this works, but it looks like it
> performs a fixed set of transforms (e.g., symbol renames).  That can be
> useful, but it isn't sufficient in a lot of cases.
>
> For instance, in some work on Git I'm doing, I need to include a header
> file for a function.  It was previously included in another header, but
> my rebase onto the main branch had it removed, so I need to add it into
> the C file I'm working on.  The only way I can 100% know which commit
> needs that change programmatically is to try to compile on each commit
> until it fails, and insert it there.  This, of course, requires atomic
> commits.
>
> In a lot of cases, I use git-autofixup[0], which looks at changed lines
> in the diff to determine which commit to squash the changes into. It's a
> great idea, but it has limitations, since sometimes a change logically
> attaches to a commit but isn't close to it in the context of the diff
> (such as my header inclusion change above), and sometimes I've changed
> similar lines in an earlier commit A (where my change should be
> squashed) as well as in later commit B, yet it gets attributed to B
> accidentally.  It's convenient and generally works well, but I have to
> be careful not to trust it entirely.
>
> > I would be very interested in whether the community sees this worth exploring
> > as an **experimental rebase strategy** (something like `git rebase
> > --squash-restore`),
> > or if it overlaps with existing mechanisms I am not aware of.
>
> I think the fact that it adds a dependency on tree-sitter is kind of a
> non-starter for us.  We're currently having a mildly contentious
> discussion over the use of Rust in Git and it won't be mandatory until
> Git 3.0.
>
> It certainly covers some use cases, but it looks like it only handles
> deletions, file renames, and symbol renames, and the latter only for a
> small set of file types.  There's a lot of cases that it doesn't handle,
> and I'm afraid that people will end up disappointed by expecting it to
> magically handle a lot more than it does.  That doesn't mean it isn't a
> useful and valuable tool for many use cases, but users may have much
> higher expectations that it will magically solve their problems when
> that isn't the case.  I have some tools that I've written myself that
> are like that, for instance.
>
> I'll also point out that we try to avoid hard-coding known languages.
> People use lots of languages with Git, some of which are not even known
> to Vim, which has an exceptionally complete list of file types and
> syntax highlighting.  People also do non-code things with Git; for
> instance, I store my creative writing in a repository, and other people
> use Git for working on documentation or technical books.  As much as
> possible, we want tooling that works for at least generic text files and
> ideally, all files.
> --
> brian m. carlson (they/them)
> Toronto, Ontario, CA

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase
  2025-09-19 21:14       ` Antonio Mennillo
@ 2025-09-20 19:13         ` Elijah Newren
  2025-09-21 14:06           ` Antonio Mennillo
  0 siblings, 1 reply; 8+ messages in thread
From: Elijah Newren @ 2025-09-20 19:13 UTC (permalink / raw)
  To: Antonio Mennillo; +Cc: brian m. carlson, git

Hi,

On Fri, Sep 19, 2025 at 2:17 PM Antonio Mennillo
<antoniomennillo87@gmail.com> wrote:
>
> Hi Brian,
>
> Thank you again for the feedback, it gave me a fundamental piece of
> puzzle about Git's design constraints that I hadn't fully considered.
>
> You're right about tree-sitter being inappropriate for Git core. That
> approach is too narrow, covering only specific languages. My v2.0.0
> already prioritizes Git's existing plumbing to track content changes
> universally, with tree-sitter as an optional enhancement. This stays
> content-agnostic and removes that which I can define as a source of
> nondeterminism and heuristic complexity that it's elegant to avoid.
>
> To summarize the mechanism:
>
> 1) Squash the feature branch into a single commit: temporarily
> compressing the branch makes all internal semantic dependencies
> visible at once, avoiding cascade conflicts during step-by-step rebase
> with non-atomic commits.
>
> 2) Resolve conflicts once with full context: the semantic issues are
> "suspended" in this unified state where everything is coherent.
>
> 3) Restore original commits by replaying saved diffs: using a
> hash-based approach with git's native diff/apply machinery, the tool
> remains file-type agnostic. The restored commits may still break
> individually (e.g., won't compile in isolation), but the branch is now
> rebased with granular history preserved.

I don't understand how this is possible, other than in trivial cases.
Lots of questions:

How exactly is replaying the saved diffs supposed to work?  Replaying
those diffs would run into conflicts, otherwise you wouldn't have been
trying to search for a modified strategy in the first place.  How do
you resolve those?  Are you assuming all conflicts are orthogonal,
i.e. that the conflicts from patches A and B are never on overlapping
or even nearby lines to each other?  In such a case, there wouldn't be
any "cascade of conflicts" to resolve anyway, so the approach would be
unnecessary.  And if this isn't true, then I don't see how you can
resolve all the conflicts at once and magically split them out into
separate commits and replay things in a way that doesn't have
conflicts -- at least not without further human intervention.

And that was focusing on content conflicts.  What do you do with
non-content conflicts, such as file mode differences, file/directory
conflicts, symlink/submodule conflicts, rename conflict, directory
rename conflict, etc.?

But going back to content conflicts, are you somehow sidestepping the
need to assume orthogonality?  Perhaps when you attempt to replay and
hit a conflict, you just take the end-result for that hunk range for
the first patch that touches it, and for any other patch that touches
that hunk range you simply discard the changes for those since they've
been squashed into the first patch?  Or something similar?

As another way to look at this problem, what about cases where
individual commits being replayed will conflict, but merging the end
result has no conflicts?  (In other words, only the intermediate
states had a conflict of some type -- a simple example of this is
someone making a change and then shortly later reverting it.)  In such
a case, by resolving conflicts only at the end, there was no
resolution of the conflicts experienced by those intermediate states.
What do you do there?

What about cases like
https://git-scm.com/docs/gitfaq#Documentation/gitfaq.txt-IfImakeachangeontwobranchesbutrevertitononewhydoesthemergeofthosebranchesincludethechange
(both sides of history make a change, and one side also reverts that
change) -- by merging the squashed result, you'd end up with the
change being applied, but when you try to split it out into separate
commits, you have "apply the change" followed by "revert the change"
-- how does that result in the change being applied in the end result?

Are there cases where your algorithm feels indistinguishable from
squashing all the commits, having the user resolve conflicts, and then
randomly assigning the hunks to a completely new set of commits
(beyond the obvious differences in commit messages and perhaps number
of commits)?  Every algorithm I can think of for automatically playing
existing commits while only resolving conflicts at the end would seem
to sometimes suffer from this kind of downside, but it's entirely
possible I'm only thinking of a few brute force algorithms and I'm
missing something clever that you might be doing.

> To address your header file example specifically: when common.h no
> longer includes stdlib.h after rebasing onto main, you would indeed
> need to add #include <stdlib.h> during conflict resolution. With my
> tool, you'd add this include once while resolving the squashed
> commit's conflicts. The fix then persists through all restored commits
> - so instead of potentially fixing the same missing header multiple
> times across commits during a normal rebase, you handle it once.
> You're correct that the tool cannot determine which specific commit
> should ideally contain this fix (that would require compilation
> testing), but it prevents the cascading resolution burden while
> preserving the granular history.

I think there may have been a misunderstanding here.  The purpose
wasn't granular history, it was meaningful logical commits; those tend
to come in granular sizes, but granularity was more of a side-effect
rather than the purpose. Having a disfigured granular history is quite
different than meaningful logical commits.

> The key point: this doesn't fix non-atomic commits. The semantic
> problems remain if you examine individual commits. But it prevents
> these issues from causing redundant conflict resolution during rebase
> by compressing the timeline of the branch in a single instant "t"
> where the semantic non-atomic misalignment appears simultaneously.
>
> The tool trades ideal per-commit correctness for efficiency in
> real-world scenarios where atomic commits aren't always achievable
> (junior developers, emergency fixes, legacy codebases).

I'm pretty biased here, but let me take exception with your wording
that these are not achievable cases.  I'm going to push back on that,
rather strongly.  In fact, I'd go further and say that legacy
codebases are where logical small commits make the *most* sense
(sometimes with small new projects the "why" and the "what" you're
doing is obvious and so there's not much to write in a commit message,
and anyone can follow along, whereas in a legacy codebase you need to
carefully document why you are making changes and ensure each change
in isolation makes sense so others can figure out why each line of
code does what it does).  In regards to junio developers, letting
junior developers develop bad habits (and even facilitating such) just
leads to them later being senior developers with bad habits; I think
it'd be better to take the time to coach them on proper practices
instead.  And in the remaining case, what you describe sounds to me
like a rushed emergency fix, and I feel like rushed emergency fixes
result in more emergencies later.

I understand that many are reluctant to put in the effort to make
clear, logical commits, but I feel that each of your cases are
examples of unwillingness or lack of training, not an actual
inability.

As noted above, I'm quite biased here, and would rather encourage
people to make clear, logical, easy-to-follow-and-review-years-later
steps that are documented as commits.  I think tools which encourage
or facilitate doing the opposite are things that should be shunned.
(I hate squash-merging in GitHub for this reason, and GitHub's lack of
focus on treating commit messages as important first-class material
for review has long bothered me deeply, as I've commented elsewhere.)
Others are free to disagree with me and pursue their own path, but
that's my position.

> Regarding expectations: documentation will be clear that this only
> reduces redundant conflict resolution. It doesn't make commits atomic
> or understand semantics. It manages semantic inconsistency by
> compression, intended as reduced dimensionality. The defects remain in
> history but no longer interfere with rebasing.
>
> I apologize if the code evolution seems rushed - v2.0.0 was developed
> quickly to establish a working implementation of the core concept. For
> v3.0.0, I plan to remove tree-sitter entirely to achieve complete
> language-agnosticism. I'll also need to properly evaluate edge cases
> like submodules and merge commits through integration testing, though
> the squash operation should hopefully be quite independent from these
> structures. My testing focus will be on ensuring determinism, proper
> rollback capabilities, and full idempotency across all supported
> scenarios.
>
> The current implementation is available at
> https://github.com/anthem87/clean-rebase/tree/v2.0.0 if you'd like to
> explore the approach.
>
> Thanks again for the patience and the attention.

I hope my email doesn't come across as too negative.  I'm a big fan of
exploring and attempting new things.  While I am biased against tools
that work against logical & bisectable changes, and I'm worried your
tool is going to face an awful lot of problematic cases if you
continue with it, hopefully my email provides more context for you and
alerts you to some cases you might need to investigate if you want to
push the tool further.

Even if I am not in the camp that'd be interested in using your tool,
I am very curious if you have clever ideas for any of the special
cases I listed above and how to handle them; sometimes any such ideas
might have ways of being wielded elsewhere.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase
  2025-09-20 19:13         ` Elijah Newren
@ 2025-09-21 14:06           ` Antonio Mennillo
  2025-09-23 23:33             ` Antonio Mennillo
  0 siblings, 1 reply; 8+ messages in thread
From: Antonio Mennillo @ 2025-09-21 14:06 UTC (permalink / raw)
  To: Elijah Newren; +Cc: brian m. carlson, git

Hi,

Sorry for the duplicate earlier, I realized my email was rejected by
the mailing list for being HTML, I have no pc available now. Resending
in plain text

Thank you again for your detailed and thoughtful response. I apologize
for my delayed reply. I'm currently traveling and away from my
development environment, which is making it difficult to properly
address all the important technical points you've raised.

I don't find your emails as negative, quite the opposite, I repeat:
it's precious insight and experience sharing.

Please expect my detailed response by Wednesday at the latest. I
appreciate your patience and the time you've taken to engage so deeply
with this proposal.

Best regards,
Antonio


Il giorno sab 20 set 2025 alle ore 21:14 Elijah Newren
<newren@gmail.com> ha scritto:
>
> Hi,
>
> On Fri, Sep 19, 2025 at 2:17 PM Antonio Mennillo
> <antoniomennillo87@gmail.com> wrote:
> >
> > Hi Brian,
> >
> > Thank you again for the feedback, it gave me a fundamental piece of
> > puzzle about Git's design constraints that I hadn't fully considered.
> >
> > You're right about tree-sitter being inappropriate for Git core. That
> > approach is too narrow, covering only specific languages. My v2.0.0
> > already prioritizes Git's existing plumbing to track content changes
> > universally, with tree-sitter as an optional enhancement. This stays
> > content-agnostic and removes that which I can define as a source of
> > nondeterminism and heuristic complexity that it's elegant to avoid.
> >
> > To summarize the mechanism:
> >
> > 1) Squash the feature branch into a single commit: temporarily
> > compressing the branch makes all internal semantic dependencies
> > visible at once, avoiding cascade conflicts during step-by-step rebase
> > with non-atomic commits.
> >
> > 2) Resolve conflicts once with full context: the semantic issues are
> > "suspended" in this unified state where everything is coherent.
> >
> > 3) Restore original commits by replaying saved diffs: using a
> > hash-based approach with git's native diff/apply machinery, the tool
> > remains file-type agnostic. The restored commits may still break
> > individually (e.g., won't compile in isolation), but the branch is now
> > rebased with granular history preserved.
>
> I don't understand how this is possible, other than in trivial cases.
> Lots of questions:
>
> How exactly is replaying the saved diffs supposed to work?  Replaying
> those diffs would run into conflicts, otherwise you wouldn't have been
> trying to search for a modified strategy in the first place.  How do
> you resolve those?  Are you assuming all conflicts are orthogonal,
> i.e. that the conflicts from patches A and B are never on overlapping
> or even nearby lines to each other?  In such a case, there wouldn't be
> any "cascade of conflicts" to resolve anyway, so the approach would be
> unnecessary.  And if this isn't true, then I don't see how you can
> resolve all the conflicts at once and magically split them out into
> separate commits and replay things in a way that doesn't have
> conflicts -- at least not without further human intervention.
>
> And that was focusing on content conflicts.  What do you do with
> non-content conflicts, such as file mode differences, file/directory
> conflicts, symlink/submodule conflicts, rename conflict, directory
> rename conflict, etc.?
>
> But going back to content conflicts, are you somehow sidestepping the
> need to assume orthogonality?  Perhaps when you attempt to replay and
> hit a conflict, you just take the end-result for that hunk range for
> the first patch that touches it, and for any other patch that touches
> that hunk range you simply discard the changes for those since they've
> been squashed into the first patch?  Or something similar?
>
> As another way to look at this problem, what about cases where
> individual commits being replayed will conflict, but merging the end
> result has no conflicts?  (In other words, only the intermediate
> states had a conflict of some type -- a simple example of this is
> someone making a change and then shortly later reverting it.)  In such
> a case, by resolving conflicts only at the end, there was no
> resolution of the conflicts experienced by those intermediate states.
> What do you do there?
>
> What about cases like
> https://git-scm.com/docs/gitfaq#Documentation/gitfaq.txt-IfImakeachangeontwobranchesbutrevertitononewhydoesthemergeofthosebranchesincludethechange
> (both sides of history make a change, and one side also reverts that
> change) -- by merging the squashed result, you'd end up with the
> change being applied, but when you try to split it out into separate
> commits, you have "apply the change" followed by "revert the change"
> -- how does that result in the change being applied in the end result?
>
> Are there cases where your algorithm feels indistinguishable from
> squashing all the commits, having the user resolve conflicts, and then
> randomly assigning the hunks to a completely new set of commits
> (beyond the obvious differences in commit messages and perhaps number
> of commits)?  Every algorithm I can think of for automatically playing
> existing commits while only resolving conflicts at the end would seem
> to sometimes suffer from this kind of downside, but it's entirely
> possible I'm only thinking of a few brute force algorithms and I'm
> missing something clever that you might be doing.
>
> > To address your header file example specifically: when common.h no
> > longer includes stdlib.h after rebasing onto main, you would indeed
> > need to add #include <stdlib.h> during conflict resolution. With my
> > tool, you'd add this include once while resolving the squashed
> > commit's conflicts. The fix then persists through all restored commits
> > - so instead of potentially fixing the same missing header multiple
> > times across commits during a normal rebase, you handle it once.
> > You're correct that the tool cannot determine which specific commit
> > should ideally contain this fix (that would require compilation
> > testing), but it prevents the cascading resolution burden while
> > preserving the granular history.
>
> I think there may have been a misunderstanding here.  The purpose
> wasn't granular history, it was meaningful logical commits; those tend
> to come in granular sizes, but granularity was more of a side-effect
> rather than the purpose. Having a disfigured granular history is quite
> different than meaningful logical commits.
>
> > The key point: this doesn't fix non-atomic commits. The semantic
> > problems remain if you examine individual commits. But it prevents
> > these issues from causing redundant conflict resolution during rebase
> > by compressing the timeline of the branch in a single instant "t"
> > where the semantic non-atomic misalignment appears simultaneously.
> >
> > The tool trades ideal per-commit correctness for efficiency in
> > real-world scenarios where atomic commits aren't always achievable
> > (junior developers, emergency fixes, legacy codebases).
>
> I'm pretty biased here, but let me take exception with your wording
> that these are not achievable cases.  I'm going to push back on that,
> rather strongly.  In fact, I'd go further and say that legacy
> codebases are where logical small commits make the *most* sense
> (sometimes with small new projects the "why" and the "what" you're
> doing is obvious and so there's not much to write in a commit message,
> and anyone can follow along, whereas in a legacy codebase you need to
> carefully document why you are making changes and ensure each change
> in isolation makes sense so others can figure out why each line of
> code does what it does).  In regards to junio developers, letting
> junior developers develop bad habits (and even facilitating such) just
> leads to them later being senior developers with bad habits; I think
> it'd be better to take the time to coach them on proper practices
> instead.  And in the remaining case, what you describe sounds to me
> like a rushed emergency fix, and I feel like rushed emergency fixes
> result in more emergencies later.
>
> I understand that many are reluctant to put in the effort to make
> clear, logical commits, but I feel that each of your cases are
> examples of unwillingness or lack of training, not an actual
> inability.
>
> As noted above, I'm quite biased here, and would rather encourage
> people to make clear, logical, easy-to-follow-and-review-years-later
> steps that are documented as commits.  I think tools which encourage
> or facilitate doing the opposite are things that should be shunned.
> (I hate squash-merging in GitHub for this reason, and GitHub's lack of
> focus on treating commit messages as important first-class material
> for review has long bothered me deeply, as I've commented elsewhere.)
> Others are free to disagree with me and pursue their own path, but
> that's my position.
>
> > Regarding expectations: documentation will be clear that this only
> > reduces redundant conflict resolution. It doesn't make commits atomic
> > or understand semantics. It manages semantic inconsistency by
> > compression, intended as reduced dimensionality. The defects remain in
> > history but no longer interfere with rebasing.
> >
> > I apologize if the code evolution seems rushed - v2.0.0 was developed
> > quickly to establish a working implementation of the core concept. For
> > v3.0.0, I plan to remove tree-sitter entirely to achieve complete
> > language-agnosticism. I'll also need to properly evaluate edge cases
> > like submodules and merge commits through integration testing, though
> > the squash operation should hopefully be quite independent from these
> > structures. My testing focus will be on ensuring determinism, proper
> > rollback capabilities, and full idempotency across all supported
> > scenarios.
> >
> > The current implementation is available at
> > https://github.com/anthem87/clean-rebase/tree/v2.0.0 if you'd like to
> > explore the approach.
> >
> > Thanks again for the patience and the attention.
>
> I hope my email doesn't come across as too negative.  I'm a big fan of
> exploring and attempting new things.  While I am biased against tools
> that work against logical & bisectable changes, and I'm worried your
> tool is going to face an awful lot of problematic cases if you
> continue with it, hopefully my email provides more context for you and
> alerts you to some cases you might need to investigate if you want to
> push the tool further.
>
> Even if I am not in the camp that'd be interested in using your tool,
> I am very curious if you have clever ideas for any of the special
> cases I listed above and how to handle them; sometimes any such ideas
> might have ways of being wielded elsewhere.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase
  2025-09-21 14:06           ` Antonio Mennillo
@ 2025-09-23 23:33             ` Antonio Mennillo
  0 siblings, 0 replies; 8+ messages in thread
From: Antonio Mennillo @ 2025-09-23 23:33 UTC (permalink / raw)
  To: Elijah Newren; +Cc: brian m. carlson, git

Hi Brian, Elijah,

I need to acknowledge that my initial reasoning around this tool
developed quite rapidly, driven both by enthusiasm for sharing results
and by pressure from growing interest in my work network. Now, with
more time to reflect, I can analyze your comments more rationally
seeing all the implications.

The "semantic conflict" example should be considered only as a
limiting use case I've seen in emergency situations. It was not
intended as the definition of the tool, but my fault I set it up in
the object of the mailing list. The actual purpose is to consolidate
rebase conflicts into a single step (one-pass conflict resolution) via
squash, ideally reducing the cardinality from N to 1.
For instance, rebasing a 20-commit feature branch with 5 conflict
points currently requires 5 separate git rebase --continue cycles.
This tool consolidates those into a single resolution step, then aims
to reconstruct the commits. On large codebases, such as multi-modular
Java projects, managing a rebase in an IDE becomes significantly more
manageable with one consolidated conflict rather than dozens, reducing
both cognitive load and the chance of errors.

I fully understand and respect the importance of atomic commits. To be
explicit, this tool:

- does not make non-atomic commits atomic
- does not guarantee bisectability
- does not replace developer discipline

It only reduces the number of conflict resolution iterations. Poor
commit atomicity will still widely affect the number of conflicting
files. The "temporal flattening" I mentioned earlier refers simply to
this consolidation.

From a technical standpoint, I am developing a different approach that
properly addresses the concerns you raised. This new implementation
works directly with Git's internal structures to ensure deterministic
behavior, specifically handling revert detection, intermediate-only
conflicts, non-content conflicts, and attribution at appropriate
granularity levels.

I'm targeting this weekend for a working implementation with
integration test coverage.

The goal remains pragmatic: helping developers handle complex rebases
more efficiently when dealing with real-world constraints. The
consequence of a bad commit atomicity will still be there to be solved
by a strict commit discipline.

What becomes easier is avoiding redundant conflict resolution cycles,
while the consequences of poor commit atomicity remain unchanged as
the Git philosophy rightly points out.

Best regards,
Antonio

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-09-23 23:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-16 21:39 [RFC] git-rebase-clean: mitigating a “semantic conflict cascade” during rebase Antonio Mennillo
2025-09-16 22:14 ` brian m. carlson
2025-09-18 17:02   ` Antonio Mennillo
2025-09-18 22:22     ` brian m. carlson
2025-09-19 21:14       ` Antonio Mennillo
2025-09-20 19:13         ` Elijah Newren
2025-09-21 14:06           ` Antonio Mennillo
2025-09-23 23:33             ` Antonio Mennillo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).