first-class conflicts?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* first-class conflicts?
@ 2023-11-06 21:17 Sandra Snan
  2023-11-06 22:01 ` Dragan Simic
  2023-11-07  8:16 ` Elijah Newren
  0 siblings, 2 replies; 25+ messages in thread
From: Sandra Snan @ 2023-11-06 21:17 UTC (permalink / raw)
  To: git

Is this feature from jj also a good idea for git?
https://martinvonz.github.io/jj/v0.11.0/conflicts/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-06 21:17 first-class conflicts? Sandra Snan
@ 2023-11-06 22:01 ` Dragan Simic
  2023-11-06 22:34   ` Sandra Snan
  2023-11-06 22:34   ` rsbecker
  2023-11-07  8:16 ` Elijah Newren
  1 sibling, 2 replies; 25+ messages in thread
From: Dragan Simic @ 2023-11-06 22:01 UTC (permalink / raw)
  To: Sandra Snan; +Cc: git

On 2023-11-06 22:17, Sandra Snan wrote:
> Is this feature from jj also a good idea for git?
> https://martinvonz.github.io/jj/v0.11.0/conflicts/

Hmm, that's quite interesting, but frankly it makes little sense to me.  
See, the source code in a repository should always be in a compileable 
or runnable state, in each and every commit, so going against that rule 
wouldn't make much sense.  Just think about various CI/CD tools that 
also expect the same.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-06 22:01 ` Dragan Simic
@ 2023-11-06 22:34   ` Sandra Snan
  2023-11-06 22:34   ` rsbecker
  1 sibling, 0 replies; 25+ messages in thread
From: Sandra Snan @ 2023-11-06 22:34 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 169 bytes --]

I've sometimes merged stuff in and almost not notice that I had a conflict 
in there and in those cases the code wasn't compilable even though I was 
using vanilla git.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: first-class conflicts?
  2023-11-06 22:01 ` Dragan Simic
  2023-11-06 22:34   ` Sandra Snan
@ 2023-11-06 22:34   ` rsbecker
  2023-11-06 22:45     ` Sandra Snan
  1 sibling, 1 reply; 25+ messages in thread
From: rsbecker @ 2023-11-06 22:34 UTC (permalink / raw)
  To: 'Dragan Simic', 'Sandra Snan'; +Cc: git

On November 6, 2023 5:01 PM, Dragan Simic wrote:
>On 2023-11-06 22:17, Sandra Snan wrote:
>> Is this feature from jj also a good idea for git?
>> https://martinvonz.github.io/jj/v0.11.0/conflicts/
>
>Hmm, that's quite interesting, but frankly it makes little sense to me.
>See, the source code in a repository should always be in a compileable or
runnable
>state, in each and every commit, so going against that rule wouldn't make
much
>sense.  Just think about various CI/CD tools that also expect the same.

It seems to me, perhaps naively, that the longer a conflict persists in a
repository, the greater the potential for chaotic results. There are,
notably, at least two fundamental types of conflicts:

1. Content conflict, where a point in a file is modified in two (or n)
branches being combined, is what git tries to ensure never happens. The
longer such a conflict exists in a file, the greater the variance from a
buildable or consistent state will persist and will likely be increasingly
harder to resolve.

2. Semantic conflicts, where unrelated modification points cause
incompatibilities are much harder to resolve and quantify - many are, in
fact, undetectable from a computational standpoint (as in detecting general
semantic conflicts is an uncomputable problem). The longer those persist,
partly when they are missed by pull requests/code reviews, the more
persistent a defect can become.

3. I am avoiding matters such as code optimization conflicts which are
outside the scope of the proposal.

In either case, storing conflicts in the integration branches of a
repository is, in my view, a bad thing that eventually can make the
repository unsustainable. I will concede that keeping conflicts around in
non-integration branches may have intellectual value for recording research
and development progress.

This is just my opinion.
Randall

--
Brief whoami: NonStop&UNIX developer since approximately
UNIX(421664400)
NonStop(211288444200000000)
-- In real life, I talk too much.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RE: first-class conflicts?
  2023-11-06 22:34   ` rsbecker
@ 2023-11-06 22:45     ` Sandra Snan
  2023-11-07  0:50       ` Theodore Ts'o
  2023-11-07 11:23       ` Phillip Wood
  0 siblings, 2 replies; 25+ messages in thread
From: Sandra Snan @ 2023-11-06 22:45 UTC (permalink / raw)
  To: git, Dragan Simic, rsbecker

[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]

Randall, thank you for that.

I did mean of the first type, pure content conflicts (just like the examples 
on that jj page).

I just have sometimes wish git could be a little more aware of them beyond 
just storing them with ASCII art in the files themselves (and alerting / 
warning when they happen but I often can't properly see those warnings flash 
by so I end up having to search for the conflict markers manually). So if 
conflicts are a thing that *can* happen, it'd be better if vc could know 
about them which would make some of the rebases simpler as in jj. That doesn't 
mean we wanna adopt the jj workflow of deliberately checking in conflicts 
(not even locally), just be able to deal with them better if it does happen.

I dunno… and I've really appreciated the naysayers so far, helps me sort 
out my thoughts in this. I personally really prefer the vanilla "explicit 
staging" workflow (with magit) over jj, got, gitless etc. I'm more scared 
of overcommitting by mistake than undercommitting. But this one feature 
seemed to me that it might be really good: just having the vc be aware of 
the conflicts it has created.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-06 22:45     ` Sandra Snan
@ 2023-11-07  0:50       ` Theodore Ts'o
  2023-11-11  1:31         ` Junio C Hamano
  2023-11-07 11:23       ` Phillip Wood
  1 sibling, 1 reply; 25+ messages in thread
From: Theodore Ts'o @ 2023-11-07  0:50 UTC (permalink / raw)
  To: Sandra Snan; +Cc: git, Dragan Simic, rsbecker

On Mon, Nov 06, 2023 at 10:45:03PM +0000, Sandra Snan wrote:
> Randall, thank you for that.
> 
> I just have sometimes wish git could be a little more aware of them beyond
> just storing them with ASCII art in the files themselves (and alerting /
> warning when they happen but I often can't properly see those warnings flash
> by so I end up having to search for the conflict markers manually). So if
> conflicts are a thing that *can* happen, it'd be better if vc could know
> about them which would make some of the rebases simpler as in jj. That
> doesn't mean we wanna adopt the jj workflow of deliberately checking in
> conflicts (not even locally), just be able to deal with them better if it
> does happen.

Well, if you miss them, "git status" does show that there are conflicts:

   Unmerged paths:
     (use "git add <file>..." to mark resolution)
           both modified:   version.h

And if you attempt to commit the merge without resolving the
conflicts, git won't let you:

   error: Committing is not possible because you have unmerged files.
   hint: Fix them up in the work tree, and then use 'git add/rm <file>'
   hint: as appropriate to mark resolution and make a commit.

So it's hard to miss the indications of the content conflict, because
if you try to commit without resolving them, it's not a warning, it's
an outright error.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-07  0:50       ` Theodore Ts'o
@ 2023-11-11  1:31         ` Junio C Hamano
  2023-11-11  7:48           ` Sandra Snan
  2023-11-12 15:21           ` Theodore Ts'o
  0 siblings, 2 replies; 25+ messages in thread
From: Junio C Hamano @ 2023-11-11  1:31 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Sandra Snan, git, Dragan Simic, rsbecker

"Theodore Ts'o" <tytso@mit.edu> writes:

> And if you attempt to commit the merge without resolving the
> conflicts, git won't let you:
>
>    error: Committing is not possible because you have unmerged files.
>    hint: Fix them up in the work tree, and then use 'git add/rm <file>'
>    hint: as appropriate to mark resolution and make a commit.
>
> So it's hard to miss the indications of the content conflict, because
> if you try to commit without resolving them, it's not a warning, it's
> an outright error.

Correct but with a caveat: it is too easy for lazy folks to
circumvent the safety by mistake with "commit -a".

I wonder if it would help users to add a new configuration option
for those who want to live safer that tells "commit -a" to leave
unmerged paths alone and require the unmerged paths to be added
explicitly (which may have to extend to cover things like "add -u"
and "add .").

Perhaps not.  I often find myself doing "git add -u" after resolving
conflicts and re-reading the result, without an explicit pathspec.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-11  1:31         ` Junio C Hamano
@ 2023-11-11  7:48           ` Sandra Snan
  2023-11-12 15:21           ` Theodore Ts'o
  1 sibling, 0 replies; 25+ messages in thread
From: Sandra Snan @ 2023-11-11  7:48 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster@pobox.com> writes:
> Correct but with a caveat: it is too easy for lazy folks to 
> circumvent the safety by mistake with "commit -a". 

Lazy and ignorant like myself because I didn't know -a was that
dangerous. Thank you both!

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-11  1:31         ` Junio C Hamano
  2023-11-11  7:48           ` Sandra Snan
@ 2023-11-12 15:21           ` Theodore Ts'o
  2023-11-12 23:25             ` Junio C Hamano
  1 sibling, 1 reply; 25+ messages in thread
From: Theodore Ts'o @ 2023-11-12 15:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sandra Snan, git, Dragan Simic, rsbecker

On Sat, Nov 11, 2023 at 10:31:54AM +0900, Junio C Hamano wrote:
> Correct but with a caveat: it is too easy for lazy folks to
> circumvent the safety by mistake with "commit -a".
> 
> I wonder if it would help users to add a new configuration option
> for those who want to live safer that tells "commit -a" to leave
> unmerged paths alone and require the unmerged paths to be added
> explicitly (which may have to extend to cover things like "add -u"
> and "add .").
> 
> Perhaps not.  I often find myself doing "git add -u" after resolving
> conflicts and re-reading the result, without an explicit pathspec.

Maybe the configuration option would also forbit "git add -u" from
adding diffs with conflict markers unless --force is added?

I dunno.  I personally wouldn't use it myself, because I've always
made a point of running "git diff", or "git status", and almost
always, a command like "make -j16 && make -j16 check" (or an aliased
equivalent) before commiting a merge.

But that's because I'm a paranoid s.o.b. and in my long career, I've
learned is that "you can't be paranoid enough", and "hope is not a
strategy".  :-)

					- Ted

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-12 15:21           ` Theodore Ts'o
@ 2023-11-12 23:25             ` Junio C Hamano
  0 siblings, 0 replies; 25+ messages in thread
From: Junio C Hamano @ 2023-11-12 23:25 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Sandra Snan, git, Dragan Simic, rsbecker

"Theodore Ts'o" <tytso@mit.edu> writes:

> On Sat, Nov 11, 2023 at 10:31:54AM +0900, Junio C Hamano wrote:
>> ... 
>> I wonder if it would help users to add a new configuration option
>> for those who want to live safer that tells "commit -a" to leave
>> unmerged paths alone and require the unmerged paths to be added
>> explicitly (which may have to extend to cover things like "add -u"
>> and "add .").
>> 
>> Perhaps not.  I often find myself doing "git add -u" after resolving
>> conflicts and re-reading the result, without an explicit pathspec.
>
> Maybe the configuration option would also forbit "git add -u" from
> adding diffs with conflict markers unless --force is added?

Historically we left it to pre-commit hooks, but I agree that
protection at the time of "git add" may be more helpful.

I also alluded to being careful about "git add" with an overly vague
pathspec (like "."  to add everything addable under the sun), but I
do not think it is possible to define "overly vague" in a way that
satisfies everybody (would "git add \*.h" be still overly vague when
5% of your header files have conflicts in the merge you are
concluding?) and keep the new users safe.

Unless the configuration forbids patterns and say "each and every
individual path must be named to add and resolve conflicted paths",
that is.  Come to think of it, that may not be too bad.

> I dunno.  I personally wouldn't use it myself, because I've always
> made a point of running "git diff", or "git status", and almost
> always, a command like "make -j16 && make -j16 check" (or an aliased
> equivalent) before commiting a merge.
>
> But that's because I'm a paranoid s.o.b. and in my long career, I've
> learned is that "you can't be paranoid enough", and "hope is not a
> strategy".  :-)

Being careful and paranoid is good ;-) I wouldn't use it myself,
either, but the discussion started while trying to allay new users'
worries about recording a half-resolved state by mistake, and in
that context, I think it would have non-empty audiences.

Thanks.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-06 22:45     ` Sandra Snan
  2023-11-07  0:50       ` Theodore Ts'o
@ 2023-11-07 11:23       ` Phillip Wood
  2023-11-07 11:24         ` Sandra Snan
  1 sibling, 1 reply; 25+ messages in thread
From: Phillip Wood @ 2023-11-07 11:23 UTC (permalink / raw)
  To: Sandra Snan, git, Dragan Simic, rsbecker

Hi Sandra

On 06/11/2023 22:45, Sandra Snan wrote:
> Randall, thank you for that.
> 
> I did mean of the first type, pure content conflicts (just like the 
> examples on that jj page).
> 
> I just have sometimes wish git could be a little more aware of them 
> beyond just storing them with ASCII art in the files themselves (and 
> alerting / warning when they happen but I often can't properly see those 
> warnings flash by so I end up having to search for the conflict markers 
> manually). So if conflicts are a thing that *can* happen, it'd be better 
> if vc could know about them which would make some of the rebases simpler 
> as in jj. That doesn't mean we wanna adopt the jj workflow of 
> deliberately checking in conflicts (not even locally), just be able to 
> deal with them better if it does happen.
> 
> I dunno… and I've really appreciated the naysayers so far, helps me sort 
> out my thoughts in this. I personally really prefer the vanilla 
> "explicit staging" workflow (with magit) over jj, got, gitless etc. I'm 
> more scared of overcommitting by mistake than undercommitting. But this 
> one feature seemed to me that it might be really good: just having the 
> vc be aware of the conflicts it has created.

If you run "git status" it will list the files that have conflicts as 
"unmerged". To prevent "git commit" from creating a commit that contains 
conflict markers you can use a pre-commit hook that runs "git diff 
--cached--check". The sample hook that is created by default does this, 
to activate it run

	mv .git/hooks/pre-commit.sample .git/hooks/pre-commit

in the main worktree. You can also run "git config commit.verbose true" 
to make "git commit" show the diff of the changes that will be committed 
below the commit message when you're editing the message.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: RE: first-class conflicts?
  2023-11-07 11:23       ` Phillip Wood
@ 2023-11-07 11:24         ` Sandra Snan
  0 siblings, 0 replies; 25+ messages in thread
From: Sandra Snan @ 2023-11-07 11:24 UTC (permalink / raw)
  To: git, Dragan Simic, rsbecker

That is wonderful! Thank you so much, Phillip! 👍🏻


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-06 21:17 first-class conflicts? Sandra Snan
  2023-11-06 22:01 ` Dragan Simic
@ 2023-11-07  8:16 ` Elijah Newren
  2023-11-07  8:21   ` Dragan Simic
                     ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Elijah Newren @ 2023-11-07  8:16 UTC (permalink / raw)
  To: Sandra Snan; +Cc: git

On Mon, Nov 6, 2023 at 1:26 PM Sandra Snan
<sandra.snan@idiomdrottning.org> wrote:
>
> Is this feature from jj also a good idea for git?
> https://martinvonz.github.io/jj/v0.11.0/conflicts/

Martin talked about this and other features at Git Merge 2022, a
little over a year ago.  I talked to him in more depth about these
while there.  I personally think he has some really interesting
features here, though at the time, I thought that the additional
object type might be too much to ask for in a Git change, and it was
an intrinsic part of the implementation back then.

Martin also gave us an update at the 2023 Git Contributors summit, and
in particular noted a significant implementation change to not have
per-file storage of conflicts, but rather storing at the commit level
the multiple conflicting trees involved.  That model might be
something we could implement in Git.  And if we did, it'd solve
various issues such as people wanting to be able to stash conflicts,
or wanting to be able to partially resolve conflicts and fix it up
later, or be able to collaboratively resolve conflicts without having
everyone have access to the same checkout.

But we'd also have to be careful and think through usecases, including
in the surrounding community.  People would probably want to ensure
that e.g. "Protected" or "Integration" branches don't get accept
fetches or pushes of conflicted commits, git status would probably
need some special warnings or notices, git checkout would probably
benefit from additional warnings/notices checks for those cases, git
log should probably display conflicted commits differently, we'd need
to add special handling for higher order conflicts (e.g. a merge with
conflicts is itself involved in a merge) probably similar to what jj
has done, and audit a lot of other code paths to see what would be
needed.

I think it'd be really interesting to at least investigate, but it'd
also be a lot of work, and I already have several other things I've
been wanting to get back to for over a year and haven't succeeded in
generating more time for Git.

Anyway, just my $0.02.
Elijah

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-07  8:16 ` Elijah Newren
@ 2023-11-07  8:21   ` Dragan Simic
  2023-11-07  9:16   ` Sandra Snan
  2023-11-07 11:49   ` Phillip Wood
  2 siblings, 0 replies; 25+ messages in thread
From: Dragan Simic @ 2023-11-07  8:21 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Sandra Snan, git

On 2023-11-07 09:16, Elijah Newren wrote:
> But we'd also have to be careful and think through usecases, including
> in the surrounding community.  People would probably want to ensure
> that e.g. "Protected" or "Integration" branches don't get accept
> fetches or pushes of conflicted commits, git status would probably
> need some special warnings or notices, git checkout would probably
> benefit from additional warnings/notices checks for those cases, git
> log should probably display conflicted commits differently, we'd need
> to add special handling for higher order conflicts (e.g. a merge with
> conflicts is itself involved in a merge) probably similar to what jj
> has done, and audit a lot of other code paths to see what would be
> needed.

That would be a truly _massive_ project.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-07  8:16 ` Elijah Newren
  2023-11-07  8:21   ` Dragan Simic
@ 2023-11-07  9:16   ` Sandra Snan
  2023-11-07 11:49   ` Phillip Wood
  2 siblings, 0 replies; 25+ messages in thread
From: Sandra Snan @ 2023-11-07  9:16 UTC (permalink / raw)
  To: git

Elijah Newren <newren@gmail.com> writes:
> Martin talked about this and other features at Git Merge 2022, a 
> little over a year ago.

That is something I should've checked or searched for before 
starting this thread, in hindsight. Thank you, Elijah, for letting 
me know that.

> And if we did, it'd solve various issues such as people wanting 
> to be able to stash conflicts, or wanting to be able to 
> partially resolve conflicts and fix it up later, or be able to 
> collaboratively resolve conflicts without having everyone have 
> access to the same checkout. 

One feature I would really like and maybe vanilla git can already 
do this today and I just don't know how, but just becoming more 
aware of conflicts, of when there's a conflict in the commit.

> git status would probably need some special warnings or notices, 
> git checkout would probably benefit from additional 
> warnings/notices checks for those cases, git log should probably 
> display conflicted commits differently

That's exactly what I dream of! I wouldn't wanna commit conflicts 
deliberately, just that I'm paranoid that I might have some failed 
merges and three way diffs in code that I missed when they flashed 
by on the screen.

> it'd also be a lot of work

That is for sure. And don't get me wrong, it's not a feature I
personally really need or am clamoring for. Thank you so much for the
thoughtful explanation.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-07  8:16 ` Elijah Newren
  2023-11-07  8:21   ` Dragan Simic
  2023-11-07  9:16   ` Sandra Snan
@ 2023-11-07 11:49   ` Phillip Wood
  2023-11-07 17:38     ` Martin von Zweigbergk
  2023-11-08  6:31     ` Elijah Newren
  2 siblings, 2 replies; 25+ messages in thread
From: Phillip Wood @ 2023-11-07 11:49 UTC (permalink / raw)
  To: Elijah Newren, Sandra Snan; +Cc: git, Martin von Zweigbergk, Randall S. Becker

Hi Elijah

[I've cc'd Martin to see if he has anything to add about how "jj" 
manages the issues around storing conflicts.]

On 07/11/2023 08:16, Elijah Newren wrote:
> On Mon, Nov 6, 2023 at 1:26 PM Sandra Snan
> <sandra.snan@idiomdrottning.org> wrote:
>>
>> Is this feature from jj also a good idea for git?
>> https://martinvonz.github.io/jj/v0.11.0/conflicts/
> 
> Martin talked about this and other features at Git Merge 2022, a
> little over a year ago.  I talked to him in more depth about these
> while there.  I personally think he has some really interesting
> features here, though at the time, I thought that the additional
> object type might be too much to ask for in a Git change, and it was
> an intrinsic part of the implementation back then.
> 
> Martin also gave us an update at the 2023 Git Contributors summit, and
> in particular noted a significant implementation change to not have
> per-file storage of conflicts, but rather storing at the commit level
> the multiple conflicting trees involved.  That model might be
> something we could implement in Git.  And if we did, it'd solve
> various issues such as people wanting to be able to stash conflicts,
> or wanting to be able to partially resolve conflicts and fix it up
> later, or be able to collaboratively resolve conflicts without having
> everyone have access to the same checkout.

One thing to think about if we ever want to implement this is what other 
data we need to store along with the conflict trees to preserve the 
context in which the conflict was created. For example the files that 
are read by "git commit" when it commits a conflict resolution. For a 
single cherry-pick/revert it would probably be fairly straight forward 
to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets 
transferred along with the conflicts. For a sequence of cherry-picks or 
a rebase it is more complicated to preserve the context of the conflict. 
Even "git merge" can create several files in addition to MERGE_HEAD 
which are read when the conflict resolution is committed.

> But we'd also have to be careful and think through usecases, including
> in the surrounding community.  People would probably want to ensure
> that e.g. "Protected" or "Integration" branches don't get accept
> fetches or pushes of conflicted commits,

I think this is a really important point, while it can be useful to 
share conflicts so they can be collaboratively resolved we don't want to 
propagate them into "stable" or production branches. I wonder how 'jj' 
handles this.

> git status would probably
> need some special warnings or notices, git checkout would probably
> benefit from additional warnings/notices checks for those cases, git
> log should probably display conflicted commits differently, we'd need
> to add special handling for higher order conflicts (e.g. a merge with
> conflicts is itself involved in a merge) probably similar to what jj
> has done, and audit a lot of other code paths to see what would be
> needed.

As you point out there is a lot more to this than just being able to 
store the conflict data in a commit - in many ways I think that is the 
easiest part of the solution to sharing conflicts.

Best Wishes

Phillip


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-07 11:49   ` Phillip Wood
@ 2023-11-07 17:38     ` Martin von Zweigbergk
  2023-11-08  7:31       ` Elijah Newren
  2023-11-09 14:50       ` phillip.wood123
  2023-11-08  6:31     ` Elijah Newren
  1 sibling, 2 replies; 25+ messages in thread
From: Martin von Zweigbergk @ 2023-11-07 17:38 UTC (permalink / raw)
  To: phillip.wood; +Cc: Elijah Newren, Sandra Snan, git, Randall S. Becker

(new attempt in plain text)

On Tue, Nov 7, 2023 at 3:49 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> Hi Elijah
>
> [I've cc'd Martin to see if he has anything to add about how "jj"
> manages the issues around storing conflicts.]
>
> On 07/11/2023 08:16, Elijah Newren wrote:
> > On Mon, Nov 6, 2023 at 1:26 PM Sandra Snan
> > <sandra.snan@idiomdrottning.org> wrote:
> >>
> >> Is this feature from jj also a good idea for git?
> >> https://martinvonz.github.io/jj/v0.11.0/conflicts/
> >
> > Martin talked about this and other features at Git Merge 2022, a
> > little over a year ago.  I talked to him in more depth about these
> > while there.  I personally think he has some really interesting
> > features here, though at the time, I thought that the additional
> > object type might be too much to ask for in a Git change, and it was
> > an intrinsic part of the implementation back then.
> >
> > Martin also gave us an update at the 2023 Git Contributors summit, and
> > in particular noted a significant implementation change to not have
> > per-file storage of conflicts, but rather storing at the commit level
> > the multiple conflicting trees involved.  That model might be
> > something we could implement in Git.  And if we did, it'd solve
> > various issues such as people wanting to be able to stash conflicts,
> > or wanting to be able to partially resolve conflicts and fix it up
> > later, or be able to collaboratively resolve conflicts without having
> > everyone have access to the same checkout.
>
> One thing to think about if we ever want to implement this is what other
> data we need to store along with the conflict trees to preserve the
> context in which the conflict was created. For example the files that
> are read by "git commit" when it commits a conflict resolution. For a
> single cherry-pick/revert it would probably be fairly straight forward
> to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets
> transferred along with the conflicts. For a sequence of cherry-picks or
> a rebase it is more complicated to preserve the context of the conflict.
> Even "git merge" can create several files in addition to MERGE_HEAD
> which are read when the conflict resolution is committed.

Good point. We actually don't store any extra data in jj. The old
per-path conflict model was prepared for having some label associated
with each term of the conflict but we never actually used it.

If we add such metadata, it would probably have to be something that
makes sense even after pushing the conflict to another repo, so it
probably shouldn't be commit ids, unless we made sure to also push
those commits. Also note that if you `jj restore --from <commit with
conflict>`, you can get a conflict into a commit that didn't have
conflicts previously. Or if you already had conflicts in the
destination commit, your root trees (the multiple root trees
constituting the conflict) will now have conflicts that potentially
were created by two completely unrelated operations, so you would kind
of need different labels for different paths.

https://github.com/martinvonz/jj/issues/1176 has some more discussion
about this.

> > But we'd also have to be careful and think through usecases, including
> > in the surrounding community.  People would probably want to ensure
> > that e.g. "Protected" or "Integration" branches don't get accept
> > fetches or pushes of conflicted commits,
>
> I think this is a really important point, while it can be useful to
> share conflicts so they can be collaboratively resolved we don't want to
> propagate them into "stable" or production branches. I wonder how 'jj'
> handles this.

Agreed. `jj git push` refuses to push commits with conflicts, because
it's very unlikely that the remote will be able to make any sense of
it. Our commit backend at Google does support conflicts, so users can
check out each other's conflicted commits there (except that we
haven't even started dogfooding yet).

> > git status would probably
> > need some special warnings or notices, git checkout would probably
> > benefit from additional warnings/notices checks for those cases, git
> > log should probably display conflicted commits differently, we'd need
> > to add special handling for higher order conflicts (e.g. a merge with
> > conflicts is itself involved in a merge) probably similar to what jj
> > has done, and audit a lot of other code paths to see what would be
> > needed.
>
> As you point out there is a lot more to this than just being able to
> store the conflict data in a commit - in many ways I think that is the
> easiest part of the solution to sharing conflicts.

Yes, I think it would be a very large project. Unlike jj, Git of
course has to worry about backwards compatibility. For example, you
would have to decide if your goal - even in the long term - is to make
`git rebase` etc. not get interrupted due to conflicts.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-07 17:38     ` Martin von Zweigbergk
@ 2023-11-08  7:31       ` Elijah Newren
  2023-11-08 18:22         ` Martin von Zweigbergk
  2023-11-09 14:50       ` phillip.wood123
  1 sibling, 1 reply; 25+ messages in thread
From: Elijah Newren @ 2023-11-08  7:31 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: phillip.wood, Sandra Snan, git, Randall S. Becker

Hi Martin,

On Tue, Nov 7, 2023 at 9:38 AM Martin von Zweigbergk
<martinvonz@google.com> wrote:
>
[...]
> > One thing to think about if we ever want to implement this is what other
> > data we need to store along with the conflict trees to preserve the
> > context in which the conflict was created. For example the files that
> > are read by "git commit" when it commits a conflict resolution. For a
> > single cherry-pick/revert it would probably be fairly straight forward
> > to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets
> > transferred along with the conflicts. For a sequence of cherry-picks or
> > a rebase it is more complicated to preserve the context of the conflict.
> > Even "git merge" can create several files in addition to MERGE_HEAD
> > which are read when the conflict resolution is committed.
>
> Good point. We actually don't store any extra data in jj. The old
> per-path conflict model was prepared for having some label associated
> with each term of the conflict but we never actually used it.
>
> If we add such metadata, it would probably have to be something that
> makes sense even after pushing the conflict to another repo, so it
> probably shouldn't be commit ids, unless we made sure to also push
> those commits. Also note that if you `jj restore --from <commit with
> conflict>`, you can get a conflict into a commit that didn't have
> conflicts previously. Or if you already had conflicts in the
> destination commit, your root trees (the multiple root trees
> constituting the conflict) will now have conflicts that potentially
> were created by two completely unrelated operations, so you would kind
> of need different labels for different paths.
>
> https://github.com/martinvonz/jj/issues/1176 has some more discussion
> about this.

Interesting link; thanks for sharing.

I am curious more about the data you do store.  My fuzzy memory is
that you store a commit header involving something of the form "A + B
- C", where those are all commit IDs.  Is that correct?  Is this in
addition to a normal "tree" header as in Git, or are one of A or B
found in the tree header?  I think you said there was also the
possibility for more than three terms.  Are those for when a
conflicted commit is merged with another branch that adds more
conflicts, or are there other cases too?  (Octopus merges?)

What about recursive merges, i.e. merges where the two sides do not
have a unique merge base.  What is the form of those?  (Would "- C" be
replaced by "- C1 - C2 - ... - Cn"?  Or would we create the virtual
merge base V and then do a " - V"?  Or do we only have "A + B"?)

You previously mentioned that if someone goes to edit a commit with
conflicts, and resolves the conflicts in just one file, then you can
modify each of the trees A, B, and C such that a merging of those
trees gives the partially resolved result.  How does one do that with
special conflicts, such as:
   * User modifies file D on both sides of history, in conflicting
ways, and also renames D -> E on one side of history.  User checks out
this conflicted commit and fixes the conflicts in E (but not other
files) and does a "git add E".  When they go to commit, does the
machinery need a mapping to figure out that it needs to adjust "D" in
two of the trees while adjusting "E" in the other?
   * Similar to the above, but the side that doesn't rename D renames
olddir/ -> newdir/, and the side that renames D instead renames
D->olddir/E.  For this case, the file will end up at newdir/E; do we
need the backward mapping from newdir/E to both olddir/E and D?
   * Slightly different than the above: User renames D -> E on one
side of history, and D -> F on the other.  That's a rename/rename
(1to2) conflict.  User checks out this conflicted commit and does a
"git add F", marking it as okay, but leaving E conflicted.  How can
one adjust the tree such that no conflict for F appears, but one still
appears for E?
   * Similar to above with an extra wrinkle: User renames D -> E on
one side of history, and on the other side both renames D -> F and
adds a slightly different file named E.  That's both a rename/rename
(1to2) conflict for E & F, and an add/add conflict for E.  Users
checks out this conflicted commit and resolves textual conflict in E
(in favor of the "other side"), and does a "git add E", marking it as
resolved.  When they go to commit, we not only need to worry about
making sure a conflict for F appears, we also need to figure out how
to adjust the tree such that the merge result gives you the expected
value in E without affecting F.  How can that be done?

On the first two bullet points, there's no such thing as a reverse
mapping from conflicted files to original files from previous commits
in current Git.  Creating one, if possible, would be a fair amount of
work.  But, I'm not so sure it's even possible, due to the fact that
conflicts and files do not always have one-to-one (or even one-to-many
or many-to-one) relationships; many-to-many relationship can exist, as
I've started alluding to in the last two bullet points (see also
https://github.com/git/git/blob/98009afd24e2304bf923a64750340423473809ff/Documentation/git-merge-tree.txt#L266-L271).
In fact, they can get even more complicated (e.g.
https://github.com/git/git/blob/master/t/t6422-merge-rename-corner-cases.sh#L1017-L1022).

> > > But we'd also have to be careful and think through usecases, including
> > > in the surrounding community.  People would probably want to ensure
> > > that e.g. "Protected" or "Integration" branches don't get accept
> > > fetches or pushes of conflicted commits,
> >
> > I think this is a really important point, while it can be useful to
> > share conflicts so they can be collaboratively resolved we don't want to
> > propagate them into "stable" or production branches. I wonder how 'jj'
> > handles this.
>
> Agreed. `jj git push` refuses to push commits with conflicts, because
> it's very unlikely that the remote will be able to make any sense of
> it. Our commit backend at Google does support conflicts, so users can
> check out each other's conflicted commits there (except that we
> haven't even started dogfooding yet).

I'm curious to hear what happens when you do start dogfooding, on
projects with many developers and which are jj-only.  Do commits with
conflicts accidentally end up in mainline branches, or are there good
ways to make sure they don't hit anything considered stable?

> > > git status would probably
> > > need some special warnings or notices, git checkout would probably
> > > benefit from additional warnings/notices checks for those cases, git
> > > log should probably display conflicted commits differently, we'd need
> > > to add special handling for higher order conflicts (e.g. a merge with
> > > conflicts is itself involved in a merge) probably similar to what jj
> > > has done, and audit a lot of other code paths to see what would be
> > > needed.
> >
> > As you point out there is a lot more to this than just being able to
> > store the conflict data in a commit - in many ways I think that is the
> > easiest part of the solution to sharing conflicts.
>
> Yes, I think it would be a very large project. Unlike jj, Git of
> course has to worry about backwards compatibility. For example, you
> would have to decide if your goal - even in the long term - is to make
> `git rebase` etc. not get interrupted due to conflicts.

...and whether to copy jj's other feature in this area in some form:
auto-rebasing any descendants when you checkout and amend an old
commit (e.g. to resolve conflicts).  :-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-08  7:31       ` Elijah Newren
@ 2023-11-08 18:22         ` Martin von Zweigbergk
  2023-11-10 21:41           ` Elijah Newren
  0 siblings, 1 reply; 25+ messages in thread
From: Martin von Zweigbergk @ 2023-11-08 18:22 UTC (permalink / raw)
  To: Elijah Newren; +Cc: phillip.wood, Sandra Snan, git, Randall S. Becker

Hi Elijah,


On Tue, Nov 7, 2023 at 11:31 PM Elijah Newren <newren@gmail.com> wrote:
>
> Hi Martin,
>
> On Tue, Nov 7, 2023 at 9:38 AM Martin von Zweigbergk
> <martinvonz@google.com> wrote:
> >
> [...]
> > > One thing to think about if we ever want to implement this is what other
> > > data we need to store along with the conflict trees to preserve the
> > > context in which the conflict was created. For example the files that
> > > are read by "git commit" when it commits a conflict resolution. For a
> > > single cherry-pick/revert it would probably be fairly straight forward
> > > to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets
> > > transferred along with the conflicts. For a sequence of cherry-picks or
> > > a rebase it is more complicated to preserve the context of the conflict.
> > > Even "git merge" can create several files in addition to MERGE_HEAD
> > > which are read when the conflict resolution is committed.
> >
> > Good point. We actually don't store any extra data in jj. The old
> > per-path conflict model was prepared for having some label associated
> > with each term of the conflict but we never actually used it.
> >
> > If we add such metadata, it would probably have to be something that
> > makes sense even after pushing the conflict to another repo, so it
> > probably shouldn't be commit ids, unless we made sure to also push
> > those commits. Also note that if you `jj restore --from <commit with
> > conflict>`, you can get a conflict into a commit that didn't have
> > conflicts previously. Or if you already had conflicts in the
> > destination commit, your root trees (the multiple root trees
> > constituting the conflict) will now have conflicts that potentially
> > were created by two completely unrelated operations, so you would kind
> > of need different labels for different paths.
> >
> > https://github.com/martinvonz/jj/issues/1176 has some more discussion
> > about this.
>
> Interesting link; thanks for sharing.
>
> I am curious more about the data you do store.  My fuzzy memory is
> that you store a commit header involving something of the form "A + B
> - C", where those are all commit IDs.  Is that correct?

We actually store it outside the Git repo (together with the "change
id"). We have avoided using commit headers because I wasn't sure how
well different tools deal with unexpected commit headers, and because
I wanted commits to be indistinguishable from commits created by a
regular Git binary. The latter argument doesn't apply to commits with
conflicts since those are clearly not from a regular Git binary
anyway, and we don't allow pushing them to a remote.

>  Is this in
> addition to a normal "tree" header as in Git, or are one of A or B
> found in the tree header?

It's in addition. For the tree, we actually write a tree object with
three subtrees:

.jjconflict-base-0: C
.jjconflict-side-0: A
.jjconflict-side-1: B

The tree is not authoritative - we use the Git-external storage for
that. The reason we write the trees is mostly to prevent them from
getting GC'd. Also, if a user does `git checkout <conflicted commit>`,
they'll see those subdirectories and will hopefully be reminded that
they did something odd (perhaps we should drop the leading `.` so `ls`
will show them...). They can also diff the directories in a diff tool
if they like.

>  I think you said there was also the
> possibility for more than three terms.  Are those for when a
> conflicted commit is merged with another branch that adds more
> conflicts, or are there other cases too?  (Octopus merges?)

Yes, they can happen in both of those cases you mention. More
generally, whenever you apply a diff between two trees onto another
tree, you might end up with a higher-arity conflict. So merging in
another branch can do that, or doing an octopus merge (which is the
same thing at the tree level, just different at the commit level), or
rebasing or reverting a commit.

We simplify conflicts algebraically, so rebasing a commit multiple
times does not increase the arity - the intermediate parents were both
added and removed and thus cancel out. These simple algorithms for
simplifying conflicts are encapsulated in
https://github.com/martinvonz/jj/blob/main/lib/src/merge.rs. Most of
them are independent of the type of values being merged; they can be
used for doing algebra on tree ids, content hunks, refs, etc. (in the
test cases, we mostly merge integers because integer literals are
compact).

> What about recursive merges, i.e. merges where the two sides do not
> have a unique merge base.  What is the form of those?  (Would "- C" be
> replaced by "- C1 - C2 - ... - Cn"?  Or would we create the virtual
> merge base V and then do a " - V"?  Or do we only have "A + B"?)

We do that by recursively creating a virtual tree just like Git does,
I think (https://github.com/martinvonz/jj/blob/084b99e1e2c42c40f2d52038cdc97687b76fed89/lib/src/rewrite.rs#L56-L71).
I think the main difference is that by modeling conflicts, we can
avoid recursive conflict markers (if that's what Git does), and we can
even automatically resolve some cases where the virtual tree has a
conflict.

> You previously mentioned that if someone goes to edit a commit with
> conflicts, and resolves the conflicts in just one file, then you can
> modify each of the trees A, B, and C such that a merging of those
> trees gives the partially resolved result.  How does one do that with
> special conflicts, such as:
>    * User modifies file D on both sides of history, in conflicting
> ways, and also renames D -> E on one side of history.  User checks out
> this conflicted commit and fixes the conflicts in E (but not other
> files) and does a "git add E".  When they go to commit, does the
> machinery need a mapping to figure out that it needs to adjust "D" in
> two of the trees while adjusting "E" in the other?
>    * Similar to the above, but the side that doesn't rename D renames
> olddir/ -> newdir/, and the side that renames D instead renames
> D->olddir/E.  For this case, the file will end up at newdir/E; do we
> need the backward mapping from newdir/E to both olddir/E and D?
>    * Slightly different than the above: User renames D -> E on one
> side of history, and D -> F on the other.  That's a rename/rename
> (1to2) conflict.  User checks out this conflicted commit and does a
> "git add F", marking it as okay, but leaving E conflicted.  How can
> one adjust the tree such that no conflict for F appears, but one still
> appears for E?
>    * Similar to above with an extra wrinkle: User renames D -> E on
> one side of history, and on the other side both renames D -> F and
> adds a slightly different file named E.  That's both a rename/rename
> (1to2) conflict for E & F, and an add/add conflict for E.  Users
> checks out this conflicted commit and resolves textual conflict in E
> (in favor of the "other side"), and does a "git add E", marking it as
> resolved.  When they go to commit, we not only need to worry about
> making sure a conflict for F appears, we also need to figure out how
> to adjust the tree such that the merge result gives you the expected
> value in E without affecting F.  How can that be done?
>
> On the first two bullet points, there's no such thing as a reverse
> mapping from conflicted files to original files from previous commits
> in current Git.  Creating one, if possible, would be a fair amount of
> work.  But, I'm not so sure it's even possible, due to the fact that
> conflicts and files do not always have one-to-one (or even one-to-many
> or many-to-one) relationships; many-to-many relationship can exist, as
> I've started alluding to in the last two bullet points (see also
> https://github.com/git/git/blob/98009afd24e2304bf923a64750340423473809ff/Documentation/git-merge-tree.txt#L266-L271).
> In fact, they can get even more complicated (e.g.
> https://github.com/git/git/blob/master/t/t6422-merge-rename-corner-cases.sh#L1017-L1022).

Great questions! We don't have support for renames, so we haven't had
to worry about these things. We have talked a little about divergent
renames and the need for recording that in the commit so we can tell
the user about it and maybe ask them which name they want to keep. I
had not considered the interaction with partial conflict resolution,
so thanks for bringing that up. I don't have any answers now, but
we'll probably need to start thinking about this soon.

> > > > But we'd also have to be careful and think through usecases, including
> > > > in the surrounding community.  People would probably want to ensure
> > > > that e.g. "Protected" or "Integration" branches don't get accept
> > > > fetches or pushes of conflicted commits,
> > >
> > > I think this is a really important point, while it can be useful to
> > > share conflicts so they can be collaboratively resolved we don't want to
> > > propagate them into "stable" or production branches. I wonder how 'jj'
> > > handles this.
> >
> > Agreed. `jj git push` refuses to push commits with conflicts, because
> > it's very unlikely that the remote will be able to make any sense of
> > it. Our commit backend at Google does support conflicts, so users can
> > check out each other's conflicted commits there (except that we
> > haven't even started dogfooding yet).
>
> I'm curious to hear what happens when you do start dogfooding, on
> projects with many developers and which are jj-only.  Do commits with
> conflicts accidentally end up in mainline branches, or are there good
> ways to make sure they don't hit anything considered stable?

That won't happen at Google because our source of truth for "merged
PRs" (in GitHub-speak) is in our existing VCS. We will necessarily
have to translate from jj's data model to its data model before a
commit can even be sent for review.

>
> > > > git status would probably
> > > > need some special warnings or notices, git checkout would probably
> > > > benefit from additional warnings/notices checks for those cases, git
> > > > log should probably display conflicted commits differently, we'd need
> > > > to add special handling for higher order conflicts (e.g. a merge with
> > > > conflicts is itself involved in a merge) probably similar to what jj
> > > > has done, and audit a lot of other code paths to see what would be
> > > > needed.
> > >
> > > As you point out there is a lot more to this than just being able to
> > > store the conflict data in a commit - in many ways I think that is the
> > > easiest part of the solution to sharing conflicts.
> >
> > Yes, I think it would be a very large project. Unlike jj, Git of
> > course has to worry about backwards compatibility. For example, you
> > would have to decide if your goal - even in the long term - is to make
> > `git rebase` etc. not get interrupted due to conflicts.
>
> ...and whether to copy jj's other feature in this area in some form:
> auto-rebasing any descendants when you checkout and amend an old
> commit (e.g. to resolve conflicts).  :-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-08 18:22         ` Martin von Zweigbergk
@ 2023-11-10 21:41           ` Elijah Newren
  2023-11-12  7:05             ` Martin von Zweigbergk
  0 siblings, 1 reply; 25+ messages in thread
From: Elijah Newren @ 2023-11-10 21:41 UTC (permalink / raw)
  To: Martin von Zweigbergk; +Cc: phillip.wood, Sandra Snan, git, Randall S. Becker

Hi Martin,

On Wed, Nov 8, 2023 at 10:23 AM Martin von Zweigbergk
<martinvonz@google.com> wrote:
> On Tue, Nov 7, 2023 at 11:31 PM Elijah Newren <newren@gmail.com> wrote:
> > On Tue, Nov 7, 2023 at 9:38 AM Martin von Zweigbergk
> > <martinvonz@google.com> wrote:
> > >
[...]
> > I am curious more about the data you do store.  My fuzzy memory is
> > that you store a commit header involving something of the form "A + B
> > - C", where those are all commit IDs.  Is that correct?
>
> We actually store it outside the Git repo (together with the "change
> id"). We have avoided using commit headers because I wasn't sure how
> well different tools deal with unexpected commit headers, and because
> I wanted commits to be indistinguishable from commits created by a
> regular Git binary. The latter argument doesn't apply to commits with
> conflicts since those are clearly not from a regular Git binary
> anyway, and we don't allow pushing them to a remote.
>
> >  Is this in
> > addition to a normal "tree" header as in Git, or are one of A or B
> > found in the tree header?
>
> It's in addition. For the tree, we actually write a tree object with
> three subtrees:
>
> .jjconflict-base-0: C
> .jjconflict-side-0: A
> .jjconflict-side-1: B
>
> The tree is not authoritative - we use the Git-external storage for
> that. The reason we write the trees is mostly to prevent them from
> getting GC'd.

Oh, that seems like a clever way to handle reachability and make sure
the relevant trees are automatically included in any pushes or pulls.

> Also, if a user does `git checkout <conflicted commit>`,
> they'll see those subdirectories and will hopefully be reminded that
> they did something odd (perhaps we should drop the leading `.` so `ls`
> will show them...). They can also diff the directories in a diff tool
> if they like.

Oh, so they don't get a regular top-level looking tree with
possibly-conflicted-files present?  Or is this in addition to the
regular repository contents?  If in addition, are you worried about
users ever creating real entries named ".jjconflict-base-<N>" in their
repository?

> >  I think you said there was also the
> > possibility for more than three terms.  Are those for when a
> > conflicted commit is merged with another branch that adds more
> > conflicts, or are there other cases too?  (Octopus merges?)
>
> Yes, they can happen in both of those cases you mention. More
> generally, whenever you apply a diff between two trees onto another
> tree, you might end up with a higher-arity conflict. So merging in
> another branch can do that, or doing an octopus merge (which is the
> same thing at the tree level, just different at the commit level), or
> rebasing or reverting a commit.
>
> We simplify conflicts algebraically, so rebasing a commit multiple
> times does not increase the arity - the intermediate parents were both
> added and removed and thus cancel out. These simple algorithms for
> simplifying conflicts are encapsulated in
> https://github.com/martinvonz/jj/blob/main/lib/src/merge.rs. Most of
> them are independent of the type of values being merged; they can be
> used for doing algebra on tree ids, content hunks, refs, etc. (in the
> test cases, we mostly merge integers because integer literals are
> compact).

It's done on content hunks as well?  That's interesting.

When exactly would it be done on refs, though?  I'm not following that one.

And what else is in that "etc."?

> > What about recursive merges, i.e. merges where the two sides do not
> > have a unique merge base.  What is the form of those?  (Would "- C" be
> > replaced by "- C1 - C2 - ... - Cn"?  Or would we create the virtual
> > merge base V and then do a " - V"?  Or do we only have "A + B"?)
>
> We do that by recursively creating a virtual tree just like Git does,
> I think (https://github.com/martinvonz/jj/blob/084b99e1e2c42c40f2d52038cdc97687b76fed89/lib/src/rewrite.rs#L56-L71).
> I think the main difference is that by modeling conflicts, we can
> avoid recursive conflict markers (if that's what Git does), and we can
> even automatically resolve some cases where the virtual tree has a
> conflict.

Okay, but that talks about the mechanics of creating a recursive
merge, omitting all the details about how the conflict header is
written when you record the merge.  Is the virtual merge base
represented in the algebraic "A + B - C" expressions, or is the "- C"
part omitted?  If it is represented, and the virtual merge base had
conflicts which you could not automatically resolve, what exactly does
the conflicted header for the outer merge get populated with?

[...]

> Great questions! We don't have support for renames, so we haven't had
> to worry about these things. We have talked a little about divergent
> renames and the need for recording that in the commit so we can tell
> the user about it and maybe ask them which name they want to keep. I
> had not considered the interaction with partial conflict resolution,
> so thanks for bringing that up. I don't have any answers now, but
> we'll probably need to start thinking about this soon.

I was wondering if that might be the answer.  When you do tackle this,
I'd be interested to hear your thoughts.  I'm wondering if we just
need to augment the data in the conflict header to handle such cases
(though I guess this could risk having commit objects that are
significantly bigger than normal in theoretical cases where many such
paths are involved?)

> > I'm curious to hear what happens when you do start dogfooding, on
> > projects with many developers and which are jj-only.  Do commits with
> > conflicts accidentally end up in mainline branches, or are there good
> > ways to make sure they don't hit anything considered stable?
>
> That won't happen at Google because our source of truth for "merged
> PRs" (in GitHub-speak) is in our existing VCS. We will necessarily
> have to translate from jj's data model to its data model before a
> commit can even be sent for review.

That makes sense, but I was just hoping we'd have an example to look
to for how to keep things safe if we were to implement this.  Sadly, I
don't think we have the benefit of relying on folks to first push
their commits into some other VCS which lacks this feature.  ;-)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-10 21:41           ` Elijah Newren
@ 2023-11-12  7:05             ` Martin von Zweigbergk
  0 siblings, 0 replies; 25+ messages in thread
From: Martin von Zweigbergk @ 2023-11-12  7:05 UTC (permalink / raw)
  To: Elijah Newren; +Cc: phillip.wood, Sandra Snan, git, Randall S. Becker

On Fri, Nov 10, 2023 at 1:41 PM Elijah Newren <newren@gmail.com> wrote:
>
> Hi Martin,
>
> On Wed, Nov 8, 2023 at 10:23 AM Martin von Zweigbergk
> <martinvonz@google.com> wrote:
> > On Tue, Nov 7, 2023 at 11:31 PM Elijah Newren <newren@gmail.com> wrote:
> > > On Tue, Nov 7, 2023 at 9:38 AM Martin von Zweigbergk
> > > <martinvonz@google.com> wrote:
> > > >
> [...]
> > > I am curious more about the data you do store.  My fuzzy memory is
> > > that you store a commit header involving something of the form "A + B
> > > - C", where those are all commit IDs.  Is that correct?
> >
> > We actually store it outside the Git repo (together with the "change
> > id"). We have avoided using commit headers because I wasn't sure how
> > well different tools deal with unexpected commit headers, and because
> > I wanted commits to be indistinguishable from commits created by a
> > regular Git binary. The latter argument doesn't apply to commits with
> > conflicts since those are clearly not from a regular Git binary
> > anyway, and we don't allow pushing them to a remote.
> >
> > >  Is this in
> > > addition to a normal "tree" header as in Git, or are one of A or B
> > > found in the tree header?
> >
> > It's in addition. For the tree, we actually write a tree object with
> > three subtrees:
> >
> > .jjconflict-base-0: C
> > .jjconflict-side-0: A
> > .jjconflict-side-1: B
> >
> > The tree is not authoritative - we use the Git-external storage for
> > that. The reason we write the trees is mostly to prevent them from
> > getting GC'd.
>
> Oh, that seems like a clever way to handle reachability and make sure
> the relevant trees are automatically included in any pushes or pulls.
>
> > Also, if a user does `git checkout <conflicted commit>`,
> > they'll see those subdirectories and will hopefully be reminded that
> > they did something odd (perhaps we should drop the leading `.` so `ls`
> > will show them...). They can also diff the directories in a diff tool
> > if they like.
>
> Oh, so they don't get a regular top-level looking tree with
> possibly-conflicted-files present? Or is this in addition to the
> regular repository contents?

They get a regular tree with conflict markers if they use `jj
checkout`, but not if they use `git checkout`.

> If in addition, are you worried about
> users ever creating real entries named ".jjconflict-base-<N>" in their
> repository?

I'm not worried about that since it's not the source of truth, so at
most they waste some time.

By the way, if the user did use `git checkout` and got those
`.jjconflict-*` directories in the working copy, and then ran a `jj`
command afterwards, then jj would think that the conflict was resolved
by replacing the conflicted paths (and all other paths!) by those
`.jjconflict-*` directories :) The user would probably realize their
mistake pretty quickly and run `jj abandon` to discard those changes.

>
> > >  I think you said there was also the
> > > possibility for more than three terms.  Are those for when a
> > > conflicted commit is merged with another branch that adds more
> > > conflicts, or are there other cases too?  (Octopus merges?)
> >
> > Yes, they can happen in both of those cases you mention. More
> > generally, whenever you apply a diff between two trees onto another
> > tree, you might end up with a higher-arity conflict. So merging in
> > another branch can do that, or doing an octopus merge (which is the
> > same thing at the tree level, just different at the commit level), or
> > rebasing or reverting a commit.
> >
> > We simplify conflicts algebraically, so rebasing a commit multiple
> > times does not increase the arity - the intermediate parents were both
> > added and removed and thus cancel out. These simple algorithms for
> > simplifying conflicts are encapsulated in
> > https://github.com/martinvonz/jj/blob/main/lib/src/merge.rs. Most of
> > them are independent of the type of values being merged; they can be
> > used for doing algebra on tree ids, content hunks, refs, etc. (in the
> > test cases, we mostly merge integers because integer literals are
> > compact).
>
> It's done on content hunks as well?  That's interesting.

Yes, when merging trees, we start at the root tree and try to resolve
conflicts at the tree entry level (i.e. without reading file
contents). I think git does the same. If that's not enough we need to
recurse into subtrees or file contents. When merging files, we find
matching regions of the inputs and use the same algorithm on the
individual chunks between the matching regions.

>
> When exactly would it be done on refs, though?  I'm not following that one.

First of all, note that jj allows refs to be in a conflicted state
similar to how trees can be in a conflicted state. We merge refs for a
few different reasons. If you run two concurrent operations on a repo,
we merge any changes to the refs. We do the same thing when you fetch
branches from a remote. For example, if you've fetched branch "main"
from a remote, then moved it locally, and then you fetch again from
the remote, we'll attempt to merge those refs. We use the same
function for merging there, but if it fails, we then also
automatically resolve two operations moving the branch forward
different amounts (e.g. one operation moves a ref from X~10 to X~5
while the other moves it forward to X, we resolve to X).
https://github.com/martinvonz/jj/blob/main/docs/technical/concurrency.md
talks a bit more about that.

>
> And what else is in that "etc."?

I think it's only individual file ids (blob ids) and the executable
bit. If a file's content changed and its executable bit changed, we
use the same algorithm for each of those pieces of information.

>
> > > What about recursive merges, i.e. merges where the two sides do not
> > > have a unique merge base.  What is the form of those?  (Would "- C" be
> > > replaced by "- C1 - C2 - ... - Cn"?  Or would we create the virtual
> > > merge base V and then do a " - V"?  Or do we only have "A + B"?)
> >
> > We do that by recursively creating a virtual tree just like Git does,
> > I think (https://github.com/martinvonz/jj/blob/084b99e1e2c42c40f2d52038cdc97687b76fed89/lib/src/rewrite.rs#L56-L71).
> > I think the main difference is that by modeling conflicts, we can
> > avoid recursive conflict markers (if that's what Git does), and we can
> > even automatically resolve some cases where the virtual tree has a
> > conflict.
>
> Okay, but that talks about the mechanics of creating a recursive
> merge, omitting all the details about how the conflict header is
> written when you record the merge.  Is the virtual merge base
> represented in the algebraic "A + B - C" expressions, or is the "- C"
> part omitted?  If it is represented, and the virtual merge base had
> conflicts which you could not automatically resolve, what exactly does
> the conflicted header for the outer merge get populated with?

I think we're talking about the state in F below, right?

  F
/ \
/ \
D E
|\ /|
| X |
|/ \|
B C
\ /
\ /
A

The virtual commit/tree, which we can think of as sitting where the X
is in the graph, would have state V=B+C-A. The state at F would have
D+E-V=D+E-(B+C-A)=D+(E-C)+(A-B). This is encoded in `Merge::flatten()`
here:  https://github.com/martinvonz/jj/blob/e3a1e5b80ed9124091baa4d920cc9e8124c1f559/lib/src/merge.rs#L421-L451.
It's not specific to recursive merge; we run into the same kind of
higher-arity conflicts on regular octopus merges or repeated merges
(if you don't resolve conflicts in between).

Oh, I should also say that we don't store the unmodified trees in
these expressions. Instead, for anything we can automatically resolve,
we replace those parts of the trees. So even if A, B, and C differ at
paths X, Y, and Z, the trees we associate with V might only differ at
path Y if that's the only path we couldn't resolve. IIRC, I did it
that way because it seemed wasteful to re-attempt the merge at paths X
and Z every time we rewrite the commit. I *think* it rarely matters in
practice, but it feels like it could in some cases (maybe where two
sides make the same changes).

>
> [...]
>
> > Great questions! We don't have support for renames, so we haven't had
> > to worry about these things. We have talked a little about divergent
> > renames and the need for recording that in the commit so we can tell
> > the user about it and maybe ask them which name they want to keep. I
> > had not considered the interaction with partial conflict resolution,
> > so thanks for bringing that up. I don't have any answers now, but
> > we'll probably need to start thinking about this soon.
>
> I was wondering if that might be the answer.  When you do tackle this,
> I'd be interested to hear your thoughts.  I'm wondering if we just
> need to augment the data in the conflict header to handle such cases
> (though I guess this could risk having commit objects that are
> significantly bigger than normal in theoretical cases where many such
> paths are involved?)

Yes, that's what I've been thinking, but I think the only thing I had
been thinking of storing was for "divergent renames" (A->B on one
side, A->C on the other). Will let you know when we start thinking
about this for real. Thanks again for your input!

>
> > > I'm curious to hear what happens when you do start dogfooding, on
> > > projects with many developers and which are jj-only.  Do commits with
> > > conflicts accidentally end up in mainline branches, or are there good
> > > ways to make sure they don't hit anything considered stable?
> >
> > That won't happen at Google because our source of truth for "merged
> > PRs" (in GitHub-speak) is in our existing VCS. We will necessarily
> > have to translate from jj's data model to its data model before a
> > commit can even be sent for review.
>
> That makes sense, but I was just hoping we'd have an example to look
> to for how to keep things safe if we were to implement this.  Sadly, I
> don't think we have the benefit of relying on folks to first push
> their commits into some other VCS which lacks this feature.  ;-)

It might be best to disallow pushing conflicts to start with. It
should also be easy to add a hook on the server to disallow it only to
certain branches.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-07 17:38     ` Martin von Zweigbergk
  2023-11-08  7:31       ` Elijah Newren
@ 2023-11-09 14:50       ` phillip.wood123
  1 sibling, 0 replies; 25+ messages in thread
From: phillip.wood123 @ 2023-11-09 14:50 UTC (permalink / raw)
  To: Martin von Zweigbergk, phillip.wood
  Cc: Elijah Newren, Sandra Snan, git, Randall S. Becker

Hi Martin

On 07/11/2023 17:38, Martin von Zweigbergk wrote:
> (new attempt in plain text)

Oh, the joys of the mailing list! Thanks for your comments below and in 
your reply to Elijah, I found them really helpful to get a better 
understanding of how 'jj' handles this.

Best Wishes

Phillip

> On Tue, Nov 7, 2023 at 3:49 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>>
>> Hi Elijah
>>
>> [I've cc'd Martin to see if he has anything to add about how "jj"
>> manages the issues around storing conflicts.]
>>
>> On 07/11/2023 08:16, Elijah Newren wrote:
>>> On Mon, Nov 6, 2023 at 1:26 PM Sandra Snan
>>> <sandra.snan@idiomdrottning.org> wrote:
>>>>
>>>> Is this feature from jj also a good idea for git?
>>>> https://martinvonz.github.io/jj/v0.11.0/conflicts/
>>>
>>> Martin talked about this and other features at Git Merge 2022, a
>>> little over a year ago.  I talked to him in more depth about these
>>> while there.  I personally think he has some really interesting
>>> features here, though at the time, I thought that the additional
>>> object type might be too much to ask for in a Git change, and it was
>>> an intrinsic part of the implementation back then.
>>>
>>> Martin also gave us an update at the 2023 Git Contributors summit, and
>>> in particular noted a significant implementation change to not have
>>> per-file storage of conflicts, but rather storing at the commit level
>>> the multiple conflicting trees involved.  That model might be
>>> something we could implement in Git.  And if we did, it'd solve
>>> various issues such as people wanting to be able to stash conflicts,
>>> or wanting to be able to partially resolve conflicts and fix it up
>>> later, or be able to collaboratively resolve conflicts without having
>>> everyone have access to the same checkout.
>>
>> One thing to think about if we ever want to implement this is what other
>> data we need to store along with the conflict trees to preserve the
>> context in which the conflict was created. For example the files that
>> are read by "git commit" when it commits a conflict resolution. For a
>> single cherry-pick/revert it would probably be fairly straight forward
>> to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets
>> transferred along with the conflicts. For a sequence of cherry-picks or
>> a rebase it is more complicated to preserve the context of the conflict.
>> Even "git merge" can create several files in addition to MERGE_HEAD
>> which are read when the conflict resolution is committed.
> 
> Good point. We actually don't store any extra data in jj. The old
> per-path conflict model was prepared for having some label associated
> with each term of the conflict but we never actually used it.
> 
> If we add such metadata, it would probably have to be something that
> makes sense even after pushing the conflict to another repo, so it
> probably shouldn't be commit ids, unless we made sure to also push
> those commits. Also note that if you `jj restore --from <commit with
> conflict>`, you can get a conflict into a commit that didn't have
> conflicts previously. Or if you already had conflicts in the
> destination commit, your root trees (the multiple root trees
> constituting the conflict) will now have conflicts that potentially
> were created by two completely unrelated operations, so you would kind
> of need different labels for different paths.
> 
> https://github.com/martinvonz/jj/issues/1176 has some more discussion
> about this.
> 
>>> But we'd also have to be careful and think through usecases, including
>>> in the surrounding community.  People would probably want to ensure
>>> that e.g. "Protected" or "Integration" branches don't get accept
>>> fetches or pushes of conflicted commits,
>>
>> I think this is a really important point, while it can be useful to
>> share conflicts so they can be collaboratively resolved we don't want to
>> propagate them into "stable" or production branches. I wonder how 'jj'
>> handles this.
> 
> Agreed. `jj git push` refuses to push commits with conflicts, because
> it's very unlikely that the remote will be able to make any sense of
> it. Our commit backend at Google does support conflicts, so users can
> check out each other's conflicted commits there (except that we
> haven't even started dogfooding yet).
> 
>>> git status would probably
>>> need some special warnings or notices, git checkout would probably
>>> benefit from additional warnings/notices checks for those cases, git
>>> log should probably display conflicted commits differently, we'd need
>>> to add special handling for higher order conflicts (e.g. a merge with
>>> conflicts is itself involved in a merge) probably similar to what jj
>>> has done, and audit a lot of other code paths to see what would be
>>> needed.
>>
>> As you point out there is a lot more to this than just being able to
>> store the conflict data in a commit - in many ways I think that is the
>> easiest part of the solution to sharing conflicts.
> 
> Yes, I think it would be a very large project. Unlike jj, Git of
> course has to worry about backwards compatibility. For example, you
> would have to decide if your goal - even in the long term - is to make
> `git rebase` etc. not get interrupted due to conflicts.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-07 11:49   ` Phillip Wood
  2023-11-07 17:38     ` Martin von Zweigbergk
@ 2023-11-08  6:31     ` Elijah Newren
  2023-11-09 14:45       ` Phillip Wood
  1 sibling, 1 reply; 25+ messages in thread
From: Elijah Newren @ 2023-11-08  6:31 UTC (permalink / raw)
  To: phillip.wood; +Cc: Sandra Snan, git, Martin von Zweigbergk, Randall S. Becker

Hi Phillip,

On Tue, Nov 7, 2023 at 3:49 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
> Hi Elijah
>
> [I've cc'd Martin to see if he has anything to add about how "jj"
> manages the issues around storing conflicts.]

+1.  I'll add some other questions for him too while we're at it,
separately in this thread.

[...]

> > Martin also gave us an update at the 2023 Git Contributors summit, and
> > in particular noted a significant implementation change to not have
> > per-file storage of conflicts, but rather storing at the commit level
> > the multiple conflicting trees involved.  That model might be
> > something we could implement in Git.  And if we did, it'd solve
> > various issues such as people wanting to be able to stash conflicts,
> > or wanting to be able to partially resolve conflicts and fix it up
> > later, or be able to collaboratively resolve conflicts without having
> > everyone have access to the same checkout.
>
> One thing to think about if we ever want to implement this is what other
> data we need to store along with the conflict trees to preserve the
> context in which the conflict was created. For example the files that
> are read by "git commit" when it commits a conflict resolution. For a
> single cherry-pick/revert it would probably be fairly straight forward
> to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets
> transferred along with the conflicts.

This is a great thing to think about and bring up.  However, I'm not
sure what part of it actually needs to be preserved; in fact, it's not
clear to me that any of it needs preserving -- especially not the
files read by "git commit".  A commit was already created, after all.

It seems that CHERRY_PICK_HEAD/REVERT_HEAD files exist primarily to
clue in that we are in-the-middle-of-<op>, and the conflict header
(the "tree A + tree B - tree C" thing; whatever that's called)
similarly provides signal that the commit still has conflicts.
Secondarily, these files contain information about the tree we came
from and its parent tree, which allows users to investigate the diff
between those...but that information is also available from the
conflict header in the recorded commit.  The CHERRY_PICK_HEAD and
REVERT_HEAD files could also be used to access the commit message, but
that would have been stored in the conflicted commit as well.  Are
there any other pieces of information I'm missing?

> For a sequence of cherry-picks or
> a rebase it is more complicated to preserve the context of the conflict.

I think the big piece here is whether we also want to adopt jj's
behavior of automatically rebasing all descendant commits when
checking out and amending some historical commit (or at least having
the option of doing so).  That behavior allows users to amend commits
to resolve conflicts without figuring out complicated interactive
rebases to fix all the descendant commits across all relevant
branches.  Without that feature, I agree this might be a bit more
difficult, but with that feature, I'm having a hard time figuring out
what context we actually need to preserve for a sequence of
cherry-picks or a rebase.

Digging into a few briefly...

Many of the state files are about the status of the in-progress
operation (todo-list, numbers of commits done and to do, what should
be done with not-yet-handled commits, temporary refs corresponding to
temporary labels that need to be deleted, rescheduling failed execs,
dropping or keeping redundant commits, etc.), but if the operation has
completed and new commits created (potentially with multiple files
with conflict headers), I don't see how this information is useful
anymore.

There are some special state files related to half-completed
operations (e.g. squash commits when we haven't yet reached the final
one in the sequence, a file to note that we want to edit a commit
message once the user has finished resolving conflicts, whether we
need to create a new root commit), but again, the operation has
completed and commits have been created with appropriate parentage and
commit messages so I don't think these are useful anymore either.

Other state files are related to things needing to be done at the end
of the operation, like invoke the post-rewrite hook or pop the
autostash (with knowledge of what was rewritten to what).  But the
operation would have been completed and those things done already, so
I don't see how this is necessary either.

Some state files are for controlling how commits are created (setting
committer date to author date, gpg signing options, whether to add
signoff), but, again, commits have already been created, and can be
further amended as the user wants (hopefully including resolving the
conflicts).

The biggest issue is perhaps that REBASE_HEAD is used in the
implementation of `git rebase --show-current-patch`, but all
information stored in that is still accessible -- the commit message
is stored in the commit, the author time is stored in the commit, and
the trees involved are in the conflict header.  The only thing missing
is committer timestamp, which isn't relevant anyway.

The only ones I'm pausing a bit on are the strategy and
strategy-options.  Those might be useful somehow...but I can't
currently quite put my finger on explaining how they would be useful
and I'm not sure they are.

Am I missing anything?

> Even "git merge" can create several files in addition to MERGE_HEAD
> which are read when the conflict resolution is committed.

That's a good one to bring up too, but I'm not sure I understand how
these could be useful to preserve either.  Am I missing something?  My
breakdown:
   * MERGE_HEAD: was recorded in the commit as a second parent, so we
already have that info
   * MERGE_MSG: was recorded in the commit as the commit message, so
again we already have that info
   * MERGE_AUTOSTASH: irrelevant since the stashed stuff isn't part of
the commit and was in fact unstashed after the
merge-commit-with-conflicts was created
   * MERGE_MODE: irrelevant since it's only used for reducing heads at
time of git-commit, and git-commit has already been run
   * MERGE_RR: I think this is irrelevant; the conflict record (tree A
+ tree B - tree C) lets us redo the merge if needed to get the list of
conflicted files and textual conflicts found therein

So I don't see how any of the information in these files need to be
recorded as additional auxiliary information.  However, that last item
might depend upon the strategy and strategy-options, which currently
is not recorded...hmm....

> > But we'd also have to be careful and think through usecases, including
> > in the surrounding community.  People would probably want to ensure
> > that e.g. "Protected" or "Integration" branches don't get accept
> > fetches or pushes of conflicted commits,
>
> I think this is a really important point, while it can be useful to
> share conflicts so they can be collaboratively resolved we don't want to
> propagate them into "stable" or production branches. I wonder how 'jj'
> handles this.

Yeah, figuring this out might be the biggest sticking point.

> > git status would probably
> > need some special warnings or notices, git checkout would probably
> > benefit from additional warnings/notices checks for those cases, git
> > log should probably display conflicted commits differently, we'd need
> > to add special handling for higher order conflicts (e.g. a merge with
> > conflicts is itself involved in a merge) probably similar to what jj
> > has done, and audit a lot of other code paths to see what would be
> > needed.
>
> As you point out there is a lot more to this than just being able to
> store the conflict data in a commit - in many ways I think that is the
> easiest part of the solution to sharing conflicts.

Yeah, another one I just thought of is that the trees referenced in
the conflicts would also need to affect reachability computations as
well, to make sure they both don't get gc'ed and that they are
transferred when appropriate.  There are lots of things that would be
involved in implementing this idea.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-08  6:31     ` Elijah Newren
@ 2023-11-09 14:45       ` Phillip Wood
  2023-11-10 22:57         ` Elijah Newren
  0 siblings, 1 reply; 25+ messages in thread
From: Phillip Wood @ 2023-11-09 14:45 UTC (permalink / raw)
  To: Elijah Newren, phillip.wood
  Cc: Sandra Snan, git, Martin von Zweigbergk, Randall S. Becker

Hi Elijah

On 08/11/2023 06:31, Elijah Newren wrote:
> Hi Phillip,
> 
> On Tue, Nov 7, 2023 at 3:49 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>>
>> Hi Elijah
>>
>> [I've cc'd Martin to see if he has anything to add about how "jj"
>> manages the issues around storing conflicts.]
> 
> +1.  I'll add some other questions for him too while we're at it,
> separately in this thread.
> 
> [...]
> 
>>> Martin also gave us an update at the 2023 Git Contributors summit, and
>>> in particular noted a significant implementation change to not have
>>> per-file storage of conflicts, but rather storing at the commit level
>>> the multiple conflicting trees involved.  That model might be
>>> something we could implement in Git.  And if we did, it'd solve
>>> various issues such as people wanting to be able to stash conflicts,
>>> or wanting to be able to partially resolve conflicts and fix it up
>>> later, or be able to collaboratively resolve conflicts without having
>>> everyone have access to the same checkout.
>>
>> One thing to think about if we ever want to implement this is what other
>> data we need to store along with the conflict trees to preserve the
>> context in which the conflict was created. For example the files that
>> are read by "git commit" when it commits a conflict resolution. For a
>> single cherry-pick/revert it would probably be fairly straight forward
>> to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets
>> transferred along with the conflicts.
> 
> This is a great thing to think about and bring up.  However, I'm not
> sure what part of it actually needs to be preserved; in fact, it's not
> clear to me that any of it needs preserving -- especially not the
> files read by "git commit".  A commit was already created, after all.
> 
> It seems that CHERRY_PICK_HEAD/REVERT_HEAD files exist primarily to
> clue in that we are in-the-middle-of-<op>, and the conflict header
> (the "tree A + tree B - tree C" thing; whatever that's called)
> similarly provides signal that the commit still has conflicts.
> Secondarily, these files contain information about the tree we came
> from and its parent tree, which allows users to investigate the diff
> between those...but that information is also available from the
> conflict header in the recorded commit.  The CHERRY_PICK_HEAD and
> REVERT_HEAD files could also be used to access the commit message, but
> that would have been stored in the conflicted commit as well.  Are
> there any other pieces of information I'm missing?

Mainly that I'm an idiot and forgot we were actually creating a commit 
and can store the message and authorship there! More seriously I think 
being able to inspect the commit being cherry-picked (including the 
original commit message) is useful so we'd need to recreate something 
like CHERRY_PICK_HEAD when the conflict commit is checked out. 
Recreating CHERRY_PICK_HEAD is useful for "git status" as well. I think 
that means storing a little more that just the "tree A + tree B - tree 
C" thing.

>> For a sequence of cherry-picks or
>> a rebase it is more complicated to preserve the context of the conflict.
> 
> I think the big piece here is whether we also want to adopt jj's
> behavior of automatically rebasing all descendant commits when
> checking out and amending some historical commit (or at least having
> the option of doing so).  That behavior allows users to amend commits
> to resolve conflicts without figuring out complicated interactive
> rebases to fix all the descendant commits across all relevant
> branches.

That's a potentially attractive option which is fairly simple to 
implement locally as I think you can use the commit DAG to find all the 
descendants though that could be expensive if there are lots of 
branches. However, if we're going to share conflicts I think we'd need 
something like "hg evolve" - if I push a commit with conflicts and you 
base some work on it and then I resolve the conflict and push again you 
would want to your work to be rebased onto my conflict resolution. To 
handle "rebase --exec" we could store the exec command and run it when 
the  conflicts are resolved.

Also I wonder how annoying it would be in cases where I just want to 
rebase and resolve the conflicts now. At the moment "git rebase" stops 
at the conflict, with this feature I'd have to go and checkout the 
conflicted commit and fix the conflicts after the rebase had finished.

> Without that feature, I agree this might be a bit more
> difficult,

Yes, when I wrote my original message I was imagining that we'd stop at 
the first conflicting pick and store all the rebase state like some kind 
of stash on steroids so it could be continued when the conflict was 
resolved. It would be much simpler to try and avoid that.

> but with that feature, I'm having a hard time figuring out
> what context we actually need to preserve for a sequence of
> cherry-picks or a rebase.
>  
> Digging into a few briefly...
> 
> Many of the state files are about the status of the in-progress
> operation (todo-list, numbers of commits done and to do, what should
> be done with not-yet-handled commits, temporary refs corresponding to
> temporary labels that need to be deleted, rescheduling failed execs,
> dropping or keeping redundant commits, etc.), but if the operation has
> completed and new commits created (potentially with multiple files
> with conflict headers), I don't see how this information is useful
> anymore.

Agreed

> There are some special state files related to half-completed
> operations (e.g. squash commits when we haven't yet reached the final
> one in the sequence, a file to note that we want to edit a commit
> message once the user has finished resolving conflicts, whether we
> need to create a new root commit), but again, the operation has
> completed and commits have been created with appropriate parentage and
> commit messages so I don't think these are useful anymore either.

Yes, though we may want to remember which commits were squashed together 
so the user can inspect that when resolving conflicts.

> Other state files are related to things needing to be done at the end
> of the operation, like invoke the post-rewrite hook or pop the
> autostash (with knowledge of what was rewritten to what).  But the
> operation would have been completed and those things done already, so
> I don't see how this is necessary either.

Agreed

> Some state files are for controlling how commits are created (setting
> committer date to author date, gpg signing options, whether to add
> signoff), but, again, commits have already been created, and can be
> further amended as the user wants (hopefully including resolving the
> conflicts).

Agreed

> The biggest issue is perhaps that REBASE_HEAD is used in the
> implementation of `git rebase --show-current-patch`, but all
> information stored in that is still accessible -- the commit message
> is stored in the commit, the author time is stored in the commit, and
> the trees involved are in the conflict header.  The only thing missing
> is committer timestamp, which isn't relevant anyway.

The commit message may have been edited so we lose the original message 
but I'm not sure how important that is.

> The only ones I'm pausing a bit on are the strategy and
> strategy-options.  Those might be useful somehow...but I can't
> currently quite put my finger on explaining how they would be useful
> and I'm not sure they are.

I can't think of an immediate use for them. When we re-create conflicts 
we do it per-file based on the index entries created by the original 
merge so I don't think we need to know anything about the strategy or 
strategy-options.

> Am I missing anything?

exec commands? If the user runs "git rebase --exec" and there are 
conflicts then we'd need to defer running the exec commands until the 
conflicts are resolved. For something like "git rebase --exec 'make 
test'" that should be fine. I wonder if there are corner cases where the 
exec command changes HEAD though.

>> Even "git merge" can create several files in addition to MERGE_HEAD
>> which are read when the conflict resolution is committed.
> 
> That's a good one to bring up too, but I'm not sure I understand how
> these could be useful to preserve either.  Am I missing something?  My
> breakdown:
>     * MERGE_HEAD: was recorded in the commit as a second parent, so we
> already have that info
>     * MERGE_MSG: was recorded in the commit as the commit message, so
> again we already have that info
>     * MERGE_AUTOSTASH: irrelevant since the stashed stuff isn't part of
> the commit and was in fact unstashed after the
> merge-commit-with-conflicts was created
>     * MERGE_MODE: irrelevant since it's only used for reducing heads at
> time of git-commit, and git-commit has already been run
>     * MERGE_RR: I think this is irrelevant; the conflict record (tree A
> + tree B - tree C) lets us redo the merge if needed to get the list of
> conflicted files and textual conflicts found therein
> 
> So I don't see how any of the information in these files need to be
> recorded as additional auxiliary information.  However, that last item
> might depend upon the strategy and strategy-options, which currently
> is not recorded...hmm....

Yes, as we're creating some kind of commit we don't need to preserve 
those files separately.

>>> But we'd also have to be careful and think through usecases, including
>>> in the surrounding community.  People would probably want to ensure
>>> that e.g. "Protected" or "Integration" branches don't get accept
>>> fetches or pushes of conflicted commits,
>>
>> I think this is a really important point, while it can be useful to
>> share conflicts so they can be collaboratively resolved we don't want to
>> propagate them into "stable" or production branches. I wonder how 'jj'
>> handles this.
> 
> Yeah, figuring this out might be the biggest sticking point.

Indeed

>>> git status would probably
>>> need some special warnings or notices, git checkout would probably
>>> benefit from additional warnings/notices checks for those cases, git
>>> log should probably display conflicted commits differently, we'd need
>>> to add special handling for higher order conflicts (e.g. a merge with
>>> conflicts is itself involved in a merge) probably similar to what jj
>>> has done, and audit a lot of other code paths to see what would be
>>> needed.
>>
>> As you point out there is a lot more to this than just being able to
>> store the conflict data in a commit - in many ways I think that is the
>> easiest part of the solution to sharing conflicts.
> 
> Yeah, another one I just thought of is that the trees referenced in
> the conflicts would also need to affect reachability computations as
> well, to make sure they both don't get gc'ed and that they are
> transferred when appropriate.  There are lots of things that would be
> involved in implementing this idea.

Yes, it would certainly be lots of work.

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: first-class conflicts?
  2023-11-09 14:45       ` Phillip Wood
@ 2023-11-10 22:57         ` Elijah Newren
  0 siblings, 0 replies; 25+ messages in thread
From: Elijah Newren @ 2023-11-10 22:57 UTC (permalink / raw)
  To: phillip.wood; +Cc: Sandra Snan, git, Martin von Zweigbergk, Randall S. Becker

Hi Phillip,

On Thu, Nov 9, 2023 at 6:45 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
>
[...]
> > This is a great thing to think about and bring up.  However, I'm not
> > sure what part of it actually needs to be preserved; in fact, it's not
> > clear to me that any of it needs preserving -- especially not the
> > files read by "git commit".  A commit was already created, after all.
> >
> > It seems that CHERRY_PICK_HEAD/REVERT_HEAD files exist primarily to
> > clue in that we are in-the-middle-of-<op>, and the conflict header
> > (the "tree A + tree B - tree C" thing; whatever that's called)
> > similarly provides signal that the commit still has conflicts.
> > Secondarily, these files contain information about the tree we came
> > from and its parent tree, which allows users to investigate the diff
> > between those...but that information is also available from the
> > conflict header in the recorded commit.  The CHERRY_PICK_HEAD and
> > REVERT_HEAD files could also be used to access the commit message, but
> > that would have been stored in the conflicted commit as well.  Are
> > there any other pieces of information I'm missing?
>
> Mainly that I'm an idiot and forgot we were actually creating a commit
> and can store the message and authorship there!

You're definitely not an idiot.  The whole problem space is new and
different, so it's easy to overlook or forget certain details, and
even to make completely different assumptions than others and have no
one aware that we're operating with similar sounding but entirely
different mental models.

> More seriously I think
> being able to inspect the commit being cherry-picked (including the
> original commit message) is useful so we'd need to recreate something
> like CHERRY_PICK_HEAD when the conflict commit is checked out.

So, I see a few issues with this:

1) Even if we were to create CHERRY_PICK_HEAD as you envision, that
doesn't necessarily guarantee the user can view the original commit
because they may not have it.  It may have been a local-only commit
that wasn't pushed or pulled to the person who is now investigating
it.

2a) You highlight the original commit message, but if someone doesn't
want to immediately resolve conflicts, why would they be modifying the
commit message?

2b) Even if users did want to modify the commit message without
resolving conflicts, how would they do so?  Rebasing has
interactivity, but cherry-picking doesn't.  And interactivity seems to
be something people probably wouldn't use together with storing
conflicts; the point of interactivity is to tweak things further and
fix them up, suggesting they'd want to be running in
address-conflicts-now mode.

> Recreating CHERRY_PICK_HEAD is useful for "git status" as well.

"git status" uses this file to determine if it should display
information about currently being in the middle of a cherry-pick
operation.  Putting such a file in place would thus be misleading,
because we aren't in a cherry-pick operation anymore; that has
completed already.  I would not expect the suggested commands printed
by git-status while it thinks we're in such a state (namely, "git
cherry-pick [--continue|--skip|--abort]") to work either.  So, I'd
argue it would be a bug to create such a file when checking out a
conflicted-commit.

Of course, we would want git-status to display information about the
current commit being conflicted, but I think that could be based on
the simple conflict header without additional info.

> I think
> that means storing a little more that just the "tree A + tree B - tree
> C" thing.

I'm totally willing to believe there will be cases where more info is
needed.  I'm suspecting that conflicts with certain kinds of renames,
or which were performed with certain types of strategies or strategy
options might be some examples.  But I'm not sure I'm understanding
why CHERRY_PICK_HEAD should be one of those cases.

> > I think the big piece here is whether we also want to adopt jj's
> > behavior of automatically rebasing all descendant commits when
> > checking out and amending some historical commit (or at least having
> > the option of doing so).  That behavior allows users to amend commits
> > to resolve conflicts without figuring out complicated interactive
> > rebases to fix all the descendant commits across all relevant
> > branches.
>
> That's a potentially attractive option which is fairly simple to
> implement locally as I think you can use the commit DAG to find all the
> descendants though that could be expensive if there are lots of
> branches. However, if we're going to share conflicts I think we'd need
> something like "hg evolve" - if I push a commit with conflicts and you
> base some work on it and then I resolve the conflict and push again you
> would want to your work to be rebased onto my conflict resolution.

Ooh, that's an interesting point.

> To handle "rebase --exec" we could store the exec command and run it when
> the  conflicts are resolved.

So, my assumption is that even if we add the ability to commit
conflicts and even if we default to auto-committing them during
cherry-picks or non-interactive rebases, there will still be people
who want to resolve conflicts as they are hit rather than
auto-committing them, and thus that stop-on-conflict should always be
an option.  In the world where a user has this choice, I think it'd be
rare for users to want to auto-commit conflicts with --exec.  I'd
suggest that --exec, and even --interactive, would default to stopping
on conflicts and waiting for the user to resolve even if
auto-commit-on-conflict is the default in other cases.

That leaves me wondering if there are any cases where users want to
auto-commit conflicts in.conjunction with --exec, which I'm already
struggling to come up with, _and_ that would further want the exec
commands to be preserved in the conflicted commits (and any descendant
commits?) for later usage.  Maybe there's a case for that, but I'm not
coming up with it right now.

Also, another way of looking at this is that my current mental model
is that the cherry-pick or rebase operation is completed once it has
handled each of the commits in its list; the operation does not extend
until all the conflicts in the commits it creates are resolved.  The
fact that rebases do not extend until conflicts are resolved is
important because you can later further rebase conflicted-commits (as
Martin alludes to in his emails); considering the old rebase(s) to
still be in progress while a new one starts might get excessively
complex to handle.  The reason all of this matters to --exec is that
--exec is part of the rebase operation; once the rebase operation is
done, the --exec stuff is also done.  (And thus, if you don't want
--exec to run on conflicted commits, then don't opt for
auto-committing conflicts.).

> Also I wonder how annoying it would be in cases where I just want to
> rebase and resolve the conflicts now. At the moment "git rebase" stops
> at the conflict, with this feature I'd have to go and checkout the
> conflicted commit and fix the conflicts after the rebase had finished.

I agree that would often be annoying.  Personally, I think that
auto-committing conflicts as a feature should at most be an option
(even if perhaps the default in some cases), not a new mandatory
worldview.  And I'm currently not convinced that even if it were
implemented it should be the default in any cases.

> > Without that feature, I agree this might be a bit more
> > difficult,
>
> Yes, when I wrote my original message I was imagining that we'd stop at
> the first conflicting pick and store all the rebase state like some kind
> of stash on steroids so it could be continued when the conflict was
> resolved. It would be much simpler to try and avoid that.

Yeah, this is an example of how completely different mental models we
can come up with when none of us (other than Martin) know much about
the problem space.  I suspect there's at least a few more examples
like this where we still have very different mental models, and
perhaps some gems to be found by mixing and matching them.

> > There are some special state files related to half-completed
> > operations (e.g. squash commits when we haven't yet reached the final
> > one in the sequence, a file to note that we want to edit a commit
> > message once the user has finished resolving conflicts, whether we
> > need to create a new root commit), but again, the operation has
> > completed and commits have been created with appropriate parentage and
> > commit messages so I don't think these are useful anymore either.
>
> Yes, though we may want to remember which commits were squashed together
> so the user can inspect that when resolving conflicts.

Ooh, that's interesting...though it does run into the problem of users
not having access to the original commits.

> > The biggest issue is perhaps that REBASE_HEAD is used in the
> > implementation of `git rebase --show-current-patch`, but all
> > information stored in that is still accessible -- the commit message
> > is stored in the commit, the author time is stored in the commit, and
> > the trees involved are in the conflict header.  The only thing missing
> > is committer timestamp, which isn't relevant anyway.
>
> The commit message may have been edited so we lose the original message
> but I'm not sure how important that is.

Is this a reversal from your comment earlier in your email about the
importance of the original commit message for CHERRY_PICK_HEAD?  :-)

> > The only ones I'm pausing a bit on are the strategy and
> > strategy-options.  Those might be useful somehow...but I can't
> > currently quite put my finger on explaining how they would be useful
> > and I'm not sure they are.
>
> I can't think of an immediate use for them. When we re-create conflicts
> we do it per-file based on the index entries created by the original
> merge so I don't think we need to know anything about the strategy or
> strategy-options.

But we don't have index entries.  We only have trees in this
conflicted commit, and when users check it out, they probably expect
conflicted index entries to be put into place.  Can we correctly
regenerate the right conflicted index entries from the original trees
without the strategy and strategy-options command line flags?  I
suspect there might be problems here, and user-defined merge
strategies could really throw a wrench in the works.  Hmm...

> > Am I missing anything?
>
> exec commands? If the user runs "git rebase --exec" and there are
> conflicts then we'd need to defer running the exec commands until the
> conflicts are resolved. For something like "git rebase --exec 'make
> test'" that should be fine. I wonder if there are corner cases where the
> exec command changes HEAD though.

We talked about exec commands above, as well as the assumption whether
auto-committing conflicts should be mandatory vs. an option, so I
won't repeat that here.  It was definitely a very interesting topic to
bring up though; thanks!

[...]

> Yes, it would certainly be lots of work.

...but even if none of us get time and prioritization to work on it, I
personally find it a really interesting topic to discuss and explore.
Thanks for joining in and bringing up many good points!

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-11-12 23:25 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-06 21:17 first-class conflicts? Sandra Snan
2023-11-06 22:01 ` Dragan Simic
2023-11-06 22:34   ` Sandra Snan
2023-11-06 22:34   ` rsbecker
2023-11-06 22:45     ` Sandra Snan
2023-11-07  0:50       ` Theodore Ts'o
2023-11-11  1:31         ` Junio C Hamano
2023-11-11  7:48           ` Sandra Snan
2023-11-12 15:21           ` Theodore Ts'o
2023-11-12 23:25             ` Junio C Hamano
2023-11-07 11:23       ` Phillip Wood
2023-11-07 11:24         ` Sandra Snan
2023-11-07  8:16 ` Elijah Newren
2023-11-07  8:21   ` Dragan Simic
2023-11-07  9:16   ` Sandra Snan
2023-11-07 11:49   ` Phillip Wood
2023-11-07 17:38     ` Martin von Zweigbergk
2023-11-08  7:31       ` Elijah Newren
2023-11-08 18:22         ` Martin von Zweigbergk
2023-11-10 21:41           ` Elijah Newren
2023-11-12  7:05             ` Martin von Zweigbergk
2023-11-09 14:50       ` phillip.wood123
2023-11-08  6:31     ` Elijah Newren
2023-11-09 14:45       ` Phillip Wood
2023-11-10 22:57         ` Elijah Newren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).