* first-class conflicts? @ 2023-11-06 21:17 Sandra Snan 2023-11-06 22:01 ` Dragan Simic 2023-11-07 8:16 ` Elijah Newren 0 siblings, 2 replies; 25+ messages in thread From: Sandra Snan @ 2023-11-06 21:17 UTC (permalink / raw) To: git Is this feature from jj also a good idea for git? https://martinvonz.github.io/jj/v0.11.0/conflicts/ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-06 21:17 first-class conflicts? Sandra Snan @ 2023-11-06 22:01 ` Dragan Simic 2023-11-06 22:34 ` Sandra Snan 2023-11-06 22:34 ` rsbecker 2023-11-07 8:16 ` Elijah Newren 1 sibling, 2 replies; 25+ messages in thread From: Dragan Simic @ 2023-11-06 22:01 UTC (permalink / raw) To: Sandra Snan; +Cc: git On 2023-11-06 22:17, Sandra Snan wrote: > Is this feature from jj also a good idea for git? > https://martinvonz.github.io/jj/v0.11.0/conflicts/ Hmm, that's quite interesting, but frankly it makes little sense to me. See, the source code in a repository should always be in a compileable or runnable state, in each and every commit, so going against that rule wouldn't make much sense. Just think about various CI/CD tools that also expect the same. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-06 22:01 ` Dragan Simic @ 2023-11-06 22:34 ` Sandra Snan 2023-11-06 22:34 ` rsbecker 1 sibling, 0 replies; 25+ messages in thread From: Sandra Snan @ 2023-11-06 22:34 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 169 bytes --] I've sometimes merged stuff in and almost not notice that I had a conflict in there and in those cases the code wasn't compilable even though I was using vanilla git. ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: first-class conflicts? 2023-11-06 22:01 ` Dragan Simic 2023-11-06 22:34 ` Sandra Snan @ 2023-11-06 22:34 ` rsbecker 2023-11-06 22:45 ` Sandra Snan 1 sibling, 1 reply; 25+ messages in thread From: rsbecker @ 2023-11-06 22:34 UTC (permalink / raw) To: 'Dragan Simic', 'Sandra Snan'; +Cc: git On November 6, 2023 5:01 PM, Dragan Simic wrote: >On 2023-11-06 22:17, Sandra Snan wrote: >> Is this feature from jj also a good idea for git? >> https://martinvonz.github.io/jj/v0.11.0/conflicts/ > >Hmm, that's quite interesting, but frankly it makes little sense to me. >See, the source code in a repository should always be in a compileable or runnable >state, in each and every commit, so going against that rule wouldn't make much >sense. Just think about various CI/CD tools that also expect the same. It seems to me, perhaps naively, that the longer a conflict persists in a repository, the greater the potential for chaotic results. There are, notably, at least two fundamental types of conflicts: 1. Content conflict, where a point in a file is modified in two (or n) branches being combined, is what git tries to ensure never happens. The longer such a conflict exists in a file, the greater the variance from a buildable or consistent state will persist and will likely be increasingly harder to resolve. 2. Semantic conflicts, where unrelated modification points cause incompatibilities are much harder to resolve and quantify - many are, in fact, undetectable from a computational standpoint (as in detecting general semantic conflicts is an uncomputable problem). The longer those persist, partly when they are missed by pull requests/code reviews, the more persistent a defect can become. 3. I am avoiding matters such as code optimization conflicts which are outside the scope of the proposal. In either case, storing conflicts in the integration branches of a repository is, in my view, a bad thing that eventually can make the repository unsustainable. I will concede that keeping conflicts around in non-integration branches may have intellectual value for recording research and development progress. This is just my opinion. Randall -- Brief whoami: NonStop&UNIX developer since approximately UNIX(421664400) NonStop(211288444200000000) -- In real life, I talk too much. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RE: first-class conflicts? 2023-11-06 22:34 ` rsbecker @ 2023-11-06 22:45 ` Sandra Snan 2023-11-07 0:50 ` Theodore Ts'o 2023-11-07 11:23 ` Phillip Wood 0 siblings, 2 replies; 25+ messages in thread From: Sandra Snan @ 2023-11-06 22:45 UTC (permalink / raw) To: git, Dragan Simic, rsbecker [-- Attachment #1: Type: text/plain, Size: 1141 bytes --] Randall, thank you for that. I did mean of the first type, pure content conflicts (just like the examples on that jj page). I just have sometimes wish git could be a little more aware of them beyond just storing them with ASCII art in the files themselves (and alerting / warning when they happen but I often can't properly see those warnings flash by so I end up having to search for the conflict markers manually). So if conflicts are a thing that *can* happen, it'd be better if vc could know about them which would make some of the rebases simpler as in jj. That doesn't mean we wanna adopt the jj workflow of deliberately checking in conflicts (not even locally), just be able to deal with them better if it does happen. I dunno… and I've really appreciated the naysayers so far, helps me sort out my thoughts in this. I personally really prefer the vanilla "explicit staging" workflow (with magit) over jj, got, gitless etc. I'm more scared of overcommitting by mistake than undercommitting. But this one feature seemed to me that it might be really good: just having the vc be aware of the conflicts it has created. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-06 22:45 ` Sandra Snan @ 2023-11-07 0:50 ` Theodore Ts'o 2023-11-11 1:31 ` Junio C Hamano 2023-11-07 11:23 ` Phillip Wood 1 sibling, 1 reply; 25+ messages in thread From: Theodore Ts'o @ 2023-11-07 0:50 UTC (permalink / raw) To: Sandra Snan; +Cc: git, Dragan Simic, rsbecker On Mon, Nov 06, 2023 at 10:45:03PM +0000, Sandra Snan wrote: > Randall, thank you for that. > > I just have sometimes wish git could be a little more aware of them beyond > just storing them with ASCII art in the files themselves (and alerting / > warning when they happen but I often can't properly see those warnings flash > by so I end up having to search for the conflict markers manually). So if > conflicts are a thing that *can* happen, it'd be better if vc could know > about them which would make some of the rebases simpler as in jj. That > doesn't mean we wanna adopt the jj workflow of deliberately checking in > conflicts (not even locally), just be able to deal with them better if it > does happen. Well, if you miss them, "git status" does show that there are conflicts: Unmerged paths: (use "git add <file>..." to mark resolution) both modified: version.h And if you attempt to commit the merge without resolving the conflicts, git won't let you: error: Committing is not possible because you have unmerged files. hint: Fix them up in the work tree, and then use 'git add/rm <file>' hint: as appropriate to mark resolution and make a commit. So it's hard to miss the indications of the content conflict, because if you try to commit without resolving them, it's not a warning, it's an outright error. Cheers, - Ted ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-07 0:50 ` Theodore Ts'o @ 2023-11-11 1:31 ` Junio C Hamano 2023-11-11 7:48 ` Sandra Snan 2023-11-12 15:21 ` Theodore Ts'o 0 siblings, 2 replies; 25+ messages in thread From: Junio C Hamano @ 2023-11-11 1:31 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Sandra Snan, git, Dragan Simic, rsbecker "Theodore Ts'o" <tytso@mit.edu> writes: > And if you attempt to commit the merge without resolving the > conflicts, git won't let you: > > error: Committing is not possible because you have unmerged files. > hint: Fix them up in the work tree, and then use 'git add/rm <file>' > hint: as appropriate to mark resolution and make a commit. > > So it's hard to miss the indications of the content conflict, because > if you try to commit without resolving them, it's not a warning, it's > an outright error. Correct but with a caveat: it is too easy for lazy folks to circumvent the safety by mistake with "commit -a". I wonder if it would help users to add a new configuration option for those who want to live safer that tells "commit -a" to leave unmerged paths alone and require the unmerged paths to be added explicitly (which may have to extend to cover things like "add -u" and "add ."). Perhaps not. I often find myself doing "git add -u" after resolving conflicts and re-reading the result, without an explicit pathspec. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-11 1:31 ` Junio C Hamano @ 2023-11-11 7:48 ` Sandra Snan 2023-11-12 15:21 ` Theodore Ts'o 1 sibling, 0 replies; 25+ messages in thread From: Sandra Snan @ 2023-11-11 7:48 UTC (permalink / raw) To: git Junio C Hamano <gitster@pobox.com> writes: > Correct but with a caveat: it is too easy for lazy folks to > circumvent the safety by mistake with "commit -a". Lazy and ignorant like myself because I didn't know -a was that dangerous. Thank you both! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-11 1:31 ` Junio C Hamano 2023-11-11 7:48 ` Sandra Snan @ 2023-11-12 15:21 ` Theodore Ts'o 2023-11-12 23:25 ` Junio C Hamano 1 sibling, 1 reply; 25+ messages in thread From: Theodore Ts'o @ 2023-11-12 15:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: Sandra Snan, git, Dragan Simic, rsbecker On Sat, Nov 11, 2023 at 10:31:54AM +0900, Junio C Hamano wrote: > Correct but with a caveat: it is too easy for lazy folks to > circumvent the safety by mistake with "commit -a". > > I wonder if it would help users to add a new configuration option > for those who want to live safer that tells "commit -a" to leave > unmerged paths alone and require the unmerged paths to be added > explicitly (which may have to extend to cover things like "add -u" > and "add ."). > > Perhaps not. I often find myself doing "git add -u" after resolving > conflicts and re-reading the result, without an explicit pathspec. Maybe the configuration option would also forbit "git add -u" from adding diffs with conflict markers unless --force is added? I dunno. I personally wouldn't use it myself, because I've always made a point of running "git diff", or "git status", and almost always, a command like "make -j16 && make -j16 check" (or an aliased equivalent) before commiting a merge. But that's because I'm a paranoid s.o.b. and in my long career, I've learned is that "you can't be paranoid enough", and "hope is not a strategy". :-) - Ted ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-12 15:21 ` Theodore Ts'o @ 2023-11-12 23:25 ` Junio C Hamano 0 siblings, 0 replies; 25+ messages in thread From: Junio C Hamano @ 2023-11-12 23:25 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Sandra Snan, git, Dragan Simic, rsbecker "Theodore Ts'o" <tytso@mit.edu> writes: > On Sat, Nov 11, 2023 at 10:31:54AM +0900, Junio C Hamano wrote: >> ... >> I wonder if it would help users to add a new configuration option >> for those who want to live safer that tells "commit -a" to leave >> unmerged paths alone and require the unmerged paths to be added >> explicitly (which may have to extend to cover things like "add -u" >> and "add ."). >> >> Perhaps not. I often find myself doing "git add -u" after resolving >> conflicts and re-reading the result, without an explicit pathspec. > > Maybe the configuration option would also forbit "git add -u" from > adding diffs with conflict markers unless --force is added? Historically we left it to pre-commit hooks, but I agree that protection at the time of "git add" may be more helpful. I also alluded to being careful about "git add" with an overly vague pathspec (like "." to add everything addable under the sun), but I do not think it is possible to define "overly vague" in a way that satisfies everybody (would "git add \*.h" be still overly vague when 5% of your header files have conflicts in the merge you are concluding?) and keep the new users safe. Unless the configuration forbids patterns and say "each and every individual path must be named to add and resolve conflicted paths", that is. Come to think of it, that may not be too bad. > I dunno. I personally wouldn't use it myself, because I've always > made a point of running "git diff", or "git status", and almost > always, a command like "make -j16 && make -j16 check" (or an aliased > equivalent) before commiting a merge. > > But that's because I'm a paranoid s.o.b. and in my long career, I've > learned is that "you can't be paranoid enough", and "hope is not a > strategy". :-) Being careful and paranoid is good ;-) I wouldn't use it myself, either, but the discussion started while trying to allay new users' worries about recording a half-resolved state by mistake, and in that context, I think it would have non-empty audiences. Thanks. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-06 22:45 ` Sandra Snan 2023-11-07 0:50 ` Theodore Ts'o @ 2023-11-07 11:23 ` Phillip Wood 2023-11-07 11:24 ` Sandra Snan 1 sibling, 1 reply; 25+ messages in thread From: Phillip Wood @ 2023-11-07 11:23 UTC (permalink / raw) To: Sandra Snan, git, Dragan Simic, rsbecker Hi Sandra On 06/11/2023 22:45, Sandra Snan wrote: > Randall, thank you for that. > > I did mean of the first type, pure content conflicts (just like the > examples on that jj page). > > I just have sometimes wish git could be a little more aware of them > beyond just storing them with ASCII art in the files themselves (and > alerting / warning when they happen but I often can't properly see those > warnings flash by so I end up having to search for the conflict markers > manually). So if conflicts are a thing that *can* happen, it'd be better > if vc could know about them which would make some of the rebases simpler > as in jj. That doesn't mean we wanna adopt the jj workflow of > deliberately checking in conflicts (not even locally), just be able to > deal with them better if it does happen. > > I dunno… and I've really appreciated the naysayers so far, helps me sort > out my thoughts in this. I personally really prefer the vanilla > "explicit staging" workflow (with magit) over jj, got, gitless etc. I'm > more scared of overcommitting by mistake than undercommitting. But this > one feature seemed to me that it might be really good: just having the > vc be aware of the conflicts it has created. If you run "git status" it will list the files that have conflicts as "unmerged". To prevent "git commit" from creating a commit that contains conflict markers you can use a pre-commit hook that runs "git diff --cached--check". The sample hook that is created by default does this, to activate it run mv .git/hooks/pre-commit.sample .git/hooks/pre-commit in the main worktree. You can also run "git config commit.verbose true" to make "git commit" show the diff of the changes that will be committed below the commit message when you're editing the message. Best Wishes Phillip ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: RE: first-class conflicts? 2023-11-07 11:23 ` Phillip Wood @ 2023-11-07 11:24 ` Sandra Snan 0 siblings, 0 replies; 25+ messages in thread From: Sandra Snan @ 2023-11-07 11:24 UTC (permalink / raw) To: git, Dragan Simic, rsbecker That is wonderful! Thank you so much, Phillip! 👍🏻 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-06 21:17 first-class conflicts? Sandra Snan 2023-11-06 22:01 ` Dragan Simic @ 2023-11-07 8:16 ` Elijah Newren 2023-11-07 8:21 ` Dragan Simic ` (2 more replies) 1 sibling, 3 replies; 25+ messages in thread From: Elijah Newren @ 2023-11-07 8:16 UTC (permalink / raw) To: Sandra Snan; +Cc: git On Mon, Nov 6, 2023 at 1:26 PM Sandra Snan <sandra.snan@idiomdrottning.org> wrote: > > Is this feature from jj also a good idea for git? > https://martinvonz.github.io/jj/v0.11.0/conflicts/ Martin talked about this and other features at Git Merge 2022, a little over a year ago. I talked to him in more depth about these while there. I personally think he has some really interesting features here, though at the time, I thought that the additional object type might be too much to ask for in a Git change, and it was an intrinsic part of the implementation back then. Martin also gave us an update at the 2023 Git Contributors summit, and in particular noted a significant implementation change to not have per-file storage of conflicts, but rather storing at the commit level the multiple conflicting trees involved. That model might be something we could implement in Git. And if we did, it'd solve various issues such as people wanting to be able to stash conflicts, or wanting to be able to partially resolve conflicts and fix it up later, or be able to collaboratively resolve conflicts without having everyone have access to the same checkout. But we'd also have to be careful and think through usecases, including in the surrounding community. People would probably want to ensure that e.g. "Protected" or "Integration" branches don't get accept fetches or pushes of conflicted commits, git status would probably need some special warnings or notices, git checkout would probably benefit from additional warnings/notices checks for those cases, git log should probably display conflicted commits differently, we'd need to add special handling for higher order conflicts (e.g. a merge with conflicts is itself involved in a merge) probably similar to what jj has done, and audit a lot of other code paths to see what would be needed. I think it'd be really interesting to at least investigate, but it'd also be a lot of work, and I already have several other things I've been wanting to get back to for over a year and haven't succeeded in generating more time for Git. Anyway, just my $0.02. Elijah ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-07 8:16 ` Elijah Newren @ 2023-11-07 8:21 ` Dragan Simic 2023-11-07 9:16 ` Sandra Snan 2023-11-07 11:49 ` Phillip Wood 2 siblings, 0 replies; 25+ messages in thread From: Dragan Simic @ 2023-11-07 8:21 UTC (permalink / raw) To: Elijah Newren; +Cc: Sandra Snan, git On 2023-11-07 09:16, Elijah Newren wrote: > But we'd also have to be careful and think through usecases, including > in the surrounding community. People would probably want to ensure > that e.g. "Protected" or "Integration" branches don't get accept > fetches or pushes of conflicted commits, git status would probably > need some special warnings or notices, git checkout would probably > benefit from additional warnings/notices checks for those cases, git > log should probably display conflicted commits differently, we'd need > to add special handling for higher order conflicts (e.g. a merge with > conflicts is itself involved in a merge) probably similar to what jj > has done, and audit a lot of other code paths to see what would be > needed. That would be a truly _massive_ project. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-07 8:16 ` Elijah Newren 2023-11-07 8:21 ` Dragan Simic @ 2023-11-07 9:16 ` Sandra Snan 2023-11-07 11:49 ` Phillip Wood 2 siblings, 0 replies; 25+ messages in thread From: Sandra Snan @ 2023-11-07 9:16 UTC (permalink / raw) To: git Elijah Newren <newren@gmail.com> writes: > Martin talked about this and other features at Git Merge 2022, a > little over a year ago. That is something I should've checked or searched for before starting this thread, in hindsight. Thank you, Elijah, for letting me know that. > And if we did, it'd solve various issues such as people wanting > to be able to stash conflicts, or wanting to be able to > partially resolve conflicts and fix it up later, or be able to > collaboratively resolve conflicts without having everyone have > access to the same checkout. One feature I would really like and maybe vanilla git can already do this today and I just don't know how, but just becoming more aware of conflicts, of when there's a conflict in the commit. > git status would probably need some special warnings or notices, > git checkout would probably benefit from additional > warnings/notices checks for those cases, git log should probably > display conflicted commits differently That's exactly what I dream of! I wouldn't wanna commit conflicts deliberately, just that I'm paranoid that I might have some failed merges and three way diffs in code that I missed when they flashed by on the screen. > it'd also be a lot of work That is for sure. And don't get me wrong, it's not a feature I personally really need or am clamoring for. Thank you so much for the thoughtful explanation. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-07 8:16 ` Elijah Newren 2023-11-07 8:21 ` Dragan Simic 2023-11-07 9:16 ` Sandra Snan @ 2023-11-07 11:49 ` Phillip Wood 2023-11-07 17:38 ` Martin von Zweigbergk 2023-11-08 6:31 ` Elijah Newren 2 siblings, 2 replies; 25+ messages in thread From: Phillip Wood @ 2023-11-07 11:49 UTC (permalink / raw) To: Elijah Newren, Sandra Snan; +Cc: git, Martin von Zweigbergk, Randall S. Becker Hi Elijah [I've cc'd Martin to see if he has anything to add about how "jj" manages the issues around storing conflicts.] On 07/11/2023 08:16, Elijah Newren wrote: > On Mon, Nov 6, 2023 at 1:26 PM Sandra Snan > <sandra.snan@idiomdrottning.org> wrote: >> >> Is this feature from jj also a good idea for git? >> https://martinvonz.github.io/jj/v0.11.0/conflicts/ > > Martin talked about this and other features at Git Merge 2022, a > little over a year ago. I talked to him in more depth about these > while there. I personally think he has some really interesting > features here, though at the time, I thought that the additional > object type might be too much to ask for in a Git change, and it was > an intrinsic part of the implementation back then. > > Martin also gave us an update at the 2023 Git Contributors summit, and > in particular noted a significant implementation change to not have > per-file storage of conflicts, but rather storing at the commit level > the multiple conflicting trees involved. That model might be > something we could implement in Git. And if we did, it'd solve > various issues such as people wanting to be able to stash conflicts, > or wanting to be able to partially resolve conflicts and fix it up > later, or be able to collaboratively resolve conflicts without having > everyone have access to the same checkout. One thing to think about if we ever want to implement this is what other data we need to store along with the conflict trees to preserve the context in which the conflict was created. For example the files that are read by "git commit" when it commits a conflict resolution. For a single cherry-pick/revert it would probably be fairly straight forward to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets transferred along with the conflicts. For a sequence of cherry-picks or a rebase it is more complicated to preserve the context of the conflict. Even "git merge" can create several files in addition to MERGE_HEAD which are read when the conflict resolution is committed. > But we'd also have to be careful and think through usecases, including > in the surrounding community. People would probably want to ensure > that e.g. "Protected" or "Integration" branches don't get accept > fetches or pushes of conflicted commits, I think this is a really important point, while it can be useful to share conflicts so they can be collaboratively resolved we don't want to propagate them into "stable" or production branches. I wonder how 'jj' handles this. > git status would probably > need some special warnings or notices, git checkout would probably > benefit from additional warnings/notices checks for those cases, git > log should probably display conflicted commits differently, we'd need > to add special handling for higher order conflicts (e.g. a merge with > conflicts is itself involved in a merge) probably similar to what jj > has done, and audit a lot of other code paths to see what would be > needed. As you point out there is a lot more to this than just being able to store the conflict data in a commit - in many ways I think that is the easiest part of the solution to sharing conflicts. Best Wishes Phillip ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-07 11:49 ` Phillip Wood @ 2023-11-07 17:38 ` Martin von Zweigbergk 2023-11-08 7:31 ` Elijah Newren 2023-11-09 14:50 ` phillip.wood123 2023-11-08 6:31 ` Elijah Newren 1 sibling, 2 replies; 25+ messages in thread From: Martin von Zweigbergk @ 2023-11-07 17:38 UTC (permalink / raw) To: phillip.wood; +Cc: Elijah Newren, Sandra Snan, git, Randall S. Becker (new attempt in plain text) On Tue, Nov 7, 2023 at 3:49 AM Phillip Wood <phillip.wood123@gmail.com> wrote: > > Hi Elijah > > [I've cc'd Martin to see if he has anything to add about how "jj" > manages the issues around storing conflicts.] > > On 07/11/2023 08:16, Elijah Newren wrote: > > On Mon, Nov 6, 2023 at 1:26 PM Sandra Snan > > <sandra.snan@idiomdrottning.org> wrote: > >> > >> Is this feature from jj also a good idea for git? > >> https://martinvonz.github.io/jj/v0.11.0/conflicts/ > > > > Martin talked about this and other features at Git Merge 2022, a > > little over a year ago. I talked to him in more depth about these > > while there. I personally think he has some really interesting > > features here, though at the time, I thought that the additional > > object type might be too much to ask for in a Git change, and it was > > an intrinsic part of the implementation back then. > > > > Martin also gave us an update at the 2023 Git Contributors summit, and > > in particular noted a significant implementation change to not have > > per-file storage of conflicts, but rather storing at the commit level > > the multiple conflicting trees involved. That model might be > > something we could implement in Git. And if we did, it'd solve > > various issues such as people wanting to be able to stash conflicts, > > or wanting to be able to partially resolve conflicts and fix it up > > later, or be able to collaboratively resolve conflicts without having > > everyone have access to the same checkout. > > One thing to think about if we ever want to implement this is what other > data we need to store along with the conflict trees to preserve the > context in which the conflict was created. For example the files that > are read by "git commit" when it commits a conflict resolution. For a > single cherry-pick/revert it would probably be fairly straight forward > to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets > transferred along with the conflicts. For a sequence of cherry-picks or > a rebase it is more complicated to preserve the context of the conflict. > Even "git merge" can create several files in addition to MERGE_HEAD > which are read when the conflict resolution is committed. Good point. We actually don't store any extra data in jj. The old per-path conflict model was prepared for having some label associated with each term of the conflict but we never actually used it. If we add such metadata, it would probably have to be something that makes sense even after pushing the conflict to another repo, so it probably shouldn't be commit ids, unless we made sure to also push those commits. Also note that if you `jj restore --from <commit with conflict>`, you can get a conflict into a commit that didn't have conflicts previously. Or if you already had conflicts in the destination commit, your root trees (the multiple root trees constituting the conflict) will now have conflicts that potentially were created by two completely unrelated operations, so you would kind of need different labels for different paths. https://github.com/martinvonz/jj/issues/1176 has some more discussion about this. > > But we'd also have to be careful and think through usecases, including > > in the surrounding community. People would probably want to ensure > > that e.g. "Protected" or "Integration" branches don't get accept > > fetches or pushes of conflicted commits, > > I think this is a really important point, while it can be useful to > share conflicts so they can be collaboratively resolved we don't want to > propagate them into "stable" or production branches. I wonder how 'jj' > handles this. Agreed. `jj git push` refuses to push commits with conflicts, because it's very unlikely that the remote will be able to make any sense of it. Our commit backend at Google does support conflicts, so users can check out each other's conflicted commits there (except that we haven't even started dogfooding yet). > > git status would probably > > need some special warnings or notices, git checkout would probably > > benefit from additional warnings/notices checks for those cases, git > > log should probably display conflicted commits differently, we'd need > > to add special handling for higher order conflicts (e.g. a merge with > > conflicts is itself involved in a merge) probably similar to what jj > > has done, and audit a lot of other code paths to see what would be > > needed. > > As you point out there is a lot more to this than just being able to > store the conflict data in a commit - in many ways I think that is the > easiest part of the solution to sharing conflicts. Yes, I think it would be a very large project. Unlike jj, Git of course has to worry about backwards compatibility. For example, you would have to decide if your goal - even in the long term - is to make `git rebase` etc. not get interrupted due to conflicts. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-07 17:38 ` Martin von Zweigbergk @ 2023-11-08 7:31 ` Elijah Newren 2023-11-08 18:22 ` Martin von Zweigbergk 2023-11-09 14:50 ` phillip.wood123 1 sibling, 1 reply; 25+ messages in thread From: Elijah Newren @ 2023-11-08 7:31 UTC (permalink / raw) To: Martin von Zweigbergk; +Cc: phillip.wood, Sandra Snan, git, Randall S. Becker Hi Martin, On Tue, Nov 7, 2023 at 9:38 AM Martin von Zweigbergk <martinvonz@google.com> wrote: > [...] > > One thing to think about if we ever want to implement this is what other > > data we need to store along with the conflict trees to preserve the > > context in which the conflict was created. For example the files that > > are read by "git commit" when it commits a conflict resolution. For a > > single cherry-pick/revert it would probably be fairly straight forward > > to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets > > transferred along with the conflicts. For a sequence of cherry-picks or > > a rebase it is more complicated to preserve the context of the conflict. > > Even "git merge" can create several files in addition to MERGE_HEAD > > which are read when the conflict resolution is committed. > > Good point. We actually don't store any extra data in jj. The old > per-path conflict model was prepared for having some label associated > with each term of the conflict but we never actually used it. > > If we add such metadata, it would probably have to be something that > makes sense even after pushing the conflict to another repo, so it > probably shouldn't be commit ids, unless we made sure to also push > those commits. Also note that if you `jj restore --from <commit with > conflict>`, you can get a conflict into a commit that didn't have > conflicts previously. Or if you already had conflicts in the > destination commit, your root trees (the multiple root trees > constituting the conflict) will now have conflicts that potentially > were created by two completely unrelated operations, so you would kind > of need different labels for different paths. > > https://github.com/martinvonz/jj/issues/1176 has some more discussion > about this. Interesting link; thanks for sharing. I am curious more about the data you do store. My fuzzy memory is that you store a commit header involving something of the form "A + B - C", where those are all commit IDs. Is that correct? Is this in addition to a normal "tree" header as in Git, or are one of A or B found in the tree header? I think you said there was also the possibility for more than three terms. Are those for when a conflicted commit is merged with another branch that adds more conflicts, or are there other cases too? (Octopus merges?) What about recursive merges, i.e. merges where the two sides do not have a unique merge base. What is the form of those? (Would "- C" be replaced by "- C1 - C2 - ... - Cn"? Or would we create the virtual merge base V and then do a " - V"? Or do we only have "A + B"?) You previously mentioned that if someone goes to edit a commit with conflicts, and resolves the conflicts in just one file, then you can modify each of the trees A, B, and C such that a merging of those trees gives the partially resolved result. How does one do that with special conflicts, such as: * User modifies file D on both sides of history, in conflicting ways, and also renames D -> E on one side of history. User checks out this conflicted commit and fixes the conflicts in E (but not other files) and does a "git add E". When they go to commit, does the machinery need a mapping to figure out that it needs to adjust "D" in two of the trees while adjusting "E" in the other? * Similar to the above, but the side that doesn't rename D renames olddir/ -> newdir/, and the side that renames D instead renames D->olddir/E. For this case, the file will end up at newdir/E; do we need the backward mapping from newdir/E to both olddir/E and D? * Slightly different than the above: User renames D -> E on one side of history, and D -> F on the other. That's a rename/rename (1to2) conflict. User checks out this conflicted commit and does a "git add F", marking it as okay, but leaving E conflicted. How can one adjust the tree such that no conflict for F appears, but one still appears for E? * Similar to above with an extra wrinkle: User renames D -> E on one side of history, and on the other side both renames D -> F and adds a slightly different file named E. That's both a rename/rename (1to2) conflict for E & F, and an add/add conflict for E. Users checks out this conflicted commit and resolves textual conflict in E (in favor of the "other side"), and does a "git add E", marking it as resolved. When they go to commit, we not only need to worry about making sure a conflict for F appears, we also need to figure out how to adjust the tree such that the merge result gives you the expected value in E without affecting F. How can that be done? On the first two bullet points, there's no such thing as a reverse mapping from conflicted files to original files from previous commits in current Git. Creating one, if possible, would be a fair amount of work. But, I'm not so sure it's even possible, due to the fact that conflicts and files do not always have one-to-one (or even one-to-many or many-to-one) relationships; many-to-many relationship can exist, as I've started alluding to in the last two bullet points (see also https://github.com/git/git/blob/98009afd24e2304bf923a64750340423473809ff/Documentation/git-merge-tree.txt#L266-L271). In fact, they can get even more complicated (e.g. https://github.com/git/git/blob/master/t/t6422-merge-rename-corner-cases.sh#L1017-L1022). > > > But we'd also have to be careful and think through usecases, including > > > in the surrounding community. People would probably want to ensure > > > that e.g. "Protected" or "Integration" branches don't get accept > > > fetches or pushes of conflicted commits, > > > > I think this is a really important point, while it can be useful to > > share conflicts so they can be collaboratively resolved we don't want to > > propagate them into "stable" or production branches. I wonder how 'jj' > > handles this. > > Agreed. `jj git push` refuses to push commits with conflicts, because > it's very unlikely that the remote will be able to make any sense of > it. Our commit backend at Google does support conflicts, so users can > check out each other's conflicted commits there (except that we > haven't even started dogfooding yet). I'm curious to hear what happens when you do start dogfooding, on projects with many developers and which are jj-only. Do commits with conflicts accidentally end up in mainline branches, or are there good ways to make sure they don't hit anything considered stable? > > > git status would probably > > > need some special warnings or notices, git checkout would probably > > > benefit from additional warnings/notices checks for those cases, git > > > log should probably display conflicted commits differently, we'd need > > > to add special handling for higher order conflicts (e.g. a merge with > > > conflicts is itself involved in a merge) probably similar to what jj > > > has done, and audit a lot of other code paths to see what would be > > > needed. > > > > As you point out there is a lot more to this than just being able to > > store the conflict data in a commit - in many ways I think that is the > > easiest part of the solution to sharing conflicts. > > Yes, I think it would be a very large project. Unlike jj, Git of > course has to worry about backwards compatibility. For example, you > would have to decide if your goal - even in the long term - is to make > `git rebase` etc. not get interrupted due to conflicts. ...and whether to copy jj's other feature in this area in some form: auto-rebasing any descendants when you checkout and amend an old commit (e.g. to resolve conflicts). :-) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-08 7:31 ` Elijah Newren @ 2023-11-08 18:22 ` Martin von Zweigbergk 2023-11-10 21:41 ` Elijah Newren 0 siblings, 1 reply; 25+ messages in thread From: Martin von Zweigbergk @ 2023-11-08 18:22 UTC (permalink / raw) To: Elijah Newren; +Cc: phillip.wood, Sandra Snan, git, Randall S. Becker Hi Elijah, On Tue, Nov 7, 2023 at 11:31 PM Elijah Newren <newren@gmail.com> wrote: > > Hi Martin, > > On Tue, Nov 7, 2023 at 9:38 AM Martin von Zweigbergk > <martinvonz@google.com> wrote: > > > [...] > > > One thing to think about if we ever want to implement this is what other > > > data we need to store along with the conflict trees to preserve the > > > context in which the conflict was created. For example the files that > > > are read by "git commit" when it commits a conflict resolution. For a > > > single cherry-pick/revert it would probably be fairly straight forward > > > to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets > > > transferred along with the conflicts. For a sequence of cherry-picks or > > > a rebase it is more complicated to preserve the context of the conflict. > > > Even "git merge" can create several files in addition to MERGE_HEAD > > > which are read when the conflict resolution is committed. > > > > Good point. We actually don't store any extra data in jj. The old > > per-path conflict model was prepared for having some label associated > > with each term of the conflict but we never actually used it. > > > > If we add such metadata, it would probably have to be something that > > makes sense even after pushing the conflict to another repo, so it > > probably shouldn't be commit ids, unless we made sure to also push > > those commits. Also note that if you `jj restore --from <commit with > > conflict>`, you can get a conflict into a commit that didn't have > > conflicts previously. Or if you already had conflicts in the > > destination commit, your root trees (the multiple root trees > > constituting the conflict) will now have conflicts that potentially > > were created by two completely unrelated operations, so you would kind > > of need different labels for different paths. > > > > https://github.com/martinvonz/jj/issues/1176 has some more discussion > > about this. > > Interesting link; thanks for sharing. > > I am curious more about the data you do store. My fuzzy memory is > that you store a commit header involving something of the form "A + B > - C", where those are all commit IDs. Is that correct? We actually store it outside the Git repo (together with the "change id"). We have avoided using commit headers because I wasn't sure how well different tools deal with unexpected commit headers, and because I wanted commits to be indistinguishable from commits created by a regular Git binary. The latter argument doesn't apply to commits with conflicts since those are clearly not from a regular Git binary anyway, and we don't allow pushing them to a remote. > Is this in > addition to a normal "tree" header as in Git, or are one of A or B > found in the tree header? It's in addition. For the tree, we actually write a tree object with three subtrees: .jjconflict-base-0: C .jjconflict-side-0: A .jjconflict-side-1: B The tree is not authoritative - we use the Git-external storage for that. The reason we write the trees is mostly to prevent them from getting GC'd. Also, if a user does `git checkout <conflicted commit>`, they'll see those subdirectories and will hopefully be reminded that they did something odd (perhaps we should drop the leading `.` so `ls` will show them...). They can also diff the directories in a diff tool if they like. > I think you said there was also the > possibility for more than three terms. Are those for when a > conflicted commit is merged with another branch that adds more > conflicts, or are there other cases too? (Octopus merges?) Yes, they can happen in both of those cases you mention. More generally, whenever you apply a diff between two trees onto another tree, you might end up with a higher-arity conflict. So merging in another branch can do that, or doing an octopus merge (which is the same thing at the tree level, just different at the commit level), or rebasing or reverting a commit. We simplify conflicts algebraically, so rebasing a commit multiple times does not increase the arity - the intermediate parents were both added and removed and thus cancel out. These simple algorithms for simplifying conflicts are encapsulated in https://github.com/martinvonz/jj/blob/main/lib/src/merge.rs. Most of them are independent of the type of values being merged; they can be used for doing algebra on tree ids, content hunks, refs, etc. (in the test cases, we mostly merge integers because integer literals are compact). > What about recursive merges, i.e. merges where the two sides do not > have a unique merge base. What is the form of those? (Would "- C" be > replaced by "- C1 - C2 - ... - Cn"? Or would we create the virtual > merge base V and then do a " - V"? Or do we only have "A + B"?) We do that by recursively creating a virtual tree just like Git does, I think (https://github.com/martinvonz/jj/blob/084b99e1e2c42c40f2d52038cdc97687b76fed89/lib/src/rewrite.rs#L56-L71). I think the main difference is that by modeling conflicts, we can avoid recursive conflict markers (if that's what Git does), and we can even automatically resolve some cases where the virtual tree has a conflict. > You previously mentioned that if someone goes to edit a commit with > conflicts, and resolves the conflicts in just one file, then you can > modify each of the trees A, B, and C such that a merging of those > trees gives the partially resolved result. How does one do that with > special conflicts, such as: > * User modifies file D on both sides of history, in conflicting > ways, and also renames D -> E on one side of history. User checks out > this conflicted commit and fixes the conflicts in E (but not other > files) and does a "git add E". When they go to commit, does the > machinery need a mapping to figure out that it needs to adjust "D" in > two of the trees while adjusting "E" in the other? > * Similar to the above, but the side that doesn't rename D renames > olddir/ -> newdir/, and the side that renames D instead renames > D->olddir/E. For this case, the file will end up at newdir/E; do we > need the backward mapping from newdir/E to both olddir/E and D? > * Slightly different than the above: User renames D -> E on one > side of history, and D -> F on the other. That's a rename/rename > (1to2) conflict. User checks out this conflicted commit and does a > "git add F", marking it as okay, but leaving E conflicted. How can > one adjust the tree such that no conflict for F appears, but one still > appears for E? > * Similar to above with an extra wrinkle: User renames D -> E on > one side of history, and on the other side both renames D -> F and > adds a slightly different file named E. That's both a rename/rename > (1to2) conflict for E & F, and an add/add conflict for E. Users > checks out this conflicted commit and resolves textual conflict in E > (in favor of the "other side"), and does a "git add E", marking it as > resolved. When they go to commit, we not only need to worry about > making sure a conflict for F appears, we also need to figure out how > to adjust the tree such that the merge result gives you the expected > value in E without affecting F. How can that be done? > > On the first two bullet points, there's no such thing as a reverse > mapping from conflicted files to original files from previous commits > in current Git. Creating one, if possible, would be a fair amount of > work. But, I'm not so sure it's even possible, due to the fact that > conflicts and files do not always have one-to-one (or even one-to-many > or many-to-one) relationships; many-to-many relationship can exist, as > I've started alluding to in the last two bullet points (see also > https://github.com/git/git/blob/98009afd24e2304bf923a64750340423473809ff/Documentation/git-merge-tree.txt#L266-L271). > In fact, they can get even more complicated (e.g. > https://github.com/git/git/blob/master/t/t6422-merge-rename-corner-cases.sh#L1017-L1022). Great questions! We don't have support for renames, so we haven't had to worry about these things. We have talked a little about divergent renames and the need for recording that in the commit so we can tell the user about it and maybe ask them which name they want to keep. I had not considered the interaction with partial conflict resolution, so thanks for bringing that up. I don't have any answers now, but we'll probably need to start thinking about this soon. > > > > But we'd also have to be careful and think through usecases, including > > > > in the surrounding community. People would probably want to ensure > > > > that e.g. "Protected" or "Integration" branches don't get accept > > > > fetches or pushes of conflicted commits, > > > > > > I think this is a really important point, while it can be useful to > > > share conflicts so they can be collaboratively resolved we don't want to > > > propagate them into "stable" or production branches. I wonder how 'jj' > > > handles this. > > > > Agreed. `jj git push` refuses to push commits with conflicts, because > > it's very unlikely that the remote will be able to make any sense of > > it. Our commit backend at Google does support conflicts, so users can > > check out each other's conflicted commits there (except that we > > haven't even started dogfooding yet). > > I'm curious to hear what happens when you do start dogfooding, on > projects with many developers and which are jj-only. Do commits with > conflicts accidentally end up in mainline branches, or are there good > ways to make sure they don't hit anything considered stable? That won't happen at Google because our source of truth for "merged PRs" (in GitHub-speak) is in our existing VCS. We will necessarily have to translate from jj's data model to its data model before a commit can even be sent for review. > > > > > git status would probably > > > > need some special warnings or notices, git checkout would probably > > > > benefit from additional warnings/notices checks for those cases, git > > > > log should probably display conflicted commits differently, we'd need > > > > to add special handling for higher order conflicts (e.g. a merge with > > > > conflicts is itself involved in a merge) probably similar to what jj > > > > has done, and audit a lot of other code paths to see what would be > > > > needed. > > > > > > As you point out there is a lot more to this than just being able to > > > store the conflict data in a commit - in many ways I think that is the > > > easiest part of the solution to sharing conflicts. > > > > Yes, I think it would be a very large project. Unlike jj, Git of > > course has to worry about backwards compatibility. For example, you > > would have to decide if your goal - even in the long term - is to make > > `git rebase` etc. not get interrupted due to conflicts. > > ...and whether to copy jj's other feature in this area in some form: > auto-rebasing any descendants when you checkout and amend an old > commit (e.g. to resolve conflicts). :-) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-08 18:22 ` Martin von Zweigbergk @ 2023-11-10 21:41 ` Elijah Newren 2023-11-12 7:05 ` Martin von Zweigbergk 0 siblings, 1 reply; 25+ messages in thread From: Elijah Newren @ 2023-11-10 21:41 UTC (permalink / raw) To: Martin von Zweigbergk; +Cc: phillip.wood, Sandra Snan, git, Randall S. Becker Hi Martin, On Wed, Nov 8, 2023 at 10:23 AM Martin von Zweigbergk <martinvonz@google.com> wrote: > On Tue, Nov 7, 2023 at 11:31 PM Elijah Newren <newren@gmail.com> wrote: > > On Tue, Nov 7, 2023 at 9:38 AM Martin von Zweigbergk > > <martinvonz@google.com> wrote: > > > [...] > > I am curious more about the data you do store. My fuzzy memory is > > that you store a commit header involving something of the form "A + B > > - C", where those are all commit IDs. Is that correct? > > We actually store it outside the Git repo (together with the "change > id"). We have avoided using commit headers because I wasn't sure how > well different tools deal with unexpected commit headers, and because > I wanted commits to be indistinguishable from commits created by a > regular Git binary. The latter argument doesn't apply to commits with > conflicts since those are clearly not from a regular Git binary > anyway, and we don't allow pushing them to a remote. > > > Is this in > > addition to a normal "tree" header as in Git, or are one of A or B > > found in the tree header? > > It's in addition. For the tree, we actually write a tree object with > three subtrees: > > .jjconflict-base-0: C > .jjconflict-side-0: A > .jjconflict-side-1: B > > The tree is not authoritative - we use the Git-external storage for > that. The reason we write the trees is mostly to prevent them from > getting GC'd. Oh, that seems like a clever way to handle reachability and make sure the relevant trees are automatically included in any pushes or pulls. > Also, if a user does `git checkout <conflicted commit>`, > they'll see those subdirectories and will hopefully be reminded that > they did something odd (perhaps we should drop the leading `.` so `ls` > will show them...). They can also diff the directories in a diff tool > if they like. Oh, so they don't get a regular top-level looking tree with possibly-conflicted-files present? Or is this in addition to the regular repository contents? If in addition, are you worried about users ever creating real entries named ".jjconflict-base-<N>" in their repository? > > I think you said there was also the > > possibility for more than three terms. Are those for when a > > conflicted commit is merged with another branch that adds more > > conflicts, or are there other cases too? (Octopus merges?) > > Yes, they can happen in both of those cases you mention. More > generally, whenever you apply a diff between two trees onto another > tree, you might end up with a higher-arity conflict. So merging in > another branch can do that, or doing an octopus merge (which is the > same thing at the tree level, just different at the commit level), or > rebasing or reverting a commit. > > We simplify conflicts algebraically, so rebasing a commit multiple > times does not increase the arity - the intermediate parents were both > added and removed and thus cancel out. These simple algorithms for > simplifying conflicts are encapsulated in > https://github.com/martinvonz/jj/blob/main/lib/src/merge.rs. Most of > them are independent of the type of values being merged; they can be > used for doing algebra on tree ids, content hunks, refs, etc. (in the > test cases, we mostly merge integers because integer literals are > compact). It's done on content hunks as well? That's interesting. When exactly would it be done on refs, though? I'm not following that one. And what else is in that "etc."? > > What about recursive merges, i.e. merges where the two sides do not > > have a unique merge base. What is the form of those? (Would "- C" be > > replaced by "- C1 - C2 - ... - Cn"? Or would we create the virtual > > merge base V and then do a " - V"? Or do we only have "A + B"?) > > We do that by recursively creating a virtual tree just like Git does, > I think (https://github.com/martinvonz/jj/blob/084b99e1e2c42c40f2d52038cdc97687b76fed89/lib/src/rewrite.rs#L56-L71). > I think the main difference is that by modeling conflicts, we can > avoid recursive conflict markers (if that's what Git does), and we can > even automatically resolve some cases where the virtual tree has a > conflict. Okay, but that talks about the mechanics of creating a recursive merge, omitting all the details about how the conflict header is written when you record the merge. Is the virtual merge base represented in the algebraic "A + B - C" expressions, or is the "- C" part omitted? If it is represented, and the virtual merge base had conflicts which you could not automatically resolve, what exactly does the conflicted header for the outer merge get populated with? [...] > Great questions! We don't have support for renames, so we haven't had > to worry about these things. We have talked a little about divergent > renames and the need for recording that in the commit so we can tell > the user about it and maybe ask them which name they want to keep. I > had not considered the interaction with partial conflict resolution, > so thanks for bringing that up. I don't have any answers now, but > we'll probably need to start thinking about this soon. I was wondering if that might be the answer. When you do tackle this, I'd be interested to hear your thoughts. I'm wondering if we just need to augment the data in the conflict header to handle such cases (though I guess this could risk having commit objects that are significantly bigger than normal in theoretical cases where many such paths are involved?) > > I'm curious to hear what happens when you do start dogfooding, on > > projects with many developers and which are jj-only. Do commits with > > conflicts accidentally end up in mainline branches, or are there good > > ways to make sure they don't hit anything considered stable? > > That won't happen at Google because our source of truth for "merged > PRs" (in GitHub-speak) is in our existing VCS. We will necessarily > have to translate from jj's data model to its data model before a > commit can even be sent for review. That makes sense, but I was just hoping we'd have an example to look to for how to keep things safe if we were to implement this. Sadly, I don't think we have the benefit of relying on folks to first push their commits into some other VCS which lacks this feature. ;-) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-10 21:41 ` Elijah Newren @ 2023-11-12 7:05 ` Martin von Zweigbergk 0 siblings, 0 replies; 25+ messages in thread From: Martin von Zweigbergk @ 2023-11-12 7:05 UTC (permalink / raw) To: Elijah Newren; +Cc: phillip.wood, Sandra Snan, git, Randall S. Becker On Fri, Nov 10, 2023 at 1:41 PM Elijah Newren <newren@gmail.com> wrote: > > Hi Martin, > > On Wed, Nov 8, 2023 at 10:23 AM Martin von Zweigbergk > <martinvonz@google.com> wrote: > > On Tue, Nov 7, 2023 at 11:31 PM Elijah Newren <newren@gmail.com> wrote: > > > On Tue, Nov 7, 2023 at 9:38 AM Martin von Zweigbergk > > > <martinvonz@google.com> wrote: > > > > > [...] > > > I am curious more about the data you do store. My fuzzy memory is > > > that you store a commit header involving something of the form "A + B > > > - C", where those are all commit IDs. Is that correct? > > > > We actually store it outside the Git repo (together with the "change > > id"). We have avoided using commit headers because I wasn't sure how > > well different tools deal with unexpected commit headers, and because > > I wanted commits to be indistinguishable from commits created by a > > regular Git binary. The latter argument doesn't apply to commits with > > conflicts since those are clearly not from a regular Git binary > > anyway, and we don't allow pushing them to a remote. > > > > > Is this in > > > addition to a normal "tree" header as in Git, or are one of A or B > > > found in the tree header? > > > > It's in addition. For the tree, we actually write a tree object with > > three subtrees: > > > > .jjconflict-base-0: C > > .jjconflict-side-0: A > > .jjconflict-side-1: B > > > > The tree is not authoritative - we use the Git-external storage for > > that. The reason we write the trees is mostly to prevent them from > > getting GC'd. > > Oh, that seems like a clever way to handle reachability and make sure > the relevant trees are automatically included in any pushes or pulls. > > > Also, if a user does `git checkout <conflicted commit>`, > > they'll see those subdirectories and will hopefully be reminded that > > they did something odd (perhaps we should drop the leading `.` so `ls` > > will show them...). They can also diff the directories in a diff tool > > if they like. > > Oh, so they don't get a regular top-level looking tree with > possibly-conflicted-files present? Or is this in addition to the > regular repository contents? They get a regular tree with conflict markers if they use `jj checkout`, but not if they use `git checkout`. > If in addition, are you worried about > users ever creating real entries named ".jjconflict-base-<N>" in their > repository? I'm not worried about that since it's not the source of truth, so at most they waste some time. By the way, if the user did use `git checkout` and got those `.jjconflict-*` directories in the working copy, and then ran a `jj` command afterwards, then jj would think that the conflict was resolved by replacing the conflicted paths (and all other paths!) by those `.jjconflict-*` directories :) The user would probably realize their mistake pretty quickly and run `jj abandon` to discard those changes. > > > > I think you said there was also the > > > possibility for more than three terms. Are those for when a > > > conflicted commit is merged with another branch that adds more > > > conflicts, or are there other cases too? (Octopus merges?) > > > > Yes, they can happen in both of those cases you mention. More > > generally, whenever you apply a diff between two trees onto another > > tree, you might end up with a higher-arity conflict. So merging in > > another branch can do that, or doing an octopus merge (which is the > > same thing at the tree level, just different at the commit level), or > > rebasing or reverting a commit. > > > > We simplify conflicts algebraically, so rebasing a commit multiple > > times does not increase the arity - the intermediate parents were both > > added and removed and thus cancel out. These simple algorithms for > > simplifying conflicts are encapsulated in > > https://github.com/martinvonz/jj/blob/main/lib/src/merge.rs. Most of > > them are independent of the type of values being merged; they can be > > used for doing algebra on tree ids, content hunks, refs, etc. (in the > > test cases, we mostly merge integers because integer literals are > > compact). > > It's done on content hunks as well? That's interesting. Yes, when merging trees, we start at the root tree and try to resolve conflicts at the tree entry level (i.e. without reading file contents). I think git does the same. If that's not enough we need to recurse into subtrees or file contents. When merging files, we find matching regions of the inputs and use the same algorithm on the individual chunks between the matching regions. > > When exactly would it be done on refs, though? I'm not following that one. First of all, note that jj allows refs to be in a conflicted state similar to how trees can be in a conflicted state. We merge refs for a few different reasons. If you run two concurrent operations on a repo, we merge any changes to the refs. We do the same thing when you fetch branches from a remote. For example, if you've fetched branch "main" from a remote, then moved it locally, and then you fetch again from the remote, we'll attempt to merge those refs. We use the same function for merging there, but if it fails, we then also automatically resolve two operations moving the branch forward different amounts (e.g. one operation moves a ref from X~10 to X~5 while the other moves it forward to X, we resolve to X). https://github.com/martinvonz/jj/blob/main/docs/technical/concurrency.md talks a bit more about that. > > And what else is in that "etc."? I think it's only individual file ids (blob ids) and the executable bit. If a file's content changed and its executable bit changed, we use the same algorithm for each of those pieces of information. > > > > What about recursive merges, i.e. merges where the two sides do not > > > have a unique merge base. What is the form of those? (Would "- C" be > > > replaced by "- C1 - C2 - ... - Cn"? Or would we create the virtual > > > merge base V and then do a " - V"? Or do we only have "A + B"?) > > > > We do that by recursively creating a virtual tree just like Git does, > > I think (https://github.com/martinvonz/jj/blob/084b99e1e2c42c40f2d52038cdc97687b76fed89/lib/src/rewrite.rs#L56-L71). > > I think the main difference is that by modeling conflicts, we can > > avoid recursive conflict markers (if that's what Git does), and we can > > even automatically resolve some cases where the virtual tree has a > > conflict. > > Okay, but that talks about the mechanics of creating a recursive > merge, omitting all the details about how the conflict header is > written when you record the merge. Is the virtual merge base > represented in the algebraic "A + B - C" expressions, or is the "- C" > part omitted? If it is represented, and the virtual merge base had > conflicts which you could not automatically resolve, what exactly does > the conflicted header for the outer merge get populated with? I think we're talking about the state in F below, right? F / \ / \ D E |\ /| | X | |/ \| B C \ / \ / A The virtual commit/tree, which we can think of as sitting where the X is in the graph, would have state V=B+C-A. The state at F would have D+E-V=D+E-(B+C-A)=D+(E-C)+(A-B). This is encoded in `Merge::flatten()` here: https://github.com/martinvonz/jj/blob/e3a1e5b80ed9124091baa4d920cc9e8124c1f559/lib/src/merge.rs#L421-L451. It's not specific to recursive merge; we run into the same kind of higher-arity conflicts on regular octopus merges or repeated merges (if you don't resolve conflicts in between). Oh, I should also say that we don't store the unmodified trees in these expressions. Instead, for anything we can automatically resolve, we replace those parts of the trees. So even if A, B, and C differ at paths X, Y, and Z, the trees we associate with V might only differ at path Y if that's the only path we couldn't resolve. IIRC, I did it that way because it seemed wasteful to re-attempt the merge at paths X and Z every time we rewrite the commit. I *think* it rarely matters in practice, but it feels like it could in some cases (maybe where two sides make the same changes). > > [...] > > > Great questions! We don't have support for renames, so we haven't had > > to worry about these things. We have talked a little about divergent > > renames and the need for recording that in the commit so we can tell > > the user about it and maybe ask them which name they want to keep. I > > had not considered the interaction with partial conflict resolution, > > so thanks for bringing that up. I don't have any answers now, but > > we'll probably need to start thinking about this soon. > > I was wondering if that might be the answer. When you do tackle this, > I'd be interested to hear your thoughts. I'm wondering if we just > need to augment the data in the conflict header to handle such cases > (though I guess this could risk having commit objects that are > significantly bigger than normal in theoretical cases where many such > paths are involved?) Yes, that's what I've been thinking, but I think the only thing I had been thinking of storing was for "divergent renames" (A->B on one side, A->C on the other). Will let you know when we start thinking about this for real. Thanks again for your input! > > > > I'm curious to hear what happens when you do start dogfooding, on > > > projects with many developers and which are jj-only. Do commits with > > > conflicts accidentally end up in mainline branches, or are there good > > > ways to make sure they don't hit anything considered stable? > > > > That won't happen at Google because our source of truth for "merged > > PRs" (in GitHub-speak) is in our existing VCS. We will necessarily > > have to translate from jj's data model to its data model before a > > commit can even be sent for review. > > That makes sense, but I was just hoping we'd have an example to look > to for how to keep things safe if we were to implement this. Sadly, I > don't think we have the benefit of relying on folks to first push > their commits into some other VCS which lacks this feature. ;-) It might be best to disallow pushing conflicts to start with. It should also be easy to add a hook on the server to disallow it only to certain branches. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-07 17:38 ` Martin von Zweigbergk 2023-11-08 7:31 ` Elijah Newren @ 2023-11-09 14:50 ` phillip.wood123 1 sibling, 0 replies; 25+ messages in thread From: phillip.wood123 @ 2023-11-09 14:50 UTC (permalink / raw) To: Martin von Zweigbergk, phillip.wood Cc: Elijah Newren, Sandra Snan, git, Randall S. Becker Hi Martin On 07/11/2023 17:38, Martin von Zweigbergk wrote: > (new attempt in plain text) Oh, the joys of the mailing list! Thanks for your comments below and in your reply to Elijah, I found them really helpful to get a better understanding of how 'jj' handles this. Best Wishes Phillip > On Tue, Nov 7, 2023 at 3:49 AM Phillip Wood <phillip.wood123@gmail.com> wrote: >> >> Hi Elijah >> >> [I've cc'd Martin to see if he has anything to add about how "jj" >> manages the issues around storing conflicts.] >> >> On 07/11/2023 08:16, Elijah Newren wrote: >>> On Mon, Nov 6, 2023 at 1:26 PM Sandra Snan >>> <sandra.snan@idiomdrottning.org> wrote: >>>> >>>> Is this feature from jj also a good idea for git? >>>> https://martinvonz.github.io/jj/v0.11.0/conflicts/ >>> >>> Martin talked about this and other features at Git Merge 2022, a >>> little over a year ago. I talked to him in more depth about these >>> while there. I personally think he has some really interesting >>> features here, though at the time, I thought that the additional >>> object type might be too much to ask for in a Git change, and it was >>> an intrinsic part of the implementation back then. >>> >>> Martin also gave us an update at the 2023 Git Contributors summit, and >>> in particular noted a significant implementation change to not have >>> per-file storage of conflicts, but rather storing at the commit level >>> the multiple conflicting trees involved. That model might be >>> something we could implement in Git. And if we did, it'd solve >>> various issues such as people wanting to be able to stash conflicts, >>> or wanting to be able to partially resolve conflicts and fix it up >>> later, or be able to collaboratively resolve conflicts without having >>> everyone have access to the same checkout. >> >> One thing to think about if we ever want to implement this is what other >> data we need to store along with the conflict trees to preserve the >> context in which the conflict was created. For example the files that >> are read by "git commit" when it commits a conflict resolution. For a >> single cherry-pick/revert it would probably be fairly straight forward >> to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets >> transferred along with the conflicts. For a sequence of cherry-picks or >> a rebase it is more complicated to preserve the context of the conflict. >> Even "git merge" can create several files in addition to MERGE_HEAD >> which are read when the conflict resolution is committed. > > Good point. We actually don't store any extra data in jj. The old > per-path conflict model was prepared for having some label associated > with each term of the conflict but we never actually used it. > > If we add such metadata, it would probably have to be something that > makes sense even after pushing the conflict to another repo, so it > probably shouldn't be commit ids, unless we made sure to also push > those commits. Also note that if you `jj restore --from <commit with > conflict>`, you can get a conflict into a commit that didn't have > conflicts previously. Or if you already had conflicts in the > destination commit, your root trees (the multiple root trees > constituting the conflict) will now have conflicts that potentially > were created by two completely unrelated operations, so you would kind > of need different labels for different paths. > > https://github.com/martinvonz/jj/issues/1176 has some more discussion > about this. > >>> But we'd also have to be careful and think through usecases, including >>> in the surrounding community. People would probably want to ensure >>> that e.g. "Protected" or "Integration" branches don't get accept >>> fetches or pushes of conflicted commits, >> >> I think this is a really important point, while it can be useful to >> share conflicts so they can be collaboratively resolved we don't want to >> propagate them into "stable" or production branches. I wonder how 'jj' >> handles this. > > Agreed. `jj git push` refuses to push commits with conflicts, because > it's very unlikely that the remote will be able to make any sense of > it. Our commit backend at Google does support conflicts, so users can > check out each other's conflicted commits there (except that we > haven't even started dogfooding yet). > >>> git status would probably >>> need some special warnings or notices, git checkout would probably >>> benefit from additional warnings/notices checks for those cases, git >>> log should probably display conflicted commits differently, we'd need >>> to add special handling for higher order conflicts (e.g. a merge with >>> conflicts is itself involved in a merge) probably similar to what jj >>> has done, and audit a lot of other code paths to see what would be >>> needed. >> >> As you point out there is a lot more to this than just being able to >> store the conflict data in a commit - in many ways I think that is the >> easiest part of the solution to sharing conflicts. > > Yes, I think it would be a very large project. Unlike jj, Git of > course has to worry about backwards compatibility. For example, you > would have to decide if your goal - even in the long term - is to make > `git rebase` etc. not get interrupted due to conflicts. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-07 11:49 ` Phillip Wood 2023-11-07 17:38 ` Martin von Zweigbergk @ 2023-11-08 6:31 ` Elijah Newren 2023-11-09 14:45 ` Phillip Wood 1 sibling, 1 reply; 25+ messages in thread From: Elijah Newren @ 2023-11-08 6:31 UTC (permalink / raw) To: phillip.wood; +Cc: Sandra Snan, git, Martin von Zweigbergk, Randall S. Becker Hi Phillip, On Tue, Nov 7, 2023 at 3:49 AM Phillip Wood <phillip.wood123@gmail.com> wrote: > > Hi Elijah > > [I've cc'd Martin to see if he has anything to add about how "jj" > manages the issues around storing conflicts.] +1. I'll add some other questions for him too while we're at it, separately in this thread. [...] > > Martin also gave us an update at the 2023 Git Contributors summit, and > > in particular noted a significant implementation change to not have > > per-file storage of conflicts, but rather storing at the commit level > > the multiple conflicting trees involved. That model might be > > something we could implement in Git. And if we did, it'd solve > > various issues such as people wanting to be able to stash conflicts, > > or wanting to be able to partially resolve conflicts and fix it up > > later, or be able to collaboratively resolve conflicts without having > > everyone have access to the same checkout. > > One thing to think about if we ever want to implement this is what other > data we need to store along with the conflict trees to preserve the > context in which the conflict was created. For example the files that > are read by "git commit" when it commits a conflict resolution. For a > single cherry-pick/revert it would probably be fairly straight forward > to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets > transferred along with the conflicts. This is a great thing to think about and bring up. However, I'm not sure what part of it actually needs to be preserved; in fact, it's not clear to me that any of it needs preserving -- especially not the files read by "git commit". A commit was already created, after all. It seems that CHERRY_PICK_HEAD/REVERT_HEAD files exist primarily to clue in that we are in-the-middle-of-<op>, and the conflict header (the "tree A + tree B - tree C" thing; whatever that's called) similarly provides signal that the commit still has conflicts. Secondarily, these files contain information about the tree we came from and its parent tree, which allows users to investigate the diff between those...but that information is also available from the conflict header in the recorded commit. The CHERRY_PICK_HEAD and REVERT_HEAD files could also be used to access the commit message, but that would have been stored in the conflicted commit as well. Are there any other pieces of information I'm missing? > For a sequence of cherry-picks or > a rebase it is more complicated to preserve the context of the conflict. I think the big piece here is whether we also want to adopt jj's behavior of automatically rebasing all descendant commits when checking out and amending some historical commit (or at least having the option of doing so). That behavior allows users to amend commits to resolve conflicts without figuring out complicated interactive rebases to fix all the descendant commits across all relevant branches. Without that feature, I agree this might be a bit more difficult, but with that feature, I'm having a hard time figuring out what context we actually need to preserve for a sequence of cherry-picks or a rebase. Digging into a few briefly... Many of the state files are about the status of the in-progress operation (todo-list, numbers of commits done and to do, what should be done with not-yet-handled commits, temporary refs corresponding to temporary labels that need to be deleted, rescheduling failed execs, dropping or keeping redundant commits, etc.), but if the operation has completed and new commits created (potentially with multiple files with conflict headers), I don't see how this information is useful anymore. There are some special state files related to half-completed operations (e.g. squash commits when we haven't yet reached the final one in the sequence, a file to note that we want to edit a commit message once the user has finished resolving conflicts, whether we need to create a new root commit), but again, the operation has completed and commits have been created with appropriate parentage and commit messages so I don't think these are useful anymore either. Other state files are related to things needing to be done at the end of the operation, like invoke the post-rewrite hook or pop the autostash (with knowledge of what was rewritten to what). But the operation would have been completed and those things done already, so I don't see how this is necessary either. Some state files are for controlling how commits are created (setting committer date to author date, gpg signing options, whether to add signoff), but, again, commits have already been created, and can be further amended as the user wants (hopefully including resolving the conflicts). The biggest issue is perhaps that REBASE_HEAD is used in the implementation of `git rebase --show-current-patch`, but all information stored in that is still accessible -- the commit message is stored in the commit, the author time is stored in the commit, and the trees involved are in the conflict header. The only thing missing is committer timestamp, which isn't relevant anyway. The only ones I'm pausing a bit on are the strategy and strategy-options. Those might be useful somehow...but I can't currently quite put my finger on explaining how they would be useful and I'm not sure they are. Am I missing anything? > Even "git merge" can create several files in addition to MERGE_HEAD > which are read when the conflict resolution is committed. That's a good one to bring up too, but I'm not sure I understand how these could be useful to preserve either. Am I missing something? My breakdown: * MERGE_HEAD: was recorded in the commit as a second parent, so we already have that info * MERGE_MSG: was recorded in the commit as the commit message, so again we already have that info * MERGE_AUTOSTASH: irrelevant since the stashed stuff isn't part of the commit and was in fact unstashed after the merge-commit-with-conflicts was created * MERGE_MODE: irrelevant since it's only used for reducing heads at time of git-commit, and git-commit has already been run * MERGE_RR: I think this is irrelevant; the conflict record (tree A + tree B - tree C) lets us redo the merge if needed to get the list of conflicted files and textual conflicts found therein So I don't see how any of the information in these files need to be recorded as additional auxiliary information. However, that last item might depend upon the strategy and strategy-options, which currently is not recorded...hmm.... > > But we'd also have to be careful and think through usecases, including > > in the surrounding community. People would probably want to ensure > > that e.g. "Protected" or "Integration" branches don't get accept > > fetches or pushes of conflicted commits, > > I think this is a really important point, while it can be useful to > share conflicts so they can be collaboratively resolved we don't want to > propagate them into "stable" or production branches. I wonder how 'jj' > handles this. Yeah, figuring this out might be the biggest sticking point. > > git status would probably > > need some special warnings or notices, git checkout would probably > > benefit from additional warnings/notices checks for those cases, git > > log should probably display conflicted commits differently, we'd need > > to add special handling for higher order conflicts (e.g. a merge with > > conflicts is itself involved in a merge) probably similar to what jj > > has done, and audit a lot of other code paths to see what would be > > needed. > > As you point out there is a lot more to this than just being able to > store the conflict data in a commit - in many ways I think that is the > easiest part of the solution to sharing conflicts. Yeah, another one I just thought of is that the trees referenced in the conflicts would also need to affect reachability computations as well, to make sure they both don't get gc'ed and that they are transferred when appropriate. There are lots of things that would be involved in implementing this idea. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-08 6:31 ` Elijah Newren @ 2023-11-09 14:45 ` Phillip Wood 2023-11-10 22:57 ` Elijah Newren 0 siblings, 1 reply; 25+ messages in thread From: Phillip Wood @ 2023-11-09 14:45 UTC (permalink / raw) To: Elijah Newren, phillip.wood Cc: Sandra Snan, git, Martin von Zweigbergk, Randall S. Becker Hi Elijah On 08/11/2023 06:31, Elijah Newren wrote: > Hi Phillip, > > On Tue, Nov 7, 2023 at 3:49 AM Phillip Wood <phillip.wood123@gmail.com> wrote: >> >> Hi Elijah >> >> [I've cc'd Martin to see if he has anything to add about how "jj" >> manages the issues around storing conflicts.] > > +1. I'll add some other questions for him too while we're at it, > separately in this thread. > > [...] > >>> Martin also gave us an update at the 2023 Git Contributors summit, and >>> in particular noted a significant implementation change to not have >>> per-file storage of conflicts, but rather storing at the commit level >>> the multiple conflicting trees involved. That model might be >>> something we could implement in Git. And if we did, it'd solve >>> various issues such as people wanting to be able to stash conflicts, >>> or wanting to be able to partially resolve conflicts and fix it up >>> later, or be able to collaboratively resolve conflicts without having >>> everyone have access to the same checkout. >> >> One thing to think about if we ever want to implement this is what other >> data we need to store along with the conflict trees to preserve the >> context in which the conflict was created. For example the files that >> are read by "git commit" when it commits a conflict resolution. For a >> single cherry-pick/revert it would probably be fairly straight forward >> to store CHERRY_PICK_HEAD/REVERT_HEAD and add it as a parent so it gets >> transferred along with the conflicts. > > This is a great thing to think about and bring up. However, I'm not > sure what part of it actually needs to be preserved; in fact, it's not > clear to me that any of it needs preserving -- especially not the > files read by "git commit". A commit was already created, after all. > > It seems that CHERRY_PICK_HEAD/REVERT_HEAD files exist primarily to > clue in that we are in-the-middle-of-<op>, and the conflict header > (the "tree A + tree B - tree C" thing; whatever that's called) > similarly provides signal that the commit still has conflicts. > Secondarily, these files contain information about the tree we came > from and its parent tree, which allows users to investigate the diff > between those...but that information is also available from the > conflict header in the recorded commit. The CHERRY_PICK_HEAD and > REVERT_HEAD files could also be used to access the commit message, but > that would have been stored in the conflicted commit as well. Are > there any other pieces of information I'm missing? Mainly that I'm an idiot and forgot we were actually creating a commit and can store the message and authorship there! More seriously I think being able to inspect the commit being cherry-picked (including the original commit message) is useful so we'd need to recreate something like CHERRY_PICK_HEAD when the conflict commit is checked out. Recreating CHERRY_PICK_HEAD is useful for "git status" as well. I think that means storing a little more that just the "tree A + tree B - tree C" thing. >> For a sequence of cherry-picks or >> a rebase it is more complicated to preserve the context of the conflict. > > I think the big piece here is whether we also want to adopt jj's > behavior of automatically rebasing all descendant commits when > checking out and amending some historical commit (or at least having > the option of doing so). That behavior allows users to amend commits > to resolve conflicts without figuring out complicated interactive > rebases to fix all the descendant commits across all relevant > branches. That's a potentially attractive option which is fairly simple to implement locally as I think you can use the commit DAG to find all the descendants though that could be expensive if there are lots of branches. However, if we're going to share conflicts I think we'd need something like "hg evolve" - if I push a commit with conflicts and you base some work on it and then I resolve the conflict and push again you would want to your work to be rebased onto my conflict resolution. To handle "rebase --exec" we could store the exec command and run it when the conflicts are resolved. Also I wonder how annoying it would be in cases where I just want to rebase and resolve the conflicts now. At the moment "git rebase" stops at the conflict, with this feature I'd have to go and checkout the conflicted commit and fix the conflicts after the rebase had finished. > Without that feature, I agree this might be a bit more > difficult, Yes, when I wrote my original message I was imagining that we'd stop at the first conflicting pick and store all the rebase state like some kind of stash on steroids so it could be continued when the conflict was resolved. It would be much simpler to try and avoid that. > but with that feature, I'm having a hard time figuring out > what context we actually need to preserve for a sequence of > cherry-picks or a rebase. > > Digging into a few briefly... > > Many of the state files are about the status of the in-progress > operation (todo-list, numbers of commits done and to do, what should > be done with not-yet-handled commits, temporary refs corresponding to > temporary labels that need to be deleted, rescheduling failed execs, > dropping or keeping redundant commits, etc.), but if the operation has > completed and new commits created (potentially with multiple files > with conflict headers), I don't see how this information is useful > anymore. Agreed > There are some special state files related to half-completed > operations (e.g. squash commits when we haven't yet reached the final > one in the sequence, a file to note that we want to edit a commit > message once the user has finished resolving conflicts, whether we > need to create a new root commit), but again, the operation has > completed and commits have been created with appropriate parentage and > commit messages so I don't think these are useful anymore either. Yes, though we may want to remember which commits were squashed together so the user can inspect that when resolving conflicts. > Other state files are related to things needing to be done at the end > of the operation, like invoke the post-rewrite hook or pop the > autostash (with knowledge of what was rewritten to what). But the > operation would have been completed and those things done already, so > I don't see how this is necessary either. Agreed > Some state files are for controlling how commits are created (setting > committer date to author date, gpg signing options, whether to add > signoff), but, again, commits have already been created, and can be > further amended as the user wants (hopefully including resolving the > conflicts). Agreed > The biggest issue is perhaps that REBASE_HEAD is used in the > implementation of `git rebase --show-current-patch`, but all > information stored in that is still accessible -- the commit message > is stored in the commit, the author time is stored in the commit, and > the trees involved are in the conflict header. The only thing missing > is committer timestamp, which isn't relevant anyway. The commit message may have been edited so we lose the original message but I'm not sure how important that is. > The only ones I'm pausing a bit on are the strategy and > strategy-options. Those might be useful somehow...but I can't > currently quite put my finger on explaining how they would be useful > and I'm not sure they are. I can't think of an immediate use for them. When we re-create conflicts we do it per-file based on the index entries created by the original merge so I don't think we need to know anything about the strategy or strategy-options. > Am I missing anything? exec commands? If the user runs "git rebase --exec" and there are conflicts then we'd need to defer running the exec commands until the conflicts are resolved. For something like "git rebase --exec 'make test'" that should be fine. I wonder if there are corner cases where the exec command changes HEAD though. >> Even "git merge" can create several files in addition to MERGE_HEAD >> which are read when the conflict resolution is committed. > > That's a good one to bring up too, but I'm not sure I understand how > these could be useful to preserve either. Am I missing something? My > breakdown: > * MERGE_HEAD: was recorded in the commit as a second parent, so we > already have that info > * MERGE_MSG: was recorded in the commit as the commit message, so > again we already have that info > * MERGE_AUTOSTASH: irrelevant since the stashed stuff isn't part of > the commit and was in fact unstashed after the > merge-commit-with-conflicts was created > * MERGE_MODE: irrelevant since it's only used for reducing heads at > time of git-commit, and git-commit has already been run > * MERGE_RR: I think this is irrelevant; the conflict record (tree A > + tree B - tree C) lets us redo the merge if needed to get the list of > conflicted files and textual conflicts found therein > > So I don't see how any of the information in these files need to be > recorded as additional auxiliary information. However, that last item > might depend upon the strategy and strategy-options, which currently > is not recorded...hmm.... Yes, as we're creating some kind of commit we don't need to preserve those files separately. >>> But we'd also have to be careful and think through usecases, including >>> in the surrounding community. People would probably want to ensure >>> that e.g. "Protected" or "Integration" branches don't get accept >>> fetches or pushes of conflicted commits, >> >> I think this is a really important point, while it can be useful to >> share conflicts so they can be collaboratively resolved we don't want to >> propagate them into "stable" or production branches. I wonder how 'jj' >> handles this. > > Yeah, figuring this out might be the biggest sticking point. Indeed >>> git status would probably >>> need some special warnings or notices, git checkout would probably >>> benefit from additional warnings/notices checks for those cases, git >>> log should probably display conflicted commits differently, we'd need >>> to add special handling for higher order conflicts (e.g. a merge with >>> conflicts is itself involved in a merge) probably similar to what jj >>> has done, and audit a lot of other code paths to see what would be >>> needed. >> >> As you point out there is a lot more to this than just being able to >> store the conflict data in a commit - in many ways I think that is the >> easiest part of the solution to sharing conflicts. > > Yeah, another one I just thought of is that the trees referenced in > the conflicts would also need to affect reachability computations as > well, to make sure they both don't get gc'ed and that they are > transferred when appropriate. There are lots of things that would be > involved in implementing this idea. Yes, it would certainly be lots of work. Best Wishes Phillip ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: first-class conflicts? 2023-11-09 14:45 ` Phillip Wood @ 2023-11-10 22:57 ` Elijah Newren 0 siblings, 0 replies; 25+ messages in thread From: Elijah Newren @ 2023-11-10 22:57 UTC (permalink / raw) To: phillip.wood; +Cc: Sandra Snan, git, Martin von Zweigbergk, Randall S. Becker Hi Phillip, On Thu, Nov 9, 2023 at 6:45 AM Phillip Wood <phillip.wood123@gmail.com> wrote: > [...] > > This is a great thing to think about and bring up. However, I'm not > > sure what part of it actually needs to be preserved; in fact, it's not > > clear to me that any of it needs preserving -- especially not the > > files read by "git commit". A commit was already created, after all. > > > > It seems that CHERRY_PICK_HEAD/REVERT_HEAD files exist primarily to > > clue in that we are in-the-middle-of-<op>, and the conflict header > > (the "tree A + tree B - tree C" thing; whatever that's called) > > similarly provides signal that the commit still has conflicts. > > Secondarily, these files contain information about the tree we came > > from and its parent tree, which allows users to investigate the diff > > between those...but that information is also available from the > > conflict header in the recorded commit. The CHERRY_PICK_HEAD and > > REVERT_HEAD files could also be used to access the commit message, but > > that would have been stored in the conflicted commit as well. Are > > there any other pieces of information I'm missing? > > Mainly that I'm an idiot and forgot we were actually creating a commit > and can store the message and authorship there! You're definitely not an idiot. The whole problem space is new and different, so it's easy to overlook or forget certain details, and even to make completely different assumptions than others and have no one aware that we're operating with similar sounding but entirely different mental models. > More seriously I think > being able to inspect the commit being cherry-picked (including the > original commit message) is useful so we'd need to recreate something > like CHERRY_PICK_HEAD when the conflict commit is checked out. So, I see a few issues with this: 1) Even if we were to create CHERRY_PICK_HEAD as you envision, that doesn't necessarily guarantee the user can view the original commit because they may not have it. It may have been a local-only commit that wasn't pushed or pulled to the person who is now investigating it. 2a) You highlight the original commit message, but if someone doesn't want to immediately resolve conflicts, why would they be modifying the commit message? 2b) Even if users did want to modify the commit message without resolving conflicts, how would they do so? Rebasing has interactivity, but cherry-picking doesn't. And interactivity seems to be something people probably wouldn't use together with storing conflicts; the point of interactivity is to tweak things further and fix them up, suggesting they'd want to be running in address-conflicts-now mode. > Recreating CHERRY_PICK_HEAD is useful for "git status" as well. "git status" uses this file to determine if it should display information about currently being in the middle of a cherry-pick operation. Putting such a file in place would thus be misleading, because we aren't in a cherry-pick operation anymore; that has completed already. I would not expect the suggested commands printed by git-status while it thinks we're in such a state (namely, "git cherry-pick [--continue|--skip|--abort]") to work either. So, I'd argue it would be a bug to create such a file when checking out a conflicted-commit. Of course, we would want git-status to display information about the current commit being conflicted, but I think that could be based on the simple conflict header without additional info. > I think > that means storing a little more that just the "tree A + tree B - tree > C" thing. I'm totally willing to believe there will be cases where more info is needed. I'm suspecting that conflicts with certain kinds of renames, or which were performed with certain types of strategies or strategy options might be some examples. But I'm not sure I'm understanding why CHERRY_PICK_HEAD should be one of those cases. > > I think the big piece here is whether we also want to adopt jj's > > behavior of automatically rebasing all descendant commits when > > checking out and amending some historical commit (or at least having > > the option of doing so). That behavior allows users to amend commits > > to resolve conflicts without figuring out complicated interactive > > rebases to fix all the descendant commits across all relevant > > branches. > > That's a potentially attractive option which is fairly simple to > implement locally as I think you can use the commit DAG to find all the > descendants though that could be expensive if there are lots of > branches. However, if we're going to share conflicts I think we'd need > something like "hg evolve" - if I push a commit with conflicts and you > base some work on it and then I resolve the conflict and push again you > would want to your work to be rebased onto my conflict resolution. Ooh, that's an interesting point. > To handle "rebase --exec" we could store the exec command and run it when > the conflicts are resolved. So, my assumption is that even if we add the ability to commit conflicts and even if we default to auto-committing them during cherry-picks or non-interactive rebases, there will still be people who want to resolve conflicts as they are hit rather than auto-committing them, and thus that stop-on-conflict should always be an option. In the world where a user has this choice, I think it'd be rare for users to want to auto-commit conflicts with --exec. I'd suggest that --exec, and even --interactive, would default to stopping on conflicts and waiting for the user to resolve even if auto-commit-on-conflict is the default in other cases. That leaves me wondering if there are any cases where users want to auto-commit conflicts in.conjunction with --exec, which I'm already struggling to come up with, _and_ that would further want the exec commands to be preserved in the conflicted commits (and any descendant commits?) for later usage. Maybe there's a case for that, but I'm not coming up with it right now. Also, another way of looking at this is that my current mental model is that the cherry-pick or rebase operation is completed once it has handled each of the commits in its list; the operation does not extend until all the conflicts in the commits it creates are resolved. The fact that rebases do not extend until conflicts are resolved is important because you can later further rebase conflicted-commits (as Martin alludes to in his emails); considering the old rebase(s) to still be in progress while a new one starts might get excessively complex to handle. The reason all of this matters to --exec is that --exec is part of the rebase operation; once the rebase operation is done, the --exec stuff is also done. (And thus, if you don't want --exec to run on conflicted commits, then don't opt for auto-committing conflicts.). > Also I wonder how annoying it would be in cases where I just want to > rebase and resolve the conflicts now. At the moment "git rebase" stops > at the conflict, with this feature I'd have to go and checkout the > conflicted commit and fix the conflicts after the rebase had finished. I agree that would often be annoying. Personally, I think that auto-committing conflicts as a feature should at most be an option (even if perhaps the default in some cases), not a new mandatory worldview. And I'm currently not convinced that even if it were implemented it should be the default in any cases. > > Without that feature, I agree this might be a bit more > > difficult, > > Yes, when I wrote my original message I was imagining that we'd stop at > the first conflicting pick and store all the rebase state like some kind > of stash on steroids so it could be continued when the conflict was > resolved. It would be much simpler to try and avoid that. Yeah, this is an example of how completely different mental models we can come up with when none of us (other than Martin) know much about the problem space. I suspect there's at least a few more examples like this where we still have very different mental models, and perhaps some gems to be found by mixing and matching them. > > There are some special state files related to half-completed > > operations (e.g. squash commits when we haven't yet reached the final > > one in the sequence, a file to note that we want to edit a commit > > message once the user has finished resolving conflicts, whether we > > need to create a new root commit), but again, the operation has > > completed and commits have been created with appropriate parentage and > > commit messages so I don't think these are useful anymore either. > > Yes, though we may want to remember which commits were squashed together > so the user can inspect that when resolving conflicts. Ooh, that's interesting...though it does run into the problem of users not having access to the original commits. > > The biggest issue is perhaps that REBASE_HEAD is used in the > > implementation of `git rebase --show-current-patch`, but all > > information stored in that is still accessible -- the commit message > > is stored in the commit, the author time is stored in the commit, and > > the trees involved are in the conflict header. The only thing missing > > is committer timestamp, which isn't relevant anyway. > > The commit message may have been edited so we lose the original message > but I'm not sure how important that is. Is this a reversal from your comment earlier in your email about the importance of the original commit message for CHERRY_PICK_HEAD? :-) > > The only ones I'm pausing a bit on are the strategy and > > strategy-options. Those might be useful somehow...but I can't > > currently quite put my finger on explaining how they would be useful > > and I'm not sure they are. > > I can't think of an immediate use for them. When we re-create conflicts > we do it per-file based on the index entries created by the original > merge so I don't think we need to know anything about the strategy or > strategy-options. But we don't have index entries. We only have trees in this conflicted commit, and when users check it out, they probably expect conflicted index entries to be put into place. Can we correctly regenerate the right conflicted index entries from the original trees without the strategy and strategy-options command line flags? I suspect there might be problems here, and user-defined merge strategies could really throw a wrench in the works. Hmm... > > Am I missing anything? > > exec commands? If the user runs "git rebase --exec" and there are > conflicts then we'd need to defer running the exec commands until the > conflicts are resolved. For something like "git rebase --exec 'make > test'" that should be fine. I wonder if there are corner cases where the > exec command changes HEAD though. We talked about exec commands above, as well as the assumption whether auto-committing conflicts should be mandatory vs. an option, so I won't repeat that here. It was definitely a very interesting topic to bring up though; thanks! [...] > Yes, it would certainly be lots of work. ...but even if none of us get time and prioritization to work on it, I personally find it a really interesting topic to discuss and explore. Thanks for joining in and bringing up many good points! ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2023-11-12 23:25 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-06 21:17 first-class conflicts? Sandra Snan 2023-11-06 22:01 ` Dragan Simic 2023-11-06 22:34 ` Sandra Snan 2023-11-06 22:34 ` rsbecker 2023-11-06 22:45 ` Sandra Snan 2023-11-07 0:50 ` Theodore Ts'o 2023-11-11 1:31 ` Junio C Hamano 2023-11-11 7:48 ` Sandra Snan 2023-11-12 15:21 ` Theodore Ts'o 2023-11-12 23:25 ` Junio C Hamano 2023-11-07 11:23 ` Phillip Wood 2023-11-07 11:24 ` Sandra Snan 2023-11-07 8:16 ` Elijah Newren 2023-11-07 8:21 ` Dragan Simic 2023-11-07 9:16 ` Sandra Snan 2023-11-07 11:49 ` Phillip Wood 2023-11-07 17:38 ` Martin von Zweigbergk 2023-11-08 7:31 ` Elijah Newren 2023-11-08 18:22 ` Martin von Zweigbergk 2023-11-10 21:41 ` Elijah Newren 2023-11-12 7:05 ` Martin von Zweigbergk 2023-11-09 14:50 ` phillip.wood123 2023-11-08 6:31 ` Elijah Newren 2023-11-09 14:45 ` Phillip Wood 2023-11-10 22:57 ` Elijah Newren
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).