git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Rename conflicts in the index
@ 2013-03-13 14:08 Edward Thomson
  2013-03-13 18:05 ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Edward Thomson @ 2013-03-13 14:08 UTC (permalink / raw)
  To: git@vger.kernel.org

When a rename conflict occurs, the information about the conflict is
written to stdout and the index is updated as if the conflict were a
simpler conflict that did not involve renames.  This doesn't give a lot of
information to users after the fact - a status of "added in ours" does not
provide a lot of details about what occurred.

In reimplementations, we face a similar challenge.  Unfortunately, we can't
simply print the output to the console, we would like to provide this
information back to the caller by some mechanism.  My preference would be to
return this information in the index - so that the results of "unpack trees"
(if you will) was a single data structure (the index) and so that we could
persist this information to disk.  I've been considering the idea of an
extension that contains more detailed data about conflicts, but I certainly
wouldn't want to proceed without discussing this here further.

I would propose that we store the data about the file in conflict as it
occurred through the renames.  For example, in a rename 1->2 conflict where
A was renamed to both B and C, you would have a single conflict entry
containing the data for A, B and C.  This would allow us to provide more
detailed information to the user - and allow them to (say) choose a single
name to proceed with.

Is this something that has value to core git as well?  Alternately, is
there something particularly stupid about this proposal?

Thanks-
-ed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rename conflicts in the index
  2013-03-13 14:08 Rename conflicts in the index Edward Thomson
@ 2013-03-13 18:05 ` Junio C Hamano
  2013-03-13 20:44   ` Edward Thomson
  2013-03-26 18:30   ` Edward Thomson
  0 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2013-03-13 18:05 UTC (permalink / raw)
  To: Edward Thomson; +Cc: git@vger.kernel.org

Edward Thomson <ethomson@microsoft.com> writes:

> I would propose that we store the data about the file in conflict as it
> occurred through the renames.  For example, in a rename 1->2 conflict where
> A was renamed to both B and C, you would have a single conflict entry
> containing the data for A, B and C.  This would allow us to provide more
> detailed information to the user - and allow them to (say) choose a single
> name to proceed with.
>
> Is this something that has value to core git as well?  Alternately, is
> there something particularly stupid about this proposal?

I do not offhand see anything particularly stupid; a new optional
index extension section CACHE_EXT_RENAME_CONFLICT might be a good
addition.

Is "one side moves A to B while the other side moves it to C" the
only case, or is it just an example?  Off the top of my head, "one
side moves A to x while the other side moves B to x/y" would also be
something we would want to know.  I am sure there are other cases
that need to be considered.

I do not think we can discuss the design at the concrete level until
the proposal spells out to cover all interesting cases in order for
implementations to agree on the common semantics.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Rename conflicts in the index
  2013-03-13 18:05 ` Junio C Hamano
@ 2013-03-13 20:44   ` Edward Thomson
  2013-03-26 18:30   ` Edward Thomson
  1 sibling, 0 replies; 16+ messages in thread
From: Edward Thomson @ 2013-03-13 20:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

Junio C Hamano [mailto:gitster@pobox.com] wrote:
> I do not offhand see anything particularly stupid; a new optional index extension
> section CACHE_EXT_RENAME_CONFLICT might be a good addition.
> 
> Is "one side moves A to B while the other side moves it to C" the only case, or is
> it just an example?  Off the top of my head, "one side moves A to x while the
> other side moves B to x/y" would also be something we would want to know.  I
> am sure there are other cases that need to be considered.

Yes, that was just an example.  Certainly I was intending that all conflicts
that arose from renames would end up here since one can't really reason
why the merge tool created a conflict by looking at the index alone - even
knowing the merge tool's similarity algorithms, this would be awfully
expensive to piece back together, even if the index did contain non-zero
stage entries for all the items that were involved in the conflicts.

That said, my rather naive initial thought was that we could repeat *all*
conflicts in this area.  This would give tools that knew how to understand
this the ability to go to a single place for conflict data, rather than
producing some merge of high-stage entries that comprise non-rename
conflicts and data from the rename conflict area for rename conflicts.

Thanks-
-ed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Rename conflicts in the index
  2013-03-13 18:05 ` Junio C Hamano
  2013-03-13 20:44   ` Edward Thomson
@ 2013-03-26 18:30   ` Edward Thomson
  2013-03-26 19:24     ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Edward Thomson @ 2013-03-26 18:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

Junio C Hamano [mailto:gitster@pobox.com] wrote:
> Edward Thomson <ethomson@microsoft.com> writes:
> 
> > I would propose that we store the data about the file in conflict as
> > it occurred through the renames.  For example, in a rename 1->2
> > conflict where A was renamed to both B and C, you would have a single
> > conflict entry containing the data for A, B and C.  This would allow
> > us to provide more detailed information to the user - and allow them
> > to (say) choose a single name to proceed with.
> >
> > Is this something that has value to core git as well?  Alternately, is
> > there something particularly stupid about this proposal?
> 
> I do not offhand see anything particularly stupid; a new optional index extension
> section CACHE_EXT_RENAME_CONFLICT might be a good addition.
> 
> Is "one side moves A to B while the other side moves it to C" the only case, or is
> it just an example?  Off the top of my head, "one side moves A to x while the
> other side moves B to x/y" would also be something we would want to know.  I
> am sure there are other cases that need to be considered.
> 
> I do not think we can discuss the design at the concrete level until the proposal
> spells out to cover all interesting cases in order for implementations to agree on
> the common semantics.

Sorry about the delay here:  besides getting busy with some other things,
I wanted both a complete writeup and to have taken a pass at a test
implementation this in libgit2 to make sure seemed like a reasonably sensible
approach.

I would propose a new extension, 'CONF', to handle conflict data, differing
from the stage >0 entries in the index in that this extension tracks the
conflicting file across names if the underlying merge engine has support
for renames.

I made an attempt to keep the entry data similar to other entries in the
index.  I would propose that entries in the conflict are as follows:

Flags
  Four octets that describe the conflict.  Data includes:

  0x01  HAS_ANCESTOR
    There is a file in the common ancestor branch that contributes
    to this conflict.  Its data will follow.
  0x02  HAS_OURS
    There is a file in "our" branch that contributes to this conflict.
    Its data will follow.
  0x04  HAS_THEIRS
    There is a file in "their" branch that contributes to this conflict.
    Its data will follow.

  0x08  NAME_CONFLICT_OURS
    This item has a path in "our" branch that overlaps a different
    item in "their" branch.  (Eg, this conflict represents the "our"
    side of a rename/add conflict.)
  0x10  NAME_CONFLICT_THEIRS
    This item has a path in "their" branch that overlaps a different
    item in "our" branch.  (Eg, this conflict represents the "theirs"
    side of a rename/add conflict.)

  0x20  DF_CONFLICT_FILE
    This is the file involved in a directory/file conflict.
  0x40  DF_CONFLICT_CHILD
    This is a child of a directory involved in a directory/file conflict.

  Other bits are reserved.

Conflict Sides
  The data about one side of a conflict will contain:
  mode (ASCII string representation of octal, null-terminated)
  path (null terminated)
  sha1 (raw bytes)

The conflict sides will be written in this order:
  Ancestor (if HAS_ANCESTOR is set)
  Ours (if HAS_OURS is set)
  Theirs (if HAS_THEIRS is set)

I would propose that this not simply track rename conflicts, but all
conflicts.  Having a single canonical location is preferable - if the index
contains a CONF section (and the client supports it), it would use that.
Otherwise, the client would look at stage >0 entries.

I would propose that another extension, 'RSVD', track these conflicts once
they are resolved.  The format would be the same - when a conflict is
resolved from the CONF the entry will be placed as-is in the RSVD.

Examples are not an exhaustive list, but should help elucidate the name
and d/f conflicts:

Normal edit / edit conflict, where A is edited in ours and theirs:

  Conflict one:
    Flags = HAS_ANCESTOR|HAS_OURS|HAS_THEIRS
    Entry 1 = A [Ancestor]
    Entry 2 = B [Ancestor]
    Entry 3 = C [Ancestor]

Rename / add conflict, where A is renamed to B in ours and B is added in
theirs:

  Conflict one:
    Flags = HAS_ANCESTOR|HAS_OURS|NAME_CONFLICT_OURS
    Entry 1 = A [Ancestor]
    Entry 2 = B [Ours]
    Entry 3 = A [Theirs]
  Conflict two:
    Flags = HAS_THEIRS|NAME_CONFLICT_THEIRS
    Entry 1 = File B [Theirs]

D/F conflict, where some file A is deleted in theirs, and a directory
A is created with file child:

  Conflict one:
    Flags = HAS_ANCESTOR|HAS_OURS|HAS_THEIRS|DF_CONFLICT_FILE
    Entry 1 = A [Ancestor]
    Entry 2 = A [Ours]
  Conflict two:
    Flags = HAS_THEIRS|DF_CONFLICT_CHILD
    Entry 1 = A/child [Theirs]

Thanks for your input on this.

-ed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rename conflicts in the index
  2013-03-26 18:30   ` Edward Thomson
@ 2013-03-26 19:24     ` Junio C Hamano
  2013-03-26 20:40       ` Edward Thomson
  2013-03-27 17:03       ` Edward Thomson
  0 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2013-03-26 19:24 UTC (permalink / raw)
  To: Edward Thomson; +Cc: git@vger.kernel.org

Edward Thomson <ethomson@microsoft.com> writes:

> I would propose a new extension, 'CONF', to handle conflict data, differing
> from the stage >0 entries in the index in that this extension tracks the
> conflicting file across names if the underlying merge engine has support
> for renames.
>
> I made an attempt to keep the entry data similar to other entries in the
> index.  I would propose that entries in the conflict are as follows:
>
> Flags
>   Four octets that describe the conflict.  Data includes:
>
>   0x01  HAS_ANCESTOR
>     There is a file in the common ancestor branch that contributes
>     to this conflict.  Its data will follow.
>   0x02  HAS_OURS
>     There is a file in "our" branch that contributes to this conflict.
>     Its data will follow.
>   0x04  HAS_THEIRS
>     There is a file in "their" branch that contributes to this conflict.
>     Its data will follow.
>
>   0x08  NAME_CONFLICT_OURS
>     This item has a path in "our" branch that overlaps a different
>     item in "their" branch.  (Eg, this conflict represents the "our"
>     side of a rename/add conflict.)
>   0x10  NAME_CONFLICT_THEIRS
>     This item has a path in "their" branch that overlaps a different
>     item in "our" branch.  (Eg, this conflict represents the "theirs"
>     side of a rename/add conflict.)
>
>   0x20  DF_CONFLICT_FILE
>     This is the file involved in a directory/file conflict.
>   0x40  DF_CONFLICT_CHILD
>     This is a child of a directory involved in a directory/file conflict.
>
>   Other bits are reserved.
>
> Conflict Sides
>   The data about one side of a conflict will contain:
>   mode (ASCII string representation of octal, null-terminated)
>   path (null terminated)
>   sha1 (raw bytes)
>
> The conflict sides will be written in this order:
>   Ancestor (if HAS_ANCESTOR is set)
>   Ours (if HAS_OURS is set)
>   Theirs (if HAS_THEIRS is set)

Puzzled.  Most of the above, except NAME_CONFLICT_{OURS,THEIRS}
bits, look totally pointless duplication.

When you are working with Git, you have to be prepared to read from
the datafile like the index that other people (and your previous
version) created, and you also have to make sure you do not make
what you write out unusable by other people without a good reason.

So your tool needs code to see higher stage entries in the main
index to find <mode,sha1> for the conflicted paths even without the
index extension anyway, and if your tool does also perform merges,
you would need to strive for writing the main index with conflicted
entries and implementations that do not yet understand your
extension can keep operating.  For some types of extensions, the
latter may be hard (and that is why I stopped at "you would need to
strive for", and not "you must"), but for the one under discussion,
I do not think it is the case (by the way "CONF" sounds as if it is
some sort of configuration data).

If you are starting a brand new system from scratch, keeping only
the resolved entries in the main index and having a separate section
for conflicts might be also a valid design choice, but you do not
live in that world if you are discussing the design on this mailing
list.

> I would propose that this not simply track rename conflicts, but all
> conflicts.

That is a no starter.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Rename conflicts in the index
  2013-03-26 19:24     ` Junio C Hamano
@ 2013-03-26 20:40       ` Edward Thomson
  2013-03-27 17:03       ` Edward Thomson
  1 sibling, 0 replies; 16+ messages in thread
From: Edward Thomson @ 2013-03-26 20:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org


Junio C Hamano [mailto:gitster@pobox.com] wrote:
> Edward Thomson <ethomson@microsoft.com> writes:
> > I would propose a new extension, 'CONF', to handle conflict data,
> > differing from the stage >0 entries in the index in that this
> > extension tracks the conflicting file across names if the underlying
> > merge engine has support for renames.
> >
> > I made an attempt to keep the entry data similar to other entries in
> > the index.  I would propose that entries in the conflict are as follows:
> >
> > Flags
> >   Four octets that describe the conflict.  Data includes:
> >
> >   0x01  HAS_ANCESTOR
> >     There is a file in the common ancestor branch that contributes
> >     to this conflict.  Its data will follow.
> >   0x02  HAS_OURS
> >     There is a file in "our" branch that contributes to this conflict.
> >     Its data will follow.
> >   0x04  HAS_THEIRS
> >     There is a file in "their" branch that contributes to this conflict.
> >     Its data will follow.
> >
> >   0x08  NAME_CONFLICT_OURS
> >     This item has a path in "our" branch that overlaps a different
> >     item in "their" branch.  (Eg, this conflict represents the "our"
> >     side of a rename/add conflict.)
> >   0x10  NAME_CONFLICT_THEIRS
> >     This item has a path in "their" branch that overlaps a different
> >     item in "our" branch.  (Eg, this conflict represents the "theirs"
> >     side of a rename/add conflict.)
> >
> >   0x20  DF_CONFLICT_FILE
> >     This is the file involved in a directory/file conflict.
> >   0x40  DF_CONFLICT_CHILD
> >     This is a child of a directory involved in a directory/file conflict.
> >
> >   Other bits are reserved.
> >
> > Conflict Sides
> >   The data about one side of a conflict will contain:
> >   mode (ASCII string representation of octal, null-terminated)
> >   path (null terminated)
> >   sha1 (raw bytes)
> >
> > The conflict sides will be written in this order:
> >   Ancestor (if HAS_ANCESTOR is set)
> >   Ours (if HAS_OURS is set)
> >   Theirs (if HAS_THEIRS is set)
> 
> Puzzled.  Most of the above, except NAME_CONFLICT_{OURS,THEIRS} bits, look
> totally pointless duplication.

Obviously HAS_ANCESTOR / HAS_OURS / HAS_THEIRS is to indicate to a reader
whether there is data to be read or not.  Similar to how a mode of 0
in the REUC indicates that the rest of the record should not be read.)

> When you are working with Git, you have to be prepared to read from the
> datafile like the index that other people (and your previous
> version) created, and you also have to make sure you do not make what you
> write out unusable by other people without a good reason.

I'm acutely aware that you need to be able to read an index that other
people created - that's the problem at hand.  git does not produce an
index that allows anyone (including itself) to reason about rename
conflicts.  It doesn't even bother to write high-stage conflict entries
in many instances, so you can have an instance where git tells you that
a conflict occurred but one of those files is staged anyway, the other
is just dirty in the workdir and you can commit immediately thereafter.

While obviously it's possible to handle this situation (is file A
in conflict?  Look in the rename conflict extension.  Not there?  Okay,
look in the index.)  That's not exactly elegant.  My goal here was
to have a single source for conflicts.

> (by the way "CONF" sounds as if it is some sort of configuration data).

There's only four letters, and not everything's as easy as "TREE".  REUC,
for example, sounds like a donkey, though I suppose it depends on the
language in question.

-ed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Rename conflicts in the index
  2013-03-26 19:24     ` Junio C Hamano
  2013-03-26 20:40       ` Edward Thomson
@ 2013-03-27 17:03       ` Edward Thomson
  2013-03-27 17:34         ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Edward Thomson @ 2013-03-27 17:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

Junio C Hamano [mailto:gitster@pobox.com] wrote:
> Edward Thomson <ethomson@microsoft.com> writes:
> > I would propose that this not simply track rename conflicts, but all
> > conflicts.
> 
> That is a no starter.

So.  Can you explain to me why this would be a non starter?  Can you suggest
some alternate strategy here?

Maybe there's something I'm fundamentally misunderstanding.  It seems that
at present, git will:

1. Detect rename conflicts when performing a merge (at least,
   git-merge-recursive will, which is the default.)

2. If the rename itself caused a conflict (eg, renamed in one side, added in
   the other) then the merge cannot succeed.

3. The resultant index is written as if renames were not detected, which
   means - at best - records the files that went in to the name conflict
   and git status reports an "added in ours" conflict, which is a pretty
   disappointing conflict.  Often, though, many of the files will not
   exist at higher stage entries, since without rename detection, they
   would have not been conflicts.  At worst, one side is staged, there are
   no conflicts in the index and the user can commit (and thus lose the
   other side.)

Thus it's not like we could add some extension that merely records the names
that produced the rename conflicts and point them at the higher stage entries
in the index.  That would require that they actually be in the index.

Thanks-
-ed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rename conflicts in the index
  2013-03-27 17:03       ` Edward Thomson
@ 2013-03-27 17:34         ` Junio C Hamano
  2013-03-27 18:53           ` Edward Thomson
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2013-03-27 17:34 UTC (permalink / raw)
  To: Edward Thomson; +Cc: git@vger.kernel.org

Edward Thomson <ethomson@microsoft.com> writes:

> Junio C Hamano [mailto:gitster@pobox.com] wrote:
>> Edward Thomson <ethomson@microsoft.com> writes:
>> > I would propose that this not simply track rename conflicts, but all
>> > conflicts.
>> 
>> That is a no starter.
>
> So.  Can you explain to me why this would be a non starter?

At least two, IIRC.  One is the consequence of the other.

We do not gratuitously break existing implementations.  If no
conflict is stored as higher-stage index entries in an index that
has your index extension, no existing implementation can read a
conflicted index written by your implementation and have users
resolve conflicts.

When a path originally at A is moved to B on only one branch, and
there are content-level conflicts between the changes made by one
branch (while going from A to B) and by the other branch (while
keeping it at A), we would end up having three stages for path B
without any trace of path A.  I do not offhand know how much it
helps to learn A in such a situation in the real life, but we are
indeed losing information, and I do not have any problem with an
extension that records in the index the fact that in the two (of the
three) commits involved in the merge, the path was at A.

But people have been successfully using existing versions of Git
without that information to merge branches with renames, and
resolving the content-level conflicts.  Your tool that
_additionally_ records "This path that currently has three stages
for B was at A in the common ancestor (i.e. stage #1) and that
branch (either stage #2 or stage #3)" does not _have_ _to_ break
these users by removing the three stages for B from the main index.

Also we do not duplicate information unnecessarily.  Nowhere in the
above "we have been losing the fact that two of the three had the
contents we have at path B in the resulting unmerged index at path
A, and that information might be useful as well", there is a reason
to write another copy of mode or SHA-1 for any of the three variants.

As I said, you do not live in the world where you are writing
something like Git from scratch.  Perhaps you do, but then the
result will not be Git and we wouldn't be discussing that system on
this mailing list.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Rename conflicts in the index
  2013-03-27 17:34         ` Junio C Hamano
@ 2013-03-27 18:53           ` Edward Thomson
  2013-03-27 19:44             ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Edward Thomson @ 2013-03-27 18:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

Junio C Hamano [mailto:gitster@pobox.com] wrote:
> We do not gratuitously break existing implementations.  If no conflict is stored
> as higher-stage index entries in an index that has your index extension, no
> existing implementation can read a conflicted index written by your
> implementation and have users resolve conflicts.

I'm not suggesting that anybody stop writing >0 stage entries.

> When a path originally at A is moved to B on only one branch, and there are
> content-level conflicts between the changes made by one branch (while going
> from A to B) and by the other branch (while keeping it at A), we would end up
> having three stages for path B without any trace of path A.  I do not offhand
> know how much it helps to learn A in such a situation in the real life, but we are
> indeed losing information, and I do not have any problem with an extension that
> records in the index the fact that in the two (of the
> three) commits involved in the merge, the path was at A.

What you've described is true only for a certain class of rename conflicts,
for example the rename/edit conflict you've described above.

It's also true if you were to rename some item 'a' to 'b' in both branches.
But when 'b' is sufficiently dissimilar to become a rewrite, then I end up
with a rename of a->b on one side and deleting a and adding b on the other.
The result is a mysterious "added by us" conflict:

100644 e2dd530c9f31550a2b0c90773ccde056929d6d66 2       b

Worse yet is if I don't do the rename in my side, but I just add a new b so
that in theirs I've renamed a to b and in mine I have both a and b.  When I
do the merge, I'm told I have conflicts, except that I don't:

100644 08d4f831774aed5d4c6cb496affefd4020dce40c 0       b

The other branch's b is long gone and exists only as a dirty file in the
workdir.

> But people have been successfully using existing versions of Git without that
> information to merge branches with renames, and resolving the content-level
> conflicts.

But you aren't afforded the option to resolve content-level conflicts if you
don't know where the conflict came from.  For example, in a rename 1->2
conflict, we dutifully detect that a was renamed to both b and c and fail,
but that fact is never given to the index.  This conflict could be fed into
a merge tool or, better, automerged, with the user only needing to pick a
path:

100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 1       a
100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 3       b
100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 2       c

I hate to sound like a broken record here, but without some more data in the
index - anywhere, really - any tool that doesn't have the luxury of emitting
data about what happened to stdout certainly can't infer anything about what
happened in the merge.

-ed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rename conflicts in the index
  2013-03-27 18:53           ` Edward Thomson
@ 2013-03-27 19:44             ` Junio C Hamano
  2013-04-01 16:14               ` Edward Thomson
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2013-03-27 19:44 UTC (permalink / raw)
  To: Edward Thomson; +Cc: git@vger.kernel.org

Edward Thomson <ethomson@microsoft.com> writes:

> Junio C Hamano [mailto:gitster@pobox.com] wrote:
>> We do not gratuitously break existing implementations.  If no conflict is stored
>> as higher-stage index entries in an index that has your index extension, no
>> existing implementation can read a conflicted index written by your
>> implementation and have users resolve conflicts.
>
> I'm not suggesting that anybody stop writing >0 stage entries.

Ah, OK, then I misread your original message.  You said

> Having a single canonical location is preferable - if the index
> contains a CONF section (and the client supports it), it would use
> that.  Otherwise, the client would look at stage >0 entries.

which I read as "an index with this extension would not have higher
stage entries, an index without the extension records higher stage
entries".  As long as the format will be backward compatible to
allow existing users use existing tools to deal with cases the
existing tools can handle, then that is OK.  I didn't get that
impression which is where my "non starter" came from.

> What you've described is true only for a certain class of rename conflicts,
> for example the rename/edit conflict you've described above.

As you asked me to explain why it was a non starter, I only
illustrated with a "renamed trivially, with content level conflict"
example that shows why dropping higher-stage entries in the main
index would not be acceptable.  The previous message did not even
mean to cover any cases the *new* feature you have in mind is trying
to address.  Again, if it hurts existing users handling cases
existing tools used to handle, that makes it a non starter.

How new feature is designed, and extension is added to help that new
feature, is a different matter.  My original "That's a non starter"
message didn't even go that far.

In any case, the principle of "always record the state 'merge'
stopped to ask for help as higher stage entries to give existing
tools and users a chance to manually resolve, and augment with
optional extension to record additional information that might help,
but do not gratiutously waste bytes on redundant information" would
apply to other exotic cases you would want to tackle with the new
feature, I would think.

If one branch moves path A in the original to path B and the other
one moved it to path C, for example, we can record it in different
ways, even in the main index.

 * Path A may have only stage #1, while path B and C has only stage
   #2 and stage #3 (the user would have to notice these three
   correspond to each other, and resolve manually).

   You would want to annotate "B at stage #2 seems to have been at A
   in the original" (similarly for C#3) if you choose to do so.

 * You can choose to favor "our" choice, and have path B with three
   stages (if we guessed wrong and the user wants to move it to C,
   the user can resolve and then "git mv" the path).

   You would want to annotate "the other side wanted to have B at
   stage #3 at C" in that case.

 * Or you may want to have in the main index both B and C (but not
   A) with all three stages (the user would have to choose which
   one survives, but discarding the other side with "git rm" would
   be easy).

   You would want to annotate the origin of the stage #1 for path B
   and C (these were originally at A), stage #2 for B (the other
   branch wants to have it at C), stage #3 for C (we want to have it
   at B).

There may be other ways, and I do not offhand know what the current
merge-recursive implementation does, but both of the latter two
sound equally usable and reasonable ways, even without the
annotation.  And with your annotation that records different paths,
the conflict may become even easier to resolve.

I still do not need to duplicate <mode, SHA-1> in the extensions to
do the above, or do I?

If the original path A was removed and a new path B was added, with
contents that are modified from A beyond recognition, at the merge
time you wouldn't know where B it came from or where A went, so
annotating A at stage #1 to say "it went to B" is a nonsense.  If
you have algorithm to do so [*1*], you would be better off detecting
it as a rename.


[Footnote]

*1* Instead of a three-way merge that inspects only the endpoints,
    you might get a better rename trail if you looked at the
    histories of both branches.  It would be a lot more expensive
    than the simple three-way, but burning CPU cycles is better than
    burning human neurons.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Rename conflicts in the index
  2013-03-27 19:44             ` Junio C Hamano
@ 2013-04-01 16:14               ` Edward Thomson
  2013-04-02 20:55                 ` Edward Thomson
  0 siblings, 1 reply; 16+ messages in thread
From: Edward Thomson @ 2013-04-01 16:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

Junio C Hamano [mailto:gister@pobox.com] wrote:
> As long as
> the format will be backward compatible to allow existing users use existing tools
> to deal with cases the existing tools can handle, then that is OK.  I didn't get that
> impression which is where my "non starter" came from.

I see now.  Thank you for the clarification.  I apologize if I was not clear
about this; indeed, the duplication of data in my proposed extension was
specifically to avoid any compatibility problems amongst clients.

In particular, when we have a rename in ours, edit in theirs conflict, we
store the conflict at the new (ours) path.  If, for example, I rename a->b
in my branch and merge a branch that edits a:

mode hash 1 b
mode hash 2 b
mode hash 3 b

This prohibits us from storing anything else in the theirs side at that
path, so if I were to have added b in their branch in addition to modifying
b, I cannot record it.

I was assuming that any change to this behavior would be a breaking one,
which is where the new section came from.

>  * Path A may have only stage #1, while path B and C has only stage
>    #2 and stage #3 (the user would have to notice these three
>    correspond to each other, and resolve manually).
> 
>    You would want to annotate "B at stage #2 seems to have been at A
>    in the original" (similarly for C#3) if you choose to do so.

If we're going to make changes to the way conflicts are recorded in the
main index, then I would prefer this approach.  It is unambiguous and all
data about all sides are recorded, including the names that items had in
their respective branches.

I would think that this might be a burden on current tools, however.
Now if I rename a->b my just my branch, my conflict will be recorded as:

mode hash 1 a
mode hash 2 b
mode hash 3 a

And current git-status will not look at any rename annotations to know
how to report this.

However, maybe this is not as big a problem as I'm concerned it would be.

>  * You can choose to favor "our" choice, and have path B with three
>    stages (if we guessed wrong and the user wants to move it to C,
>    the user can resolve and then "git mv" the path).

I think this approach suffers from the drawback that the current approach
has, wherein this conflicts when they had path B, also, as noted above.

I think that if you were to put both B and C with all three stages, this
would be problematic for the same reason.

> *1* Instead of a three-way merge that inspects only the endpoints,
>     you might get a better rename trail if you looked at the
>     histories of both branches.  It would be a lot more expensive
>     than the simple three-way, but burning CPU cycles is better than
>     burning human neurons.

For the record, I like this approach very much.  It's not something that
libgit2 will be able to tackle in the near future; we're in a sort of
walk-before-you-can-run situation with merge at the moment, as you can
probably see.  But any improvement that avoids burning neurons is a
valuable one.

Thanks-

-ed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Rename conflicts in the index
  2013-04-01 16:14               ` Edward Thomson
@ 2013-04-02 20:55                 ` Edward Thomson
  2013-04-02 21:20                   ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Edward Thomson @ 2013-04-02 20:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

Edward Thomson [ethomson@microsoft.com] wrote:
> Junio C Hamano [mailto:gister@pobox.com] wrote:
> >  * Path A may have only stage #1, while path B and C has only stage
> >    #2 and stage #3 (the user would have to notice these three
> >    correspond to each other, and resolve manually).
> >
> >    You would want to annotate "B at stage #2 seems to have been at A
> >    in the original" (similarly for C#3) if you choose to do so.
> 
> If we're going to make changes to the way conflicts are recorded in the main
> index, then I would prefer this approach.  It is unambiguous and all data about
> all sides are recorded, including the names that items had in their respective
> branches.

Junio, did you have additional thoughts on this?

What would you like from me to proceed?  If the aforementioned seems
reasonable, I can update Documentation/technical/index-format.txt and
we can iron out the details in that fashion?

Thanks-
-ed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rename conflicts in the index
  2013-04-02 20:55                 ` Edward Thomson
@ 2013-04-02 21:20                   ` Junio C Hamano
  2013-04-02 21:29                     ` Edward Thomson
       [not found]                     ` <A54CE3E330039942B33B670D971F857403A0F593@TK5EX14MBXC253.redmond.cor p.microsoft.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2013-04-02 21:20 UTC (permalink / raw)
  To: Edward Thomson; +Cc: git@vger.kernel.org

Edward Thomson <ethomson@microsoft.com> writes:

> Edward Thomson [ethomson@microsoft.com] wrote:
>> Junio C Hamano [mailto:gister@pobox.com] wrote:
>> >  * Path A may have only stage #1, while path B and C has only stage
>> >    #2 and stage #3 (the user would have to notice these three
>> >    correspond to each other, and resolve manually).
>> >
>> >    You would want to annotate "B at stage #2 seems to have been at A
>> >    in the original" (similarly for C#3) if you choose to do so.
>> 
>> If we're going to make changes to the way conflicts are recorded in the main
>> index, then I would prefer this approach.  It is unambiguous and all data about
>> all sides are recorded, including the names that items had in their respective
>> branches.
>
> Junio, did you have additional thoughts on this?

Not at this moment.

I think we have covered the principles (do not unnecessarily
duplicate information, do not break existing implementations
unnecessarily, etc.) already, and we know how we want to record "one
side renamed A to B, the other side renamed A to C" case, but I do
not think the discussion covered all cases yet.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: Rename conflicts in the index
  2013-04-02 21:20                   ` Junio C Hamano
@ 2013-04-02 21:29                     ` Edward Thomson
       [not found]                     ` <A54CE3E330039942B33B670D971F857403A0F593@TK5EX14MBXC253.redmond.cor p.microsoft.com>
  1 sibling, 0 replies; 16+ messages in thread
From: Edward Thomson @ 2013-04-02 21:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git@vger.kernel.org

Junio C Hamano [mailto:gitster@pobox.com] wrote:
> Edward Thomson <ethomson@microsoft.com> writes:
> > Junio, did you have additional thoughts on this?
> 
> Not at this moment.
> 
> I think we have covered the principles (do not unnecessarily duplicate
> information, do not break existing implementations unnecessarily, etc.) already,
> and we know how we want to record "one side renamed A to B, the other side
> renamed A to C" case, but I do not think the discussion covered all cases yet.

Sorry, I'm not sure what you're asking for - would you just like some more
examples of what this looks like with aforementioned exotic conflict types?
Or are you looking for something more strict - BNF format, for example?

-ed

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rename conflicts in the index
       [not found]                     ` <A54CE3E330039942B33B670D971F857403A0F593@TK5EX14MBXC253.redmond.cor p.microsoft.com>
@ 2013-04-03  0:53                       ` Junio C Hamano
  2013-04-03  0:57                         ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2013-04-03  0:53 UTC (permalink / raw)
  To: Edward Thomson; +Cc: git@vger.kernel.org

Edward Thomson <ethomson@microsoft.com> writes:

> Junio C Hamano [mailto:gitster@pobox.com] wrote:
>> Edward Thomson <ethomson@microsoft.com> writes:
>> > Junio, did you have additional thoughts on this?
>> 
>> Not at this moment.
>> 
>> I think we have covered the principles (do not unnecessarily duplicate
>> information, do not break existing implementations unnecessarily, etc.) already,
>> and we know how we want to record "one side renamed A to B, the other side
>> renamed A to C" case, but I do not think the discussion covered all cases yet.
>
> Sorry, I'm not sure what you're asking for - would you just like some more
> examples of what this looks like with aforementioned exotic conflict types?
> Or are you looking for something more strict - BNF format, for example?

Ehh, I wasn't asking for anything ;-)

You asked if I had any additional thoughts, I answered there is
nothing at this moment based on what I saw so far.  It is not my
immediate itch to update the index with more rename information, but
it is yours, so I would imagine you would know what cases you would
want to improve the end user experience better than I do ;-).

If I were solving the issue, I would probably proceed like this:

 * Start from a rough sketch of what extra information I would want
   to store in the new index extension section.

 * Teach read-cache.c to read from the new extension and keep it in
   an in-core data structure, and read from the in-core data
   structure and seriealize it to write to the extension section.

 * Perhaps enhance "update-index" so that it can read textual
   representation of the contents of the new extension section, turn
   it into the in-core representation, so that it can write it out
   to the index file, as a debugging/development aid.

 * Teach read-cache.c to read from the new extension and keep it in
   an in-core data structure.

 * Teach wt-status.c to read from that in-core data structure and
   improve the presentation of the cases I care about using that
   information.  Use the "update-index" development aid to prepare
   various cases you care about.

    - If the kind of information that is stored in the new extension
      turns out to be insufficient, go back to the beginning and
      iterate.

    - If the use the in-core data structure here turns out to be
      awkward, go back one step and iterate.

    - As I cover one more case, I would add a test to the test suite
      so that we would know what cases are covered and what the
      expected end-user presentation should be.

 * Once the result of the above covers all the cases I care about,
   then update merge-recursive.c to prepare the in-core data
   structure to be written out as the extension section.

As I iterate, the rough sketch will hopefully cover all the cases I
care about and I'll be ready to write them down as an update to the
document somewhere in Documentation/technical/api-*.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Rename conflicts in the index
  2013-04-03  0:53                       ` Junio C Hamano
@ 2013-04-03  0:57                         ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2013-04-03  0:57 UTC (permalink / raw)
  To: Edward Thomson; +Cc: git@vger.kernel.org

Junio C Hamano <gitster@pobox.com> writes:

> If I were solving the issue, I would probably proceed like this:
>
>  * Start from a rough sketch of what extra information I would want
>    to store in the new index extension section.
>
>  * Teach read-cache.c to read from the new extension and keep it in
>    an in-core data structure, and read from the in-core data
>    structure and seriealize it to write to the extension section.
>
>  * Perhaps enhance "update-index" so that it can read textual
>    representation of the contents of the new extension section, turn
>    it into the in-core representation, so that it can write it out
>    to the index file, as a debugging/development aid.
>
>  * Teach read-cache.c to read from the new extension and keep it in
>    an in-core data structure.

Sorry, this is a dup of the second one.  Please ignore.

Also, all "you" in this section should read "I" (because this is a
description of "If I were solving it").

>  * Teach wt-status.c to read from that in-core data structure and
>    improve the presentation of the cases I care about using that
>    information.  Use the "update-index" development aid to prepare
>    various cases you care about.
>
>     - If the kind of information that is stored in the new extension
>       turns out to be insufficient, go back to the beginning and
>       iterate.
>
>     - If the use the in-core data structure here turns out to be
>       awkward, go back one step and iterate.
>
>     - As I cover one more case, I would add a test to the test suite
>       so that we would know what cases are covered and what the
>       expected end-user presentation should be.
>
>  * Once the result of the above covers all the cases I care about,
>    then update merge-recursive.c to prepare the in-core data
>    structure to be written out as the extension section.
>
> As I iterate, the rough sketch will hopefully cover all the cases I
> care about and I'll be ready to write them down as an update to the
> document somewhere in Documentation/technical/api-*.
>
> Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-04-03  0:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-13 14:08 Rename conflicts in the index Edward Thomson
2013-03-13 18:05 ` Junio C Hamano
2013-03-13 20:44   ` Edward Thomson
2013-03-26 18:30   ` Edward Thomson
2013-03-26 19:24     ` Junio C Hamano
2013-03-26 20:40       ` Edward Thomson
2013-03-27 17:03       ` Edward Thomson
2013-03-27 17:34         ` Junio C Hamano
2013-03-27 18:53           ` Edward Thomson
2013-03-27 19:44             ` Junio C Hamano
2013-04-01 16:14               ` Edward Thomson
2013-04-02 20:55                 ` Edward Thomson
2013-04-02 21:20                   ` Junio C Hamano
2013-04-02 21:29                     ` Edward Thomson
     [not found]                     ` <A54CE3E330039942B33B670D971F857403A0F593@TK5EX14MBXC253.redmond.cor p.microsoft.com>
2013-04-03  0:53                       ` Junio C Hamano
2013-04-03  0:57                         ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).