git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 'git notes merge' implementation questions
@ 2010-04-21  7:57 Johan Herland
  2010-04-21 17:12 ` Jonathan Nieder
  2010-04-21 21:29 ` Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Johan Herland @ 2010-04-21  7:57 UTC (permalink / raw)
  To: git; +Cc: gitster, johan

Hi,

I'm planning the implementation of 'git notes merge', a new 'git notes'
subcommand that will (in addition to various options not yet determined)
take a <notes_ref> argument, identifying a notes tree to be merged into
the _current_ notes_ref (as specified with --ref, $GIT_NOTES_REF,
core.notesRef, or defaulted to "refs/notes/commits").

This subcommand will typically be used when pulling notes from remotes.
What follows, are my thoughts on how this command should work, and
implementation challenges/problems that I'm not yet sure how to solve:

For merges between notes refs with no common history the merge should be
a straightforward joining of trees. This also covers the "history-less"
notes refs, like Peff's notes-cache, and should work whether the notes
refs point to parentless commits, or point directly to tree objects (if
we want to support that). For merges between notes refs with common
history we want to do a (more or less regular) three-way merge.

In both cases (with or without common history), conflicts may ensue,
and these must of course be resolved in some way. Since the notes refs
have no accompanying worktree, we must find some other way to resolve
conflicts. See the discussion on "conflict resolvers" below for more
details.

I would like to adapt the current merge machinery to do most of the
work for us, but there are some non-trivial challenges in adapting it
to the needs of the 'git notes merge' command. AFAICS, there are two
major problems:


1. Handling different notes-tree fanouts in a merge scenario

   Notes trees adjust their fanout automatically based on the number
   of notes, and we will inevitably have to merge notes trees with
   _different_ fanouts. This must be handled in some fashion, so that
   conflicting notes for the same object - but located at different
   fanouts - are properly conflict-resolved.

   Possible solution: When building the index, add a callback (within
   'read-tree'?) that may manipulate each entry added to the index.
   Provide a callback that simply removes all directory separators,
   thus normalising all note object names to their 40-char SHA1. As a
   result, the merge happens in a large flat "directory" with no
   "subdirs". When converting the index back into a notes tree, we
   obviously need to auto-establish an appropriate fanout.

   Remaining problem: How do we still ignore/skip identical subtrees?

   Alternative solution: As a pre-merge step, require that the notes
   trees are forced/rewritten to a common fanout (rewriting the smaller
   notes tree into the fanout of the larger notes tree).


2. Merging without a worktree

2.1 Problem: 'git merge' operates on the current HEAD.

    Possible solution: Allow 'merge' to update a ref that is not HEAD
    (i.e. remove HEAD from all merge logic). Instead, allow users to
    explicitly specify which ref to update (and thus the first parent
    of the merge).

2.2 Problem: 'git merge' operates on the current index.

    Simple solution: Provide a separate index (using $GIT_INDEX_FILE)
    in which the merge can take place (possibly using the modified
    'read-tree' from above)

2.3 Problem: 'git merge' updates the worktree.

    Simple solution:  When 'merge' has prepared an index (containing
    unmerged entries), _don't_ start looking for a working tree.
    Instead, invoke conflict resolvers on the unmerged entries (see
    below).

2.4 Problem: Resolving conflicts without a worktree.

    Possible solution: Conflict resolvers:

    These are functions/programs that resolve unmerged entries in the
    index without requiring a worktree. There should be several
    built-in/bundled conflict resolvers, and it should be possible for
    the user to choose between them, and also to add _additional_
    (custom) conflict resolvers. The merge machinery should have a
    simple interface against the conflict resolvers, with 3 inputs
    (<base SHA1>, <ours SHA1>, <theirs SHA1>) and 1 output
    (<resulting SHA1>).
    The bundled conflict resolvers should at least include:

    - "ours" (similar to 'git merge-file --ours')
    - "theirs" (similar to 'git merge-file --theirs')
    - "union" (similar to 'git merge-file --union')

    In addition to the fully automatic resolvers, there may be room for
    additional conflict resolvers that require user interaction:

    - "edit" (manually resolve conflicts in an editor)
    - "mergetool" (manually resolve conflicts in the chosen mergetool)

    These resolvers would follow the same interface towards the merge
    machinery, and the resolvers are thus responsible for preparing
    whatever temporary file(s) and/or invoking whatever editors,
    mergetools or other user interactions that are needed.


As you may see from the above, I'm not very familiar with the merge
machinery and manipulation of the index (currently struggling to grok
merge.c...). I hope to re-use as much as possible of the current merge
logic (instead of re-implementing it), but I might be ignorant of
certain details that simplifies, or complicates, the implementation of
'git notes merge'.


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git notes merge' implementation questions
  2010-04-21  7:57 'git notes merge' implementation questions Johan Herland
@ 2010-04-21 17:12 ` Jonathan Nieder
  2010-04-21 23:55   ` Johan Herland
  2010-04-21 21:29 ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Jonathan Nieder @ 2010-04-21 17:12 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, gitster

Johan Herland wrote:

> 2. Merging without a worktree

Eh, I am not a fan.  I am thinking it might be better to use something
like contrib/workdir to make a temporary worktree with its own index
and HEAD in .git/tmp-merge-notes and let the conflict resolvers work
there.

Advantages:

 - easy to debug when something goes wrong
 - merge driver can take other unmerged entries into account
 - (if merging manually) the user is not at the mercy of the program.
   Instead of being forced to consider the conflicts in the order git
   wants, she can skip some and go back to them, look at how many
   there are before deciding to start work, resolve some, reboot to
   test a new kernel, resolve some more later, and visualize the
   result.
 - if the unmerged notes are very long, you might need a temporary
   file anyway
 - maybe some day a kind of rename detection could help cope with
   situations like propagation of notes after a rebase

Disadvantages:

 - setting up a new git dir takes some time
 - a checkout with all notes would be insanely huge.  So somehow one
   has to find an appropriate subset to check out.

>     Possible solution: Conflict resolvers:

I think you can do an entirely in-index merge with ‘git read-tree’ and
‘git merge-index’.  If you forbid the per-file merge driver to fail
then this sounds like exactly what you’re talking about.

In my opinion in the case of popping up an editor this is a cruel
thing to do, but in the other cases it’s a good place to start.

Hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git notes merge' implementation questions
  2010-04-21  7:57 'git notes merge' implementation questions Johan Herland
  2010-04-21 17:12 ` Jonathan Nieder
@ 2010-04-21 21:29 ` Junio C Hamano
  2010-04-22  0:08   ` Johan Herland
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2010-04-21 21:29 UTC (permalink / raw)
  To: Johan Herland; +Cc: git

Johan Herland <johan@herland.net> writes:

> For merges between notes refs with no common history the merge should be
> a straightforward joining of trees. This also covers the "history-less"
> notes refs, like Peff's notes-cache, and should work whether the notes
> refs point to parentless commits, or point directly to tree objects (if
> we want to support that). For merges between notes refs with common
> history we want to do a (more or less regular) three-way merge.

Note that a "history-less" merge is just a special three-way merge
pretending as if their common ancestor is an empty tree.  There is no
point in special casing the former.

> In both cases (with or without common history), conflicts may ensue,
> and these must of course be resolved in some way. Since the notes refs
> have no accompanying worktree, we must find some other way to resolve
> conflicts.

I would say we may not even have to "resolve" the conflicts in the usual
sense of the word.

You can run "ls-tree" on three trees, removing '/' from the output to
obtain the list of objects that are annotated, and do a three-way merge at
the object level first.  For the ones that do have diverging changes on
both sides, you just run "git notes append" to add the data from the other
side and be done with it ;-).

That way you don't have to worry about how "git merge" merges things, how
it uses the index, nor how it uses the working tree, as you won't be using
anything from "git merge" at all.

That would be the first step.

The second step would be to designate a special directory that would exist
only during a conflicted "notes merge" in $GIT_DIR, just like MERGE_HEAD
serves as the signal we are in a conflicted merge.  When

    $ git notes merge <other>

is run, if (and only if) you need a manual merge resolution, you would
create this directory, which will have:

 - a temporary index that has the result of the tree level merge, but you
   will:

   (1) only register the entries that have conflicted; and 
   (2) flatten the fan-out structure.

 - files that have conflicted merge result, whose names are 40-byte object
   names that are annotated; and

 - something like MERGE_HEAD to keep track of the <other>, so that you can
   create a merge commit when concluding the merge.

You can chdir to the directory and use "git diff" and "git ls-files -u" to
inspect the conflicts, and run "git add <filename>" to mark a resolved
note.

You would need a separate command ("git notes commit" perhaps) to conclude
the merge.  At that point, you would iterate over this temporary index
(which only has conflicted notes), pulling out the list of <object name
being annotated, the annotation>, add these annotates to produce a new
notes tree, record that tree as a merge commit in the notes namespace, and
finally remove the notes merge working directory.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git notes merge' implementation questions
  2010-04-21 17:12 ` Jonathan Nieder
@ 2010-04-21 23:55   ` Johan Herland
  2010-04-22  0:26     ` Jonathan Nieder
  0 siblings, 1 reply; 10+ messages in thread
From: Johan Herland @ 2010-04-21 23:55 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, gitster

On Wednesday 21 April 2010, Jonathan Nieder wrote:
> Johan Herland wrote:
> > 2. Merging without a worktree
> 
> Eh, I am not a fan.  I am thinking it might be better to use something
> like contrib/workdir to make a temporary worktree with its own index
> and HEAD in .git/tmp-merge-notes and let the conflict resolvers work
> there.
> 
> Advantages:
> 
>  - easy to debug when something goes wrong
>  - merge driver can take other unmerged entries into account
>  - (if merging manually) the user is not at the mercy of the program.
>    Instead of being forced to consider the conflicts in the order git
>    wants, she can skip some and go back to them, look at how many
>    there are before deciding to start work, resolve some, reboot to
>    test a new kernel, resolve some more later, and visualize the
>    result.
>  - if the unmerged notes are very long, you might need a temporary
>    file anyway
>  - maybe some day a kind of rename detection could help cope with
>    situations like propagation of notes after a rebase
> 
> Disadvantages:
> 
>  - setting up a new git dir takes some time
>  - a checkout with all notes would be insanely huge.  So somehow one
>    has to find an appropriate subset to check out.
> 
> >     Possible solution: Conflict resolvers:
> I think you can do an entirely in-index merge with ‘git read-tree’ and
> ‘git merge-index’.  If you forbid the per-file merge driver to fail
> then this sounds like exactly what you’re talking about.
> 
> In my opinion in the case of popping up an editor this is a cruel
> thing to do, but in the other cases it’s a good place to start.

Thanks, I'm definitely coming around to the idea that having the user freely 
resolve conflicts in a temporary worktree (that preferably contains only the 
conflicting notes) is better than trying to resolve conflicts _without_ a 
worktree.


...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git notes merge' implementation questions
  2010-04-21 21:29 ` Junio C Hamano
@ 2010-04-22  0:08   ` Johan Herland
  2010-04-22  0:12     ` Junio C Hamano
  2010-04-22  0:19     ` Junio C Hamano
  0 siblings, 2 replies; 10+ messages in thread
From: Johan Herland @ 2010-04-22  0:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wednesday 21 April 2010, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > For merges between notes refs with no common history the merge should
> > be a straightforward joining of trees. This also covers the
> > "history-less" notes refs, like Peff's notes-cache, and should work
> > whether the notes refs point to parentless commits, or point directly
> > to tree objects (if we want to support that). For merges between notes
> > refs with common history we want to do a (more or less regular)
> > three-way merge.
> 
> Note that a "history-less" merge is just a special three-way merge
> pretending as if their common ancestor is an empty tree.  There is no
> point in special casing the former.

Agreed.

> > In both cases (with or without common history), conflicts may ensue,
> > and these must of course be resolved in some way. Since the notes refs
> > have no accompanying worktree, we must find some other way to resolve
> > conflicts.
> 
> I would say we may not even have to "resolve" the conflicts in the usual
> sense of the word.
> 
> You can run "ls-tree" on three trees,

The three trees you mention are (I assume) <base>, <ours> and <theirs>. What 
if there is more than one merge-base? How do we re-use the recursive 
strategy's "pre-merge" of merge-bases?

Next, we will need to be somewhat careful about using "ls-tree", to avoid 
needlessly unpacking subtrees that are identical between <ours> and 
<theirs>.

> removing '/' from the output to
> obtain the list of objects that are annotated, and do a three-way merge
> at the object level first.  For the ones that do have diverging changes
> on both sides, you just run "git notes append" to add the data from the
> other side and be done with it ;-).

I believe "git notes append" might not be suitable for all types of notes, 
and that users would in some cases want to choose other strategies for 
resolving conflicting notes. I'm currently planning to offer three fully 
automatic resolvers: "ours", "theirs" and "union" (corresponding to the 
_ignore, _overwrite and _concatenate combine_notes functions, respectively, 
in notes.h).

> That way you don't have to worry about how "git merge" merges things, how
> it uses the index, nor how it uses the working tree, as you won't be
> using anything from "git merge" at all.

Ok, I'm just worried that it'll force us to re-implement much of the three-
way merge logic that's already implemented in the merge machinery.

> That would be the first step.
> 
> The second step would be to designate a special directory that would
> exist only during a conflicted "notes merge" in $GIT_DIR, just like
> MERGE_HEAD serves as the signal we are in a conflicted merge.  When
> 
>     $ git notes merge <other>
> 
> is run, if (and only if) you need a manual merge resolution, you would
> create this directory, which will have:
> 
>  - a temporary index that has the result of the tree level merge, but you
>    will:
> 
>    (1) only register the entries that have conflicted; and
>    (2) flatten the fan-out structure.
> 
>  - files that have conflicted merge result, whose names are 40-byte
> object names that are annotated; and
> 
>  - something like MERGE_HEAD to keep track of the <other>, so that you
> can create a merge commit when concluding the merge.
> 
> You can chdir to the directory and use "git diff" and "git ls-files -u"
> to inspect the conflicts, and run "git add <filename>" to mark a
> resolved note.
> 
> You would need a separate command ("git notes commit" perhaps) to
> conclude the merge.  At that point, you would iterate over this
> temporary index (which only has conflicted notes), pulling out the list
> of <object name being annotated, the annotation>, add these annotates to
> produce a new notes tree, record that tree as a merge commit in the
> notes namespace, and finally remove the notes merge working directory.

I agree that this is probably a better way to resolve conflicts in notes 
(i.e. better than designing a way to semi-manually resolve conflicts without 
a working tree).

We're then left with a few fully automatic conflict resolvers ("ours", 
"theirs" and "union") which will always succeed (and therefore need no 
special directory), and a "manual" resolver which sets up the special 
directory as you describe above, and instructs the user to resolve the 
merge, followed by 'git notes commit' to conclude the merge.


Thanks for the feedback. (BTW, I'm travelling next week, so don't expect any 
immediate patches from me).


Have fun! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git notes merge' implementation questions
  2010-04-22  0:08   ` Johan Herland
@ 2010-04-22  0:12     ` Junio C Hamano
  2010-04-22  8:34       ` Johan Herland
  2010-04-22  0:19     ` Junio C Hamano
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2010-04-22  0:12 UTC (permalink / raw)
  To: Johan Herland; +Cc: git

Johan Herland <johan@herland.net> writes:

> Ok, I'm just worried that it'll force us to re-implement much of the three-
> way merge logic that's already implemented in the merge machinery.

There is no way around it, as long as you have that variable fan-out in
the notes structure.  Changing and unstabilizing "merge" for dubious
benefit of code reuse is unacceptable, as the part that deals with
variable fan-out has no benefit to the regular "merge".

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git notes merge' implementation questions
  2010-04-22  0:08   ` Johan Herland
  2010-04-22  0:12     ` Junio C Hamano
@ 2010-04-22  0:19     ` Junio C Hamano
  2010-04-22  8:38       ` Johan Herland
  1 sibling, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2010-04-22  0:19 UTC (permalink / raw)
  To: Johan Herland; +Cc: git

Johan Herland <johan@herland.net> writes:

> Next, we will need to be somewhat careful about using "ls-tree", to avoid 
> needlessly unpacking subtrees that are identical between <ours> and 
> <theirs>.

My mentioning of "ls-tree" is only about what needs to be done at the
conceptual level.  In practice, assuming that notes trees have mostly the
same fan-out structure, you would run "diff-tree -r" of (base,ours) and
(base,theirs) pair _without_ anything fancy like rename detection, and
pick out pieces (one tree may have ab/cdx{36} while the other tree may
have abcd/x{36} that are notes about the same object---you treat this as
if it is a partial ls-tree output that pertains only to the different
parts, and make canonical "list of annotated objects" by removing '/'.

All of this is very specific to merging "notes" and normal "merge" does
not even want to know about it; I don't think you can avoid doing this
yourself without touching "merge" if you want to merge "notes" correctly.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git notes merge' implementation questions
  2010-04-21 23:55   ` Johan Herland
@ 2010-04-22  0:26     ` Jonathan Nieder
  0 siblings, 0 replies; 10+ messages in thread
From: Jonathan Nieder @ 2010-04-22  0:26 UTC (permalink / raw)
  To: Johan Herland; +Cc: git, gitster

Johan Herland wrote:

> Thanks, I'm definitely coming around to the idea that having the user freely 
> resolve conflicts in a temporary worktree (that preferably contains only the 
> conflicting notes) is better than trying to resolve conflicts _without_ a 
> worktree.

Junio did a good job of making both interfaces --- no-resolution and
resolution-in-a-worktree --- sound very appealing.

Thanks for the hard work,
Jonathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git notes merge' implementation questions
  2010-04-22  0:12     ` Junio C Hamano
@ 2010-04-22  8:34       ` Johan Herland
  0 siblings, 0 replies; 10+ messages in thread
From: Johan Herland @ 2010-04-22  8:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thursday 22 April 2010, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > Ok, I'm just worried that it'll force us to re-implement much of the
> > three- way merge logic that's already implemented in the merge
> > machinery.
> 
> There is no way around it, as long as you have that variable fan-out in
> the notes structure.  Changing and unstabilizing "merge" for dubious
> benefit of code reuse is unacceptable, as the part that deals with
> variable fan-out has no benefit to the regular "merge".

Understood. At some point I merely contemplated whether a generic 
"tree_filter" callback could be added somewhere in the merge machinery, and 
then whether flattening the fanout could be implemented as such a 
"tree_filter" (with an inverse "tree_filter" for reconstructing the fanout 
in the merge result). But it seems the costs outweigh the benefits, even 
more so as I don't yet see any other use case for such a "tree_filter".

In any case, you have already given me lots of other/better options on how 
to implement 'git notes merge'.


Thanks! :)

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'git notes merge' implementation questions
  2010-04-22  0:19     ` Junio C Hamano
@ 2010-04-22  8:38       ` Johan Herland
  0 siblings, 0 replies; 10+ messages in thread
From: Johan Herland @ 2010-04-22  8:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thursday 22 April 2010, Junio C Hamano wrote:
> Johan Herland <johan@herland.net> writes:
> > Next, we will need to be somewhat careful about using "ls-tree", to
> > avoid needlessly unpacking subtrees that are identical between <ours>
> > and <theirs>.
> 
> My mentioning of "ls-tree" is only about what needs to be done at the
> conceptual level.  In practice, assuming that notes trees have mostly the
> same fan-out structure, you would run "diff-tree -r" of (base,ours) and
> (base,theirs) pair _without_ anything fancy like rename detection, and
> pick out pieces (one tree may have ab/cdx{36} while the other tree may
> have abcd/x{36} that are notes about the same object---you treat this as
> if it is a partial ls-tree output that pertains only to the different
> parts, and make canonical "list of annotated objects" by removing '/'.

Got it. Brilliant, and obvious, really, when I think about it...

> All of this is very specific to merging "notes" and normal "merge" does
> not even want to know about it; I don't think you can avoid doing this
> yourself without touching "merge" if you want to merge "notes" correctly.

Agreed. From your feedback, I now have a much clearer picture on how to 
proceed. Thanks! :)


Have fun!

...Johan

-- 
Johan Herland, <johan@herland.net>
www.herland.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-04-22  8:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-21  7:57 'git notes merge' implementation questions Johan Herland
2010-04-21 17:12 ` Jonathan Nieder
2010-04-21 23:55   ` Johan Herland
2010-04-22  0:26     ` Jonathan Nieder
2010-04-21 21:29 ` Junio C Hamano
2010-04-22  0:08   ` Johan Herland
2010-04-22  0:12     ` Junio C Hamano
2010-04-22  8:34       ` Johan Herland
2010-04-22  0:19     ` Junio C Hamano
2010-04-22  8:38       ` Johan Herland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).