git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Shawn O. Pearce" <spearce@spearce.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] A new merge stragety 'subtree'.
Date: Sat, 17 Feb 2007 03:45:58 -0500	[thread overview]
Message-ID: <20070217084558.GE27864@spearce.org> (raw)
In-Reply-To: <7vfy95y2n9.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> The detection of corresponding subtree is done by comparing the
> pathnames and types in the toplevel of the tree.
> 
> Heuristics galore!  That's the git way ;-).

I have some concerns about the match-tree heuristic you are using here.

For example, it is very common for Java projects to have the same
tree "shape".  Just look at egit/jgit for an example, the three
top level directories are:

	org.spearce.egit.core/
		META-INF/
		build.properties
		plugin.xml
		src/

	org.spearce.egit.ui/
		META-INF/
		build.properties
		plugin.xml
		src/

	org.spearce.jgit
		META-INF/
		src/

If I were to treat the first two as subprojects this new subtree
merge strategy might fail here as it could easily match to the
wrong directory.


What about a different approach?

In a merge of commit#1 (parent project) and commit#2 (subroject)...

We have the set of merge bases readily available.  We just have
to find out in each merge base where the files went from commit#2,
then modify commit#2 to conform to that same shape.

Really that isn't too different from a rename detection.  In other
words do something like the following:

  a) Scan the parents of the merge base B for a commit that is
  in commit#2's ancestory but not commit#1's ancestory, except by
  the merge commit B.  Such a parent must be from the project that
  commit#2 is also from.  For sake of explaining this, lets call
  this parent B^2.

  b) Perform a partial rename-diff between B^2 and B.  The magic
  here is we need to discard any path in B that also appears in
  B^1 and B^2, and that has the same SHA-1 as in B^1, before we do
  the rename-diff.

  c) Find the most common prefix within the renamed files.

  d) Fit commit#2 to use that prefix, and merge.


Here's a real example.  In 67c75759 you merged git-gui.git.
67c75759^1 is from git.git, 67c75759^2 is from git-gui.git.

The stock rename-diff:

  $ git diff-tree --abbrev -r -M --diff-filter=MRD 67c75759^2 67c75759
  :100644 100644 c714d38... d99372a... M  .gitignore
  :100755 100755 8fac8cb... 7a10b60... M  GIT-VERSION-GEN
  :100644 100644 fd82d9d... 5d31e6d... M  Makefile
  :100644 100644 b95a137... b95a137... R100       TODO    git-gui/TODO
  :100755 100755 f5010dd... f5010dd... R100       git-gui.sh      git-gui/git-gui.sh

The problem here is both ^1 and ^2 defines the first three paths,
so we think we modified them in the merge rather than moved them.
But these three files match ^1, as we did not do an evil merge here.
That's why they are showing as modified in this diff.

Now take 67c7 and whack those three files (step b above), and rediff:

  $ C=$(git ls-tree 67c75759 | sed '
          /       .gitignore$/d
          /       GIT-VERSION-GEN$/d
          /       Makefile$/d' | git mktree)
  $ git diff-tree --abbrev -r -M --diff-filter=MRD 67c75759^2 $C
  :100644 100644 c714d38... c714d38... R100       .gitignore      git-gui/.gitignore
  :100755 100755 8fac8cb... 8fac8cb... R100       GIT-VERSION-GEN git-gui/GIT-VERSION-GEN
  :100644 100644 fd82d9d... fd82d9d... R100       Makefile        git-gui/Makefile
  :100644 100644 b95a137... b95a137... R100       TODO    git-gui/TODO
  :100755 100755 f5010dd... f5010dd... R100       git-gui.sh      git-gui/git-gui.sh

Wow, look at that, everything starts with 'git-gui/'!  ;-)

Then we just need to pick the most popular common prefix of all
renamed paths and fit commit#2 to conform to that structure.
Finally we can run the merge through.

The (now functional) pretend object stuff can be useful here,
such as to make $C above so we can pass it off to diffcore.


I think popping off the 'git-gui/' prefix would be the same deal,
only we'd be looking at the old names to determine the prefix to pop,
rather than the new names.

We already do rename detection in merge-recursive.  Slapping an extra
rename pass in front of things when it is invoked as merge-subtree
can't performance hurt that much.

Thoughts?

-- 
Shawn.

  parent reply	other threads:[~2007-02-17  8:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-02-17  1:49 [PATCH] A new merge stragety 'subtree' Junio C Hamano
2007-02-17  7:14 ` Shawn O. Pearce
2007-02-17  8:29   ` Junio C Hamano
2007-02-17  8:53     ` Shawn O. Pearce
2007-02-17 18:02       ` Junio C Hamano
2007-02-17  8:45 ` Shawn O. Pearce [this message]
2007-02-17  8:51   ` Junio C Hamano
2007-02-17  9:02     ` Shawn O. Pearce
2007-02-17 18:04       ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070217084558.GE27864@spearce.org \
    --to=spearce@spearce.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).