git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How to merge by subtree while preserving history?
@ 2009-03-26 22:59 David Reitter
  2009-03-27  7:38 ` Miklos Vajna
  0 siblings, 1 reply; 4+ messages in thread
From: David Reitter @ 2009-03-26 22:59 UTC (permalink / raw)
  To: git

I have two separately developed projects (foo, bar) which I'd like to  
merge; the contents of foo should, initially, go in a subdirectory of  
bar.

I'm aware of two methods:  moving (renaming) everything within foo  
into foo-dir, and then just pulling foo into bar.

This works beautifully, except that the big rename causes havoc w.r.t.  
to the files histories, i.e. git-log needs a "--follow" argument now,  
and "diff-tree" can't track changes when given the new file name.  No  
good.

I've also tried the method described in [1], but it seems that all  
history is lost here (the text could point this out..)
I've tried to "git pull -s subtree foo master" directly as well, but  
then it put foo into strange places (and lost the history).


So, I'm at a loss.  Suggestions much appreciated.




[1] http://www.kernel.org/pub/software/scm/git/docs/howto/using-merge-subtree.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How to merge by subtree while preserving history?
  2009-03-26 22:59 How to merge by subtree while preserving history? David Reitter
@ 2009-03-27  7:38 ` Miklos Vajna
  2009-03-27 16:56   ` David Reitter
  0 siblings, 1 reply; 4+ messages in thread
From: Miklos Vajna @ 2009-03-27  7:38 UTC (permalink / raw)
  To: David Reitter; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1455 bytes --]

On Thu, Mar 26, 2009 at 06:59:51PM -0400, David Reitter <david.reitter@gmail.com> wrote:
> I have two separately developed projects (foo, bar) which I'd like to  
> merge; the contents of foo should, initially, go in a subdirectory of  
> bar.
> 
> I'm aware of two methods:  moving (renaming) everything within foo  
> into foo-dir, and then just pulling foo into bar.

The result of the two methods are the same.

> This works beautifully, except that the big rename causes havoc w.r.t.  
> to the files histories, i.e. git-log needs a "--follow" argument now,  
> and "diff-tree" can't track changes when given the new file name.  No  
> good.
> 
> I've also tried the method described in [1], but it seems that all  
> history is lost here (the text could point this out..)

Of course it is not lost. :)

Example:

commit f8c62880ef22b74ea6df47bb349ff0743d2a93f9
Merge: f474c52... 52b8ea9...
Author: Junio C Hamano <gitster@pobox.com>
Date:   Sun Mar 1 22:20:52 2009 -0800

    Merge git://git.kernel.org/pub/scm/gitk/gitk

Now do a 'git log f474c52..52b8ea9' and you'll see the merged commits.

But you are right about that 'git log -- path' will find the merge
commits only (which is right, as the tree objects are not modified when
merging, just the resulting tree has the original tree in a
subdirectory).

If this is a one-time operation then I would just use git filter-branch
to move the code to a subdir.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How to merge by subtree while preserving history?
  2009-03-27  7:38 ` Miklos Vajna
@ 2009-03-27 16:56   ` David Reitter
  2009-03-27 17:20     ` Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: David Reitter @ 2009-03-27 16:56 UTC (permalink / raw)
  To: Miklos Vajna; +Cc: git

On Mar 27, 2009, at 3:38 AM, Miklos Vajna wrote:

> Now do a 'git log f474c52..52b8ea9' and you'll see the merged commits.

Sure :)
Needless to say, this is not practical and doesn't support people's  
workflow.

For simple renames,  "git log --follow" helps, but as soon as you want  
to do a "diff" in one of the listed revisions, filtering for just this  
one file, then history becomes invisible again.   Concretely, this  
breaks the common workflow with C-x C-v l, then "d" in Emacs.

I'm aware of the content-tracking vs. file-tracking discussion; it's  
all fine, except that file names are meaningful meta-data for some  
content, at least in some projects.  Is there a command that gives me  
the diff  for a revision pair, restricted to what happened to content  
in a given file in the current tree?


> But you are right about that 'git log -- path' will find the merge
> commits only (which is right, as the tree objects are not modified  
> when
> merging, just the resulting tree has the original tree in a
> subdirectory).
>
> If this is a one-time operation then I would just use git filter- 
> branch
> to move the code to a subdir.


For the record:

In the meantime, I managed to move the original file in the CVS  
repository (by just moving all the ",v" files and getting rid of  
CVSROOT/history, which doesn't seem needed).  The I re-ran cvsimport,  
mitigating a bunch of problems with "cvsps".  For the record, cvsps /  
cvsimport could not handle the case where my repository named "foo"  
had a subdirectory also called "foo", in which I moved all the  
stuff.   I had to rename the directory to "bar".   I also had to  
delete cvsps's cache file with the -x argument (or delete it from the  
surprising location ~/.cvsps).

Then, I merged with "git pull", noting the rev ID before the merge.

Next, I used "git filter-branch" to rename the directory again from  
BAR to FOO as follows:

git filter-branch --index-filter \
         'git ls-files -s | sed "s-BAR/-FOO/-" |
                 GIT_INDEX_FILE=$GIT_INDEX_FILE.new \
                         git update-index --index-info &&
          mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' <last-rev-before- 
merge>..

Finally, I had to "git gc" to prune a 200MB worth of objects (it told  
me I had 500k objects overall).


--
http://aquamacs.org -- Aquamacs: Emacs on Mac OS X

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: How to merge by subtree while preserving history?
  2009-03-27 16:56   ` David Reitter
@ 2009-03-27 17:20     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2009-03-27 17:20 UTC (permalink / raw)
  To: David Reitter; +Cc: Miklos Vajna, git

David Reitter <david.reitter@gmail.com> writes:

> ...  Is there a command that gives me
> the diff  for a revision pair, restricted to what happened to content
> in a given file in the current tree?

You can get a half of it from blame (and I presume the other half by
running the procedure in reverse).

"git blame" has an obscure switch -S that lets you lie about the ancestry
by allowing you to install a graft (this is primarily used by the annotate
operation of git-cvsserver).

Suppose you have revisions A and B, and a lot of code in a file F in the
original revision A migrated to many other places in a later revision B
over time.  You want to see where each and every line in F from A ended up
in B.

To compute this, you pretend as if the history originates at B (i.e. B is
the root commit), and A is a direct descendant of it, and blame each and
every line of F in A, with a very agressive setting.  E.g.

	{
		echo $(git rev-parse A) $(git rev-parse B)
                echo $(git rev-parse B)
	} >tmp-graft
        git blame -C -C -C -w -S tmp-graft A -- F

I'll leave it as an exercise to the readers how to compute "where did each
and every line in G in B came from A?"

Note that in order for this to work, it needs a fix to "blame -S" that I
posted about 10 days ago: aa9ea77 (blame: read custom grafts given by -S
before calling setup_revisions(), 2009-03-18); the fix is sitting in 'pu',
because as far as I know nobody has cared about the breakage other than I,
at least until now.

I've attached a script that uses this trick to compute "How much of what
Linus originally wrote still survives."  People who attended GitTogether'08
may have seen the result.

---

#!/bin/sh
# How much of the very original version from Linus survive?

_x40='[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]'
_x40="$_x40$_x40$_x40$_x40$_x40$_x40$_x40$_x40"

initial=$(git rev-parse --verify e83c5163316f89bfbde7d9ab23ca2e25604af290) &&
this=$(git rev-parse --verify ${1-HEAD}^0) || exit

tmp="/var/tmp/Linus.$$"
trap 'rm -f "$tmp".*' 0

# We blame each file in the initial revision pretending as if it is a
# direct descendant of the given version, and also pretend that the
# latter is a root commit.  This way, lines in the initial revision
# that survived to the other version can be identified (they will be
# attributed to the other version).
graft="$tmp.graft" &&
{
	echo "$initial $this"
	echo "$this"
} >"$graft" || exit

opts='-C -C -C -w'

git ls-tree -r "$initial" |
while read mode type sha1 name
do
	git blame $opts --porcelain -S "$graft" "$initial" -- "$name" |
	sed -ne "s/^\($_x40\) .*/\1/p" |
	sort |
	uniq -c | {
		# There are only two commits in the fake history, so
		# there won't be at most two output from the above.
		read cnt1 commit1
		read cnt2 commit2
		if test -z "$commit2"
		then
			cnt2=0
		fi
		if test "$initial" != "$commit1"
		then
			cnt_surviving=$cnt1
		else
			cnt_surviving=$cnt2
		fi
		cnt_total=$(( $cnt1 + $cnt2 ))
		echo "$cnt_surviving $cnt_total	$name"
	}
done | {
	total=0
	surviving=0
	while read s t n
	do
		total=$(( $total + $t )) surviving=$(( $surviving + $s ))
		printf "%6d / %-6d	%s\n" $s $t $n
	done
	printf "%6d / %-6d	%s\n" $surviving $total Total
}

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-03-27 17:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-26 22:59 How to merge by subtree while preserving history? David Reitter
2009-03-27  7:38 ` Miklos Vajna
2009-03-27 16:56   ` David Reitter
2009-03-27 17:20     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).