git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* blame follows renames, but log doesn't
@ 2007-06-19  1:10 Martin Langhoff
  2007-06-19  1:34 ` Sam Vilain
  2007-06-19  7:19 ` Theodore Tso
  0 siblings, 2 replies; 10+ messages in thread
From: Martin Langhoff @ 2007-06-19  1:10 UTC (permalink / raw)
  To: Git Mailing List

Hi all,

when I show git to newbies or demo it to people using other SCMs, and
we get to the rename part of the conversation, I discuss and show how
GIT's approach is significantly better than explicit recording of
renames.

One great example is git-blame -- actually more spectacular with the
recent git gui blame improvements. But git-log still doesn't do it.

If I say
   git blame git-cvsimport.perl  # goes to the true origin like a champ
   git log git-cvsimport.perl # stops at the Big Tool Rename

In thread in May Linus posted a PoC patch to get git-blame to do it
http://marc.info/?l=git&m=117347893211567&w=2 , and outlined the
reasons why it'd be wrong to try to do that in git-log -- but it
didn't come to happen :-/

cg-log used to have some Perl logic that could do this -- it didn't
always work, but I'm sometimes tempted to go back to it, and review
it.

Linus said:
> But it's an example of the fact that yes, git can do this, but we're so
> stupid that we don't really accept it.

And I'm sure people can cope with git blame --log path/to/file and we
can add a note to git-log manpage about renames being reported by
blame instead.

And I kind of hate having to reply to things like these

    http://www.markshuttleworth.com/archives/125


cheers


martin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blame follows renames, but log doesn't
  2007-06-19  1:10 blame follows renames, but log doesn't Martin Langhoff
@ 2007-06-19  1:34 ` Sam Vilain
  2007-06-19  7:19 ` Theodore Tso
  1 sibling, 0 replies; 10+ messages in thread
From: Sam Vilain @ 2007-06-19  1:34 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Git Mailing List

Martin Langhoff wrote:
> And I kind of hate having to reply to things like these
> 
>     http://www.markshuttleworth.com/archives/125

I think that there should be clear conventions for how to place such
breadcrumbs in the commit log, that can be suitably ignored or honoured.

At least these two things fit into this category:

  1. renaming.  A comment on a changelog entry saying "I moved this file
     from A to B in this commit".  With all of the user friendliness and
     limitations this implies (oh, you got the information wrong or
     didn't put it in?  oh well, now history is b0rked forever, HAND)

  2. cherry picking.  bzr uses patch UUIDs, with all of the user
     friendliness and limitations this implies (oh, you merged that
     patch and accidentally didn't pick any changes?  whoops, it's
     in your history anyway so never try to merge that again).

Perhaps also there should be other conventions for how to encode other
strange data out of the namespace of the filesystem ("in a different
dimension", perhaps) like "file attributes".

Sam.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blame follows renames, but log doesn't
  2007-06-19  1:10 blame follows renames, but log doesn't Martin Langhoff
  2007-06-19  1:34 ` Sam Vilain
@ 2007-06-19  7:19 ` Theodore Tso
  2007-06-19  8:31   ` Martin Langhoff
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Theodore Tso @ 2007-06-19  7:19 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Git Mailing List

On Tue, Jun 19, 2007 at 01:10:28PM +1200, Martin Langhoff wrote:
> 
> when I show git to newbies or demo it to people using other SCMs, and
> we get to the rename part of the conversation, I discuss and show how
> GIT's approach is significantly better than explicit recording of
> renames.
> 
> One great example is git-blame -- actually more spectacular with the
> recent git gui blame improvements. But git-log still doesn't do it.

Actually, the bigger missing gap is merges.  Suppose in the
development branch, you rename a whole bunch of files.  (For example,
foo_super.c got moved to foo/super.c, foo_inode.c got moved to
foo/inode.c, etc.)

Now suppose there are fixes made in the stable branch, in the original
foo_super.c and foo_inode.c files.  Ideally you would want to be able
to pull those changes into the development branch, where the files
have new names, and have the changes be applied to foo/super.c and
foo/inode.c in the development branch.

I was recently talking to someone who is still using BitKeeper, and he
cited this scenario as one of the reasons why his project is still
using BK; he'd like to move to git, but this is a critical piece of
functionality for him. 

						- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blame follows renames, but log doesn't
  2007-06-19  7:19 ` Theodore Tso
@ 2007-06-19  8:31   ` Martin Langhoff
  2007-06-19  8:39   ` Junio C Hamano
  2007-06-19  9:54   ` Steven Grimm
  2 siblings, 0 replies; 10+ messages in thread
From: Martin Langhoff @ 2007-06-19  8:31 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Git Mailing List

On 6/19/07, Theodore Tso <tytso@mit.edu> wrote:
> Actually, the bigger missing gap is merges.  Suppose in the
> development branch, you rename a whole bunch of files.  (For example,
> foo_super.c got moved to foo/super.c, foo_inode.c got moved to
> foo/inode.c, etc.)

I thought that the "recursive" strategy covered this - though I don't
work on a tree that merges across branches with renames, so my
experience is _very_ limited.

>From Documentation/merge-strategies.txt:

  Additionally this can detect and handle merges involving
  renames.  This is the default merge strategy when
  pulling or merging one branch.

cheers


m

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blame follows renames, but log doesn't
  2007-06-19  7:19 ` Theodore Tso
  2007-06-19  8:31   ` Martin Langhoff
@ 2007-06-19  8:39   ` Junio C Hamano
  2007-06-19  9:54   ` Steven Grimm
  2 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2007-06-19  8:39 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Martin Langhoff, Git Mailing List

Theodore Tso <tytso@mit.edu> writes:

> Actually, the bigger missing gap is merges.  Suppose in the
> development branch, you rename a whole bunch of files.  (For example,
> foo_super.c got moved to foo/super.c, foo_inode.c got moved to
> foo/inode.c, etc.)
>
> Now suppose there are fixes made in the stable branch, in the original
> foo_super.c and foo_inode.c files.  Ideally you would want to be able
> to pull those changes into the development branch, where the files
> have new names, and have the changes be applied to foo/super.c and
> foo/inode.c in the development branch.

That happens already with merge-recursive code, which has been
the default since late November 2005 (v0.99.9k and later should
have it).

What does _not_ happen is if foo_fixes.c was _created_ in the
stable branch.  A merge that tries to forward port such a fix
would not move the foo_fixes.c to foo/fixes.c.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blame follows renames, but log doesn't
  2007-06-19  7:19 ` Theodore Tso
  2007-06-19  8:31   ` Martin Langhoff
  2007-06-19  8:39   ` Junio C Hamano
@ 2007-06-19  9:54   ` Steven Grimm
  2007-06-19 18:28     ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm
  2007-06-20 22:11     ` blame follows renames, but log doesn't Jakub Narebski
  2 siblings, 2 replies; 10+ messages in thread
From: Steven Grimm @ 2007-06-19  9:54 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Martin Langhoff, Git Mailing List

Theodore Tso wrote:
> Actually, the bigger missing gap is merges.  Suppose in the
> development branch, you rename a whole bunch of files.  (For example,
> foo_super.c got moved to foo/super.c, foo_inode.c got moved to
> foo/inode.c, etc.)
>
> Now suppose there are fixes made in the stable branch, in the original
> foo_super.c and foo_inode.c files.  Ideally you would want to be able
> to pull those changes into the development branch, where the files
> have new names, and have the changes be applied to foo/super.c and
> foo/inode.c in the development branch.
>   

I believe git handles this case already, actually. I've seen this work 
just fine many times.

What git doesn't handle, but BitKeeper does, is applying directory 
renames to newly created files. I rename the "lib" directory to "util", 
you create a new file lib/strings.c and update lib/Makefile to compile 
it. I pull from you. Under BitKeeper, I will get util/strings.c and the 
change will be applied to my util/Makefile. git will create a brand-new 
"lib" directory containing nothing but the new file, but since the 
Makefile existed before, it will (correctly) apply your change to my 
util/Makefile, which will then break my build because it will refer to a 
file that doesn't exist in the Makefile's directory.

This has bitten me a few times in real life, e.g. in cases where I'm 
importing a third-party source tarfile and reorganizing it a little to 
fit it into my local build system. Every time they add a new source 
file, I have to go manually clean up after it rather than just merging 
the vendor branch into mine like I can do when they don't add anything. 
It is not frequent enough to be a major hassle for me but it sure is 
annoying when it happens (especially since sometimes the build *doesn't* 
break and it takes a while to notice a newly created file isn't where it 
should be.)

-Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Directory renames (was Re: blame follows renames, but log doesn't)
  2007-06-19  9:54   ` Steven Grimm
@ 2007-06-19 18:28     ` Steven Grimm
  2007-06-20 20:18       ` Sam Vilain
  2007-06-20 22:11     ` blame follows renames, but log doesn't Jakub Narebski
  1 sibling, 1 reply; 10+ messages in thread
From: Steven Grimm @ 2007-06-19 18:28 UTC (permalink / raw)
  To: Theodore Tso; +Cc: Martin Langhoff, Git Mailing List

Directory renames can break in more cases than just adding new files. 
Here's a demonstration based on a situation I ran into a few minutes ago 
on a real-world project.

$ git init
$ mkdir a
$ echo some contents > a/file1
$ echo other contents > a/file2
$ git add a
$ git commit -m "initial commit"

So far so good. Now there's a revision where the files happen to be 
identical. (On a new branch for later convenience.)

$ git checkout -b modifybranch
$ echo other contents > a/file1
$ git commit -a -m "commit that makes both files identical"

Now we rename the directory. (Again, the new branch will be used later.)

$ git checkout -b renamebranch
$ git mv a b
$ git commit -m "rename directory"

Make a change to file2.

$ echo more contents from renamebranch >> b/file2
$ git commit -a -m "add contents to file2"

Where did file2's contents come from?

$ git blame b/file2
a7dbcfdc a/file1 (Steven Grimm 2007-06-19 10:36:20 -0700 1) other contents
69a87194 b/file2 (Steven Grimm 2007-06-19 10:43:29 -0700 2) more contents

Which is wrong. The history of file2 has nothing to do with the history 
of file1, but git blame thinks it does. However, that's not what blew up 
on me in my real-world test; it gets better. Let's say one of the files 
changed in a different branch.

$ git checkout master
$ echo a change to file2 in master >> a/file2
$ git commit -a -m "file2 changed"

$ git merge renamebranch
Removed a/file1
Merge made by recursive.
 a/file1            |    1 -
 a/file2 => b/file1 |    0
 b/file2            |    2 ++
 3 files changed, 2 insertions(+), 1 deletions(-)
 delete mode 100644 a/file1
 rename a/file2 => b/file1 (100%)
 create mode 100644 b/file2

And this is just completely broken. What it *should* do is give me 
b/file1 with "other contents" from renamebranch, and b/file2 with a 
merge conflict since I added a different line to it in each branch. 
Instead, the merge succeeds with no conflict and applies the change in 
my current branch to the wrong file:

$ cat b/file1
other contents
a change to file2 in master
$ cat b/file2
other contents
more contents from renamebranch

There's one more variation on this theme that's broken in a similar but 
not identical way (this didn't happen in my real-world scenario but I 
ran into it while coming up with the above test case):

$ git checkout modifybranch
$ echo a change to file2 in modifybranch >> a/file2
$ git commit -a -m "a change in modifybranch"
$ git merge renamebranch
CONFLICT (delete/modify): a/file2 deleted in renamebranch and modified 
in HEAD. Version HEAD of a/file2 left in tree.
Automatic merge failed; fix conflicts and then commit the result.

This should definitely be a merge conflict, but it shouldn't be *this* 
merge conflict. What I'd expect here would be b/file1 == "other 
contents" and b/file2 with conflict markers.

$ ls a b
a:
file2

b:
file1   file2
$ cat a/file2
other contents
a change to file2 in modifybranch
$ cat b/file1
other contents
$ cat b/file2
other contents
more contents from renamebranch

In other words, no conflict markers at all, and I still have the old 
directory "a" with one of its two files in original form, but not the 
other file.

Hope that's illuminating or at least interesting to someone.

-Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Directory renames (was Re: blame follows renames, but log doesn't)
  2007-06-19 18:28     ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm
@ 2007-06-20 20:18       ` Sam Vilain
  2007-06-20 20:59         ` Steven Grimm
  0 siblings, 1 reply; 10+ messages in thread
From: Sam Vilain @ 2007-06-20 20:18 UTC (permalink / raw)
  To: Steven Grimm, Git Mailing List

Steven Grimm wrote:
> Hope that's illuminating or at least interesting to someone.

I didn't review your test cases in detail, but they seemed to suffer
from what I call "over-trivialization"; the heuristic methods don't work
very well for these non-real-world test cases because they're not long
enough.  Are you confident that these deficiencies are still there with
longer examples?

Sam.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Directory renames (was Re: blame follows renames, but log doesn't)
  2007-06-20 20:18       ` Sam Vilain
@ 2007-06-20 20:59         ` Steven Grimm
  0 siblings, 0 replies; 10+ messages in thread
From: Steven Grimm @ 2007-06-20 20:59 UTC (permalink / raw)
  To: Sam Vilain; +Cc: Git Mailing List

Sam Vilain wrote:
> I didn't review your test cases in detail, but they seemed to suffer
> from what I call "over-trivialization"; the heuristic methods don't work
> very well for these non-real-world test cases because they're not long
> enough.  Are you confident that these deficiencies are still there with
> longer examples?
>   

Those test cases were a demonstration of something I actually ran into 
on a real-world project yesterday. The test cases are trivial and short 
simply to make them easy to follow in an email message. If you 
substitute longer contents for the test files in my example, you will 
see the exact same behavior. The real file in question is around 2KB 
long, not a monster but presumably long enough that the heuristics 
should work.

Also -- though this doesn't happen to be relevant in the case where I 
ran into this -- not all files in real-world projects are huge. If the 
heuristics break on small test-case files then they will break on small 
real-world files too. If nothing else, a real-world project can itself 
contain trivial test data (for testing the project, not testing the 
version control system) in the form of lots of small files with similar 
or identical contents.

-Steve

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: blame follows renames, but log doesn't
  2007-06-19  9:54   ` Steven Grimm
  2007-06-19 18:28     ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm
@ 2007-06-20 22:11     ` Jakub Narebski
  1 sibling, 0 replies; 10+ messages in thread
From: Jakub Narebski @ 2007-06-20 22:11 UTC (permalink / raw)
  To: git

Steven Grimm wrote:

> What git doesn't handle, but BitKeeper does, is applying directory 
> renames to newly created files. I rename the "lib" directory to "util", 
> you create a new file lib/strings.c and update lib/Makefile to compile 
> it. I pull from you. Under BitKeeper, I will get util/strings.c and the 
> change will be applied to my util/Makefile. git will create a brand-new 
> "lib" directory containing nothing but the new file, but since the 
> Makefile existed before, it will (correctly) apply your change to my 
> util/Makefile, which will then break my build because it will refer to a 
> file that doesn't exist in the Makefile's directory.
> 
> This has bitten me a few times in real life, e.g. in cases where I'm 
> importing a third-party source tarfile and reorganizing it a little to 
> fit it into my local build system. Every time they add a new source 
> file, I have to go manually clean up after it rather than just merging 
> the vendor branch into mine like I can do when they don't add anything. 
> It is not frequent enough to be a major hassle for me but it sure is 
> annoying when it happens (especially since sometimes the build *doesn't* 
> break and it takes a while to notice a newly created file isn't where it 
> should be.)

I think git can at least try to detect this situation, and perhaps
even resolve this automatically. Namely, if we add file with the
following <path>: <dirname>/<basename>, and <dirname> tree does exists in
the ancestor but does not exist in the other branch, this is CONFLICT(add)
(or something like that), with appropriate explanation.

One way to resolve this CONFLICT(add) automatically would be to check where
all the files in no longer existing <dirname> moved to, and if they all are
of the form <newdir>/<somename> then we should add the <dirname>/<basename>
file under <newdir>/<basename>. If some of them were moved to other
directory, for example contents of one directory got split into two
directories, this is conflict which cannot be resolved automaticaly
(CONFLICT(add/multiple) or something like that perhaps?). And I guess that
SCM which _track_ renaming of directories, like Bazaar-NG, would NOT detect
this as a conflict, and happily add to perhaps wrong directory.

Or we could reuse rename detection, taking modes+filenames as tree contents,
or perhaps set of file contents as tree contents, for our content based
rename detection.


P.S. Allow me to remind you rename _detection_ success story, send here some
time ago by Johannes Schindelin in the "Rename handling" thread:
  Message-ID: <Pine.LNX.4.63.0703210120230.22628@wbgn013.biozentrum.uni-wuerzburg.de>
  http://permalink.gmane.org/gmane.comp.version-control.git/42770

JS> By now, there have been enough arguments _for_ automatic rename detection, 
JS> but I'll add another one.
JS> 
JS> A colleague of mine worked on a certain file in a branch, where he copied 
JS> the file to another location, and heavily modified it. He did that in a 
JS> branch, and when he was satisfied with the result, he deleted the old 
JS> file, since he liked the new location better.
JS> 
JS> Now, when I pulled, imagine my surprise (knowing the history of the file), 
JS> when the pull reported a rename with a substantial similarity!
JS> 
JS> So, the automatic renamer did an awesome job.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-06-20 22:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-19  1:10 blame follows renames, but log doesn't Martin Langhoff
2007-06-19  1:34 ` Sam Vilain
2007-06-19  7:19 ` Theodore Tso
2007-06-19  8:31   ` Martin Langhoff
2007-06-19  8:39   ` Junio C Hamano
2007-06-19  9:54   ` Steven Grimm
2007-06-19 18:28     ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm
2007-06-20 20:18       ` Sam Vilain
2007-06-20 20:59         ` Steven Grimm
2007-06-20 22:11     ` blame follows renames, but log doesn't Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).