* blame follows renames, but log doesn't
@ 2007-06-19 1:10 Martin Langhoff
2007-06-19 1:34 ` Sam Vilain
2007-06-19 7:19 ` Theodore Tso
0 siblings, 2 replies; 10+ messages in thread
From: Martin Langhoff @ 2007-06-19 1:10 UTC (permalink / raw)
To: Git Mailing List
Hi all,
when I show git to newbies or demo it to people using other SCMs, and
we get to the rename part of the conversation, I discuss and show how
GIT's approach is significantly better than explicit recording of
renames.
One great example is git-blame -- actually more spectacular with the
recent git gui blame improvements. But git-log still doesn't do it.
If I say
git blame git-cvsimport.perl # goes to the true origin like a champ
git log git-cvsimport.perl # stops at the Big Tool Rename
In thread in May Linus posted a PoC patch to get git-blame to do it
http://marc.info/?l=git&m=117347893211567&w=2 , and outlined the
reasons why it'd be wrong to try to do that in git-log -- but it
didn't come to happen :-/
cg-log used to have some Perl logic that could do this -- it didn't
always work, but I'm sometimes tempted to go back to it, and review
it.
Linus said:
> But it's an example of the fact that yes, git can do this, but we're so
> stupid that we don't really accept it.
And I'm sure people can cope with git blame --log path/to/file and we
can add a note to git-log manpage about renames being reported by
blame instead.
And I kind of hate having to reply to things like these
http://www.markshuttleworth.com/archives/125
cheers
martin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't
2007-06-19 1:10 blame follows renames, but log doesn't Martin Langhoff
@ 2007-06-19 1:34 ` Sam Vilain
2007-06-19 7:19 ` Theodore Tso
1 sibling, 0 replies; 10+ messages in thread
From: Sam Vilain @ 2007-06-19 1:34 UTC (permalink / raw)
To: Martin Langhoff; +Cc: Git Mailing List
Martin Langhoff wrote:
> And I kind of hate having to reply to things like these
>
> http://www.markshuttleworth.com/archives/125
I think that there should be clear conventions for how to place such
breadcrumbs in the commit log, that can be suitably ignored or honoured.
At least these two things fit into this category:
1. renaming. A comment on a changelog entry saying "I moved this file
from A to B in this commit". With all of the user friendliness and
limitations this implies (oh, you got the information wrong or
didn't put it in? oh well, now history is b0rked forever, HAND)
2. cherry picking. bzr uses patch UUIDs, with all of the user
friendliness and limitations this implies (oh, you merged that
patch and accidentally didn't pick any changes? whoops, it's
in your history anyway so never try to merge that again).
Perhaps also there should be other conventions for how to encode other
strange data out of the namespace of the filesystem ("in a different
dimension", perhaps) like "file attributes".
Sam.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't
2007-06-19 1:10 blame follows renames, but log doesn't Martin Langhoff
2007-06-19 1:34 ` Sam Vilain
@ 2007-06-19 7:19 ` Theodore Tso
2007-06-19 8:31 ` Martin Langhoff
` (2 more replies)
1 sibling, 3 replies; 10+ messages in thread
From: Theodore Tso @ 2007-06-19 7:19 UTC (permalink / raw)
To: Martin Langhoff; +Cc: Git Mailing List
On Tue, Jun 19, 2007 at 01:10:28PM +1200, Martin Langhoff wrote:
>
> when I show git to newbies or demo it to people using other SCMs, and
> we get to the rename part of the conversation, I discuss and show how
> GIT's approach is significantly better than explicit recording of
> renames.
>
> One great example is git-blame -- actually more spectacular with the
> recent git gui blame improvements. But git-log still doesn't do it.
Actually, the bigger missing gap is merges. Suppose in the
development branch, you rename a whole bunch of files. (For example,
foo_super.c got moved to foo/super.c, foo_inode.c got moved to
foo/inode.c, etc.)
Now suppose there are fixes made in the stable branch, in the original
foo_super.c and foo_inode.c files. Ideally you would want to be able
to pull those changes into the development branch, where the files
have new names, and have the changes be applied to foo/super.c and
foo/inode.c in the development branch.
I was recently talking to someone who is still using BitKeeper, and he
cited this scenario as one of the reasons why his project is still
using BK; he'd like to move to git, but this is a critical piece of
functionality for him.
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't
2007-06-19 7:19 ` Theodore Tso
@ 2007-06-19 8:31 ` Martin Langhoff
2007-06-19 8:39 ` Junio C Hamano
2007-06-19 9:54 ` Steven Grimm
2 siblings, 0 replies; 10+ messages in thread
From: Martin Langhoff @ 2007-06-19 8:31 UTC (permalink / raw)
To: Theodore Tso; +Cc: Git Mailing List
On 6/19/07, Theodore Tso <tytso@mit.edu> wrote:
> Actually, the bigger missing gap is merges. Suppose in the
> development branch, you rename a whole bunch of files. (For example,
> foo_super.c got moved to foo/super.c, foo_inode.c got moved to
> foo/inode.c, etc.)
I thought that the "recursive" strategy covered this - though I don't
work on a tree that merges across branches with renames, so my
experience is _very_ limited.
>From Documentation/merge-strategies.txt:
Additionally this can detect and handle merges involving
renames. This is the default merge strategy when
pulling or merging one branch.
cheers
m
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't
2007-06-19 7:19 ` Theodore Tso
2007-06-19 8:31 ` Martin Langhoff
@ 2007-06-19 8:39 ` Junio C Hamano
2007-06-19 9:54 ` Steven Grimm
2 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2007-06-19 8:39 UTC (permalink / raw)
To: Theodore Tso; +Cc: Martin Langhoff, Git Mailing List
Theodore Tso <tytso@mit.edu> writes:
> Actually, the bigger missing gap is merges. Suppose in the
> development branch, you rename a whole bunch of files. (For example,
> foo_super.c got moved to foo/super.c, foo_inode.c got moved to
> foo/inode.c, etc.)
>
> Now suppose there are fixes made in the stable branch, in the original
> foo_super.c and foo_inode.c files. Ideally you would want to be able
> to pull those changes into the development branch, where the files
> have new names, and have the changes be applied to foo/super.c and
> foo/inode.c in the development branch.
That happens already with merge-recursive code, which has been
the default since late November 2005 (v0.99.9k and later should
have it).
What does _not_ happen is if foo_fixes.c was _created_ in the
stable branch. A merge that tries to forward port such a fix
would not move the foo_fixes.c to foo/fixes.c.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't
2007-06-19 7:19 ` Theodore Tso
2007-06-19 8:31 ` Martin Langhoff
2007-06-19 8:39 ` Junio C Hamano
@ 2007-06-19 9:54 ` Steven Grimm
2007-06-19 18:28 ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm
2007-06-20 22:11 ` blame follows renames, but log doesn't Jakub Narebski
2 siblings, 2 replies; 10+ messages in thread
From: Steven Grimm @ 2007-06-19 9:54 UTC (permalink / raw)
To: Theodore Tso; +Cc: Martin Langhoff, Git Mailing List
Theodore Tso wrote:
> Actually, the bigger missing gap is merges. Suppose in the
> development branch, you rename a whole bunch of files. (For example,
> foo_super.c got moved to foo/super.c, foo_inode.c got moved to
> foo/inode.c, etc.)
>
> Now suppose there are fixes made in the stable branch, in the original
> foo_super.c and foo_inode.c files. Ideally you would want to be able
> to pull those changes into the development branch, where the files
> have new names, and have the changes be applied to foo/super.c and
> foo/inode.c in the development branch.
>
I believe git handles this case already, actually. I've seen this work
just fine many times.
What git doesn't handle, but BitKeeper does, is applying directory
renames to newly created files. I rename the "lib" directory to "util",
you create a new file lib/strings.c and update lib/Makefile to compile
it. I pull from you. Under BitKeeper, I will get util/strings.c and the
change will be applied to my util/Makefile. git will create a brand-new
"lib" directory containing nothing but the new file, but since the
Makefile existed before, it will (correctly) apply your change to my
util/Makefile, which will then break my build because it will refer to a
file that doesn't exist in the Makefile's directory.
This has bitten me a few times in real life, e.g. in cases where I'm
importing a third-party source tarfile and reorganizing it a little to
fit it into my local build system. Every time they add a new source
file, I have to go manually clean up after it rather than just merging
the vendor branch into mine like I can do when they don't add anything.
It is not frequent enough to be a major hassle for me but it sure is
annoying when it happens (especially since sometimes the build *doesn't*
break and it takes a while to notice a newly created file isn't where it
should be.)
-Steve
^ permalink raw reply [flat|nested] 10+ messages in thread
* Directory renames (was Re: blame follows renames, but log doesn't)
2007-06-19 9:54 ` Steven Grimm
@ 2007-06-19 18:28 ` Steven Grimm
2007-06-20 20:18 ` Sam Vilain
2007-06-20 22:11 ` blame follows renames, but log doesn't Jakub Narebski
1 sibling, 1 reply; 10+ messages in thread
From: Steven Grimm @ 2007-06-19 18:28 UTC (permalink / raw)
To: Theodore Tso; +Cc: Martin Langhoff, Git Mailing List
Directory renames can break in more cases than just adding new files.
Here's a demonstration based on a situation I ran into a few minutes ago
on a real-world project.
$ git init
$ mkdir a
$ echo some contents > a/file1
$ echo other contents > a/file2
$ git add a
$ git commit -m "initial commit"
So far so good. Now there's a revision where the files happen to be
identical. (On a new branch for later convenience.)
$ git checkout -b modifybranch
$ echo other contents > a/file1
$ git commit -a -m "commit that makes both files identical"
Now we rename the directory. (Again, the new branch will be used later.)
$ git checkout -b renamebranch
$ git mv a b
$ git commit -m "rename directory"
Make a change to file2.
$ echo more contents from renamebranch >> b/file2
$ git commit -a -m "add contents to file2"
Where did file2's contents come from?
$ git blame b/file2
a7dbcfdc a/file1 (Steven Grimm 2007-06-19 10:36:20 -0700 1) other contents
69a87194 b/file2 (Steven Grimm 2007-06-19 10:43:29 -0700 2) more contents
Which is wrong. The history of file2 has nothing to do with the history
of file1, but git blame thinks it does. However, that's not what blew up
on me in my real-world test; it gets better. Let's say one of the files
changed in a different branch.
$ git checkout master
$ echo a change to file2 in master >> a/file2
$ git commit -a -m "file2 changed"
$ git merge renamebranch
Removed a/file1
Merge made by recursive.
a/file1 | 1 -
a/file2 => b/file1 | 0
b/file2 | 2 ++
3 files changed, 2 insertions(+), 1 deletions(-)
delete mode 100644 a/file1
rename a/file2 => b/file1 (100%)
create mode 100644 b/file2
And this is just completely broken. What it *should* do is give me
b/file1 with "other contents" from renamebranch, and b/file2 with a
merge conflict since I added a different line to it in each branch.
Instead, the merge succeeds with no conflict and applies the change in
my current branch to the wrong file:
$ cat b/file1
other contents
a change to file2 in master
$ cat b/file2
other contents
more contents from renamebranch
There's one more variation on this theme that's broken in a similar but
not identical way (this didn't happen in my real-world scenario but I
ran into it while coming up with the above test case):
$ git checkout modifybranch
$ echo a change to file2 in modifybranch >> a/file2
$ git commit -a -m "a change in modifybranch"
$ git merge renamebranch
CONFLICT (delete/modify): a/file2 deleted in renamebranch and modified
in HEAD. Version HEAD of a/file2 left in tree.
Automatic merge failed; fix conflicts and then commit the result.
This should definitely be a merge conflict, but it shouldn't be *this*
merge conflict. What I'd expect here would be b/file1 == "other
contents" and b/file2 with conflict markers.
$ ls a b
a:
file2
b:
file1 file2
$ cat a/file2
other contents
a change to file2 in modifybranch
$ cat b/file1
other contents
$ cat b/file2
other contents
more contents from renamebranch
In other words, no conflict markers at all, and I still have the old
directory "a" with one of its two files in original form, but not the
other file.
Hope that's illuminating or at least interesting to someone.
-Steve
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Directory renames (was Re: blame follows renames, but log doesn't)
2007-06-19 18:28 ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm
@ 2007-06-20 20:18 ` Sam Vilain
2007-06-20 20:59 ` Steven Grimm
0 siblings, 1 reply; 10+ messages in thread
From: Sam Vilain @ 2007-06-20 20:18 UTC (permalink / raw)
To: Steven Grimm, Git Mailing List
Steven Grimm wrote:
> Hope that's illuminating or at least interesting to someone.
I didn't review your test cases in detail, but they seemed to suffer
from what I call "over-trivialization"; the heuristic methods don't work
very well for these non-real-world test cases because they're not long
enough. Are you confident that these deficiencies are still there with
longer examples?
Sam.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Directory renames (was Re: blame follows renames, but log doesn't)
2007-06-20 20:18 ` Sam Vilain
@ 2007-06-20 20:59 ` Steven Grimm
0 siblings, 0 replies; 10+ messages in thread
From: Steven Grimm @ 2007-06-20 20:59 UTC (permalink / raw)
To: Sam Vilain; +Cc: Git Mailing List
Sam Vilain wrote:
> I didn't review your test cases in detail, but they seemed to suffer
> from what I call "over-trivialization"; the heuristic methods don't work
> very well for these non-real-world test cases because they're not long
> enough. Are you confident that these deficiencies are still there with
> longer examples?
>
Those test cases were a demonstration of something I actually ran into
on a real-world project yesterday. The test cases are trivial and short
simply to make them easy to follow in an email message. If you
substitute longer contents for the test files in my example, you will
see the exact same behavior. The real file in question is around 2KB
long, not a monster but presumably long enough that the heuristics
should work.
Also -- though this doesn't happen to be relevant in the case where I
ran into this -- not all files in real-world projects are huge. If the
heuristics break on small test-case files then they will break on small
real-world files too. If nothing else, a real-world project can itself
contain trivial test data (for testing the project, not testing the
version control system) in the form of lots of small files with similar
or identical contents.
-Steve
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't
2007-06-19 9:54 ` Steven Grimm
2007-06-19 18:28 ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm
@ 2007-06-20 22:11 ` Jakub Narebski
1 sibling, 0 replies; 10+ messages in thread
From: Jakub Narebski @ 2007-06-20 22:11 UTC (permalink / raw)
To: git
Steven Grimm wrote:
> What git doesn't handle, but BitKeeper does, is applying directory
> renames to newly created files. I rename the "lib" directory to "util",
> you create a new file lib/strings.c and update lib/Makefile to compile
> it. I pull from you. Under BitKeeper, I will get util/strings.c and the
> change will be applied to my util/Makefile. git will create a brand-new
> "lib" directory containing nothing but the new file, but since the
> Makefile existed before, it will (correctly) apply your change to my
> util/Makefile, which will then break my build because it will refer to a
> file that doesn't exist in the Makefile's directory.
>
> This has bitten me a few times in real life, e.g. in cases where I'm
> importing a third-party source tarfile and reorganizing it a little to
> fit it into my local build system. Every time they add a new source
> file, I have to go manually clean up after it rather than just merging
> the vendor branch into mine like I can do when they don't add anything.
> It is not frequent enough to be a major hassle for me but it sure is
> annoying when it happens (especially since sometimes the build *doesn't*
> break and it takes a while to notice a newly created file isn't where it
> should be.)
I think git can at least try to detect this situation, and perhaps
even resolve this automatically. Namely, if we add file with the
following <path>: <dirname>/<basename>, and <dirname> tree does exists in
the ancestor but does not exist in the other branch, this is CONFLICT(add)
(or something like that), with appropriate explanation.
One way to resolve this CONFLICT(add) automatically would be to check where
all the files in no longer existing <dirname> moved to, and if they all are
of the form <newdir>/<somename> then we should add the <dirname>/<basename>
file under <newdir>/<basename>. If some of them were moved to other
directory, for example contents of one directory got split into two
directories, this is conflict which cannot be resolved automaticaly
(CONFLICT(add/multiple) or something like that perhaps?). And I guess that
SCM which _track_ renaming of directories, like Bazaar-NG, would NOT detect
this as a conflict, and happily add to perhaps wrong directory.
Or we could reuse rename detection, taking modes+filenames as tree contents,
or perhaps set of file contents as tree contents, for our content based
rename detection.
P.S. Allow me to remind you rename _detection_ success story, send here some
time ago by Johannes Schindelin in the "Rename handling" thread:
Message-ID: <Pine.LNX.4.63.0703210120230.22628@wbgn013.biozentrum.uni-wuerzburg.de>
http://permalink.gmane.org/gmane.comp.version-control.git/42770
JS> By now, there have been enough arguments _for_ automatic rename detection,
JS> but I'll add another one.
JS>
JS> A colleague of mine worked on a certain file in a branch, where he copied
JS> the file to another location, and heavily modified it. He did that in a
JS> branch, and when he was satisfied with the result, he deleted the old
JS> file, since he liked the new location better.
JS>
JS> Now, when I pulled, imagine my surprise (knowing the history of the file),
JS> when the pull reported a rename with a substantial similarity!
JS>
JS> So, the automatic renamer did an awesome job.
--
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-06-20 22:11 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-06-19 1:10 blame follows renames, but log doesn't Martin Langhoff
2007-06-19 1:34 ` Sam Vilain
2007-06-19 7:19 ` Theodore Tso
2007-06-19 8:31 ` Martin Langhoff
2007-06-19 8:39 ` Junio C Hamano
2007-06-19 9:54 ` Steven Grimm
2007-06-19 18:28 ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm
2007-06-20 20:18 ` Sam Vilain
2007-06-20 20:59 ` Steven Grimm
2007-06-20 22:11 ` blame follows renames, but log doesn't Jakub Narebski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).