* blame follows renames, but log doesn't
@ 2007-06-19 1:10 Martin Langhoff
2007-06-19 1:34 ` Sam Vilain
2007-06-19 7:19 ` Theodore Tso
0 siblings, 2 replies; 10+ messages in thread
From: Martin Langhoff @ 2007-06-19 1:10 UTC (permalink / raw)
To: Git Mailing List
Hi all,
when I show git to newbies or demo it to people using other SCMs, and
we get to the rename part of the conversation, I discuss and show how
GIT's approach is significantly better than explicit recording of
renames.
One great example is git-blame -- actually more spectacular with the
recent git gui blame improvements. But git-log still doesn't do it.
If I say
git blame git-cvsimport.perl # goes to the true origin like a champ
git log git-cvsimport.perl # stops at the Big Tool Rename
In thread in May Linus posted a PoC patch to get git-blame to do it
http://marc.info/?l=git&m=117347893211567&w=2 , and outlined the
reasons why it'd be wrong to try to do that in git-log -- but it
didn't come to happen :-/
cg-log used to have some Perl logic that could do this -- it didn't
always work, but I'm sometimes tempted to go back to it, and review
it.
Linus said:
> But it's an example of the fact that yes, git can do this, but we're so
> stupid that we don't really accept it.
And I'm sure people can cope with git blame --log path/to/file and we
can add a note to git-log manpage about renames being reported by
blame instead.
And I kind of hate having to reply to things like these
http://www.markshuttleworth.com/archives/125
cheers
martin
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't 2007-06-19 1:10 blame follows renames, but log doesn't Martin Langhoff @ 2007-06-19 1:34 ` Sam Vilain 2007-06-19 7:19 ` Theodore Tso 1 sibling, 0 replies; 10+ messages in thread From: Sam Vilain @ 2007-06-19 1:34 UTC (permalink / raw) To: Martin Langhoff; +Cc: Git Mailing List Martin Langhoff wrote: > And I kind of hate having to reply to things like these > > http://www.markshuttleworth.com/archives/125 I think that there should be clear conventions for how to place such breadcrumbs in the commit log, that can be suitably ignored or honoured. At least these two things fit into this category: 1. renaming. A comment on a changelog entry saying "I moved this file from A to B in this commit". With all of the user friendliness and limitations this implies (oh, you got the information wrong or didn't put it in? oh well, now history is b0rked forever, HAND) 2. cherry picking. bzr uses patch UUIDs, with all of the user friendliness and limitations this implies (oh, you merged that patch and accidentally didn't pick any changes? whoops, it's in your history anyway so never try to merge that again). Perhaps also there should be other conventions for how to encode other strange data out of the namespace of the filesystem ("in a different dimension", perhaps) like "file attributes". Sam. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't 2007-06-19 1:10 blame follows renames, but log doesn't Martin Langhoff 2007-06-19 1:34 ` Sam Vilain @ 2007-06-19 7:19 ` Theodore Tso 2007-06-19 8:31 ` Martin Langhoff ` (2 more replies) 1 sibling, 3 replies; 10+ messages in thread From: Theodore Tso @ 2007-06-19 7:19 UTC (permalink / raw) To: Martin Langhoff; +Cc: Git Mailing List On Tue, Jun 19, 2007 at 01:10:28PM +1200, Martin Langhoff wrote: > > when I show git to newbies or demo it to people using other SCMs, and > we get to the rename part of the conversation, I discuss and show how > GIT's approach is significantly better than explicit recording of > renames. > > One great example is git-blame -- actually more spectacular with the > recent git gui blame improvements. But git-log still doesn't do it. Actually, the bigger missing gap is merges. Suppose in the development branch, you rename a whole bunch of files. (For example, foo_super.c got moved to foo/super.c, foo_inode.c got moved to foo/inode.c, etc.) Now suppose there are fixes made in the stable branch, in the original foo_super.c and foo_inode.c files. Ideally you would want to be able to pull those changes into the development branch, where the files have new names, and have the changes be applied to foo/super.c and foo/inode.c in the development branch. I was recently talking to someone who is still using BitKeeper, and he cited this scenario as one of the reasons why his project is still using BK; he'd like to move to git, but this is a critical piece of functionality for him. - Ted ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't 2007-06-19 7:19 ` Theodore Tso @ 2007-06-19 8:31 ` Martin Langhoff 2007-06-19 8:39 ` Junio C Hamano 2007-06-19 9:54 ` Steven Grimm 2 siblings, 0 replies; 10+ messages in thread From: Martin Langhoff @ 2007-06-19 8:31 UTC (permalink / raw) To: Theodore Tso; +Cc: Git Mailing List On 6/19/07, Theodore Tso <tytso@mit.edu> wrote: > Actually, the bigger missing gap is merges. Suppose in the > development branch, you rename a whole bunch of files. (For example, > foo_super.c got moved to foo/super.c, foo_inode.c got moved to > foo/inode.c, etc.) I thought that the "recursive" strategy covered this - though I don't work on a tree that merges across branches with renames, so my experience is _very_ limited. >From Documentation/merge-strategies.txt: Additionally this can detect and handle merges involving renames. This is the default merge strategy when pulling or merging one branch. cheers m ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't 2007-06-19 7:19 ` Theodore Tso 2007-06-19 8:31 ` Martin Langhoff @ 2007-06-19 8:39 ` Junio C Hamano 2007-06-19 9:54 ` Steven Grimm 2 siblings, 0 replies; 10+ messages in thread From: Junio C Hamano @ 2007-06-19 8:39 UTC (permalink / raw) To: Theodore Tso; +Cc: Martin Langhoff, Git Mailing List Theodore Tso <tytso@mit.edu> writes: > Actually, the bigger missing gap is merges. Suppose in the > development branch, you rename a whole bunch of files. (For example, > foo_super.c got moved to foo/super.c, foo_inode.c got moved to > foo/inode.c, etc.) > > Now suppose there are fixes made in the stable branch, in the original > foo_super.c and foo_inode.c files. Ideally you would want to be able > to pull those changes into the development branch, where the files > have new names, and have the changes be applied to foo/super.c and > foo/inode.c in the development branch. That happens already with merge-recursive code, which has been the default since late November 2005 (v0.99.9k and later should have it). What does _not_ happen is if foo_fixes.c was _created_ in the stable branch. A merge that tries to forward port such a fix would not move the foo_fixes.c to foo/fixes.c. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't 2007-06-19 7:19 ` Theodore Tso 2007-06-19 8:31 ` Martin Langhoff 2007-06-19 8:39 ` Junio C Hamano @ 2007-06-19 9:54 ` Steven Grimm 2007-06-19 18:28 ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm 2007-06-20 22:11 ` blame follows renames, but log doesn't Jakub Narebski 2 siblings, 2 replies; 10+ messages in thread From: Steven Grimm @ 2007-06-19 9:54 UTC (permalink / raw) To: Theodore Tso; +Cc: Martin Langhoff, Git Mailing List Theodore Tso wrote: > Actually, the bigger missing gap is merges. Suppose in the > development branch, you rename a whole bunch of files. (For example, > foo_super.c got moved to foo/super.c, foo_inode.c got moved to > foo/inode.c, etc.) > > Now suppose there are fixes made in the stable branch, in the original > foo_super.c and foo_inode.c files. Ideally you would want to be able > to pull those changes into the development branch, where the files > have new names, and have the changes be applied to foo/super.c and > foo/inode.c in the development branch. > I believe git handles this case already, actually. I've seen this work just fine many times. What git doesn't handle, but BitKeeper does, is applying directory renames to newly created files. I rename the "lib" directory to "util", you create a new file lib/strings.c and update lib/Makefile to compile it. I pull from you. Under BitKeeper, I will get util/strings.c and the change will be applied to my util/Makefile. git will create a brand-new "lib" directory containing nothing but the new file, but since the Makefile existed before, it will (correctly) apply your change to my util/Makefile, which will then break my build because it will refer to a file that doesn't exist in the Makefile's directory. This has bitten me a few times in real life, e.g. in cases where I'm importing a third-party source tarfile and reorganizing it a little to fit it into my local build system. Every time they add a new source file, I have to go manually clean up after it rather than just merging the vendor branch into mine like I can do when they don't add anything. It is not frequent enough to be a major hassle for me but it sure is annoying when it happens (especially since sometimes the build *doesn't* break and it takes a while to notice a newly created file isn't where it should be.) -Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Directory renames (was Re: blame follows renames, but log doesn't) 2007-06-19 9:54 ` Steven Grimm @ 2007-06-19 18:28 ` Steven Grimm 2007-06-20 20:18 ` Sam Vilain 2007-06-20 22:11 ` blame follows renames, but log doesn't Jakub Narebski 1 sibling, 1 reply; 10+ messages in thread From: Steven Grimm @ 2007-06-19 18:28 UTC (permalink / raw) To: Theodore Tso; +Cc: Martin Langhoff, Git Mailing List Directory renames can break in more cases than just adding new files. Here's a demonstration based on a situation I ran into a few minutes ago on a real-world project. $ git init $ mkdir a $ echo some contents > a/file1 $ echo other contents > a/file2 $ git add a $ git commit -m "initial commit" So far so good. Now there's a revision where the files happen to be identical. (On a new branch for later convenience.) $ git checkout -b modifybranch $ echo other contents > a/file1 $ git commit -a -m "commit that makes both files identical" Now we rename the directory. (Again, the new branch will be used later.) $ git checkout -b renamebranch $ git mv a b $ git commit -m "rename directory" Make a change to file2. $ echo more contents from renamebranch >> b/file2 $ git commit -a -m "add contents to file2" Where did file2's contents come from? $ git blame b/file2 a7dbcfdc a/file1 (Steven Grimm 2007-06-19 10:36:20 -0700 1) other contents 69a87194 b/file2 (Steven Grimm 2007-06-19 10:43:29 -0700 2) more contents Which is wrong. The history of file2 has nothing to do with the history of file1, but git blame thinks it does. However, that's not what blew up on me in my real-world test; it gets better. Let's say one of the files changed in a different branch. $ git checkout master $ echo a change to file2 in master >> a/file2 $ git commit -a -m "file2 changed" $ git merge renamebranch Removed a/file1 Merge made by recursive. a/file1 | 1 - a/file2 => b/file1 | 0 b/file2 | 2 ++ 3 files changed, 2 insertions(+), 1 deletions(-) delete mode 100644 a/file1 rename a/file2 => b/file1 (100%) create mode 100644 b/file2 And this is just completely broken. What it *should* do is give me b/file1 with "other contents" from renamebranch, and b/file2 with a merge conflict since I added a different line to it in each branch. Instead, the merge succeeds with no conflict and applies the change in my current branch to the wrong file: $ cat b/file1 other contents a change to file2 in master $ cat b/file2 other contents more contents from renamebranch There's one more variation on this theme that's broken in a similar but not identical way (this didn't happen in my real-world scenario but I ran into it while coming up with the above test case): $ git checkout modifybranch $ echo a change to file2 in modifybranch >> a/file2 $ git commit -a -m "a change in modifybranch" $ git merge renamebranch CONFLICT (delete/modify): a/file2 deleted in renamebranch and modified in HEAD. Version HEAD of a/file2 left in tree. Automatic merge failed; fix conflicts and then commit the result. This should definitely be a merge conflict, but it shouldn't be *this* merge conflict. What I'd expect here would be b/file1 == "other contents" and b/file2 with conflict markers. $ ls a b a: file2 b: file1 file2 $ cat a/file2 other contents a change to file2 in modifybranch $ cat b/file1 other contents $ cat b/file2 other contents more contents from renamebranch In other words, no conflict markers at all, and I still have the old directory "a" with one of its two files in original form, but not the other file. Hope that's illuminating or at least interesting to someone. -Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Directory renames (was Re: blame follows renames, but log doesn't) 2007-06-19 18:28 ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm @ 2007-06-20 20:18 ` Sam Vilain 2007-06-20 20:59 ` Steven Grimm 0 siblings, 1 reply; 10+ messages in thread From: Sam Vilain @ 2007-06-20 20:18 UTC (permalink / raw) To: Steven Grimm, Git Mailing List Steven Grimm wrote: > Hope that's illuminating or at least interesting to someone. I didn't review your test cases in detail, but they seemed to suffer from what I call "over-trivialization"; the heuristic methods don't work very well for these non-real-world test cases because they're not long enough. Are you confident that these deficiencies are still there with longer examples? Sam. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Directory renames (was Re: blame follows renames, but log doesn't) 2007-06-20 20:18 ` Sam Vilain @ 2007-06-20 20:59 ` Steven Grimm 0 siblings, 0 replies; 10+ messages in thread From: Steven Grimm @ 2007-06-20 20:59 UTC (permalink / raw) To: Sam Vilain; +Cc: Git Mailing List Sam Vilain wrote: > I didn't review your test cases in detail, but they seemed to suffer > from what I call "over-trivialization"; the heuristic methods don't work > very well for these non-real-world test cases because they're not long > enough. Are you confident that these deficiencies are still there with > longer examples? > Those test cases were a demonstration of something I actually ran into on a real-world project yesterday. The test cases are trivial and short simply to make them easy to follow in an email message. If you substitute longer contents for the test files in my example, you will see the exact same behavior. The real file in question is around 2KB long, not a monster but presumably long enough that the heuristics should work. Also -- though this doesn't happen to be relevant in the case where I ran into this -- not all files in real-world projects are huge. If the heuristics break on small test-case files then they will break on small real-world files too. If nothing else, a real-world project can itself contain trivial test data (for testing the project, not testing the version control system) in the form of lots of small files with similar or identical contents. -Steve ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: blame follows renames, but log doesn't 2007-06-19 9:54 ` Steven Grimm 2007-06-19 18:28 ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm @ 2007-06-20 22:11 ` Jakub Narebski 1 sibling, 0 replies; 10+ messages in thread From: Jakub Narebski @ 2007-06-20 22:11 UTC (permalink / raw) To: git Steven Grimm wrote: > What git doesn't handle, but BitKeeper does, is applying directory > renames to newly created files. I rename the "lib" directory to "util", > you create a new file lib/strings.c and update lib/Makefile to compile > it. I pull from you. Under BitKeeper, I will get util/strings.c and the > change will be applied to my util/Makefile. git will create a brand-new > "lib" directory containing nothing but the new file, but since the > Makefile existed before, it will (correctly) apply your change to my > util/Makefile, which will then break my build because it will refer to a > file that doesn't exist in the Makefile's directory. > > This has bitten me a few times in real life, e.g. in cases where I'm > importing a third-party source tarfile and reorganizing it a little to > fit it into my local build system. Every time they add a new source > file, I have to go manually clean up after it rather than just merging > the vendor branch into mine like I can do when they don't add anything. > It is not frequent enough to be a major hassle for me but it sure is > annoying when it happens (especially since sometimes the build *doesn't* > break and it takes a while to notice a newly created file isn't where it > should be.) I think git can at least try to detect this situation, and perhaps even resolve this automatically. Namely, if we add file with the following <path>: <dirname>/<basename>, and <dirname> tree does exists in the ancestor but does not exist in the other branch, this is CONFLICT(add) (or something like that), with appropriate explanation. One way to resolve this CONFLICT(add) automatically would be to check where all the files in no longer existing <dirname> moved to, and if they all are of the form <newdir>/<somename> then we should add the <dirname>/<basename> file under <newdir>/<basename>. If some of them were moved to other directory, for example contents of one directory got split into two directories, this is conflict which cannot be resolved automaticaly (CONFLICT(add/multiple) or something like that perhaps?). And I guess that SCM which _track_ renaming of directories, like Bazaar-NG, would NOT detect this as a conflict, and happily add to perhaps wrong directory. Or we could reuse rename detection, taking modes+filenames as tree contents, or perhaps set of file contents as tree contents, for our content based rename detection. P.S. Allow me to remind you rename _detection_ success story, send here some time ago by Johannes Schindelin in the "Rename handling" thread: Message-ID: <Pine.LNX.4.63.0703210120230.22628@wbgn013.biozentrum.uni-wuerzburg.de> http://permalink.gmane.org/gmane.comp.version-control.git/42770 JS> By now, there have been enough arguments _for_ automatic rename detection, JS> but I'll add another one. JS> JS> A colleague of mine worked on a certain file in a branch, where he copied JS> the file to another location, and heavily modified it. He did that in a JS> branch, and when he was satisfied with the result, he deleted the old JS> file, since he liked the new location better. JS> JS> Now, when I pulled, imagine my surprise (knowing the history of the file), JS> when the pull reported a rename with a substantial similarity! JS> JS> So, the automatic renamer did an awesome job. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-06-20 22:11 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-06-19 1:10 blame follows renames, but log doesn't Martin Langhoff 2007-06-19 1:34 ` Sam Vilain 2007-06-19 7:19 ` Theodore Tso 2007-06-19 8:31 ` Martin Langhoff 2007-06-19 8:39 ` Junio C Hamano 2007-06-19 9:54 ` Steven Grimm 2007-06-19 18:28 ` Directory renames (was Re: blame follows renames, but log doesn't) Steven Grimm 2007-06-20 20:18 ` Sam Vilain 2007-06-20 20:59 ` Steven Grimm 2007-06-20 22:11 ` blame follows renames, but log doesn't Jakub Narebski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).