* Basename matching during rename/copy detection @ 2007-06-21 3:06 Shawn O. Pearce 2007-06-21 3:13 ` Junio C Hamano 2007-06-21 3:42 ` Linus Torvalds 0 siblings, 2 replies; 44+ messages in thread From: Shawn O. Pearce @ 2007-06-21 3:06 UTC (permalink / raw) To: git; +Cc: govindsalinas So Govind Salinas has found an interesting case in the rename detection code: $ git clone git://repo.or.cz/Widgit.git $ git diff -M --raw -r 192e^ 192e | grep .resx :100755 000000 4c8ab79... 0000000... D Form1.resx :100755 100755 9e70146... 9e70146... R100 CommitViewer.resx UI/CommitViewer.resx :100755 100755 90929fd... b40ff98... C091 RepoManager.resx UI/Form1.resx :100755 100755 90929fd... 90929fd... C100 PreferencesEditor.resx UI/PreferencesEditor.resx :100755 100755 90929fd... 90929fd... R100 PreferencesEditor.resx UI/RepoManager.resx :100755 100755 90929fd... 8535007... R097 RepoManager.resx UI/RepoTreeView.resx In this case several files had identical old images, and some kept that old image during the rename. Unfortunately because of the ordering of the files in the tree Git has decided to "rename" the PreferencesEditor.resx file to UI/RepoManager.resx, rather than renaming RepoManager.resx to UI/RepoManager.resx. Go Git. I'm wondering if we shouldn't play the game of trying to match delete/add pairs up by not only similarity, but also by path basename. In the case above its exactly what Govind thought should happen; he moved the file from one directory to another, and didn't even change its content during the move. But Git decided "better" to use a totally different file in the "rename". -- Shawn. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 3:06 Basename matching during rename/copy detection Shawn O. Pearce @ 2007-06-21 3:13 ` Junio C Hamano 2007-06-21 8:00 ` Andy Parkins 2007-06-21 3:42 ` Linus Torvalds 1 sibling, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2007-06-21 3:13 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git, govindsalinas "Shawn O. Pearce" <spearce@spearce.org> writes: > So Govind Salinas has found an interesting case in the rename > detection code: > > $ git clone git://repo.or.cz/Widgit.git > $ git diff -M --raw -r 192e^ 192e | grep .resx > :100755 000000 4c8ab79... 0000000... D Form1.resx > :100755 100755 9e70146... 9e70146... R100 CommitViewer.resx UI/CommitViewer.resx > :100755 100755 90929fd... b40ff98... C091 RepoManager.resx UI/Form1.resx > :100755 100755 90929fd... 90929fd... C100 PreferencesEditor.resx UI/PreferencesEditor.resx > :100755 100755 90929fd... 90929fd... R100 PreferencesEditor.resx UI/RepoManager.resx > :100755 100755 90929fd... 8535007... R097 RepoManager.resx UI/RepoTreeView.resx > > In this case several files had identical old images, and some > kept that old image during the rename. Unfortunately because of > the ordering of the files in the tree Git has decided to "rename" > the PreferencesEditor.resx file to UI/RepoManager.resx, rather than > renaming RepoManager.resx to UI/RepoManager.resx. Go Git. > > I'm wondering if we shouldn't play the game of trying to match > delete/add pairs up by not only similarity, but also by path > basename. In the case above its exactly what Govind thought should > happen; he moved the file from one directory to another, and didn't > even change its content during the move. But Git decided "better" > to use a totally different file in the "rename". Actually, git did not decide anything, and certainly not better. Having many "identical files" in the preimage is just stupid to begin with (if you know they are identical, why are you storing copies, instead of your build procedure to reuse the same file), so the algorithm did not bother finding a better match among "equals". I am not opposed to a patch that says "Ok, these two preimages have identical similarity score, *AND* indeed the preimages have the same contents --- we tiebreak them with other heuristics to help stupid projects better". And I can see basename similarity one of the useful heuristics you could use. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 3:13 ` Junio C Hamano @ 2007-06-21 8:00 ` Andy Parkins 2007-06-21 8:07 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Andy Parkins @ 2007-06-21 8:00 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, Shawn O. Pearce, govindsalinas On Thursday 2007 June 21, Junio C Hamano wrote: > Having many "identical files" in the preimage is just stupid to > begin with (if you know they are identical, why are you storing > copies, instead of your build procedure to reuse the same file), > so the algorithm did not bother finding a better match among > "equals". That's a really poor argument; it's not git's place to impose restrictions on what is stored in it. What if it's not a build environment at all but a home directory that's being stored - should no one be allowed to store copies of files because it's "stupid"? What if it's a collection of images that all started out the same, but have gradually had detail added (which is actually what I do in my GUI programs for toolbar images)? What about files that are used as flags, and are all identically empty. None of those seems like an abuse of a VCS to me. In fact, I'd say it's one of git's strengths that a duplicate file in the working tree doesn't take up any extra space in the repository. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 8:00 ` Andy Parkins @ 2007-06-21 8:07 ` Junio C Hamano 2007-06-21 9:50 ` Andy Parkins 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2007-06-21 8:07 UTC (permalink / raw) To: Andy Parkins; +Cc: git, Shawn O. Pearce, govindsalinas Andy Parkins <andyparkins@gmail.com> writes: > On Thursday 2007 June 21, Junio C Hamano wrote: > >> Having many "identical files" in the preimage is just stupid to >> begin with (if you know they are identical, why are you storing >> copies, instead of your build procedure to reuse the same file), >> so the algorithm did not bother finding a better match among >> "equals". > > That's a really poor argument; it's not git's place to impose restrictions on > what is stored in it. It's not even an argument, nor an attempt to justify it. It was just an explanation of historical fact "It did not bother". Please re-read the final part of the message, which you omitted from your quote. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 8:07 ` Junio C Hamano @ 2007-06-21 9:50 ` Andy Parkins 2007-06-21 11:52 ` Johannes Schindelin 0 siblings, 1 reply; 44+ messages in thread From: Andy Parkins @ 2007-06-21 9:50 UTC (permalink / raw) To: git; +Cc: Junio C Hamano On Thursday 2007 June 21, Junio C Hamano wrote: > It's not even an argument, nor an attempt to justify it. It was > just an explanation of historical fact "It did not bother". > Please re-read the final part of the message, which you omitted > from your quote. I omitted it because I didn't object to that part :-) I appreciated (as always) your practicality in that what you proposed would let people keep their copies. What I was objecting to was the idea that any repository with duplicate files was "stupid". Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 9:50 ` Andy Parkins @ 2007-06-21 11:52 ` Johannes Schindelin 2007-06-21 12:44 ` Andy Parkins 0 siblings, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2007-06-21 11:52 UTC (permalink / raw) To: Andy Parkins; +Cc: git, Junio C Hamano Hi, On Thu, 21 Jun 2007, Andy Parkins wrote: > On Thursday 2007 June 21, Junio C Hamano wrote: > > > It's not even an argument, nor an attempt to justify it. It was just > > an explanation of historical fact "It did not bother". Please re-read > > the final part of the message, which you omitted from your quote. > > I appreciated (as always) your practicality in that what you proposed > would let people keep their copies. What I was objecting to was the > idea that any repository with duplicate files was "stupid". FWIW I find it stupid, too. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 11:52 ` Johannes Schindelin @ 2007-06-21 12:44 ` Andy Parkins 2007-06-21 12:53 ` Matthieu Moy 2007-06-21 13:22 ` Johannes Schindelin 0 siblings, 2 replies; 44+ messages in thread From: Andy Parkins @ 2007-06-21 12:44 UTC (permalink / raw) To: git; +Cc: Johannes Schindelin, Junio C Hamano On Thursday 2007 June 21, Johannes Schindelin wrote: > > would let people keep their copies. What I was objecting to was the > > idea that any repository with duplicate files was "stupid". > > FWIW I find it stupid, too. Thanks very much. Okay, as I've been put in the position of defending this, let me give you the use case that has cropped up for me to do this stupid thing. I've got a GUI program with a load of tool buttons. Each of those buttons will, in the final product, be unique images. When I write the program, I want to be able to refer to each of the button images in the correct place. e.g. setImage( NewButton, "path/to/new-button.png" ); setImage( OpenButton, "path/to/open-button.png" ); setImage( SaveButton, "path/to/save-button.png" ); Unfortunately, I can't draw. So, I open up gimp, draw a big red X and save it as new-button.png. Then I copy that file to open-button.png and save-button.png, knowing that at some point in the future, someone will come and replace those red-X images with something appropriate. All those images now go in the repository. Symbolic links are not an option, as it's got to be checkable out on Windows. Tell me what part of that is stupid? Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 12:44 ` Andy Parkins @ 2007-06-21 12:53 ` Matthieu Moy 2007-06-21 13:10 ` Jeff King 2007-06-21 13:18 ` Johannes Schindelin 2007-06-21 13:22 ` Johannes Schindelin 1 sibling, 2 replies; 44+ messages in thread From: Matthieu Moy @ 2007-06-21 12:53 UTC (permalink / raw) To: Andy Parkins; +Cc: git, Johannes Schindelin, Junio C Hamano Andy Parkins <andyparkins@gmail.com> writes: > On Thursday 2007 June 21, Johannes Schindelin wrote: > >> > would let people keep their copies. What I was objecting to was the >> > idea that any repository with duplicate files was "stupid". >> >> FWIW I find it stupid, too. > > Thanks very much. Okay, as I've been put in the position of defending this, > let me give you the use case that has cropped up for me to do this stupid > thing. Well, why look so far to find an example of people having identical files in their tree? $ cd git $ git-ls-files -z | xargs -0 md5sum | cut -f 1 -d ' ' | wc -l 973 $ git-ls-files -z | xargs -0 md5sum | cut -f 1 -d ' ' | sort | uniq | wc -l 964 $ -- Matthieu ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 12:53 ` Matthieu Moy @ 2007-06-21 13:10 ` Jeff King 2007-06-21 13:18 ` Johannes Schindelin 1 sibling, 0 replies; 44+ messages in thread From: Jeff King @ 2007-06-21 13:10 UTC (permalink / raw) To: git; +Cc: Andy Parkins, Johannes Schindelin, Junio C Hamano On Thu, Jun 21, 2007 at 02:53:32PM +0200, Matthieu Moy wrote: > Well, why look so far to find an example of people having identical > files in their tree? > > $ cd git > $ git-ls-files -z | xargs -0 md5sum | cut -f 1 -d ' ' | wc -l > 973 > $ git-ls-files -z | xargs -0 md5sum | cut -f 1 -d ' ' | sort | uniq | wc -l > 964 md5? What is this, CVS? How about: git-ls-files -s | cut -d' ' -f2 | sort | uniq -d | wc -l Your pipeline will also list files in the working directory, which can inflate the number of duplicates (note that git-foo.sh and git-foo will have the same content). -Peff PS Please don't take this to mean I think duplicate files are stupid; I think they can be quite useful. I just wanted to nitpick your shell command. :) ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 12:53 ` Matthieu Moy 2007-06-21 13:10 ` Jeff King @ 2007-06-21 13:18 ` Johannes Schindelin 2007-06-21 13:25 ` Matthieu Moy 1 sibling, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2007-06-21 13:18 UTC (permalink / raw) To: Matthieu Moy; +Cc: Andy Parkins, git, Junio C Hamano Hi, On Thu, 21 Jun 2007, Matthieu Moy wrote: > Well, why look so far to find an example of people having identical > files in their tree? > > $ cd git > $ git-ls-files -z | xargs -0 md5sum | cut -f 1 -d ' ' | wc -l > 973 > $ git-ls-files -z | xargs -0 md5sum | cut -f 1 -d ' ' | sort | uniq | wc -l > 964 > $ Have you checked the files? They are all some blobs in the test scripts. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 13:18 ` Johannes Schindelin @ 2007-06-21 13:25 ` Matthieu Moy 2007-06-21 13:52 ` Johannes Schindelin 0 siblings, 1 reply; 44+ messages in thread From: Matthieu Moy @ 2007-06-21 13:25 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Andy Parkins, git, Junio C Hamano Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > Have you checked the files? They are all some blobs in the test scripts. Yes, but how does it make any difference? You still want git to manage them properly, don't you? -- Matthieu ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 13:25 ` Matthieu Moy @ 2007-06-21 13:52 ` Johannes Schindelin 2007-06-21 15:37 ` Steven Grimm 0 siblings, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2007-06-21 13:52 UTC (permalink / raw) To: Matthieu Moy; +Cc: Andy Parkins, git, Junio C Hamano Hi, On Thu, 21 Jun 2007, Matthieu Moy wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > Have you checked the files? They are all some blobs in the test scripts. > > Yes, but how does it make any difference? You still want git to manage > them properly, don't you? Yes. And Git explicitely allows what I call stupid. And yes, those _identical_ files in the test suit should probably all be folded into single files, and the places where they are used should reference _that_ single instance. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 13:52 ` Johannes Schindelin @ 2007-06-21 15:37 ` Steven Grimm 2007-06-21 15:53 ` Johannes Schindelin 0 siblings, 1 reply; 44+ messages in thread From: Steven Grimm @ 2007-06-21 15:37 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Matthieu Moy, Andy Parkins, git, Junio C Hamano Johannes Schindelin wrote: > Yes. And Git explicitely allows what I call stupid. And yes, those > _identical_ files in the test suit should probably all be folded into > single files, and the places where they are used should reference _that_ > single instance. > Two files that are identical in the current revision have not necessarily been identical from the beginning. Doing what you suggest will cause you to lose the history of all but one of those files. Files can absolutely become identical in the real world. I know that for a fact because it happened to me just this week (see my "Directory renames" message from a few days ago.) Are you seriously suggesting that every time I unpack an update from a third party, I should go through it and see if they have changed any files such that the contents now match another file in my repository, and if so, I should remove all but one of the copies from my repository and have a build system create it instead? Then undo that work when I unpack another update and the files are no longer identical? Well, no, I know you're not suggesting that, but it's the logical conclusion of the "it's stupid to ever have duplicate files" philosophy. While that approach certainly makes life easier for the version control system, it doesn't exactly make life easier for the *developer*, which is kind of the whole point of why we're here. -Steve ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 15:37 ` Steven Grimm @ 2007-06-21 15:53 ` Johannes Schindelin 2007-06-21 16:57 ` Steven Grimm 0 siblings, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2007-06-21 15:53 UTC (permalink / raw) To: Steven Grimm; +Cc: Matthieu Moy, Andy Parkins, git, Junio C Hamano Hi, On Thu, 21 Jun 2007, Steven Grimm wrote: > Johannes Schindelin wrote: > > Yes. And Git explicitely allows what I call stupid. And yes, those > > _identical_ files in the test suit should probably all be folded into > > single files, and the places where they are used should reference _that_ > > single instance. > > > > Two files that are identical in the current revision have not necessarily > been identical from the beginning. Doing what you suggest will cause you to > lose the history of all but one of those files. > > Files can absolutely become identical in the real world. I know that for a > fact because it happened to me just this week (see my "Directory renames" > message from a few days ago.) No, that message did not convince me. It was way too short on the side of facts. And no, I do not think that two unrelated files can get exactly the same content. Be that as may, even _if_ there were such a case, I'd still try to reuse the same file in the working directory. Just because Git can deal efficiently with millions of identical files does not mean that a working directory can, or worse, human developers. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 15:53 ` Johannes Schindelin @ 2007-06-21 16:57 ` Steven Grimm 0 siblings, 0 replies; 44+ messages in thread From: Steven Grimm @ 2007-06-21 16:57 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Matthieu Moy, Andy Parkins, git, Junio C Hamano Johannes Schindelin wrote: > No, that message did not convince me. It was way too short on the side of > facts. > Short of posting multiple historical versions of the third-party source code in question, I'm not sure what I can do to convince you. And I'd rather not violate the license agreement on that code. I would have thought, though, that the fact that I supplied a detailed, reproducible test case with obviously broken behavior would itself have been pretty convincing. The fact that not all projects contain any short files, or any files whose contents have ever been identical, does not cause git's behavior in that test case to be correct. "It's broken and unfixable" is one thing; "It's broken and we don't care" is another; and "It's broken and we care but it's not at the top of anyone's priority list to fix" is something else again. All of those are fine, but "If it's broken, you are stupid" and "If it's broken, it's a sign your project isn't real" are not. Or, to take another tack on this entirely, it is not the proper function of a version control system to dictate the contents of the projects under its control. It should take whatever we humans throw at it and reproduce those contents faithfully with coherent, non-jumbled history. It should do so even if what we're throwing at it is completely stupid. By the way, I'll toss out one more example of legitimate duplicate files, though admittedly one where you might not care so much about history jumbling: if you have a project that makes use of two GPL libraries or utilities whose source you want to keep locally, e.g. because you are making local modifications, you will have two copies of the GNU "COPYING" file. Neither one produced by a build system (or at least, not by *your* build system) and you are not permitted by the terms of the GPL to publish a copy of either piece of software without a verbatim copy of its license -- it says so right in section 1 of the GPL (the "keep intact" wording.) Removing one of those copies and expecting a build system to reconstruct it after someone clones your repository would arguably be a violation of the GPL. -Steve ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 12:44 ` Andy Parkins 2007-06-21 12:53 ` Matthieu Moy @ 2007-06-21 13:22 ` Johannes Schindelin 1 sibling, 0 replies; 44+ messages in thread From: Johannes Schindelin @ 2007-06-21 13:22 UTC (permalink / raw) To: Andy Parkins; +Cc: git, Junio C Hamano Hi, On Thu, 21 Jun 2007, Andy Parkins wrote: > I open up gimp, draw a big red X and save it as new-button.png. Then I > copy that file to open-button.png and save-button.png, knowing that at > some point in the future, someone will come and replace those red-X > images with something appropriate. So you have a couple of identical files in your repo, which are placeholders. That is quite different from what I criticised of being stupid. Well, I would not have checked in the files in your place, but only one: dumb-red-X.png Then, my Makefile would have checked for the existence of, say, my-wonderful-ok-button.png, and if it does not exist yet, copy dumb-red-X.png to it. Now, when somebody comes along, paining the prettiest ok button I ever saw, I copy that over the copy of dumb-red-X.png, and check it in. It has the further bonus that I know exactly which buttons I have to find a suck^Wgifted artist for. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: Basename matching during rename/copy detection 2007-06-21 3:06 Basename matching during rename/copy detection Shawn O. Pearce 2007-06-21 3:13 ` Junio C Hamano @ 2007-06-21 3:42 ` Linus Torvalds 2007-06-21 11:52 ` [PATCH] diffcore-rename: favour identical basenames Johannes Schindelin 1 sibling, 1 reply; 44+ messages in thread From: Linus Torvalds @ 2007-06-21 3:42 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: git, govindsalinas On Wed, 20 Jun 2007, Shawn O. Pearce wrote: > > I'm wondering if we shouldn't play the game of trying to match > delete/add pairs up by not only similarity, but also by path > basename. I think we should just consider the basename as an "added similarity bonus". IOW, we currently sort purely by data similarity, but how about just adding a small increment for "same base name". We could make it actually use the similarity of the filename itself as the basis for the increment, which would be even better, but the trivial thing is to do something like --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -186,8 +186,11 @@ static int estimate_similarity(struct diff_filespec *src, */ if (!dst->size) score = 0; /* should not happen */ - else + else { score = (int)(src_copied * MAX_SCORE / max_size); + if (basename_same(src, dst)) + score++; + } return score; } and just implement that "basename_same()" function. Or something. I do agree that the filename logically can and probably _should_ count towards the "similarity". The filename _is_ part of the data in the global notion of "content", after all. It's the "index" to the data. Linus ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH] diffcore-rename: favour identical basenames 2007-06-21 3:42 ` Linus Torvalds @ 2007-06-21 11:52 ` Johannes Schindelin 2007-06-21 13:19 ` Jeff King 2007-06-23 5:44 ` [PATCH] diffcore-rename: favour identical basenames Junio C Hamano 0 siblings, 2 replies; 44+ messages in thread From: Johannes Schindelin @ 2007-06-21 11:52 UTC (permalink / raw) To: Linus Torvalds; +Cc: Shawn O. Pearce, git, govindsalinas, gitster When there are several candidates for a rename source, and one of them has an identical basename to the rename target, take that one. Noticed by Govind Salinas, posted by Shawn O. Pearce, partial patch by Linus Torvalds. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- On Wed, 20 Jun 2007, Linus Torvalds wrote: > I think we should just consider the basename as an "added > similarity bonus". > > IOW, we currently sort purely by data similarity, but how about > just adding a small increment for "same base name". > > [patch suggestion snipped, since it is identical what is below] How 'bout this? diffcore-rename.c | 33 ++++++++++++++++++++++++++++++++- t/t4001-diff-rename.sh | 13 +++++++++++++ 2 files changed, 45 insertions(+), 1 deletions(-) diff --git a/diffcore-rename.c b/diffcore-rename.c index 93c40d9..79c984c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -119,6 +119,21 @@ static int is_exact_match(struct diff_filespec *src, return 0; } +static int basename_same(struct diff_filespec *src, struct diff_filespec *dst) +{ + int src_len = strlen(src->path), dst_len = strlen(dst->path); + while (src_len && dst_len) { + char c1 = src->path[--src_len]; + char c2 = dst->path[--dst_len]; + if (c1 != c2) + return 0; + if (c1 == '/') + return 1; + } + return (!src_len || src->path[src_len - 1] == '/') && + (!dst_len || dst->path[dst_len - 1] == '/'); +} + struct diff_score { int src; /* index in rename_src */ int dst; /* index in rename_dst */ @@ -186,8 +201,11 @@ static int estimate_similarity(struct diff_filespec *src, */ if (!dst->size) score = 0; /* should not happen */ - else + else { score = (int)(src_copied * MAX_SCORE / max_size); + if (basename_same(src, dst)) + score++; + } return score; } @@ -295,9 +313,22 @@ void diffcore_rename(struct diff_options *options) if (rename_dst[i].pair) continue; /* dealt with an earlier round */ for (j = 0; j < rename_src_nr; j++) { + int k; struct diff_filespec *one = rename_src[j].one; if (!is_exact_match(one, two, contents_too)) continue; + + /* see if there is a basename match, too */ + for (k = j; k < rename_src_nr; k++) { + one = rename_src[k].one; + if (basename_same(one, two) && + is_exact_match(one, two, + contents_too)) { + j = k; + break; + } + } + record_rename_pair(i, j, (int)MAX_SCORE); rename_count++; break; /* we are done with this entry */ diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh index 2e3c20d..90c085f 100755 --- a/t/t4001-diff-rename.sh +++ b/t/t4001-diff-rename.sh @@ -64,4 +64,17 @@ test_expect_success \ 'validate the output.' \ 'compare_diff_patch current expected' +test_expect_success 'favour same basenames over different ones' ' + cp path1 another-path && + git add another-path && + git commit -m 1 && + git rm path1 && + mkdir subdir && + git mv another-path subdir/path1 && + git runstatus | grep "renamed: .*path1 -> subdir/path1"' + +test_expect_success 'favour same basenames even with minor differences' ' + git show HEAD:path1 | sed "s/15/16/" > subdir/path1 && + git runstatus | grep "renamed: .*path1 -> subdir/path1"' + test_done -- 1.5.2.2.2822.g027a6-dirty ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-21 11:52 ` [PATCH] diffcore-rename: favour identical basenames Johannes Schindelin @ 2007-06-21 13:19 ` Jeff King 2007-06-21 14:03 ` Johannes Schindelin ` (2 more replies) 2007-06-23 5:44 ` [PATCH] diffcore-rename: favour identical basenames Junio C Hamano 1 sibling, 3 replies; 44+ messages in thread From: Jeff King @ 2007-06-21 13:19 UTC (permalink / raw) To: Johannes Schindelin Cc: Linus Torvalds, Shawn O. Pearce, git, govindsalinas, gitster On Thu, Jun 21, 2007 at 12:52:11PM +0100, Johannes Schindelin wrote: > When there are several candidates for a rename source, and one of them > has an identical basename to the rename target, take that one. That's a reasonable heuristic, but it unfortunately won't match simple things like: i386_widget.c -> arch/i386/widget.c You really don't care about "is this a good match" as much as providing an order to potential matches. I think something like a Levenshtein distance between the full pathnames would give good results, and would cover almost every situation that the basename heuristic would (there are a few exceptions, like getting "file.c" from either "file2.c" or "foo/file.c", but that seems kind of pathological). Sorry to post without a patch, but I don't have time right this second. I'll add it to the end of my (ever-growing) todo list if you think it's a good idea and don't do it yourself. :) -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-21 13:19 ` Jeff King @ 2007-06-21 14:03 ` Johannes Schindelin 2007-06-21 16:20 ` Linus Torvalds 2007-06-22 1:14 ` Johannes Schindelin 2 siblings, 0 replies; 44+ messages in thread From: Johannes Schindelin @ 2007-06-21 14:03 UTC (permalink / raw) To: Jeff King; +Cc: Linus Torvalds, Shawn O. Pearce, git, govindsalinas, gitster Hi, On Thu, 21 Jun 2007, Jeff King wrote: > On Thu, Jun 21, 2007 at 12:52:11PM +0100, Johannes Schindelin wrote: > > > When there are several candidates for a rename source, and one of them > > has an identical basename to the rename target, take that one. > > That's a reasonable heuristic, but it unfortunately won't match simple > things like: > > i386_widget.c -> arch/i386/widget.c That's right. But every heuristic falls down eventually. Personally, I think basename_same() is good enough, even if the technical challenge to implement a small enough Levenshtein, which still respects directory boundaries somehow (and not just throws them away). Besides, Levenshtein would introduce a ranking, not a boolean value like basename_same(). And that complicates the code. All in all, I'd say Levenshtein is not worth the _result_. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-21 13:19 ` Jeff King 2007-06-21 14:03 ` Johannes Schindelin @ 2007-06-21 16:20 ` Linus Torvalds 2007-06-21 17:52 ` Junio C Hamano 2007-06-22 15:19 ` Andy Parkins 2007-06-22 1:14 ` Johannes Schindelin 2 siblings, 2 replies; 44+ messages in thread From: Linus Torvalds @ 2007-06-21 16:20 UTC (permalink / raw) To: Jeff King Cc: Johannes Schindelin, Shawn O. Pearce, git, govindsalinas, gitster On Thu, 21 Jun 2007, Jeff King wrote: > On Thu, Jun 21, 2007 at 12:52:11PM +0100, Johannes Schindelin wrote: > > > When there are several candidates for a rename source, and one of them > > has an identical basename to the rename target, take that one. > > That's a reasonable heuristic, but it unfortunately won't match simple > things like: > > i386_widget.c -> arch/i386/widget.c We'e also had things like arch/i386/kernel/pci-pc.c -> arch/i386/kernel/pci/common.c so it's not always the ending of a file that is unchanged, but you still often have some "similarity" of the name (ie the "pci" substring is still common there). So I agree that we can be even better about the heuristics. I don't know how much it *matters* in practice. I do agree with the people who argue that you simply shouldn't depend on these kinds of things, and if you have identical files, and move them around, you really are getting behaviour that doesn't matter. The files are *identical* for christ sake! Following their history, it doesn't matter *which* base you follow, since regardless, they've come to the same point! So in that sense, the current git behaviour is actually perfectly fine. At the same time, I'll argue from a totally theoretical point that the "filename" is obviously part of the data in the tree, and as such, a similarity comparison that takes only the data into account is a bit limited. So while I don't think a user should really care, I also think that keeping the filename as part of the similarity analysis is actually a perfectly logical and valid thing to do withing the git policy of "content is king". The filename *is* part of the content, and it's doubly so when you think about a rename or copy operation, where the whole point of the exercise is as much about the filename as about the data inside the file. Linus ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-21 16:20 ` Linus Torvalds @ 2007-06-21 17:52 ` Junio C Hamano 2007-06-21 18:24 ` Linus Torvalds 2007-06-22 15:19 ` Andy Parkins 1 sibling, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2007-06-21 17:52 UTC (permalink / raw) To: Linus Torvalds Cc: Jeff King, Johannes Schindelin, Shawn O. Pearce, git, govindsalinas, gitster Linus Torvalds <torvalds@linux-foundation.org> writes: > We'e also had things like > > arch/i386/kernel/pci-pc.c -> arch/i386/kernel/pci/common.c > > so it's not always the ending of a file that is unchanged, but you still > often have some "similarity" of the name (ie the "pci" substring is still > common there). This is not an example to draw very useful conclusions, is it? The heuristics to say '-pc => common' is a more likely rename than '-obscure-arch => common' heavily depends on human intelligence in the context of a particular project, the kernel, where there are rules such as "peripherals are tested most widely on PC architectures, so assume that the vendors might have tested their stuff only on PCs". But I do agree that not limiting to basename has values. Taking example from the "I cannot draw so here is a red big X", it is quite possible that two red big X's are replaced with properly rendered icons, while their format modified, like so: images/ok-button.gif => images/buttons/ok.png images/cancel-button.gif => images/buttons/cancel.png This suggests that we might be able to look around to see what other rename src/target candidate files there are, so that we can figure out if there is a common pattern (i.e. in the above example, "patsubst images/%-button.gif,images/buttons/%.png" is what is going on). If we find such a pattern, we can base the assignment of "basename similarity bonus" on that pattern. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-21 17:52 ` Junio C Hamano @ 2007-06-21 18:24 ` Linus Torvalds 0 siblings, 0 replies; 44+ messages in thread From: Linus Torvalds @ 2007-06-21 18:24 UTC (permalink / raw) To: Junio C Hamano Cc: Jeff King, Johannes Schindelin, Shawn O. Pearce, git, govindsalinas On Thu, 21 Jun 2007, Junio C Hamano wrote: > > This is not an example to draw very useful conclusions, is it? > > The heuristics to say '-pc => common' is a more likely rename > than '-obscure-arch => common' heavily depends on human > intelligence in the context Oh, absolutely. I'm just saying that *if* you see two equally weighed content moves, if you then prefer the one that has more in common with the name, that's likely the right choice. In the actual example I gave, there was no ambiguity: the file contents were very obvious. But let's sat that you happened to have an example of two files with 100% identical content that moved, and you had the files -arch/i386/kernel/pci-pc.c -arch/alpha/kernel/pci-pc.c +arch/i386/kernel/pci/common.c +arch/alpha/kernel/pci/common.c to match up, how would you do it? Again: they're all identical files: we can obviously agree that two files got renamed, but what is the pairing. I'd suggest that if you do it by matching up the similarity of the filenames (not necessarily "exact same basename"), you'd actually catch it. In this case, they all have "pci" in them, but the "alpha" similarity would make you select the right one. Similarly, in some other cases, the "pci" might be the thing they have in common, and might be the thing that decides that "oh, those two filenames look like they might be more of a better pair". And yes, all of this would trigger only if the file data content match is non-conclusive. The file data is *more* important, but that doesn't mean that the file name similarity is *totally* unimportant either. Linus ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-21 16:20 ` Linus Torvalds 2007-06-21 17:52 ` Junio C Hamano @ 2007-06-22 15:19 ` Andy Parkins 2007-06-22 15:28 ` Johannes Schindelin 1 sibling, 1 reply; 44+ messages in thread From: Andy Parkins @ 2007-06-22 15:19 UTC (permalink / raw) To: git Cc: Linus Torvalds, Jeff King, Johannes Schindelin, Shawn O. Pearce, govindsalinas, gitster On Thursday 2007 June 21, Linus Torvalds wrote: > The files are *identical* for christ sake! Following their history, it > doesn't matter *which* base you follow, since regardless, they've come to > the same point! > > So in that sense, the current git behaviour is actually perfectly fine. Perhaps not. (Please don't read this as meaning I disagree with your favour-the-identical-filename patch at all - in fact I think that would address the case I give below). What if two files with different filenames and content converge at some point in history, then diverge again? If git is tracking renames merely by content and picks the wrong one, then the history of fileA suddenly becomes the history of fileB. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-22 15:19 ` Andy Parkins @ 2007-06-22 15:28 ` Johannes Schindelin 2007-06-22 17:51 ` Aidan Van Dyk 0 siblings, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2007-06-22 15:28 UTC (permalink / raw) To: Andy Parkins Cc: git, Linus Torvalds, Jeff King, Shawn O. Pearce, govindsalinas, gitster Hi, On Fri, 22 Jun 2007, Andy Parkins wrote: > What if two files with different filenames and content converge at some > point in history, then diverge again? If git is tracking renames merely > by content and picks the wrong one, then the history of fileA suddenly > becomes the history of fileB. This is becoming highly ethereal. Like "I could imagine that some day in future, some person could devise a device, that might allow you to do something that I can not explain, because I have not even thought of it". IOW show me a reasonable example, and we'll talk business. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-22 15:28 ` Johannes Schindelin @ 2007-06-22 17:51 ` Aidan Van Dyk 0 siblings, 0 replies; 44+ messages in thread From: Aidan Van Dyk @ 2007-06-22 17:51 UTC (permalink / raw) To: Johannes Schindelin Cc: Andy Parkins, git, Linus Torvalds, Jeff King, Shawn O. Pearce, govindsalinas, gitster [-- Attachment #1: Type: text/plain, Size: 1984 bytes --] * Johannes Schindelin <Johannes.Schindelin@gmx.de> [070622 13:34]: > Hi, > > On Fri, 22 Jun 2007, Andy Parkins wrote: > > > What if two files with different filenames and content converge at some > > point in history, then diverge again? If git is tracking renames merely > > by content and picks the wrong one, then the history of fileA suddenly > > becomes the history of fileB. > > This is becoming highly ethereal. Like "I could imagine that some day in > future, some person could devise a device, that might allow you to do > something that I can not explain, because I have not even thought of it". > > IOW show me a reasonable example, and we'll talk business. The one time the "content-only" rename tracking bit me was the after a merge, resulting in conflicts that were un-nessesary: -*-*-*-*-A-B-C-D \ *-E-* At A, there were 2 files: dir1/foo dir2/foo They were template files that happened to be the same in 2 themes. In E, "foo" was renamed to "foo-bar" in all the template directories. Git detected this not as 2 renames, but as: dir1/foo-bar renamed from dir1/foo dir2/foo-bar copied from dir1/foo dir2/foo deleted Meanwhile, work was happening in B, C, and D, changing foo in both templates identically. When the branch with E was merged back into ABCD, there was a merge conflict with dir2/foo being deleted in one branch, and editit in the other. In this case, the simple "basename" comparison wouldn't have even been enough. But the merge was easy enough (because no edits were made in the E branch to those files, just the renames) that I could resolve it easily. I don't know if preventing this easy-to-fix merge conflict is worth the necessary "likeness of names" necessary to avoid it... a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-21 13:19 ` Jeff King 2007-06-21 14:03 ` Johannes Schindelin 2007-06-21 16:20 ` Linus Torvalds @ 2007-06-22 1:14 ` Johannes Schindelin 2007-06-22 5:41 ` Jeff King 2007-06-22 7:17 ` Johannes Sixt 2 siblings, 2 replies; 44+ messages in thread From: Johannes Schindelin @ 2007-06-22 1:14 UTC (permalink / raw) To: Jeff King; +Cc: Linus Torvalds, Shawn O. Pearce, git, govindsalinas, gitster Hi, On Thu, 21 Jun 2007, Jeff King wrote: > I think something like a Levenshtein distance between the full pathnames > would give good results, and would cover almost every situation that the > basename heuristic would (there are a few exceptions, like getting > "file.c" from either "file2.c" or "foo/file.c", but that seems kind of > pathological). Well, now you only have to test if it makes sense: -- snipsnap -- [PATCH] diffcore-rename: replace basename_same() heuristics by Levenshtein Instead of insisting on identical basenames, try the levenshtein distance. Basically, if there are multiple rename source candidates, take the one with the smallest Levenshtein distance. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> --- The dangerous thing is that the score can get negative now. Makefile | 4 ++-- diffcore-rename.c | 42 +++++++++++++++--------------------------- levenshtein.c | 39 +++++++++++++++++++++++++++++++++++++++ levenshtein.h | 6 ++++++ 4 files changed, 62 insertions(+), 29 deletions(-) create mode 100644 levenshtein.c create mode 100644 levenshtein.h diff --git a/Makefile b/Makefile index 74b69fb..e015833 100644 --- a/Makefile +++ b/Makefile @@ -303,12 +303,12 @@ LIB_H = \ run-command.h strbuf.h tag.h tree.h git-compat-util.h revision.h \ tree-walk.h log-tree.h dir.h path-list.h unpack-trees.h builtin.h \ utf8.h reflog-walk.h patch-ids.h attr.h decorate.h progress.h \ - mailmap.h remote.h + mailmap.h remote.h levenshtein.h DIFF_OBJS = \ diff.o diff-lib.o diffcore-break.o diffcore-order.o \ diffcore-pickaxe.o diffcore-rename.o tree-diff.o combine-diff.o \ - diffcore-delta.o log-tree.o + diffcore-delta.o log-tree.o levenshtein.o LIB_OBJS = \ blob.o commit.o connect.o csum-file.o cache-tree.o base85.o \ diff --git a/diffcore-rename.c b/diffcore-rename.c index 79c984c..41448c9 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -4,6 +4,7 @@ #include "cache.h" #include "diff.h" #include "diffcore.h" +#include "levenshtein.h" /* Table of rename/copy destinations */ @@ -119,21 +120,6 @@ static int is_exact_match(struct diff_filespec *src, return 0; } -static int basename_same(struct diff_filespec *src, struct diff_filespec *dst) -{ - int src_len = strlen(src->path), dst_len = strlen(dst->path); - while (src_len && dst_len) { - char c1 = src->path[--src_len]; - char c2 = dst->path[--dst_len]; - if (c1 != c2) - return 0; - if (c1 == '/') - return 1; - } - return (!src_len || src->path[src_len - 1] == '/') && - (!dst_len || dst->path[dst_len - 1] == '/'); -} - struct diff_score { int src; /* index in rename_src */ int dst; /* index in rename_dst */ @@ -201,11 +187,9 @@ static int estimate_similarity(struct diff_filespec *src, */ if (!dst->size) score = 0; /* should not happen */ - else { - score = (int)(src_copied * MAX_SCORE / max_size); - if (basename_same(src, dst)) - score++; - } + else + score = (int)(src_copied * MAX_SCORE / max_size) + - levenshtein(src->path, dst->path); return score; } @@ -313,20 +297,24 @@ void diffcore_rename(struct diff_options *options) if (rename_dst[i].pair) continue; /* dealt with an earlier round */ for (j = 0; j < rename_src_nr; j++) { - int k; + int k, distance; struct diff_filespec *one = rename_src[j].one; if (!is_exact_match(one, two, contents_too)) continue; + distance = levenshtein(one->path, two->path); /* see if there is a basename match, too */ for (k = j; k < rename_src_nr; k++) { + int d2; one = rename_src[k].one; - if (basename_same(one, two) && - is_exact_match(one, two, - contents_too)) { - j = k; - break; - } + if (!is_exact_match(one, two, + contents_too)) + continue; + d2 = levenshtein(one->path, two->path); + if (d2 > distance) + continue; + distance = d2; + j = k; } record_rename_pair(i, j, (int)MAX_SCORE); diff --git a/levenshtein.c b/levenshtein.c new file mode 100644 index 0000000..80ef860 --- /dev/null +++ b/levenshtein.c @@ -0,0 +1,39 @@ +#include "cache.h" +#include "levenshtein.h" + +int levenshtein(const char *string1, const char *string2) +{ + int len1 = strlen(string1), len2 = strlen(string2); + int *row1 = xmalloc(sizeof(int) * (len2 + 1)); + int *row2 = xmalloc(sizeof(int) * (len2 + 1)); + int i, j; + + for (j = 1; j <= len2; j++) + row1[j] = j; + for (i = 0; i < len1; i++) { + int *dummy; + + row2[0] = i + 1; + for (j = 0; j < len2; j++) { + /* substitution */ + row2[j + 1] = row1[j] + (string1[i] != string2[j]); + /* insertion */ + if (row2[j + 1] > row1[j + 1] + 1) + row2[j + 1] = row1[j + 1] + 1; + /* deletion */ + if (row2[j + 1] > row2[j] + 1) + row2[j + 1] = row2[j] + 1; + } + + dummy = row1; + row1 = row2; + row2 = dummy; + } + + i = row1[len2]; + free(row1); + free(row2); + + return i; +} + diff --git a/levenshtein.h b/levenshtein.h new file mode 100644 index 0000000..74a6626 --- /dev/null +++ b/levenshtein.h @@ -0,0 +1,6 @@ +#ifndef LEVENSHTEIN_H +#define LEVENSHTEIN_H + +int levenshtein(const char *string1, const char *string2); + +#endif -- 1.5.2.2.2822.g027a6-dirty ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-22 1:14 ` Johannes Schindelin @ 2007-06-22 5:41 ` Jeff King 2007-06-22 10:22 ` Johannes Schindelin 2007-06-22 7:17 ` Johannes Sixt 1 sibling, 1 reply; 44+ messages in thread From: Jeff King @ 2007-06-22 5:41 UTC (permalink / raw) To: Johannes Schindelin Cc: Linus Torvalds, Shawn O. Pearce, git, govindsalinas, gitster On Fri, Jun 22, 2007 at 02:14:43AM +0100, Johannes Schindelin wrote: > @@ -313,20 +297,24 @@ void diffcore_rename(struct diff_options *options) > if (rename_dst[i].pair) > continue; /* dealt with an earlier round */ > for (j = 0; j < rename_src_nr; j++) { > - int k; > + int k, distance; > struct diff_filespec *one = rename_src[j].one; > if (!is_exact_match(one, two, contents_too)) > continue; > > + distance = levenshtein(one->path, two->path); > /* see if there is a basename match, too */ > for (k = j; k < rename_src_nr; k++) { This loop can start at k = j+1, since otherwise we are just checking rename_src[j] against itself. > +int levenshtein(const char *string1, const char *string2) > +{ > + int len1 = strlen(string1), len2 = strlen(string2); > + int *row1 = xmalloc(sizeof(int) * (len2 + 1)); > + int *row2 = xmalloc(sizeof(int) * (len2 + 1)); > + int i, j; > + > + for (j = 1; j <= len2; j++) > + row1[j] = j; This loop must start at j=0, not j=1; otherwise you have an undefined value in row1[0], which gets read when setting row2[1], and you get a totally meaningless distance (I got -1209667248 on my test case!). -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-22 5:41 ` Jeff King @ 2007-06-22 10:22 ` Johannes Schindelin 0 siblings, 0 replies; 44+ messages in thread From: Johannes Schindelin @ 2007-06-22 10:22 UTC (permalink / raw) To: Jeff King; +Cc: Linus Torvalds, Shawn O. Pearce, git, govindsalinas, gitster Hi, On Fri, 22 Jun 2007, Jeff King wrote: > On Fri, Jun 22, 2007 at 02:14:43AM +0100, Johannes Schindelin wrote: > > > @@ -313,20 +297,24 @@ void diffcore_rename(struct diff_options *options) > > if (rename_dst[i].pair) > > continue; /* dealt with an earlier round */ > > for (j = 0; j < rename_src_nr; j++) { > > - int k; > > + int k, distance; > > struct diff_filespec *one = rename_src[j].one; > > if (!is_exact_match(one, two, contents_too)) > > continue; > > > > + distance = levenshtein(one->path, two->path); > > /* see if there is a basename match, too */ > > for (k = j; k < rename_src_nr; k++) { > > This loop can start at k = j+1, since otherwise we are just checking > rename_src[j] against itself. Right. > > +int levenshtein(const char *string1, const char *string2) > > +{ > > + int len1 = strlen(string1), len2 = strlen(string2); > > + int *row1 = xmalloc(sizeof(int) * (len2 + 1)); > > + int *row2 = xmalloc(sizeof(int) * (len2 + 1)); > > + int i, j; > > + > > + for (j = 1; j <= len2; j++) > > + row1[j] = j; > > This loop must start at j=0, not j=1; otherwise you have an undefined > value in row1[0], which gets read when setting row2[1], and you get > a totally meaningless distance (I got -1209667248 on my test case!). Sorry for that. I originally had an xcalloc in there, and did not look at that loop afterwards. And I completely forgot that on my laptop (on which I did this patch), I had forgotten to add ALL_CFLAGS += -DXMALLOC_POISON=1 to config.mak. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-22 1:14 ` Johannes Schindelin 2007-06-22 5:41 ` Jeff King @ 2007-06-22 7:17 ` Johannes Sixt 2007-06-22 10:39 ` Johannes Schindelin 1 sibling, 1 reply; 44+ messages in thread From: Johannes Sixt @ 2007-06-22 7:17 UTC (permalink / raw) To: git Johannes Schindelin wrote: > The dangerous thing is that the score can get negative now. > ... > + score = (int)(src_copied * MAX_SCORE / max_size) > + - levenshtein(src->path, dst->path); Does that also mean that you can't ever have a rename with a score of 100%? (I haven't studied the algorithms and assume that levenshtein(a,b) == 0 only if a==b, and that without the -levenshtein(...) the score can grow to 100%.) -- Hannes ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-22 7:17 ` Johannes Sixt @ 2007-06-22 10:39 ` Johannes Schindelin 2007-06-22 10:52 ` 100% (was: [PATCH] diffcore-rename: favour identical basenames) David Kastrup 0 siblings, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2007-06-22 10:39 UTC (permalink / raw) To: Johannes Sixt; +Cc: git Hi, On Fri, 22 Jun 2007, Johannes Sixt wrote: > Johannes Schindelin wrote: > > The dangerous thing is that the score can get negative now. > > ... > > + score = (int)(src_copied * MAX_SCORE / max_size) > > + - levenshtein(src->path, dst->path); > > Does that also mean that you can't ever have a rename with a score of > 100%? > > (I haven't studied the algorithms and assume that levenshtein(a,b) == 0 > only if a==b, and that without the -levenshtein(...) the score can grow > to 100%.) There is a different code path for identical contents. So yes, you can still hit 100%, but it is now much, much harder to hit a score close to 100% [*1*]. The obviously correct way to do this is to have a subscore, and use it _strictly_ only when the score is identical. I see two ways to do this properly: - introduce a name_distance struct member, just below the score. This means that estimate_similarity has to "return" two values instead of one, and score_compare gets a bit more complex, too. Or - change the score to unsigned long, and shift the score to higher bits, adding a constant minus the Levenshtein distance. It is safe to assume that the filenames are shorter than 16384 bytes (PATH_MAX is actually much smaller than that), and even if two filenames of that length are completely different, the distance can not be larger than twice that number, i.e. 16384 deletions + 16384 insertions. Therefore, you could pick 32768 as that constant. However, I find both solutions ugly. Besides, I am not interested in the feature myself, only the implementation of Levenshtein was interesting, and I thought I just post the code here. So I did only the minimal stuff on top of the interesting one to make it sort of work. If somebody wants to pick up the ball, be my guest, because I am out of that game. Ciao, Dscho Footnote: *1* Actually, it is not _that_ bad. The score is not a value between 0 and 100, IOW it is _not_ what you see in the output of "diff -M". It is an unsigned short between 0 and MAX_SCORE, which is defined in diffcore.h as 60000.0. The Levenshtein distance between two filenames cannot be larger than the sum of their lengths, so it should be relatively safe. That is, if you don't have such insanely long paths as e.g. egit. But even there, the paths share most of their directories, and therefore the distances should be much, much smaller in real life. ^ permalink raw reply [flat|nested] 44+ messages in thread
* 100% (was: [PATCH] diffcore-rename: favour identical basenames) 2007-06-22 10:39 ` Johannes Schindelin @ 2007-06-22 10:52 ` David Kastrup 2007-06-22 12:49 ` Johannes Schindelin 0 siblings, 1 reply; 44+ messages in thread From: David Kastrup @ 2007-06-22 10:52 UTC (permalink / raw) To: git > Footnote: > > *1* Actually, it is not _that_ bad. The score is not a value between 0 and > 100, IOW it is _not_ what you see in the output of "diff -M". It is an > unsigned short between 0 and MAX_SCORE, which is defined in > diffcore.h as 60000.0. > > The Levenshtein distance between two filenames cannot be larger than > the sum of their lengths, so it should be relatively safe. That is, if > you don't have such insanely long paths as e.g. egit. But even there, > the paths share most of their directories, and therefore the distances > should be much, much smaller in real life. As a note aside: would it be possible to always round downwards when computing similarities or converting between them? I very much would like to see the 100% figure reserved for identity. This is particularly relevant when interpreting the output of git-diff --name-status with regard to R100, C100 and similar flags. -- David Kastrup ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 100% (was: [PATCH] diffcore-rename: favour identical basenames) 2007-06-22 10:52 ` 100% (was: [PATCH] diffcore-rename: favour identical basenames) David Kastrup @ 2007-06-22 12:49 ` Johannes Schindelin [not found] ` <86abusi1fw.fsf@lola.quinscape.zz> 0 siblings, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2007-06-22 12:49 UTC (permalink / raw) To: David Kastrup; +Cc: git Hi, On Fri, 22 Jun 2007, David Kastrup wrote: > As a note aside: would it be possible to always round downwards when > computing similarities or converting between them? I'd rather not. This would be counterintuitive. People expect rounded values. > I very much would like to see the 100% figure reserved for identity. > This is particularly relevant when interpreting the output of git-diff > --name-status with regard to R100, C100 and similar flags. You should never depend on the output of --name-status if you're interested in identifying identical files, but on the object names. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <86abusi1fw.fsf@lola.quinscape.zz>]
* Re: 100% [not found] ` <86abusi1fw.fsf@lola.quinscape.zz> @ 2007-06-23 1:31 ` Johannes Schindelin 2007-06-23 10:18 ` 100% René Scharfe 0 siblings, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2007-06-23 1:31 UTC (permalink / raw) To: David Kastrup; +Cc: git Hi, On Fri, 22 Jun 2007, David Kastrup wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > On Fri, 22 Jun 2007, David Kastrup wrote: > > > >> As a note aside: would it be possible to always round downwards when > >> computing similarities or converting between them? > > > > I'd rather not. This would be counterintuitive. People expect rounded > > values. > > Which people? Me, for one. Thank you very much. > The people I know will expect "100% identical" or even "100.0% > identical" to mean identical, period. They will be quite surprised to > hear that "99.95%" is supposed to be included. Granted, 100.0% means as close as you can get to "completely" with 4 digits. But if you have an integer, you better use the complete range, rather than arbitrarily make one number more important than others. For if you see an integer, you usually assume a rounded value. If you don't, you're hopeless. > Also, for any kind of decision made upon percentages, it is much more > relevant to be able to draw a line at 50% rather than at 49.5%. I do see too many people in my day job who take the numbers they see for absolute truths, so I cannot take that statement seriously, sorry. > Could you name a _single_ use case where rounding down could cause an > actual problem or even inconvenience for people? Could you name a _single_ use case where it does not? I mean, honestly, really. Really, really, really. A number is only a weak _indicator_, and an integer even more so, for what is _really_ going on. > >> I very much would like to see the 100% figure reserved for identity. > >> This is particularly relevant when interpreting the output of > >> git-diff --name-status with regard to R100, C100 and similar flags. > > > > You should never depend on the output of --name-status if you're > > interested in identifying identical files, but on the object names. > > Which is rather inconvenient. Frankly, I am getting bored. This argument crops up ever so often. "If you did that, _I_ could be more lazy, and the _hell_ with other people who expect otherwise!". No, really. > I _know_ that one can't rely on the output of --name-status right now. And I _know_ that you can't rely on integer numbers. Or _any_ number which is not _completely_ precise. Really, I am getting bored with this discussion. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 100% 2007-06-23 1:31 ` 100% Johannes Schindelin @ 2007-06-23 10:18 ` René Scharfe 2007-06-23 10:56 ` 100% Johannes Schindelin 0 siblings, 1 reply; 44+ messages in thread From: René Scharfe @ 2007-06-23 10:18 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git Johannes Schindelin schrieb: > On Fri, 22 Jun 2007, David Kastrup wrote: >> The people I know will expect "100% identical" or even "100.0% >> identical" to mean identical, period. They will be quite surprised to >> hear that "99.95%" is supposed to be included. > > Granted, 100.0% means as close as you can get to "completely" with 4 > digits. But if you have an integer, you better use the complete range, > rather than arbitrarily make one number more important than others. > > For if you see an integer, you usually assume a rounded value. If you > don't, you're hopeless. Why hopeless? It's a useful convention to define "100%" as "complete (not rounded)". See it this way: 50% of the time, a given percent value will be shown as one point less than it's "true" value, but you gain the ability to indicate full completeness. And that's an interesting piece of information. The price is small given that the needed accuracy is more in the range of 10 percent points (I assume). It's more a question of how to make sure everybody knows what the numbers mean -- but that's why we have a directory named "Documentation". :-D And even a person that hasn't read the docs is unlikely to really get harmed by inexact percentages, right? René ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 100% 2007-06-23 10:18 ` 100% René Scharfe @ 2007-06-23 10:56 ` Johannes Schindelin 2007-06-23 11:41 ` 100% René Scharfe 2007-06-23 19:33 ` 100% Junio C Hamano 0 siblings, 2 replies; 44+ messages in thread From: Johannes Schindelin @ 2007-06-23 10:56 UTC (permalink / raw) To: René Scharfe; +Cc: David Kastrup, git [-- Attachment #1: Type: TEXT/PLAIN, Size: 975 bytes --] Hi, On Sat, 23 Jun 2007, René Scharfe wrote: > Johannes Schindelin schrieb: > > On Fri, 22 Jun 2007, David Kastrup wrote: > >> The people I know will expect "100% identical" or even "100.0% > >> identical" to mean identical, period. They will be quite surprised to > >> hear that "99.95%" is supposed to be included. > > > > Granted, 100.0% means as close as you can get to "completely" with 4 > > digits. But if you have an integer, you better use the complete range, > > rather than arbitrarily make one number more important than others. > > > > For if you see an integer, you usually assume a rounded value. If you > > don't, you're hopeless. > > Why hopeless? It's a useful convention to define "100%" as "complete > (not rounded)". By the same reasoning, you could say "never round down to 0%, because I want to know when there is no similarity". You cannot be exact when you have to cut off fractions, so why try for _exactly_ one number? Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 100% 2007-06-23 10:56 ` 100% Johannes Schindelin @ 2007-06-23 11:41 ` René Scharfe 2007-06-23 12:00 ` 100% Johannes Schindelin 2007-06-23 19:33 ` 100% Junio C Hamano 1 sibling, 1 reply; 44+ messages in thread From: René Scharfe @ 2007-06-23 11:41 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git Johannes Schindelin schrieb: > Hi, > > On Sat, 23 Jun 2007, René Scharfe wrote: > >> Johannes Schindelin schrieb: >>> On Fri, 22 Jun 2007, David Kastrup wrote: >>>> The people I know will expect "100% identical" or even "100.0% >>>> identical" to mean identical, period. They will be quite surprised to >>>> hear that "99.95%" is supposed to be included. >>> Granted, 100.0% means as close as you can get to "completely" with 4 >>> digits. But if you have an integer, you better use the complete range, >>> rather than arbitrarily make one number more important than others. >>> >>> For if you see an integer, you usually assume a rounded value. If you >>> don't, you're hopeless. >> Why hopeless? It's a useful convention to define "100%" as "complete >> (not rounded)". > > By the same reasoning, you could say "never round down to 0%, because I > want to know when there is no similarity". > > You cannot be exact when you have to cut off fractions, so why try for > _exactly_ one number? Because completeness is special. If just one bit was available, I'd use it to indicate equality. That's what the authors of cmp(1) did, too. :) And 0% is not special, at least not in a useful way that I can think of. I.e. there is no practical difference between "no two lines match" and "one percent of the lines match". If you're really interested in similarities with an index below 10% then you'd better work with absolute numbers instead of rounded percentages. If someone came around with an interest in those cases with exactly 0% similarity, then we might need to decide between rounding up or down. But even in that hypothetical situation I think "equality" is still more interesting a data point than "really everything differs". René ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 100% 2007-06-23 11:41 ` 100% René Scharfe @ 2007-06-23 12:00 ` Johannes Schindelin 2007-06-23 12:11 ` 100% René Scharfe 0 siblings, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2007-06-23 12:00 UTC (permalink / raw) To: René Scharfe; +Cc: David Kastrup, git [-- Attachment #1: Type: TEXT/PLAIN, Size: 464 bytes --] Hi, On Sat, 23 Jun 2007, René Scharfe wrote: > Johannes Schindelin schrieb: > > > By the same reasoning, you could say "never round down to 0%, because > > I want to know when there is no similarity". > > > > You cannot be exact when you have to cut off fractions, so why try for > > _exactly_ one number? > > Because completeness is special. I am not convinced. My vote is still for the _common_ practice of just rounding. IOW keep it as is. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 100% 2007-06-23 12:00 ` 100% Johannes Schindelin @ 2007-06-23 12:11 ` René Scharfe 2007-06-23 12:21 ` 100% Johannes Schindelin 0 siblings, 1 reply; 44+ messages in thread From: René Scharfe @ 2007-06-23 12:11 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git Johannes Schindelin schrieb: > Hi, > > On Sat, 23 Jun 2007, René Scharfe wrote: > >> Johannes Schindelin schrieb: >> >>> By the same reasoning, you could say "never round down to 0%, because >>> I want to know when there is no similarity". >>> >>> You cannot be exact when you have to cut off fractions, so why try for >>> _exactly_ one number? >> Because completeness is special. > > I am not convinced. My vote is still for the _common_ practice of just > rounding. IOW keep it as is. As I already hinted at, the common result of comparing two files, as done by e.g. cmp(1), is one bit that indicates equality. This information is lost when using up/down rounding, but it is retained when rounding down. It's _not_ common to be unable to determine equality from the result of a file compare. René ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 100% 2007-06-23 12:11 ` 100% René Scharfe @ 2007-06-23 12:21 ` Johannes Schindelin 2007-06-24 22:23 ` 100% René Scharfe 0 siblings, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2007-06-23 12:21 UTC (permalink / raw) To: René Scharfe; +Cc: David Kastrup, git [-- Attachment #1: Type: TEXT/PLAIN, Size: 922 bytes --] Hi, On Sat, 23 Jun 2007, René Scharfe wrote: > As I already hinted at, the common result of comparing two files, as > done by e.g. cmp(1), is one bit that indicates equality. This > information is lost when using up/down rounding, but it is retained when > rounding down. It's _not_ common to be unable to determine equality > from the result of a file compare. And as _I_ already hinted, this does not matter. The whole purpose to have a number here instead of a bit is to have a larger range. In practice, I bet that the 100% are really uninteresting. At least here, they are. For example, if you move a Java class from one package into another, you have to change the package name in the file. Guess what, I am perfectly okay if the rename detector says "100% similarity" here. Because if it is closer to 100% than to 99%, dammit, I want to see 100%, not 99%. Nuff said about this subject. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 100% 2007-06-23 12:21 ` 100% Johannes Schindelin @ 2007-06-24 22:23 ` René Scharfe 0 siblings, 0 replies; 44+ messages in thread From: René Scharfe @ 2007-06-24 22:23 UTC (permalink / raw) To: Johannes Schindelin; +Cc: David Kastrup, git Johannes Schindelin schrieb: > Hi, > > On Sat, 23 Jun 2007, René Scharfe wrote: > >> As I already hinted at, the common result of comparing two files, as >> done by e.g. cmp(1), is one bit that indicates equality. This >> information is lost when using up/down rounding, but it is retained when >> rounding down. It's _not_ common to be unable to determine equality >> from the result of a file compare. > > And as _I_ already hinted, this does not matter. The whole purpose to have > a number here instead of a bit is to have a larger range. In practice, I > bet that the 100% are really uninteresting. At least here, they are. You would lose your bet since both David and me expressed interest in that pure 100% thing. Rounding down instead of up/down doesn't affect the size of neither the input nor the output range. It affects the boundary of the input range, (-0.499 .. 100.499 versus 0.000 .. 100.999), but I can't find a problem with that. > For example, if you move a Java class from one package into another, you > have to change the package name in the file. Guess what, I am perfectly > okay if the rename detector says "100% similarity" here. Because if it is > closer to 100% than to 99%, dammit, I want to see 100%, not 99%. That uses a side effect of rounding and won't work for small files. And of course (if the file is large enough) there could be other changes "hidden" in a similarity index value of 100% that was rounded up. > Nuff said about this subject. Yes, let's advance this topic to the coding stage. René ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 100% 2007-06-23 10:56 ` 100% Johannes Schindelin 2007-06-23 11:41 ` 100% René Scharfe @ 2007-06-23 19:33 ` Junio C Hamano 2007-06-23 20:41 ` 100% Johannes Schindelin 1 sibling, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2007-06-23 19:33 UTC (permalink / raw) To: Johannes Schindelin; +Cc: René Scharfe, David Kastrup, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > By the same reasoning, you could say "never round down to 0%, because I > want to know when there is no similarity". > > You cannot be exact when you have to cut off fractions, so why try for > _exactly_ one number? R0 or C0 would not happen in real life, so 0% is a moot issue. However, wasn't that you who did follow that "certain numbers are special" logic in diffstat? You advocated "diff --stat" should draw at least one +/- for a patch that adds/removes lines. And I (and others) agreed because zero is special in the context of that application. I think reserving R100 to mean "identical byte sequences" has value, when people look at --name-status output, in the context of "similarity index". ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: 100% 2007-06-23 19:33 ` 100% Junio C Hamano @ 2007-06-23 20:41 ` Johannes Schindelin 0 siblings, 0 replies; 44+ messages in thread From: Johannes Schindelin @ 2007-06-23 20:41 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, David Kastrup, git Hi, On Sat, 23 Jun 2007, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > By the same reasoning, you could say "never round down to 0%, because I > > want to know when there is no similarity". > > > > You cannot be exact when you have to cut off fractions, so why try for > > _exactly_ one number? > > R0 or C0 would not happen in real life, so 0% is a moot issue. It is, but not when you look at the formula. > However, wasn't that you who did follow that "certain numbers > are special" logic in diffstat? > > You advocated "diff --stat" should draw at least one +/- for a > patch that adds/removes lines. And I (and others) agreed > because zero is special in the context of that application. Actually, it was not me, but I implemented the version that we have now. I was reasonably scared that a non-linear diffstat would end up in git, therefore I wrote a linear one. The important thing to not here is that the diffstat as-is makes _better_ use of the limited scale that is available. And as you pointed out, the low end of the scale is not really interesting. The interesting parts are those around 100%. By rounding down you make less use of the available scale. > I think reserving R100 to mean "identical byte sequences" has value, > when people look at --name-status output, in the context of "similarity > index". Ah, whatever. You do what you want. Yes, this interpretation has value. No, it is not the only one that has value. I am much more used to rounding, since at the end of the day it makes better use of the scale, it is commonly used, and therefore _I_ expect it (and no, I will not read the documentation when I expect to know what it means). But hey, I don't care any more. AFAIAC you can change it from rounding to rounding down, and next year to rounding-up. We could even have an algorithm which rounds down only in odd years, and I still would not care. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH] diffcore-rename: favour identical basenames 2007-06-21 11:52 ` [PATCH] diffcore-rename: favour identical basenames Johannes Schindelin 2007-06-21 13:19 ` Jeff King @ 2007-06-23 5:44 ` Junio C Hamano 1 sibling, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2007-06-23 5:44 UTC (permalink / raw) To: Johannes Schindelin Cc: Linus Torvalds, Shawn O. Pearce, git, govindsalinas, gitster Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > When there are several candidates for a rename source, and one of them > has an identical basename to the rename target, take that one. > > Noticed by Govind Salinas, posted by Shawn O. Pearce, partial patch > by Linus Torvalds. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > --- > > On Wed, 20 Jun 2007, Linus Torvalds wrote: > > > I think we should just consider the basename as an "added > > similarity bonus". Thanks, I obviously agree with both of you. ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2007-06-24 22:23 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-06-21 3:06 Basename matching during rename/copy detection Shawn O. Pearce 2007-06-21 3:13 ` Junio C Hamano 2007-06-21 8:00 ` Andy Parkins 2007-06-21 8:07 ` Junio C Hamano 2007-06-21 9:50 ` Andy Parkins 2007-06-21 11:52 ` Johannes Schindelin 2007-06-21 12:44 ` Andy Parkins 2007-06-21 12:53 ` Matthieu Moy 2007-06-21 13:10 ` Jeff King 2007-06-21 13:18 ` Johannes Schindelin 2007-06-21 13:25 ` Matthieu Moy 2007-06-21 13:52 ` Johannes Schindelin 2007-06-21 15:37 ` Steven Grimm 2007-06-21 15:53 ` Johannes Schindelin 2007-06-21 16:57 ` Steven Grimm 2007-06-21 13:22 ` Johannes Schindelin 2007-06-21 3:42 ` Linus Torvalds 2007-06-21 11:52 ` [PATCH] diffcore-rename: favour identical basenames Johannes Schindelin 2007-06-21 13:19 ` Jeff King 2007-06-21 14:03 ` Johannes Schindelin 2007-06-21 16:20 ` Linus Torvalds 2007-06-21 17:52 ` Junio C Hamano 2007-06-21 18:24 ` Linus Torvalds 2007-06-22 15:19 ` Andy Parkins 2007-06-22 15:28 ` Johannes Schindelin 2007-06-22 17:51 ` Aidan Van Dyk 2007-06-22 1:14 ` Johannes Schindelin 2007-06-22 5:41 ` Jeff King 2007-06-22 10:22 ` Johannes Schindelin 2007-06-22 7:17 ` Johannes Sixt 2007-06-22 10:39 ` Johannes Schindelin 2007-06-22 10:52 ` 100% (was: [PATCH] diffcore-rename: favour identical basenames) David Kastrup 2007-06-22 12:49 ` Johannes Schindelin [not found] ` <86abusi1fw.fsf@lola.quinscape.zz> 2007-06-23 1:31 ` 100% Johannes Schindelin 2007-06-23 10:18 ` 100% René Scharfe 2007-06-23 10:56 ` 100% Johannes Schindelin 2007-06-23 11:41 ` 100% René Scharfe 2007-06-23 12:00 ` 100% Johannes Schindelin 2007-06-23 12:11 ` 100% René Scharfe 2007-06-23 12:21 ` 100% Johannes Schindelin 2007-06-24 22:23 ` 100% René Scharfe 2007-06-23 19:33 ` 100% Junio C Hamano 2007-06-23 20:41 ` 100% Johannes Schindelin 2007-06-23 5:44 ` [PATCH] diffcore-rename: favour identical basenames Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).