* Pure renames/copies
@ 2005-11-21 12:01 Santi Béjar
2005-11-21 18:37 ` Linus Torvalds
0 siblings, 1 reply; 11+ messages in thread
From: Santi Béjar @ 2005-11-21 12:01 UTC (permalink / raw)
To: Git Mailing List
Hello:
Is there any way to ask git to find pure renames or copies?
I ask this because it is a much cheaper operation than the -C
and -M do (-M100 does not work) and can be used when the number
of paths if big, or when you track binary files.
Thanks
Santi
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Pure renames/copies 2005-11-21 12:01 Pure renames/copies Santi Béjar @ 2005-11-21 18:37 ` Linus Torvalds 2005-11-21 19:31 ` Junio C Hamano 2005-11-21 19:50 ` Junio C Hamano 0 siblings, 2 replies; 11+ messages in thread From: Linus Torvalds @ 2005-11-21 18:37 UTC (permalink / raw) To: Santi Béjar; +Cc: Git Mailing List [-- Attachment #1: Type: TEXT/PLAIN, Size: 3291 bytes --] On Mon, 21 Nov 2005, Santi Béjar wrote: > > Is there any way to ask git to find pure renames or copies? Not directly, but git sure makes it easy for you. Do this: git-diff-tree -r old..new | grep '^:[^<tab>]*0000000000000000000000000000000000000000' and you'll get all the information (that "<tab>" is obviously the tab character) you need to efficiently do it (or, if you want to just do one commit, just do "git-diff-tree -r cmit"). In the git tree, commit 0086e2c854e3af3209915e4ec2f933bcef400050 can act as a good example of this: the output of git-diff-tree -r 0086e2c854e3af3209915e4ec2f933bcef400050 is 0086e2c854e3af3209915e4ec2f933bcef400050 :100644 100644 328b399f9fe6e2b668691ab359319f50561cd773 16a8af63f0523cec82faa23f29cee579ac224e82 M .gitignore :100644 000000 a8cc5739d7851da3aeca2388d74eb92c464f1732 0000000000000000000000000000000000000000 D Documentation/git-lost+found.txt :000000 100644 0000000000000000000000000000000000000000 03156f218bb41b955779207ec2e94120f958fc45 A Documentation/git-lost-found.txt :100644 100644 a9d47c115c071694321d076af8a73a06ddd46875 1c32dd5be7156ae0e1142523fe50d84745964793 M Documentation/git.txt :100644 100644 b75cb137875b3cdb8746d2e0135e6f2743e2046a 5b2eca897386e17021d2a8a052b0c2759df96447 M Makefile :100755 000000 3892f52005d1e36676681806a87ef35dc0689f22 0000000000000000000000000000000000000000 D git-lost+found.sh :000000 100755 0000000000000000000000000000000000000000 3892f52005d1e36676681806a87ef35dc0689f22 A git-lost-found.sh and then after the "grep", you have just :100644 000000 a8cc5739d7851da3aeca2388d74eb92c464f1732 0000000000000000000000000000000000000000 D Documentation/git-lost+found.txt :000000 100644 0000000000000000000000000000000000000000 03156f218bb41b955779207ec2e94120f958fc45 A Documentation/git-lost-found.txt :100755 000000 3892f52005d1e36676681806a87ef35dc0689f22 0000000000000000000000000000000000000000 D git-lost+found.sh :000000 100755 0000000000000000000000000000000000000000 3892f52005d1e36676681806a87ef35dc0689f22 A git-lost-found.sh left, which shows you the new and the deleted files. Then, look for renames: just match up a new file that has the same SHA1 as a deleted file, and you can see that the change from "git-lost+found.sh" to "git-lost-found.sh" was exactly such an exact rename, because they share the 3892f52005d1e36676681806a87ef35dc0689f22 SHA1. After rename detection, look at any remaining new files (in the above example, only :000000 100644 0000000000000000000000000000000000000000 03156f218bb41b955779207ec2e94120f958fc45 A Documentation/git-lost-found.txt would be left), and try to match up the SHA1 of that file with the result of "git-ls-tree -r $old", ie something like git-ls-tree -r 0086e2c854e3af3209915e4ec2f933bcef400050^ | grep 03156f218bb41b955779207ec2e94120f958fc45 which in this case is empty (that new file wasn't an exact copy of any old file, it was a rename+edit, of course). Very efficient, very simple, you can do it either with a small shell-script (using cut + sort + join + grep), or write a specialized tool around the git-diff-tree logic. Of course, arguably "-M100" should really do this optimization for you. Junio? Linus ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies 2005-11-21 18:37 ` Linus Torvalds @ 2005-11-21 19:31 ` Junio C Hamano 2005-11-21 19:50 ` Junio C Hamano 1 sibling, 0 replies; 11+ messages in thread From: Junio C Hamano @ 2005-11-21 19:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > Of course, arguably "-M100" should really do this optimization for you. > Junio? I'd agree. That is what -M100 should mean. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies 2005-11-21 18:37 ` Linus Torvalds 2005-11-21 19:31 ` Junio C Hamano @ 2005-11-21 19:50 ` Junio C Hamano 2005-11-21 21:01 ` H. Peter Anvin 2005-11-22 9:03 ` Santi Bejar 1 sibling, 2 replies; 11+ messages in thread From: Junio C Hamano @ 2005-11-21 19:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > Of course, arguably "-M100" should really do this optimization for you. > Junio? Probably something like this would suffice. -- >8 -- Subject: rename detection with -M100 means "exact renames only". When the user is interested in pure renames, there is no point doing the similarity scores. This changes the score argument parsing to special case -M100 (otherwise, it is a precision scaled value 0 <= v < 1 and would mean 0.1, not 1.0 --- if you do mean 0.1, you can say -M1), and optimizes the diffcore_rename transformation to only look at pure renames in that case. Signed-off-by: Junio C Hamano <junkio@cox.net> --- diff --git a/diff.c b/diff.c index 0391e8c..0f839c1 100644 --- a/diff.c +++ b/diff.c @@ -853,6 +853,10 @@ static int parse_num(const char **cp_p) } *cp_p = cp; + /* special case: -M100 would mean 1.0 not 0.1 */ + if (num == 100 && scale == 1000) + return MAX_SCORE; + /* user says num divided by scale and we say internally that * is MAX_SCORE * num / scale. */ diff --git a/diffcore-rename.c b/diffcore-rename.c index 6a9d95d..dba965c 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -307,6 +307,9 @@ void diffcore_rename(struct diff_options if (rename_count == rename_dst_nr) goto cleanup; + if (minimum_score == MAX_SCORE) + goto cleanup; + num_create = (rename_dst_nr - rename_count); num_src = rename_src_nr; mx = xmalloc(sizeof(*mx) * num_create * num_src); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Pure renames/copies 2005-11-21 19:50 ` Junio C Hamano @ 2005-11-21 21:01 ` H. Peter Anvin 2005-11-21 21:33 ` Junio C Hamano 2005-11-22 9:03 ` Santi Bejar 1 sibling, 1 reply; 11+ messages in thread From: H. Peter Anvin @ 2005-11-21 21:01 UTC (permalink / raw) To: Junio C Hamano; +Cc: Linus Torvalds, git Junio C Hamano wrote: > Linus Torvalds <torvalds@osdl.org> writes: > > >>Of course, arguably "-M100" should really do this optimization for you. >>Junio? > > > Probably something like this would suffice. > > -- >8 -- > Subject: rename detection with -M100 means "exact renames only". > > When the user is interested in pure renames, there is no point > doing the similarity scores. This changes the score argument > parsing to special case -M100 (otherwise, it is a precision > scaled value 0 <= v < 1 and would mean 0.1, not 1.0 --- if you > do mean 0.1, you can say -M1), and optimizes the diffcore_rename > transformation to only look at pure renames in that case. > Any reason we can't make it take an actual decimal number, like -M1.0 or -M0.345? It seems odd and annoying to invent our own notation for floating-point numbers, especially in userspace. -hpa ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies 2005-11-21 21:01 ` H. Peter Anvin @ 2005-11-21 21:33 ` Junio C Hamano 2005-11-21 21:37 ` H. Peter Anvin 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2005-11-21 21:33 UTC (permalink / raw) To: H. Peter Anvin; +Cc: git "H. Peter Anvin" <hpa@zytor.com> writes: > Any reason we can't make it take an actual decimal number, like -M1.0 or > -M0.345? It seems odd and annoying to invent our own notation for > floating-point numbers, especially in userspace. No reason we "can't". About we "don't", inertia and nothing else. It happened around this time. http://marc.theaimsgroup.com/?l=git&m=111654149421574 We could in addition to take 0 <= x <= 1 decimal number and that should be a simple patch to diff.c::parse_num(). ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies 2005-11-21 21:33 ` Junio C Hamano @ 2005-11-21 21:37 ` H. Peter Anvin 2005-11-21 22:00 ` Junio C Hamano 0 siblings, 1 reply; 11+ messages in thread From: H. Peter Anvin @ 2005-11-21 21:37 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Junio C Hamano wrote: > "H. Peter Anvin" <hpa@zytor.com> writes: > > >>Any reason we can't make it take an actual decimal number, like -M1.0 or >>-M0.345? It seems odd and annoying to invent our own notation for >>floating-point numbers, especially in userspace. > > > No reason we "can't". About we "don't", inertia and nothing > else. It happened around this time. > > http://marc.theaimsgroup.com/?l=git&m=111654149421574 > > We could in addition to take 0 <= x <= 1 decimal number and that > should be a simple patch to diff.c::parse_num(). > Okay, in that post Linus suggests that -M without an argument should be == 100% (1.0), thus avoiding having to mess up the meaning of -M100 as 0.100. It seems like a really odd thing to have -M100 mean something that's completely out of line with the rest of the meaning. -hpa ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies 2005-11-21 21:37 ` H. Peter Anvin @ 2005-11-21 22:00 ` Junio C Hamano 2005-11-21 22:10 ` H. Peter Anvin 0 siblings, 1 reply; 11+ messages in thread From: Junio C Hamano @ 2005-11-21 22:00 UTC (permalink / raw) To: H. Peter Anvin; +Cc: git "H. Peter Anvin" <hpa@zytor.com> writes: > Okay, in that post Linus suggests that -M without an argument should be > == 100% (1.0), thus avoiding having to mess up the meaning of -M100 as > 0.100. It seems like a really odd thing to have -M100 mean something > that's completely out of line with the rest of the meaning. True, but it might be too late to change that; I suspect people expect -M to do a bit more than pure renames by now. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies 2005-11-21 22:00 ` Junio C Hamano @ 2005-11-21 22:10 ` H. Peter Anvin 2005-11-21 22:17 ` H. Peter Anvin 0 siblings, 1 reply; 11+ messages in thread From: H. Peter Anvin @ 2005-11-21 22:10 UTC (permalink / raw) To: Junio C Hamano; +Cc: git [-- Attachment #1: Type: text/plain, Size: 641 bytes --] Junio C Hamano wrote: > "H. Peter Anvin" <hpa@zytor.com> writes: > > >>Okay, in that post Linus suggests that -M without an argument should be >>== 100% (1.0), thus avoiding having to mess up the meaning of -M100 as >>0.100. It seems like a really odd thing to have -M100 mean something >>that's completely out of line with the rest of the meaning. > > True, but it might be too late to change that; I suspect people > expect -M to do a bit more than pure renames by now. > Okay, how about the following? It lets both -M1.0 and -M100% work, while keeping everything else compatible, and avoiding artificial special cases. -hpa [-- Attachment #2: diff --] [-- Type: text/plain, Size: 932 bytes --] diff --git a/diff.c b/diff.c index 0391e8c..df62d2b 100644 --- a/diff.c +++ b/diff.c @@ -843,11 +843,19 @@ static int parse_num(const char **cp_p) cnt = num = 0; scale = 1; - while ('0' <= (ch = *cp) && ch <= '9') { - if (cnt++ < 5) { - /* We simply ignore more than 5 digits precision. */ - scale *= 10; - num = num * 10 + ch - '0'; + for(;;) { + ch = *cp; + if ( ch == '.' ) { + scale = 1; + } else if ( ch == '%' ) { + scale = 100; + } else if ( ch >= '0' && ch <= '9' ) { + if ( scale < 100000 ) { + scale *= 10; + num = (num*10) + (ch-'0'); + } + } else { + break; } cp++; } @@ -856,7 +864,7 @@ static int parse_num(const char **cp_p) /* user says num divided by scale and we say internally that * is MAX_SCORE * num / scale. */ - return (MAX_SCORE * num / scale); + return (num >= scale) ? MAX_SCORE : (MAX_SCORE * num / scale); } int diff_scoreopt_parse(const char *opt) ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Pure renames/copies 2005-11-21 22:10 ` H. Peter Anvin @ 2005-11-21 22:17 ` H. Peter Anvin 0 siblings, 0 replies; 11+ messages in thread From: H. Peter Anvin @ 2005-11-21 22:17 UTC (permalink / raw) To: H. Peter Anvin; +Cc: Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 257 bytes --] Better variant, which handles stuff like "4.5%" and rejects "192.168.0.1". Additionally, make sure numbers are unsigned (I'm making them unsigned long just for the hell of it), to make sure that artificial wraparound scenarios don't cause harm. -hpa [-- Attachment #2: diff --] [-- Type: text/plain, Size: 1186 bytes --] diff --git a/diff.c b/diff.c index 0391e8c..ffe8a55 100644 --- a/diff.c +++ b/diff.c @@ -838,16 +838,29 @@ int diff_opt_parse(struct diff_options * static int parse_num(const char **cp_p) { - int num, scale, ch, cnt; + unsigned long num, scale; + int ch, dot; const char *cp = *cp_p; - cnt = num = 0; + num = 0; scale = 1; - while ('0' <= (ch = *cp) && ch <= '9') { - if (cnt++ < 5) { - /* We simply ignore more than 5 digits precision. */ - scale *= 10; - num = num * 10 + ch - '0'; + dot = 0; + for(;;) { + ch = *cp; + if ( !dot && ch == '.' ) { + scale = 1; + dot = 1; + } else if ( ch == '%' ) { + scale = dot ? scale*100 : 100; + cp++; /* % is always at the end */ + break; + } else if ( ch >= '0' && ch <= '9' ) { + if ( scale < 100000 ) { + scale *= 10; + num = (num*10) + (ch-'0'); + } + } else { + break; } cp++; } @@ -856,7 +869,7 @@ static int parse_num(const char **cp_p) /* user says num divided by scale and we say internally that * is MAX_SCORE * num / scale. */ - return (MAX_SCORE * num / scale); + return (num >= scale) ? MAX_SCORE : (MAX_SCORE * num / scale); } int diff_scoreopt_parse(const char *opt) ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Pure renames/copies 2005-11-21 19:50 ` Junio C Hamano 2005-11-21 21:01 ` H. Peter Anvin @ 2005-11-22 9:03 ` Santi Bejar 1 sibling, 0 replies; 11+ messages in thread From: Santi Bejar @ 2005-11-22 9:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: git > > Probably something like this would suffice. > Ok, thanks. Now the only issue with my broken repository (it does not have all the blobs) is that it outputs: error: unable to find abcde.... for all the src paths, but the result is ok. But I can live with it. Thanks ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-11-22 9:03 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-11-21 12:01 Pure renames/copies Santi Béjar 2005-11-21 18:37 ` Linus Torvalds 2005-11-21 19:31 ` Junio C Hamano 2005-11-21 19:50 ` Junio C Hamano 2005-11-21 21:01 ` H. Peter Anvin 2005-11-21 21:33 ` Junio C Hamano 2005-11-21 21:37 ` H. Peter Anvin 2005-11-21 22:00 ` Junio C Hamano 2005-11-21 22:10 ` H. Peter Anvin 2005-11-21 22:17 ` H. Peter Anvin 2005-11-22 9:03 ` Santi Bejar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).