* Pure renames/copies
@ 2005-11-21 12:01 Santi Béjar
2005-11-21 18:37 ` Linus Torvalds
0 siblings, 1 reply; 11+ messages in thread
From: Santi Béjar @ 2005-11-21 12:01 UTC (permalink / raw)
To: Git Mailing List
Hello:
Is there any way to ask git to find pure renames or copies?
I ask this because it is a much cheaper operation than the -C
and -M do (-M100 does not work) and can be used when the number
of paths if big, or when you track binary files.
Thanks
Santi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies
2005-11-21 12:01 Pure renames/copies Santi Béjar
@ 2005-11-21 18:37 ` Linus Torvalds
2005-11-21 19:31 ` Junio C Hamano
2005-11-21 19:50 ` Junio C Hamano
0 siblings, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2005-11-21 18:37 UTC (permalink / raw)
To: Santi Béjar; +Cc: Git Mailing List
[-- Attachment #1: Type: TEXT/PLAIN, Size: 3291 bytes --]
On Mon, 21 Nov 2005, Santi Béjar wrote:
>
> Is there any way to ask git to find pure renames or copies?
Not directly, but git sure makes it easy for you.
Do this:
git-diff-tree -r old..new |
grep '^:[^<tab>]*0000000000000000000000000000000000000000'
and you'll get all the information (that "<tab>" is obviously the tab
character) you need to efficiently do it (or, if you want to just do one
commit, just do "git-diff-tree -r cmit").
In the git tree, commit 0086e2c854e3af3209915e4ec2f933bcef400050 can act
as a good example of this: the output of
git-diff-tree -r 0086e2c854e3af3209915e4ec2f933bcef400050
is
0086e2c854e3af3209915e4ec2f933bcef400050
:100644 100644 328b399f9fe6e2b668691ab359319f50561cd773 16a8af63f0523cec82faa23f29cee579ac224e82 M .gitignore
:100644 000000 a8cc5739d7851da3aeca2388d74eb92c464f1732 0000000000000000000000000000000000000000 D Documentation/git-lost+found.txt
:000000 100644 0000000000000000000000000000000000000000 03156f218bb41b955779207ec2e94120f958fc45 A Documentation/git-lost-found.txt
:100644 100644 a9d47c115c071694321d076af8a73a06ddd46875 1c32dd5be7156ae0e1142523fe50d84745964793 M Documentation/git.txt
:100644 100644 b75cb137875b3cdb8746d2e0135e6f2743e2046a 5b2eca897386e17021d2a8a052b0c2759df96447 M Makefile
:100755 000000 3892f52005d1e36676681806a87ef35dc0689f22 0000000000000000000000000000000000000000 D git-lost+found.sh
:000000 100755 0000000000000000000000000000000000000000 3892f52005d1e36676681806a87ef35dc0689f22 A git-lost-found.sh
and then after the "grep", you have just
:100644 000000 a8cc5739d7851da3aeca2388d74eb92c464f1732 0000000000000000000000000000000000000000 D Documentation/git-lost+found.txt
:000000 100644 0000000000000000000000000000000000000000 03156f218bb41b955779207ec2e94120f958fc45 A Documentation/git-lost-found.txt
:100755 000000 3892f52005d1e36676681806a87ef35dc0689f22 0000000000000000000000000000000000000000 D git-lost+found.sh
:000000 100755 0000000000000000000000000000000000000000 3892f52005d1e36676681806a87ef35dc0689f22 A git-lost-found.sh
left, which shows you the new and the deleted files.
Then, look for renames: just match up a new file that has the same SHA1 as
a deleted file, and you can see that the change from "git-lost+found.sh"
to "git-lost-found.sh" was exactly such an exact rename, because they
share the 3892f52005d1e36676681806a87ef35dc0689f22 SHA1.
After rename detection, look at any remaining new files (in the above
example, only
:000000 100644 0000000000000000000000000000000000000000 03156f218bb41b955779207ec2e94120f958fc45 A Documentation/git-lost-found.txt
would be left), and try to match up the SHA1 of that file with the result
of "git-ls-tree -r $old", ie something like
git-ls-tree -r 0086e2c854e3af3209915e4ec2f933bcef400050^ |
grep 03156f218bb41b955779207ec2e94120f958fc45
which in this case is empty (that new file wasn't an exact copy of any old
file, it was a rename+edit, of course).
Very efficient, very simple, you can do it either with a small
shell-script (using cut + sort + join + grep), or write a specialized tool
around the git-diff-tree logic.
Of course, arguably "-M100" should really do this optimization for you.
Junio?
Linus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies
2005-11-21 18:37 ` Linus Torvalds
@ 2005-11-21 19:31 ` Junio C Hamano
2005-11-21 19:50 ` Junio C Hamano
1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2005-11-21 19:31 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> Of course, arguably "-M100" should really do this optimization for you.
> Junio?
I'd agree. That is what -M100 should mean.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies
2005-11-21 18:37 ` Linus Torvalds
2005-11-21 19:31 ` Junio C Hamano
@ 2005-11-21 19:50 ` Junio C Hamano
2005-11-21 21:01 ` H. Peter Anvin
2005-11-22 9:03 ` Santi Bejar
1 sibling, 2 replies; 11+ messages in thread
From: Junio C Hamano @ 2005-11-21 19:50 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> Of course, arguably "-M100" should really do this optimization for you.
> Junio?
Probably something like this would suffice.
-- >8 --
Subject: rename detection with -M100 means "exact renames only".
When the user is interested in pure renames, there is no point
doing the similarity scores. This changes the score argument
parsing to special case -M100 (otherwise, it is a precision
scaled value 0 <= v < 1 and would mean 0.1, not 1.0 --- if you
do mean 0.1, you can say -M1), and optimizes the diffcore_rename
transformation to only look at pure renames in that case.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff --git a/diff.c b/diff.c
index 0391e8c..0f839c1 100644
--- a/diff.c
+++ b/diff.c
@@ -853,6 +853,10 @@ static int parse_num(const char **cp_p)
}
*cp_p = cp;
+ /* special case: -M100 would mean 1.0 not 0.1 */
+ if (num == 100 && scale == 1000)
+ return MAX_SCORE;
+
/* user says num divided by scale and we say internally that
* is MAX_SCORE * num / scale.
*/
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6a9d95d..dba965c 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -307,6 +307,9 @@ void diffcore_rename(struct diff_options
if (rename_count == rename_dst_nr)
goto cleanup;
+ if (minimum_score == MAX_SCORE)
+ goto cleanup;
+
num_create = (rename_dst_nr - rename_count);
num_src = rename_src_nr;
mx = xmalloc(sizeof(*mx) * num_create * num_src);
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Pure renames/copies
2005-11-21 19:50 ` Junio C Hamano
@ 2005-11-21 21:01 ` H. Peter Anvin
2005-11-21 21:33 ` Junio C Hamano
2005-11-22 9:03 ` Santi Bejar
1 sibling, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2005-11-21 21:01 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Linus Torvalds, git
Junio C Hamano wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
>
>
>>Of course, arguably "-M100" should really do this optimization for you.
>>Junio?
>
>
> Probably something like this would suffice.
>
> -- >8 --
> Subject: rename detection with -M100 means "exact renames only".
>
> When the user is interested in pure renames, there is no point
> doing the similarity scores. This changes the score argument
> parsing to special case -M100 (otherwise, it is a precision
> scaled value 0 <= v < 1 and would mean 0.1, not 1.0 --- if you
> do mean 0.1, you can say -M1), and optimizes the diffcore_rename
> transformation to only look at pure renames in that case.
>
Any reason we can't make it take an actual decimal number, like -M1.0 or
-M0.345? It seems odd and annoying to invent our own notation for
floating-point numbers, especially in userspace.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies
2005-11-21 21:01 ` H. Peter Anvin
@ 2005-11-21 21:33 ` Junio C Hamano
2005-11-21 21:37 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2005-11-21 21:33 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: git
"H. Peter Anvin" <hpa@zytor.com> writes:
> Any reason we can't make it take an actual decimal number, like -M1.0 or
> -M0.345? It seems odd and annoying to invent our own notation for
> floating-point numbers, especially in userspace.
No reason we "can't". About we "don't", inertia and nothing
else. It happened around this time.
http://marc.theaimsgroup.com/?l=git&m=111654149421574
We could in addition to take 0 <= x <= 1 decimal number and that
should be a simple patch to diff.c::parse_num().
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies
2005-11-21 21:33 ` Junio C Hamano
@ 2005-11-21 21:37 ` H. Peter Anvin
2005-11-21 22:00 ` Junio C Hamano
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2005-11-21 21:37 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
>
>
>>Any reason we can't make it take an actual decimal number, like -M1.0 or
>>-M0.345? It seems odd and annoying to invent our own notation for
>>floating-point numbers, especially in userspace.
>
>
> No reason we "can't". About we "don't", inertia and nothing
> else. It happened around this time.
>
> http://marc.theaimsgroup.com/?l=git&m=111654149421574
>
> We could in addition to take 0 <= x <= 1 decimal number and that
> should be a simple patch to diff.c::parse_num().
>
Okay, in that post Linus suggests that -M without an argument should be
== 100% (1.0), thus avoiding having to mess up the meaning of -M100 as
0.100. It seems like a really odd thing to have -M100 mean something
that's completely out of line with the rest of the meaning.
-hpa
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies
2005-11-21 21:37 ` H. Peter Anvin
@ 2005-11-21 22:00 ` Junio C Hamano
2005-11-21 22:10 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2005-11-21 22:00 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: git
"H. Peter Anvin" <hpa@zytor.com> writes:
> Okay, in that post Linus suggests that -M without an argument should be
> == 100% (1.0), thus avoiding having to mess up the meaning of -M100 as
> 0.100. It seems like a really odd thing to have -M100 mean something
> that's completely out of line with the rest of the meaning.
True, but it might be too late to change that; I suspect people
expect -M to do a bit more than pure renames by now.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Pure renames/copies
2005-11-21 22:00 ` Junio C Hamano
@ 2005-11-21 22:10 ` H. Peter Anvin
2005-11-21 22:17 ` H. Peter Anvin
0 siblings, 1 reply; 11+ messages in thread
From: H. Peter Anvin @ 2005-11-21 22:10 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 641 bytes --]
Junio C Hamano wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
>
>
>>Okay, in that post Linus suggests that -M without an argument should be
>>== 100% (1.0), thus avoiding having to mess up the meaning of -M100 as
>>0.100. It seems like a really odd thing to have -M100 mean something
>>that's completely out of line with the rest of the meaning.
>
> True, but it might be too late to change that; I suspect people
> expect -M to do a bit more than pure renames by now.
>
Okay, how about the following? It lets both -M1.0 and -M100% work,
while keeping everything else compatible, and avoiding artificial
special cases.
-hpa
[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 932 bytes --]
diff --git a/diff.c b/diff.c
index 0391e8c..df62d2b 100644
--- a/diff.c
+++ b/diff.c
@@ -843,11 +843,19 @@ static int parse_num(const char **cp_p)
cnt = num = 0;
scale = 1;
- while ('0' <= (ch = *cp) && ch <= '9') {
- if (cnt++ < 5) {
- /* We simply ignore more than 5 digits precision. */
- scale *= 10;
- num = num * 10 + ch - '0';
+ for(;;) {
+ ch = *cp;
+ if ( ch == '.' ) {
+ scale = 1;
+ } else if ( ch == '%' ) {
+ scale = 100;
+ } else if ( ch >= '0' && ch <= '9' ) {
+ if ( scale < 100000 ) {
+ scale *= 10;
+ num = (num*10) + (ch-'0');
+ }
+ } else {
+ break;
}
cp++;
}
@@ -856,7 +864,7 @@ static int parse_num(const char **cp_p)
/* user says num divided by scale and we say internally that
* is MAX_SCORE * num / scale.
*/
- return (MAX_SCORE * num / scale);
+ return (num >= scale) ? MAX_SCORE : (MAX_SCORE * num / scale);
}
int diff_scoreopt_parse(const char *opt)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Pure renames/copies
2005-11-21 22:10 ` H. Peter Anvin
@ 2005-11-21 22:17 ` H. Peter Anvin
0 siblings, 0 replies; 11+ messages in thread
From: H. Peter Anvin @ 2005-11-21 22:17 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Junio C Hamano, git
[-- Attachment #1: Type: text/plain, Size: 257 bytes --]
Better variant, which handles stuff like "4.5%" and rejects
"192.168.0.1". Additionally, make sure numbers are unsigned (I'm making
them unsigned long just for the hell of it), to make sure that
artificial wraparound scenarios don't cause harm.
-hpa
[-- Attachment #2: diff --]
[-- Type: text/plain, Size: 1186 bytes --]
diff --git a/diff.c b/diff.c
index 0391e8c..ffe8a55 100644
--- a/diff.c
+++ b/diff.c
@@ -838,16 +838,29 @@ int diff_opt_parse(struct diff_options *
static int parse_num(const char **cp_p)
{
- int num, scale, ch, cnt;
+ unsigned long num, scale;
+ int ch, dot;
const char *cp = *cp_p;
- cnt = num = 0;
+ num = 0;
scale = 1;
- while ('0' <= (ch = *cp) && ch <= '9') {
- if (cnt++ < 5) {
- /* We simply ignore more than 5 digits precision. */
- scale *= 10;
- num = num * 10 + ch - '0';
+ dot = 0;
+ for(;;) {
+ ch = *cp;
+ if ( !dot && ch == '.' ) {
+ scale = 1;
+ dot = 1;
+ } else if ( ch == '%' ) {
+ scale = dot ? scale*100 : 100;
+ cp++; /* % is always at the end */
+ break;
+ } else if ( ch >= '0' && ch <= '9' ) {
+ if ( scale < 100000 ) {
+ scale *= 10;
+ num = (num*10) + (ch-'0');
+ }
+ } else {
+ break;
}
cp++;
}
@@ -856,7 +869,7 @@ static int parse_num(const char **cp_p)
/* user says num divided by scale and we say internally that
* is MAX_SCORE * num / scale.
*/
- return (MAX_SCORE * num / scale);
+ return (num >= scale) ? MAX_SCORE : (MAX_SCORE * num / scale);
}
int diff_scoreopt_parse(const char *opt)
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: Pure renames/copies
2005-11-21 19:50 ` Junio C Hamano
2005-11-21 21:01 ` H. Peter Anvin
@ 2005-11-22 9:03 ` Santi Bejar
1 sibling, 0 replies; 11+ messages in thread
From: Santi Bejar @ 2005-11-22 9:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
>
> Probably something like this would suffice.
>
Ok, thanks. Now the only issue with my broken repository (it does not
have all the blobs) is that it outputs:
error: unable to find abcde....
for all the src paths, but the result is ok.
But I can live with it.
Thanks
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2005-11-22 9:03 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-11-21 12:01 Pure renames/copies Santi Béjar
2005-11-21 18:37 ` Linus Torvalds
2005-11-21 19:31 ` Junio C Hamano
2005-11-21 19:50 ` Junio C Hamano
2005-11-21 21:01 ` H. Peter Anvin
2005-11-21 21:33 ` Junio C Hamano
2005-11-21 21:37 ` H. Peter Anvin
2005-11-21 22:00 ` Junio C Hamano
2005-11-21 22:10 ` H. Peter Anvin
2005-11-21 22:17 ` H. Peter Anvin
2005-11-22 9:03 ` Santi Bejar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).