* --find-copies-harder finds fewer copies/renames than -C does
@ 2011-01-05 17:46 Stefan Haller
2011-01-05 18:59 ` Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Haller @ 2011-01-05 17:46 UTC (permalink / raw)
To: git
I was surprised to find out that --find-copies-harder can, under certain
circumstances, detect fewer renames than -C does for a given commit.
After some digging I think I understand now why this is so, but I find
it extremely unintuitive, and would like to know what other people think
about it. I had expected --find-copies-harder to always detect at least
as many copies/renames as -C, and possibly more, but never less.
The case I had is this: I have a repo with about 10.000 files, and a
commit with 14 files being moved to a different folder; half of them
where unmodified moves, the other half had one-line modifications
(similarity index 97% or so).
"git show -C --name-status <commit>" detects all 14 files as renames;
"git show --find-copies-harder --name-status <commit>" only shows the
unmodified moves as renames, the ones with modifications are shown as
delete plus add.
The reason for this seems to be the condition
"num_create * num_src > rename_limit * rename_limit" in diffcore_rename;
--find-copies-harder exeeds the limit, so it turns off inexact rename
dectection *completely*, while -C stays well below the limit.
I had expected --find-copies-harder to still do inexact rename detection
among the changed files in the commit in this case, and turn it off only
for the unmodified files; I'm not familiar enough with the code to tell
whether that would be easy to implement though.
Any thoughts?
--
Stefan Haller
Berlin, Germany
http://www.haller-berlin.de/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: --find-copies-harder finds fewer copies/renames than -C does
2011-01-05 17:46 --find-copies-harder finds fewer copies/renames than -C does Stefan Haller
@ 2011-01-05 18:59 ` Junio C Hamano
2011-01-06 16:54 ` Stefan Haller
0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2011-01-05 18:59 UTC (permalink / raw)
To: Stefan Haller; +Cc: git
lists@haller-berlin.de (Stefan Haller) writes:
> The reason for this seems to be the condition
> "num_create * num_src > rename_limit * rename_limit" in diffcore_rename;
> --find-copies-harder exeeds the limit, so it turns off inexact rename
> dectection *completely*, while -C stays well below the limit.
>
> I had expected --find-copies-harder to still do inexact rename detection
> among the changed files in the commit in this case, and turn it off only
> for the unmodified files; I'm not familiar enough with the code to tell
> whether that would be easy to implement though.
>
> Any thoughts?
Two. When you can spend unlimited amount of resources, it would feel more
intuitive if -C -C lifted rename-limit altogether. On the other hand, in
a project where the difference does matter (i.e. you have far too many
candidate sources), it is likely that -C -C without rename limit would run
out of memory and not produce any result, so automatic lifting of rename
limit is unacceptable as the default.
Setting rename-limit to match the size of your environment (perhaps do
this once in the config) would make this a non-issue, so coming up with an
automated way to do so might be an interesting exercise.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: --find-copies-harder finds fewer copies/renames than -C does
2011-01-05 18:59 ` Junio C Hamano
@ 2011-01-06 16:54 ` Stefan Haller
2011-01-06 20:42 ` Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: Stefan Haller @ 2011-01-06 16:54 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
Junio C Hamano <gitster@pobox.com> wrote:
> lists@haller-berlin.de (Stefan Haller) writes:
>
> > I had expected --find-copies-harder to still do inexact rename detection
> > among the changed files in the commit in this case, and turn it off only
> > for the unmodified files; I'm not familiar enough with the code to tell
> > whether that would be easy to implement though.
> >
> > Any thoughts?
>
> Two. When you can spend unlimited amount of resources, it would feel more
> intuitive if -C -C lifted rename-limit altogether. On the other hand, in
> a project where the difference does matter (i.e. you have far too many
> candidate sources), it is likely that -C -C without rename limit would run
> out of memory and not produce any result, so automatic lifting of rename
> limit is unacceptable as the default.
But what about the suggestion of falling back to -C if
--find-copies-harder exceeds the rename limit, but -C does not?
Wouldn't that be the desired behaviour?
--
Stefan Haller
Berlin, Germany
http://www.haller-berlin.de/
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: --find-copies-harder finds fewer copies/renames than -C does
2011-01-06 16:54 ` Stefan Haller
@ 2011-01-06 20:42 ` Junio C Hamano
2011-01-06 21:50 ` [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully Junio C Hamano
0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2011-01-06 20:42 UTC (permalink / raw)
To: Stefan Haller; +Cc: git
lists@haller-berlin.de (Stefan Haller) writes:
> But what about the suggestion of falling back to -C if
> --find-copies-harder exceeds the rename limit, but -C does not?
> Wouldn't that be the desired behaviour?
Maybe. Patches welcome ;-)
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully
2011-01-06 20:42 ` Junio C Hamano
@ 2011-01-06 21:50 ` Junio C Hamano
2011-01-06 21:50 ` [PATCH 1/3] diffcore-rename: refactor "too many candidates" logic Junio C Hamano
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Junio C Hamano @ 2011-01-06 21:50 UTC (permalink / raw)
To: git; +Cc: Stefan Haller
In a project with many paths, "diff -C -C" may have too many rename source
candidates (as it tries to use all existing paths) but the rename source
candidates "diff -C" would use may still fit under the rename detection
limit. Currently, we punt and disable the inexact rename detection
altogether even in such a case.
This weatherballoon series illustrates how diffcore-rename can be tweaked
to allow "-C -C" to fall back to "-C". Somebody should write a test, but
not today ;-).
Junio C Hamano (3):
diffcore-rename: refactor "too many candidates" logic
diffcore-rename: record filepair for rename src
diffcore-rename: fall back to -C when -C -C busts the rename limit
diffcore-rename.c | 97 ++++++++++++++++++++++++++++++++++++++--------------
1 files changed, 71 insertions(+), 26 deletions(-)
--
1.7.4.rc1.214.g2a4f9
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/3] diffcore-rename: refactor "too many candidates" logic
2011-01-06 21:50 ` [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully Junio C Hamano
@ 2011-01-06 21:50 ` Junio C Hamano
2011-01-06 21:50 ` [PATCH 2/3] diffcore-rename: record filepair for rename src Junio C Hamano
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2011-01-06 21:50 UTC (permalink / raw)
To: git
Move the logic to a separate function, to be enhanced by later patches in
the series.
While at it, swap the condition used in the if statement from "if it is
too big then do this" to "if it would fit then do this".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
diffcore-rename.c | 39 +++++++++++++++++++++++++--------------
1 files changed, 25 insertions(+), 14 deletions(-)
diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..6ab050d 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -414,11 +414,34 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
m[worst] = *o;
}
+static int too_many_rename_candidates(int num_create,
+ struct diff_options *options)
+{
+ int rename_limit = options->rename_limit;
+ int num_src = rename_src_nr;
+
+ /*
+ * This basically does a test for the rename matrix not
+ * growing larger than a "rename_limit" square matrix, ie:
+ *
+ * num_create * num_src > rename_limit * rename_limit
+ *
+ * but handles the potential overflow case specially (and we
+ * assume at least 32-bit integers)
+ */
+ if (rename_limit <= 0 || rename_limit > 32767)
+ rename_limit = 32767;
+ if ((num_create <= rename_limit || num_src <= rename_limit) &&
+ (num_create * num_src <= rename_limit * rename_limit))
+ return 0;
+
+ return 1;
+}
+
void diffcore_rename(struct diff_options *options)
{
int detect_rename = options->detect_rename;
int minimum_score = options->rename_score;
- int rename_limit = options->rename_limit;
struct diff_queue_struct *q = &diff_queued_diff;
struct diff_queue_struct outq;
struct diff_score *mx;
@@ -484,19 +507,7 @@ void diffcore_rename(struct diff_options *options)
if (!num_create)
goto cleanup;
- /*
- * This basically does a test for the rename matrix not
- * growing larger than a "rename_limit" square matrix, ie:
- *
- * num_create * num_src > rename_limit * rename_limit
- *
- * but handles the potential overflow case specially (and we
- * assume at least 32-bit integers)
- */
- if (rename_limit <= 0 || rename_limit > 32767)
- rename_limit = 32767;
- if ((num_create > rename_limit && num_src > rename_limit) ||
- (num_create * num_src > rename_limit * rename_limit)) {
+ if (too_many_rename_candidates(num_create, options)) {
if (options->warn_on_too_large_rename)
warning("too many files (created: %d deleted: %d), skipping inexact rename detection", num_create, num_src);
goto cleanup;
--
1.7.4.rc1.214.g2a4f9
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/3] diffcore-rename: record filepair for rename src
2011-01-06 21:50 ` [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully Junio C Hamano
2011-01-06 21:50 ` [PATCH 1/3] diffcore-rename: refactor "too many candidates" logic Junio C Hamano
@ 2011-01-06 21:50 ` Junio C Hamano
2011-01-06 21:50 ` [PATCH 3/3] diffcore-rename: fall back to -C when -C -C busts the rename limit Junio C Hamano
2011-01-06 22:28 ` [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully Jeff King
3 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2011-01-06 21:50 UTC (permalink / raw)
To: git
This will allow us to later skip unmodified entries added due to "-C -C".
We might also want to do something similar to rename_dst side, but that
would only be for the sake of symmetry and not necessary for this series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
diffcore-rename.c | 23 ++++++++++++-----------
1 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6ab050d..9ce81b6 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -54,22 +54,23 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
/* Table of rename/copy src files */
static struct diff_rename_src {
- struct diff_filespec *one;
+ struct diff_filepair *p;
unsigned short score; /* to remember the break score */
} *rename_src;
static int rename_src_nr, rename_src_alloc;
-static struct diff_rename_src *register_rename_src(struct diff_filespec *one,
- unsigned short score)
+static struct diff_rename_src *register_rename_src(struct diff_filepair *p)
{
int first, last;
+ struct diff_filespec *one = p->one;
+ unsigned short score = p->score;
first = 0;
last = rename_src_nr;
while (last > first) {
int next = (last + first) >> 1;
struct diff_rename_src *src = &(rename_src[next]);
- int cmp = strcmp(one->path, src->one->path);
+ int cmp = strcmp(one->path, src->p->one->path);
if (!cmp)
return src;
if (cmp < 0) {
@@ -89,7 +90,7 @@ static struct diff_rename_src *register_rename_src(struct diff_filespec *one,
if (first < rename_src_nr)
memmove(rename_src + first + 1, rename_src + first,
(rename_src_nr - first - 1) * sizeof(*rename_src));
- rename_src[first].one = one;
+ rename_src[first].p = p;
rename_src[first].score = score;
return &(rename_src[first]);
}
@@ -204,7 +205,7 @@ static void record_rename_pair(int dst_index, int src_index, int score)
if (rename_dst[dst_index].pair)
die("internal error: dst already matched.");
- src = rename_src[src_index].one;
+ src = rename_src[src_index].p->one;
src->rename_used++;
src->count++;
@@ -384,7 +385,7 @@ static int find_exact_renames(void)
init_hash(&file_table);
for (i = 0; i < rename_src_nr; i++)
- insert_file_table(&file_table, -1, i, rename_src[i].one);
+ insert_file_table(&file_table, -1, i, rename_src[i].p->one);
for (i = 0; i < rename_dst_nr; i++)
insert_file_table(&file_table, 1, i, rename_dst[i].two);
@@ -472,7 +473,7 @@ void diffcore_rename(struct diff_options *options)
*/
if (p->broken_pair && !p->score)
p->one->rename_used++;
- register_rename_src(p->one, p->score);
+ register_rename_src(p);
}
else if (detect_rename == DIFF_DETECT_COPY) {
/*
@@ -480,7 +481,7 @@ void diffcore_rename(struct diff_options *options)
* one, to indicate ourselves as a user.
*/
p->one->rename_used++;
- register_rename_src(p->one, p->score);
+ register_rename_src(p);
}
}
if (rename_dst_nr == 0 || rename_src_nr == 0)
@@ -526,7 +527,7 @@ void diffcore_rename(struct diff_options *options)
m[j].dst = -1;
for (j = 0; j < rename_src_nr; j++) {
- struct diff_filespec *one = rename_src[j].one;
+ struct diff_filespec *one = rename_src[j].p->one;
struct diff_score this_src;
this_src.score = estimate_similarity(one, two,
minimum_score);
@@ -556,7 +557,7 @@ void diffcore_rename(struct diff_options *options)
dst = &rename_dst[mx[i].dst];
if (dst->pair)
continue; /* already done, either exact or fuzzy. */
- if (rename_src[mx[i].src].one->rename_used)
+ if (rename_src[mx[i].src].p->one->rename_used)
continue;
record_rename_pair(mx[i].dst, mx[i].src, mx[i].score);
rename_count++;
--
1.7.4.rc1.214.g2a4f9
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 3/3] diffcore-rename: fall back to -C when -C -C busts the rename limit
2011-01-06 21:50 ` [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully Junio C Hamano
2011-01-06 21:50 ` [PATCH 1/3] diffcore-rename: refactor "too many candidates" logic Junio C Hamano
2011-01-06 21:50 ` [PATCH 2/3] diffcore-rename: record filepair for rename src Junio C Hamano
@ 2011-01-06 21:50 ` Junio C Hamano
2011-01-06 22:28 ` [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully Jeff King
3 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2011-01-06 21:50 UTC (permalink / raw)
To: git
When there are too many paths in the project, the number of rename source
candidates "git diff -C -C" finds will exceed the rename detection limit,
and no inexact rename detection is performed. We however could fall back
to "git diff -C" if the number of paths modified are sufficiently small.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
diffcore-rename.c | 37 +++++++++++++++++++++++++++++++++++--
1 files changed, 35 insertions(+), 2 deletions(-)
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 9ce81b6..4851af3 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -415,11 +415,18 @@ static void record_if_better(struct diff_score m[], struct diff_score *o)
m[worst] = *o;
}
+/*
+ * Returns:
+ * 0 if we are under the limit;
+ * 1 if we need to disable inexact rename detection;
+ * 2 if we would be under the limit if we were given -C instead of -C -C.
+ */
static int too_many_rename_candidates(int num_create,
struct diff_options *options)
{
int rename_limit = options->rename_limit;
int num_src = rename_src_nr;
+ int i;
/*
* This basically does a test for the rename matrix not
@@ -436,6 +443,19 @@ static int too_many_rename_candidates(int num_create,
(num_create * num_src <= rename_limit * rename_limit))
return 0;
+ /* Are we running under -C -C? */
+ if (!DIFF_OPT_TST(options, FIND_COPIES_HARDER))
+ return 1;
+
+ /* Would we bust the limit if we were running under -C? */
+ for (num_src = i = 0; i < rename_src_nr; i++) {
+ if (diff_unmodified_pair(rename_src[i].p))
+ continue;
+ num_src++;
+ }
+ if ((num_create <= rename_limit || num_src <= rename_limit) &&
+ (num_create * num_src <= rename_limit * rename_limit))
+ return 2;
return 1;
}
@@ -446,7 +466,7 @@ void diffcore_rename(struct diff_options *options)
struct diff_queue_struct *q = &diff_queued_diff;
struct diff_queue_struct outq;
struct diff_score *mx;
- int i, j, rename_count;
+ int i, j, rename_count, skip_unmodified = 0;
int num_create, num_src, dst_cnt;
if (!minimum_score)
@@ -508,10 +528,18 @@ void diffcore_rename(struct diff_options *options)
if (!num_create)
goto cleanup;
- if (too_many_rename_candidates(num_create, options)) {
+ switch (too_many_rename_candidates(num_create, options)) {
+ case 1:
if (options->warn_on_too_large_rename)
warning("too many files (created: %d deleted: %d), skipping inexact rename detection", num_create, num_src);
goto cleanup;
+ case 2:
+ if (options->warn_on_too_large_rename)
+ warning("too many files, falling back to -C");
+ skip_unmodified = 1;
+ break;
+ default:
+ break;
}
mx = xcalloc(num_create * NUM_CANDIDATE_PER_DST, sizeof(*mx));
@@ -529,6 +557,11 @@ void diffcore_rename(struct diff_options *options)
for (j = 0; j < rename_src_nr; j++) {
struct diff_filespec *one = rename_src[j].p->one;
struct diff_score this_src;
+
+ if (skip_unmodified &&
+ diff_unmodified_pair(rename_src[j].p))
+ continue;
+
this_src.score = estimate_similarity(one, two,
minimum_score);
this_src.name_score = basename_same(one, two);
--
1.7.4.rc1.214.g2a4f9
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully
2011-01-06 21:50 ` [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully Junio C Hamano
` (2 preceding siblings ...)
2011-01-06 21:50 ` [PATCH 3/3] diffcore-rename: fall back to -C when -C -C busts the rename limit Junio C Hamano
@ 2011-01-06 22:28 ` Jeff King
3 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2011-01-06 22:28 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Stefan Haller
On Thu, Jan 06, 2011 at 01:50:03PM -0800, Junio C Hamano wrote:
> Junio C Hamano (3):
> diffcore-rename: refactor "too many candidates" logic
> diffcore-rename: record filepair for rename src
> diffcore-rename: fall back to -C when -C -C busts the rename limit
I read through the series, and it looks good to me. I didn't do any
testing, though. :)
-Peff
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-01-06 22:29 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-05 17:46 --find-copies-harder finds fewer copies/renames than -C does Stefan Haller
2011-01-05 18:59 ` Junio C Hamano
2011-01-06 16:54 ` Stefan Haller
2011-01-06 20:42 ` Junio C Hamano
2011-01-06 21:50 ` [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully Junio C Hamano
2011-01-06 21:50 ` [PATCH 1/3] diffcore-rename: refactor "too many candidates" logic Junio C Hamano
2011-01-06 21:50 ` [PATCH 2/3] diffcore-rename: record filepair for rename src Junio C Hamano
2011-01-06 21:50 ` [PATCH 3/3] diffcore-rename: fall back to -C when -C -C busts the rename limit Junio C Hamano
2011-01-06 22:28 ` [PATCH 0/3] Falling back "diff -C -C" to "diff -C" more gracefully Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).