From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>, Git Mailing List <git@vger.kernel.org>
Subject: Fix up diffcore-rename scoring
Date: Sun, 12 Mar 2006 22:26:34 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0603122223160.3618@g5.osdl.org> (raw)
The "score" calculation for diffcore-rename was totally broken.
It scaled "score" as
	score = src_copied * MAX_SCORE / dst->size;
which means that you got a 100% similarity score even if src and dest were 
different, if just every byte of dst was copied from src, even if source 
was much larger than dst (eg we had copied 85% of the bytes, but _deleted_ 
the remaining 15%).
That's clearly bogus. We should do the score calculation relative not to 
the destination size, but to the max size of the two.
This seems to fix it.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
diff --git a/diffcore-rename.c b/diffcore-rename.c
index ed99fe2..e992698 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -133,7 +133,7 @@ static int estimate_similarity(struct di
 	 * match than anything else; the destination does not even
 	 * call into this function in that case.
 	 */
-	unsigned long delta_size, base_size, src_copied, literal_added;
+	unsigned long max_size, delta_size, base_size, src_copied, literal_added;
 	unsigned long delta_limit;
 	int score;
 
@@ -144,9 +144,9 @@ static int estimate_similarity(struct di
 	if (!S_ISREG(src->mode) || !S_ISREG(dst->mode))
 		return 0;
 
-	delta_size = ((src->size < dst->size) ?
-		      (dst->size - src->size) : (src->size - dst->size));
+	max_size = ((src->size > dst->size) ? src->size : dst->size);
 	base_size = ((src->size < dst->size) ? src->size : dst->size);
+	delta_size = max_size - base_size;
 
 	/* We would not consider edits that change the file size so
 	 * drastically.  delta_size must be smaller than
@@ -174,12 +174,10 @@ static int estimate_similarity(struct di
 	/* How similar are they?
 	 * what percentage of material in dst are from source?
 	 */
-	if (dst->size < src_copied)
-		score = MAX_SCORE;
-	else if (!dst->size)
+	if (!dst->size)
 		score = 0; /* should not happen */
 	else
-		score = src_copied * MAX_SCORE / dst->size;
+		score = src_copied * MAX_SCORE / max_size;
 	return score;
 }
 
next             reply	other threads:[~2006-03-13  6:26 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-13  6:26 Linus Torvalds [this message]
2006-03-13  6:44 ` Fix up diffcore-rename scoring Linus Torvalds
2006-03-13  6:46 ` Junio C Hamano
2006-03-13  7:09   ` Linus Torvalds
2006-03-13  7:42     ` Junio C Hamano
2006-03-13  7:44     ` Linus Torvalds
2006-03-13 10:43       ` Junio C Hamano
2006-03-13 15:38         ` Linus Torvalds
2006-03-14  0:49           ` Rutger Nijlunsing
2006-03-14  0:55           ` Junio C Hamano
2006-04-06 21:01       ` Geert Bosch
2006-04-11 22:04         ` Junio C Hamano
2006-04-14 17:46           ` Geert Bosch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=Pine.LNX.4.64.0603122223160.3618@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).