git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: Fix up diffcore-rename scoring
Date: Sun, 12 Mar 2006 23:09:11 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0603122256550.3618@g5.osdl.org> (raw)
In-Reply-To: <7vmzfusuyq.fsf@assigned-by-dhcp.cox.net>



On Sun, 12 Mar 2006, Junio C Hamano wrote:
>
> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > The "score" calculation for diffcore-rename was totally broken.
> >
> > It scaled "score" as
> >
> > 	score = src_copied * MAX_SCORE / dst->size;
> >
> > which means that you got a 100% similarity score even if src and dest were 
> > different, if just every byte of dst was copied from src, even if source 
> > was much larger than dst (eg we had copied 85% of the bytes, but _deleted_ 
> > the remaining 15%).
> 
> Your reading of the code is correct, but that is deliberate.
> 
> >  	/* How similar are they?
> >  	 * what percentage of material in dst are from source?
> >  	 */
> 
> I wanted to say in such a case that dst was _really_ derived
> from the source.  I think using max may make more sense, but I
> need to convince myself by looking at filepairs that this change
> stops detecting as renames, and this change starts detecting as
> renames.

Just compare the result. Just eye-balling the difference between the 
rename data from 2.6.12 to 2.6.14, the fixed score actually gets better 
rename detection. It actually finds 133 renames (as opposed to 132 for the 
broken one), and the renames it finds are more sensible.

For example, the fixed version finds

	drivers/i2c/chips/lm75.h -> drivers/hwmon/lm75.h

which actually matches the other i2c/chips/ renames, while the broken one 
does

	drivers/i2c/chips/lm75.h -> drivers/media/video/rds.h

which just doesn't make any sense at all.

Now, that said, they _both_ find some pretty funky renames. I think there 
is probably some serious room for improvement, regardless (or at least 
changing the default similarity cut-off to something better ;)

		Linus

  reply	other threads:[~2006-03-13  7:09 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-13  6:26 Fix up diffcore-rename scoring Linus Torvalds
2006-03-13  6:44 ` Linus Torvalds
2006-03-13  6:46 ` Junio C Hamano
2006-03-13  7:09   ` Linus Torvalds [this message]
2006-03-13  7:42     ` Junio C Hamano
2006-03-13  7:44     ` Linus Torvalds
2006-03-13 10:43       ` Junio C Hamano
2006-03-13 15:38         ` Linus Torvalds
2006-03-14  0:49           ` Rutger Nijlunsing
2006-03-14  0:55           ` Junio C Hamano
2006-04-06 21:01       ` Geert Bosch
2006-04-11 22:04         ` Junio C Hamano
2006-04-14 17:46           ` Geert Bosch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0603122256550.3618@g5.osdl.org \
    --to=torvalds@osdl.org \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).