All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Elliot Wolk <elliot.wolk@gmail.com>,
	Robin Rosenberg <robin.rosenberg@dewire.com>,
	git@vger.kernel.org
Subject: Re: move detection doesnt take filename into account
Date: Wed, 09 Jul 2014 08:51:07 -0700	[thread overview]
Message-ID: <xmqqegxu7cpg.fsf@gitster.dls.corp.google.com> (raw)
In-Reply-To: <20140709064521.GA14682@sigill.intra.peff.net> (Jeff King's message of "Wed, 9 Jul 2014 02:45:21 -0400")

Jeff King <peff@peff.net> writes:

> On Tue, Jul 01, 2014 at 10:08:15AM -0700, Junio C Hamano wrote:
>
>> I didn't think it through but my gut feeling is that we could change
>> the name similarity score to be the length of the tail part that
>> matches (e.g. 1.a to a/2.a that has the same two bytes at the tail
>> is a better match than to a/2.b that does not share any tail, and to
>> a/1.a that shares the three bytes at the tail is an even better
>> match).
>
> The delta heuristics in pack-objects use pack_name_hash, which claims:
>
>         /*
>          * This effectively just creates a sortable number from the
>          * last sixteen non-whitespace characters. Last characters
>          * count "most", so things that end in ".c" sort together.
>          */
>
> which might be another option (and seems like a superset of the basename
> check, short of basenames that are longer than 16 characters).

Perhaps.

I am however not sure if the code to compute similarity score is as
OK with false positives, i.e. dissimilar names that happen to hash
together getting clumped in a same bin or in close bins, as the
existing callers of pack_name_hash().

  reply	other threads:[~2014-07-09 15:51 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-30  6:38 move detection doesnt take filename into account Elliot Wolk
2014-07-01  9:16 ` Robin Rosenberg
2014-07-01 14:40   ` Elliot Wolk
2014-07-01 14:57   ` Junio C Hamano
2014-07-01 15:05     ` Elliot Wolk
2014-07-01 17:08       ` Junio C Hamano
2014-07-09  6:45         ` Jeff King
2014-07-09 15:51           ` Junio C Hamano [this message]
2014-07-09 22:03             ` Jeff King
2014-07-09 22:18               ` Junio C Hamano
2014-07-10  3:53                 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqegxu7cpg.fsf@gitster.dls.corp.google.com \
    --to=gitster@pobox.com \
    --cc=elliot.wolk@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=robin.rosenberg@dewire.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.