Git development
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: Yann Dirson <ydirson@altern.org>,
	Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org
Subject: Re: [PATCH 1/2] diffcore-rename: support rename cache
Date: Sat, 8 Nov 2008 21:04:13 -0500	[thread overview]
Message-ID: <20081109020413.GA31408@coredump.intra.peff.net> (raw)
In-Reply-To: <fcaeb9bf0811080400h7ea5377cvaa8d658335811c23@mail.gmail.com>

On Sat, Nov 08, 2008 at 07:00:10PM +0700, Nguyen Thai Ngoc Duy wrote:

> >  The downsides are:
> >
> >   - your cache is potentially bigger, since you are caching the score of
> >    every pair you look at, instead of just "good" pairs (OTOH, you are
> >    not doing a per-commit cache, which helps reduce the size)
> 
> It is huge if you accidentially add --find-copies-harder to your
> command, considering that every new file will be compared against
> every files in tree (about 25k).

Hmm, yeah. I was thinking you might be able to do some kind of cut-off
on the caching (i.e., don't bother storing anything that didn't come
close). But you can't safely assume that because an entry isn't there,
it isn't worth seeing (since it might also just not have been computed
yet). You could still organize by commit, and then each commit is either
fully computed or not. But then you still have a pathspec problem.

One thing you could do is just compute the rename score between all
pairs, even if a pathspec is given, limit it to values over "0.5" (or
something low, but that eliminates the totally uninteresting cases), and
then store that as the complete cache for that commit (or tree pair, if
you want to support that).

Then you would have the full information and could do an arbitrary
pathspec limit on it. If you wanted to set the rename threshold below
0.5, then we would have to recompute without the cache (but in practice,
that should be rare).

The real downside is that you pay for the whole-tree detection when you
have asked for a pathspec (but only the first time, after which you can
always generate from cache).

Just thinking out loud...

-Peff

      reply	other threads:[~2008-11-09  2:06 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-11-07 14:35 [PATCH 1/2] diffcore-rename: support rename cache Nguyễn Thái Ngọc Duy
2008-11-07 14:35 ` [PATCH 2/2] diffcore-rename: add config option to allow to cache renames Nguyễn Thái Ngọc Duy
2008-11-07 22:21 ` [PATCH 1/2] diffcore-rename: support rename cache Yann Dirson
2008-11-07 23:17   ` Junio C Hamano
2008-11-08  4:01     ` Nguyen Thai Ngoc Duy
2008-11-08  9:24       ` Yann Dirson
2008-11-08  9:29         ` Nguyen Thai Ngoc Duy
2008-11-08 11:47         ` Jeff King
2008-11-08 12:00           ` Nguyen Thai Ngoc Duy
2008-11-09  2:04             ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081109020413.GA31408@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=ydirson@altern.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox