git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Avery Pennarun <apenwarr@gmail.com>
To: Ron Garret <ron1@flownet.com>
Cc: git@vger.kernel.org
Subject: Re: git-mv redux: there must be something else going on
Date: Wed, 3 Feb 2010 14:47:33 -0500	[thread overview]
Message-ID: <32541b131002031147r367ee08fxc64c4c54165953a3@mail.gmail.com> (raw)
In-Reply-To: <ron1-5F71CB.11234903022010@news.gmane.org>

On Wed, Feb 3, 2010 at 2:23 PM, Ron Garret <ron1@flownet.com> wrote:
> In article
> Ah.  That explains everything.  Thanks.  (I thought git mv was
> equivalent to git rm followed by git add.  But it's not.)

I suppose in this case it's not.  The only difference is when your
work tree differs from your index, though, and it's to be expected
that 'git rm', in removing things from the index, would lose your
ability to track those differences.

> So... how *does* git decide when two blobs are different blobs and when
> they are the same blob with mods?  I asked this question before and was
> pointed to the diffcore docs, but that didn't really clear things up.
> That just describes all the different ways git can do diffs, not the
> actual heuristics that git uses to track content.

If you really want to know the details, looking at the code really is
probably the best solution; it's not even that long.

The short version is that git chooses a set of candidate blobs, then
diffs them and figures out a percentage similarity between each pair.
(A simple way to think of the similarity index is "how long is the
diff compared to the file itself?"  If the diff is of length zero, the
similarity is 100%, and so on.) If the similarity is greater than a
certain threshold, then it's considered to be the same file.

Choosing the set of candidates is actually the more interesting
problem, since detecting moves using the above algorithm is O(n^2)
with the number of candidates.  That's why 'git diff' and 'git log'
don't do it at all by default.

If you provide -M, the set of candidates is the set of files that were
removed/modified and the set of files that were added.  (Added files
are compared against removed/modified files, iirc.)  Normally that's a
very short list.  With -C, you need to compare all
added/removed/modified files with all others, which is slightly more
work.  With --find-copies-harder, it becomes potentially a *lot* of
work.

Have fun,

Avery

  reply	other threads:[~2010-02-03 19:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-03 18:25 git-mv redux: there must be something else going on Ron Garret
2010-02-03 18:48 ` Avery Pennarun
2010-02-03 19:23   ` Ron Garret
2010-02-03 19:47     ` Avery Pennarun [this message]
2010-02-03 20:30       ` Ron Garret
2010-02-03 19:53     ` Nicolas Pitre
2010-02-03 20:27       ` Ron Garret
2010-02-03 20:31         ` Ron Garret
2010-02-03 20:40         ` Avery Pennarun
2010-02-03 22:33           ` Ron Garret
2010-02-03 23:18             ` Avery Pennarun
2010-02-03 23:55               ` Jay Soffian
2010-02-04  0:10                 ` Ron Garret
2010-02-04  0:10               ` Ron Garret
2010-02-04  0:48             ` Junio C Hamano
2010-02-03 20:44         ` Nicolas Pitre
2010-02-03 20:12   ` Pete Harlan
2010-02-03 20:34     ` Ron Garret
2010-02-03 21:12       ` [PATCH] Documentation: clarify git-mv behaviour wrt dirty files Thomas Rast
2010-02-03 21:56         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=32541b131002031147r367ee08fxc64c4c54165953a3@mail.gmail.com \
    --to=apenwarr@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=ron1@flownet.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).