All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Parkins <andyparkins@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <junkio@cox.net>
Subject: Re: Rename detection at git log
Date: Mon, 20 Nov 2006 12:01:02 +0100	[thread overview]
Message-ID: <200611201101.04456.andyparkins@gmail.com> (raw)
In-Reply-To: <7virha4cnm.fsf@assigned-by-dhcp.cox.net>

On Monday 2006 November 20 10:48, Junio C Hamano wrote:

> I wrote the code and you contradict me ;-)?

Sorry; I wasn't so much contradicting that the filtering works exactly as you 
say (of course it must - I don't know anywhere near enough to make that sort 
of assertion).

However, I do think that the problem is not one of filtering.  I was saying 
that "-C" has no practical use.

> in your example, it would give you the creation of fileB, not
> copy.

I'm sure it would - but you had to use --find-copies-harder; -C would not find 
it as a copy.

>  - Renames are only picked up from files that were lost in the
>    same change (i.e. "mv fileA fileB" creates fileB and loses
>    fileA; fileB is checked if it is similar to fileA in the
>    original).

I've found rename detection to be flawless in all my uses.

>  - Copies are only picked up from files that were changed in the
>    same change (i.e. splitting major part of original file and
>    moving it to somewhere else, while leaving a skelton in the
>    original file).  "harder" is needed if the copy original was
>    untouched, as you found out.

Yep; I understand that.  I also understand that it is done for performance 
reasons.  However, since the typical copy will be one where the source 
doesn't change at the same time, I am arguing that the non-hard copy 
detection isn't much use.

> The last one is a compromise between performance and thoroughness,
> and the "harder" is one knob to tweak its behaviour.

I've been poking in tree-diff.c to see if I can understand why it it such a 
performance hog.  I still haven't.  Each file is stored under its hash right?  
So for copy detection why can't you just search for other files with the same 
hash, which I presume is very fast (as it is the basis of what makes git so 
fast)?

I am probably misunderstanding git, but I guess that a copy isn't even needed 
in the database because two files with the same hash in the working copy only 
need storing once and then referencing twice.  So for a copy (again, with my 
simple understanding of git) we'd have:

 commit1 -> tree1 -> fileA = fileA_hash
    ^
    |
 commit2 -> tree2 -> fileA = fileA_hash
                     fileB = fileB_hash

Doesn't that mean that copy detection is just a matter of searching the parent 
commit trees for references to the same hash?


Andy
-- 
Dr Andy Parkins, M Eng (hons), MIEE

  reply	other threads:[~2006-11-20 11:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-20  5:57 Rename detection at git log Alexander Litvinov
2006-11-20  9:50 ` Andy Parkins
2006-11-20 10:07   ` Junio C Hamano
2006-11-20 10:11     ` Jakub Narebski
2006-11-20 10:22     ` Andy Parkins
2006-11-20 10:48       ` Junio C Hamano
2006-11-20 11:01         ` Andy Parkins [this message]
2006-11-20 11:15           ` Jakub Narebski
2006-11-20 11:32             ` Junio C Hamano
2006-11-20 11:59             ` Andy Parkins
2006-11-20 11:28         ` Junio C Hamano
2006-11-20 12:16           ` Andy Parkins
2006-11-20 11:33     ` Alexander Litvinov
2006-11-20 10:06 ` Alex Riesen
2006-11-20 10:23   ` Andy Parkins
2006-11-20 10:51     ` Junio C Hamano
2006-11-20 11:17       ` Andy Parkins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200611201101.04456.andyparkins@gmail.com \
    --to=andyparkins@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.