All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Subject: Re: Rename detection at git log
Date: Mon, 20 Nov 2006 12:15:30 +0100	[thread overview]
Message-ID: <ejs2lp$2r4$1@sea.gmane.org> (raw)
In-Reply-To: 200611201101.04456.andyparkins@gmail.com

Andy Parkins wrote:

> On Monday 2006 November 20 10:48, Junio C Hamano wrote:
>
>>  - Copies are only picked up from files that were changed in the
>>    same change (i.e. splitting major part of original file and
>>    moving it to somewhere else, while leaving a skelton in the
>>    original file).  "harder" is needed if the copy original was
>>    untouched, as you found out.
> 
> Yep; I understand that.  I also understand that it is done for performance 
> reasons.  However, since the typical copy will be one where the source 
> doesn't change at the same time, I am arguing that the non-hard copy 
> detection isn't much use.

I'm not sure about this. You usually both do pure renames (to reorganize
files, to give file a better name) and renames with modification, but
I don't think that copy without modification is very common. Usually you
copy a file because you take one file as template for the other, or you
split file, or you join files into one file.
 
>> The last one is a compromise between performance and thoroughness,
>> and the "harder" is one knob to tweak its behaviour.
> 
> I've been poking in tree-diff.c to see if I can understand why it it such a 
> performance hog.  I still haven't.  Each file is stored under its hash right?  
> So for copy detection why can't you just search for other files with the same 
> hash, which I presume is very fast (as it is the basis of what makes git so 
> fast)?

Copy and rename detection are done by comparing the contents, calculating
similarity. So to check if files A and B are copies (not necessary pure
copies) it is not enough to compare hashes.

That said, it should be fairly easy (if not that useful in true projects
as I understand it, as stated above) to add to copy detection detection of
pure copies by comparing hashes. Still, --find-copies-harder would be still
needed if the copy original was untouched, while copy itself was modified.

> I am probably misunderstanding git, but I guess that a copy isn't even needed 
> in the database because two files with the same hash in the working copy only 
> need storing once and then referencing twice.  So for a copy (again, with my 
> simple understanding of git) we'd have:
> 
>  commit1 -> tree1 -> fileA = fileA_hash
>     ^
>     |
>  commit2 -> tree2 -> fileA = fileA_hash
>                      fileB = fileB_hash
> 
> Doesn't that mean that copy detection is just a matter of searching the parent 
> commit trees for references to the same hash?

Think copy'n'change.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git


  reply	other threads:[~2006-11-20 11:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-20  5:57 Rename detection at git log Alexander Litvinov
2006-11-20  9:50 ` Andy Parkins
2006-11-20 10:07   ` Junio C Hamano
2006-11-20 10:11     ` Jakub Narebski
2006-11-20 10:22     ` Andy Parkins
2006-11-20 10:48       ` Junio C Hamano
2006-11-20 11:01         ` Andy Parkins
2006-11-20 11:15           ` Jakub Narebski [this message]
2006-11-20 11:32             ` Junio C Hamano
2006-11-20 11:59             ` Andy Parkins
2006-11-20 11:28         ` Junio C Hamano
2006-11-20 12:16           ` Andy Parkins
2006-11-20 11:33     ` Alexander Litvinov
2006-11-20 10:06 ` Alex Riesen
2006-11-20 10:23   ` Andy Parkins
2006-11-20 10:51     ` Junio C Hamano
2006-11-20 11:17       ` Andy Parkins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='ejs2lp$2r4$1@sea.gmane.org' \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.