From: Andy Parkins <andyparkins@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <junkio@cox.net>
Subject: Re: Rename detection at git log
Date: Mon, 20 Nov 2006 12:01:02 +0100 [thread overview]
Message-ID: <200611201101.04456.andyparkins@gmail.com> (raw)
In-Reply-To: <7virha4cnm.fsf@assigned-by-dhcp.cox.net>
On Monday 2006 November 20 10:48, Junio C Hamano wrote:
> I wrote the code and you contradict me ;-)?
Sorry; I wasn't so much contradicting that the filtering works exactly as you
say (of course it must - I don't know anywhere near enough to make that sort
of assertion).
However, I do think that the problem is not one of filtering. I was saying
that "-C" has no practical use.
> in your example, it would give you the creation of fileB, not
> copy.
I'm sure it would - but you had to use --find-copies-harder; -C would not find
it as a copy.
> - Renames are only picked up from files that were lost in the
> same change (i.e. "mv fileA fileB" creates fileB and loses
> fileA; fileB is checked if it is similar to fileA in the
> original).
I've found rename detection to be flawless in all my uses.
> - Copies are only picked up from files that were changed in the
> same change (i.e. splitting major part of original file and
> moving it to somewhere else, while leaving a skelton in the
> original file). "harder" is needed if the copy original was
> untouched, as you found out.
Yep; I understand that. I also understand that it is done for performance
reasons. However, since the typical copy will be one where the source
doesn't change at the same time, I am arguing that the non-hard copy
detection isn't much use.
> The last one is a compromise between performance and thoroughness,
> and the "harder" is one knob to tweak its behaviour.
I've been poking in tree-diff.c to see if I can understand why it it such a
performance hog. I still haven't. Each file is stored under its hash right?
So for copy detection why can't you just search for other files with the same
hash, which I presume is very fast (as it is the basis of what makes git so
fast)?
I am probably misunderstanding git, but I guess that a copy isn't even needed
in the database because two files with the same hash in the working copy only
need storing once and then referencing twice. So for a copy (again, with my
simple understanding of git) we'd have:
commit1 -> tree1 -> fileA = fileA_hash
^
|
commit2 -> tree2 -> fileA = fileA_hash
fileB = fileB_hash
Doesn't that mean that copy detection is just a matter of searching the parent
commit trees for references to the same hash?
Andy
--
Dr Andy Parkins, M Eng (hons), MIEE
next prev parent reply other threads:[~2006-11-20 11:04 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-20 5:57 Rename detection at git log Alexander Litvinov
2006-11-20 9:50 ` Andy Parkins
2006-11-20 10:07 ` Junio C Hamano
2006-11-20 10:11 ` Jakub Narebski
2006-11-20 10:22 ` Andy Parkins
2006-11-20 10:48 ` Junio C Hamano
2006-11-20 11:01 ` Andy Parkins [this message]
2006-11-20 11:15 ` Jakub Narebski
2006-11-20 11:32 ` Junio C Hamano
2006-11-20 11:59 ` Andy Parkins
2006-11-20 11:28 ` Junio C Hamano
2006-11-20 12:16 ` Andy Parkins
2006-11-20 11:33 ` Alexander Litvinov
2006-11-20 10:06 ` Alex Riesen
2006-11-20 10:23 ` Andy Parkins
2006-11-20 10:51 ` Junio C Hamano
2006-11-20 11:17 ` Andy Parkins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200611201101.04456.andyparkins@gmail.com \
--to=andyparkins@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).