From: Andy Parkins <andyparkins@gmail.com>
To: git@vger.kernel.org
Cc: Junio C Hamano <junkio@cox.net>
Subject: Re: Rename detection at git log
Date: Mon, 20 Nov 2006 12:01:02 +0100 [thread overview]
Message-ID: <200611201101.04456.andyparkins@gmail.com> (raw)
In-Reply-To: <7virha4cnm.fsf@assigned-by-dhcp.cox.net>
On Monday 2006 November 20 10:48, Junio C Hamano wrote:
> I wrote the code and you contradict me ;-)?
Sorry; I wasn't so much contradicting that the filtering works exactly as you
say (of course it must - I don't know anywhere near enough to make that sort
of assertion).
However, I do think that the problem is not one of filtering. I was saying
that "-C" has no practical use.
> in your example, it would give you the creation of fileB, not
> copy.
I'm sure it would - but you had to use --find-copies-harder; -C would not find
it as a copy.
> - Renames are only picked up from files that were lost in the
> same change (i.e. "mv fileA fileB" creates fileB and loses
> fileA; fileB is checked if it is similar to fileA in the
> original).
I've found rename detection to be flawless in all my uses.
> - Copies are only picked up from files that were changed in the
> same change (i.e. splitting major part of original file and
> moving it to somewhere else, while leaving a skelton in the
> original file). "harder" is needed if the copy original was
> untouched, as you found out.
Yep; I understand that. I also understand that it is done for performance
reasons. However, since the typical copy will be one where the source
doesn't change at the same time, I am arguing that the non-hard copy
detection isn't much use.
> The last one is a compromise between performance and thoroughness,
> and the "harder" is one knob to tweak its behaviour.
I've been poking in tree-diff.c to see if I can understand why it it such a
performance hog. I still haven't. Each file is stored under its hash right?
So for copy detection why can't you just search for other files with the same
hash, which I presume is very fast (as it is the basis of what makes git so
fast)?
I am probably misunderstanding git, but I guess that a copy isn't even needed
in the database because two files with the same hash in the working copy only
need storing once and then referencing twice. So for a copy (again, with my
simple understanding of git) we'd have:
commit1 -> tree1 -> fileA = fileA_hash
^
|
commit2 -> tree2 -> fileA = fileA_hash
fileB = fileB_hash
Doesn't that mean that copy detection is just a matter of searching the parent
commit trees for references to the same hash?
Andy
--
Dr Andy Parkins, M Eng (hons), MIEE
next prev parent reply other threads:[~2006-11-20 11:04 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-20 5:57 Rename detection at git log Alexander Litvinov
2006-11-20 9:50 ` Andy Parkins
2006-11-20 10:07 ` Junio C Hamano
2006-11-20 10:11 ` Jakub Narebski
2006-11-20 10:22 ` Andy Parkins
2006-11-20 10:48 ` Junio C Hamano
2006-11-20 11:01 ` Andy Parkins [this message]
2006-11-20 11:15 ` Jakub Narebski
2006-11-20 11:32 ` Junio C Hamano
2006-11-20 11:59 ` Andy Parkins
2006-11-20 11:28 ` Junio C Hamano
2006-11-20 12:16 ` Andy Parkins
2006-11-20 11:33 ` Alexander Litvinov
2006-11-20 10:06 ` Alex Riesen
2006-11-20 10:23 ` Andy Parkins
2006-11-20 10:51 ` Junio C Hamano
2006-11-20 11:17 ` Andy Parkins
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200611201101.04456.andyparkins@gmail.com \
--to=andyparkins@gmail.com \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.