From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.176.0/21 X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MSGID_FROM_MTA_HEADER,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 From: Andy Parkins Subject: Re: Rename detection at git log Date: Mon, 20 Nov 2006 12:01:02 +0100 Message-ID: <200611201101.04456.andyparkins@gmail.com> References: <200611201157.23680.litvinov2004@gmail.com> <200611201022.10656.andyparkins@gmail.com> <7virha4cnm.fsf@assigned-by-dhcp.cox.net> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit NNTP-Posting-Date: Mon, 20 Nov 2006 11:04:18 +0000 (UTC) Cc: Junio C Hamano Return-path: Envelope-to: gcvg-git@gmane.org DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:from:to:subject:date:user-agent:cc:references:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:message-id; b=Xy/PVNRY2e4K8Z2nNvLHBCfwBoa6UTBjAcXjC9IXF6yybHcTw4H1AaeW3JG2Qo4K4zAHHOiV2E+qsU/AurwX10AkNABpp3UaMCsJARQkbbg6siDhUurFORonBPsoAB9y4zTUI0613qbMjKWr8BVaRVHvzTp4HDvzOmZeiLNFDz0= User-Agent: KMail/1.9.5 In-Reply-To: <7virha4cnm.fsf@assigned-by-dhcp.cox.net> Content-Disposition: inline Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1Gm6wZ-00076R-0W for gcvg-git@gmane.org; Mon, 20 Nov 2006 12:04:07 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755926AbWKTLBo (ORCPT ); Mon, 20 Nov 2006 06:01:44 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755951AbWKTLBo (ORCPT ); Mon, 20 Nov 2006 06:01:44 -0500 Received: from ug-out-1314.google.com ([66.249.92.174]:1989 "EHLO ug-out-1314.google.com") by vger.kernel.org with ESMTP id S1755926AbWKTLBn (ORCPT ); Mon, 20 Nov 2006 06:01:43 -0500 Received: by ug-out-1314.google.com with SMTP id m3so1140883ugc for ; Mon, 20 Nov 2006 03:01:09 -0800 (PST) Received: by 10.67.19.17 with SMTP id w17mr3090671ugi.1164020468754; Mon, 20 Nov 2006 03:01:08 -0800 (PST) Received: from dvr.360vision.com ( [194.70.53.227]) by mx.google.com with ESMTP id 28sm7514513ugc.2006.11.20.03.01.08; Mon, 20 Nov 2006 03:01:08 -0800 (PST) To: git@vger.kernel.org Sender: git-owner@vger.kernel.org On Monday 2006 November 20 10:48, Junio C Hamano wrote: > I wrote the code and you contradict me ;-)? Sorry; I wasn't so much contradicting that the filtering works exactly as you say (of course it must - I don't know anywhere near enough to make that sort of assertion). However, I do think that the problem is not one of filtering. I was saying that "-C" has no practical use. > in your example, it would give you the creation of fileB, not > copy. I'm sure it would - but you had to use --find-copies-harder; -C would not find it as a copy. > - Renames are only picked up from files that were lost in the > same change (i.e. "mv fileA fileB" creates fileB and loses > fileA; fileB is checked if it is similar to fileA in the > original). I've found rename detection to be flawless in all my uses. > - Copies are only picked up from files that were changed in the > same change (i.e. splitting major part of original file and > moving it to somewhere else, while leaving a skelton in the > original file). "harder" is needed if the copy original was > untouched, as you found out. Yep; I understand that. I also understand that it is done for performance reasons. However, since the typical copy will be one where the source doesn't change at the same time, I am arguing that the non-hard copy detection isn't much use. > The last one is a compromise between performance and thoroughness, > and the "harder" is one knob to tweak its behaviour. I've been poking in tree-diff.c to see if I can understand why it it such a performance hog. I still haven't. Each file is stored under its hash right? So for copy detection why can't you just search for other files with the same hash, which I presume is very fast (as it is the basis of what makes git so fast)? I am probably misunderstanding git, but I guess that a copy isn't even needed in the database because two files with the same hash in the working copy only need storing once and then referencing twice. So for a copy (again, with my simple understanding of git) we'd have: commit1 -> tree1 -> fileA = fileA_hash ^ | commit2 -> tree2 -> fileA = fileA_hash fileB = fileB_hash Doesn't that mean that copy detection is just a matter of searching the parent commit trees for references to the same hash? Andy -- Dr Andy Parkins, M Eng (hons), MIEE