From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jakub Narebski Subject: Re: Following renames Date: Mon, 27 Mar 2006 08:55:03 +0200 Organization: At home Message-ID: References: <20060326014946.GB18185@pasky.or.cz> <7virq1sywj.fsf@assigned-by-dhcp.cox.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7Bit X-From: git-owner@vger.kernel.org Mon Mar 27 08:55:07 2006 Return-path: Envelope-to: gcvg-git@gmane.org Received: from vger.kernel.org ([209.132.176.167]) by ciao.gmane.org with esmtp (Exim 4.43) id 1FNld3-0002SJ-Tp for gcvg-git@gmane.org; Mon, 27 Mar 2006 08:55:06 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750753AbWC0Gyz (ORCPT ); Mon, 27 Mar 2006 01:54:55 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750757AbWC0Gyz (ORCPT ); Mon, 27 Mar 2006 01:54:55 -0500 Received: from main.gmane.org ([80.91.229.2]:45468 "EHLO ciao.gmane.org") by vger.kernel.org with ESMTP id S1750753AbWC0Gyz (ORCPT ); Mon, 27 Mar 2006 01:54:55 -0500 Received: from list by ciao.gmane.org with local (Exim 4.43) id 1FNlcr-0002Qd-BU for git@vger.kernel.org; Mon, 27 Mar 2006 08:54:53 +0200 Received: from 193.0.122.19 ([193.0.122.19]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 27 Mar 2006 08:54:53 +0200 Received: from jnareb by 193.0.122.19 with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Mon, 27 Mar 2006 08:54:53 +0200 X-Injected-Via-Gmane: http://gmane.org/ To: git@vger.kernel.org X-Complaints-To: usenet@sea.gmane.org X-Gmane-NNTP-Posting-Host: 193.0.122.19 Mail-Copies-To: jnareb@gmail.com User-Agent: KNode/0.7.7 Sender: git-owner@vger.kernel.org Precedence: bulk X-Mailing-List: git@vger.kernel.org Archived-At: Linus Torvalds wrote: > On Sun, 26 Mar 2006, Jakub Narebski wrote: >> >> If (2) is common enough then discussed improvements to rename detection, >> namely comparing basenames as a base for candidate selection is a good >> idea. > > BK had this "renametool" which got started automatically when you applied > a patch that removed one or more files and added one or more files, so > that you could then pair up the files manually. [...] > The thing is, the fast rename detection that is in the "next" branch > really does a lot better, and it's fast enough. I was thinking about the fast ename detection algorithm in "next" branch. That is the question if recording additional (helper) information about contents copying and moving like the mentioned "renametool" did is worth the effort, both in coding it and from user's point of view. Or would better contents copying and moving detection ("renames detection") for whatchanged and similar suffice. I am of opinion that voluntary information about contents moving and copying in the commits would help. Purposes: 1.) Record contents moving and similarity information which cannot or cannot be easily calculated; see Paul Jakma response in this thread MessageID: for example copying fragment of code, small fragment of the whole file, creating documentation or header file from code, or code skeleton from template, or rewrite of code in different language (e.g. shell script to perl, script to compiled code e.g. Perl or Python to C). 2.) Caching the results of similarity algorithm/rename detection tool (also Paul Jakma post), including remembering false positives and undetected renames, for efficiency. Calculated automatically parts might be throw-away. Sources of information: 1.) Manually entered information *at commit*, including *-rm, *-mv, *-cp like commands (which nobody likes) and systematized (pseudolanguage?) for copying and moving contents in the log messages. 2.) Semi-manual tools like the mentioned "renametool" of BK. 3.) Support from editor (remebering where copied and pasted, or cut and pasted fragment came from, and providing prefilled command to record contents moving ("renames") or prefilled commit log containing this information. Hard to get, probably most useful. 4.) Information from resolved merges and results of diagnosis (pickaxe like) tools, especially recording "renames" which were not detected, and removing "renames" which were detected falsily. Is that the place where I should provide code (patch) for testing the idea :) ? >> I wonder how common is (2) compared to (1)+(2) i.e. move to other dir >> and rename, old-dir/old-file.c to new-dir/new-subdir/new-file.c > > For example, one common case was a directory structure like > > .. > type-file1.c > type-file2.c > otherfiles.c > yet-more.c > .. > > being split up into a subdirectory > > .. > type/file1.c > type/file2.c > otherfiles.c > yet-more.c > .. > > (eg drivers/scsi/aic7xx-* being given a subdirectory of it's own, as > drivers/scsi/aic7xx/*). So the basename wouldn't stay the same, because it > contained some piece of data that became redundant with the move. Perhaps fast rename detection algorithm needs some smart similarity estimate for names, which would put more weight in the parts closer to basename, and would detect */type-file1.c and */type/file1.c as similar. -- Jakub Narebski Warsaw, Poland