Re: Following renames - Jakub Narebski

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Narebski <jnareb@gmail.com>
To: git@vger.kernel.org
Subject: Re: Following renames
Date: Mon, 27 Mar 2006 08:55:03 +0200	[thread overview]
Message-ID: <e0827k$7tk$1@sea.gmane.org> (raw)
In-Reply-To: Pine.LNX.4.64.0603260947100.15714@g5.osdl.org

Linus Torvalds wrote:

> On Sun, 26 Mar 2006, Jakub Narebski wrote:
>> 
>> If (2) is common enough then discussed improvements to rename detection,
>> namely comparing basenames as a base for candidate selection is a good
>> idea.
> 
> BK had this "renametool" which got started automatically when you applied
> a patch that removed one or more files and added one or more files, so
> that you could then pair up the files manually.
[...]
> The thing is, the fast rename detection that is in the "next" branch
> really does a lot better, and it's fast enough.

I was thinking about the fast ename detection algorithm in "next" branch.

That is the question if recording additional (helper) information about
contents copying and moving like the mentioned "renametool" did is worth
the effort, both in coding it and from user's point of view. Or would
better contents copying and moving detection ("renames detection") for
whatchanged and similar suffice.

I am of opinion that voluntary information about contents moving and copying
in the commits would help.

Purposes:
1.) Record contents moving and similarity information which cannot or cannot
be easily calculated; see Paul Jakma response in this thread
  MessageID: <Pine.LNX.4.64.0603270642090.5276@sheen.jakma.org>
for example copying fragment of code, small fragment of the whole file,
creating documentation or header file from code, or code skeleton from
template, or rewrite of code in different language (e.g. shell script to
perl, script to compiled code e.g. Perl or Python to C).
2.) Caching the results of similarity algorithm/rename detection tool (also
Paul Jakma post), including remembering false positives and undetected
renames, for efficiency. Calculated automatically parts might be
throw-away.

Sources of information:
1.) Manually entered information *at commit*, including *-rm, *-mv, *-cp
like commands (which nobody likes) and systematized (pseudolanguage?) for
copying and moving contents in the log messages.
2.) Semi-manual tools like the mentioned "renametool" of BK.
3.) Support from editor (remebering where copied and pasted, or cut and
pasted fragment came from, and providing prefilled command to record
contents moving ("renames") or prefilled commit log containing this
information. Hard to get, probably most useful.
4.) Information from resolved merges and results of diagnosis (pickaxe like)
tools, especially recording "renames" which were not detected, and removing
"renames" which were detected falsily.  

Is that the place where I should provide code (patch) for testing the
idea :) ?

>> I wonder how common is (2) compared to (1)+(2) i.e. move to other dir
>> and rename, old-dir/old-file.c to new-dir/new-subdir/new-file.c
>
> For example, one common case was a directory structure like
> 
> ..
> type-file1.c
> type-file2.c
> otherfiles.c
> yet-more.c
> ..
> 
> being split up into a subdirectory
> 
> ..
> type/file1.c
> type/file2.c
> otherfiles.c
> yet-more.c
> ..
> 
> (eg drivers/scsi/aic7xx-* being given a subdirectory of it's own, as
> drivers/scsi/aic7xx/*). So the basename wouldn't stay the same, because it
> contained some piece of data that became redundant with the move.

Perhaps fast rename detection algorithm needs some smart similarity estimate
for names, which would put more weight in the parts closer to basename, and
would detect */type-file1.c and */type/file1.c as similar.

-- 
Jakub Narebski
Warsaw, Poland

next prev parent reply	other threads:[~2006-03-27  6:55 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-26  1:49 Following renames Petr Baudis
2006-03-26  2:49 ` Junio C Hamano
2006-03-26  3:52   ` Jakub Narebski
2006-03-27  6:00     ` Paul Jakma
2006-03-26 10:52   ` Petr Baudis
2006-03-26 10:55     ` Petr Baudis
2006-03-26 16:08   ` Timo Hirvonen
2006-03-26 16:43     ` Linus Torvalds
2006-03-26 16:31   ` Jakub Narebski
2006-03-26 16:46     ` Linus Torvalds
2006-03-26 17:10       ` Jakub Narebski
2006-03-26 18:10         ` Linus Torvalds
2006-03-26 19:22           ` Marco Costalba
2006-03-26 22:23             ` Linus Torvalds
2006-03-27  5:47               ` Marco Costalba
2006-03-27  6:46                 ` Junio C Hamano
2006-03-27  8:07                 ` Linus Torvalds
2006-03-27 11:19                   ` Marco Costalba
2006-03-27 11:30                     ` Johannes Schindelin
2006-03-27 16:52                     ` Linus Torvalds
2006-03-27 11:55                   ` Marco Costalba
2006-03-27 12:27                     ` Andreas Ericsson
2006-03-27  6:55           ` Jakub Narebski [this message]
2006-03-27  7:40             ` David Lang
2006-03-27  7:53               ` Jakub Narebski
2006-03-26  3:19 ` Linus Torvalds
2006-03-26  7:35   ` Ryan Anderson
2006-03-26 21:09     ` Petr Baudis
2006-03-26 10:07   ` Petr Baudis
2006-03-26 10:34     ` Fredrik Kuivinen
2006-03-26 16:33     ` Linus Torvalds
2006-03-26 19:14       ` Petr Baudis
2006-03-26 20:31         ` Petr Baudis
2006-03-26 22:22         ` Linus Torvalds
2006-03-26 22:31           ` Petr Baudis
2006-03-26 22:43             ` Junio C Hamano
2006-03-26 23:10               ` Linus Torvalds
2006-03-27  7:30                 ` Junio C Hamano
2006-03-26 23:09             ` Linus Torvalds
2006-03-26 23:26         ` Petr Baudis
2006-03-27 21:59           ` Petr Baudis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='e0827k$7tk$1@sea.gmane.org' \
    --to=jnareb@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.