git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paul Jackson <pj@sgi.com>
To: Petr Baudis <pasky@ucw.cz>
Cc: mike.taht@timesys.com, git@vger.kernel.org
Subject: Re: [ANNOUNCE] Cogito-0.8 (former git-pasky, big changes!)
Date: Tue, 26 Apr 2005 10:40:22 -0700	[thread overview]
Message-ID: <20050426104022.0c53167d.pj@sgi.com> (raw)
In-Reply-To: <20050426122304.GD18971@pasky.ji.cz>

> some way to deal with renames sensibly. 

For a Linux kernel merge tool that I did inside SGI, I come to the
conclusion the following heuristics identified renames fairly
accurately, coming up with the same renames as a careful manual
examination.

The key heuristic was to consider one file A to be a rename of another B
if the size (number of lines) of the diff of A and B (diff -auw A B | wc
-l) is less than 50% of the combined size of A and B (cat A B | wc -l).

Only pathname pairs with the same basename were considered - I was
focused here on renames due to directory restructuring.  I'm not sure
now if this is a good assumption - but it sure saved some computation.

Any file with the string "Makefile" in its name had to be excluded from
consideration.

In case of multiple potential renames (say one file is copied to two
places, removing the original and modifying each copy a little), the
'best' rename was selected, where 'best' meant the lowest ratio of
diff 'diff -auw' size to combined 'cat' size.

The end result of the above was a fairly natural identification of
renames.  If say I moved kernel/cpuset.c to mm/cpuset.c and changed
it a little bit, the above heuristics would show a rename, plus
a few changes.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

  reply	other threads:[~2005-04-26 17:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-04-26  3:24 [ANNOUNCE] Cogito-0.8 (former git-pasky, big changes!) Petr Baudis
2005-04-26  3:29 ` Petr Baudis
2005-04-26  4:12 ` Mike Taht
2005-04-26 12:23   ` Petr Baudis
2005-04-26 17:40     ` Paul Jackson [this message]
2005-04-26 15:15   ` Martin Atukunda
2005-04-26  4:22 ` Benjamin Herrenschmidt
2005-04-26  4:58 ` Jeff Garzik
2005-04-26  5:18   ` Daniel Barkalow
2005-04-26  5:40     ` Al Viro
2005-04-26  6:03       ` [PATCH] Don't use commit-id in building Daniel Barkalow
2005-04-26  7:02 ` [ANNOUNCE] Cogito-0.8 (former git-pasky, big changes!) Philip Pokorny
2005-04-26 20:43   ` Petr Baudis
2005-04-26 13:36 ` Morten Welinder
2005-04-26 22:21 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050426104022.0c53167d.pj@sgi.com \
    --to=pj@sgi.com \
    --cc=git@vger.kernel.org \
    --cc=mike.taht@timesys.com \
    --cc=pasky@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).