From: Petr Baudis <pasky@suse.cz>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Ryan Anderson <ryan@michonline.com>, git@vger.kernel.org
Subject: Re: Following renames
Date: Mon, 27 Mar 2006 01:26:49 +0200 [thread overview]
Message-ID: <20060326232649.GV18185@pasky.or.cz> (raw)
In-Reply-To: <20060326191445.GQ18185@pasky.or.cz>
Dear diary, on Sun, Mar 26, 2006 at 09:14:45PM CEST, I got a letter
where Petr Baudis <pasky@suse.cz> said that...
> Dear diary, on Sun, Mar 26, 2006 at 06:33:13PM CEST, I got a letter
> where Linus Torvalds <torvalds@osdl.org> said that...
> > If you do
> >
> > git-rev-list --parents --remove-empty $REV -- $filename
> >
> > then you'll get the whole history for that filename. When it ends, you
> > know the file went away, and then you do basically _one_ "where the hell
> > did it go" thing.
> >
> > And yes, it's not git-ls-tree (unless you only want to follow pure
> > renames), it's actually one "git-diff-tree -M $lastrev". Then you just
> > continue with the new filename (and do another "git-rev-list" until you
> > hit the next rename).
>
> I wrote a long rant but then it all suddenly fit together and I have now
> an idea how to implement it reasonably elegantly.
So, this is what I have. Testing (I've gave it very little of that) and
thoughts welcome. It is probably pretty efficient, at least in terms of
fork()s it does only 2*N of them where N is the number of commits
containing interesting renames. Actually, this should be even possible
to reduce to N+1 if you do a single git-diff-tree call and multiplex
different git-rev-lists to it, but I'm too tired to do the trickery now.
It has 'cg' in the name but depends on no Cogito stuff; it should be in
fact possible to trivially put it to git-whatchanged in place of the
final pipeline (not that I'd be suggesting this to be done universally,
but perhaps git-whatchanged -f ...?). There are three downsides in this
regard:
(i) No -c support. I need the separate deltas coming out from
git-diff-tree but I think I can join them together pretty easily on my
own, except that I have problems with -c (see
<20060326102100.GF18185@pasky.or.cz>) so I'm not sure how exactly is it
supposed to behave.
(ii) Only --pretty=raw output. It shouldn't be hard to add the
reformatting code, but I'm personally not going to use it and kind of
lazy, so I'll let someone else do that, I guess. :-)
(iii) Raw deltas required. -p parsing support would be certainly useful
and possible, but see (ii).
To quickly see what it does, you can try it e.g. on the git-log.sh file
in the Git repository.
Thoughts? Opinions? Bugs? Patches?
Signed-off-by: Petr Baudis <pasky@suse.cz>
diff --git a/cg-Xfollowrenames b/cg-Xfollowrenames
new file mode 100755
index 0000000..fa5c552
--- /dev/null
+++ b/cg-Xfollowrenames
@@ -0,0 +1,246 @@
+#!/usr/bin/env perl
+#
+# git-rev-list | git-diff-tree --stdin following renames
+# Copyright (c) Petr Baudis, 2006
+# Uses bits of git-annotate.perl by Ryan Anderson.
+#
+# This script will efficiently show output as of the
+#
+# git-rev-list --remove-empty ARGS -- FILE... |
+# git-diff-tree -M -r -m --stdin --pretty=raw ARGS
+#
+# pipeline, except that it follows renames of individual files listed
+# in the FILE... set.
+#
+# Usage:
+#
+# cg-Xfollowrenames revlistargs -- difftreeargs -- revs -- files
+
+# TODO: Does not work on multiple files properly yet - most probably
+# (I didn't test it!). We want git-rev-list to stop traversing the history
+# when _any_ file disappears while now it probably stops traversing when
+# _all_ files disappear.
+
+use warnings;
+use strict;
+
+$| = 1;
+
+our (@revlist_args, @difftree_args, @revs, @files);
+
+{ # Load arguments
+ my @argp = (\@revlist_args, \@difftree_args, \@revs, \@files);
+ my $argi = 0;
+ for my $arg (@ARGV) {
+ if ($arg eq '--' and $argi < $#argp) {
+ $argi++;
+ next;
+ }
+ push(@{$argp[$argi]}, $arg);
+ }
+}
+
+
+# The heads we watch (sorted by commit time)
+our @heads;
+# Each head is: {
+# # Persistent for the whole line of development:
+# pipe => $pipe,
+# files => \@files, # to watch for
+#
+# id => $sha1, # useful actually only for debugging
+# time => $timestamp,
+# str => $prettyoutput,
+# parents => \@sha1s,
+#
+# # When the commit is processed, spawn these extra heads:
+# recurse => {$sha1id => \@files, ...},
+# }
+
+# To avoid printing duplicate commits
+# FIXME: Currently, we will not handle merge commits properly since
+# we hit them multiple times.
+our %commits;
+
+
+sub open_pipe($@) {
+ my ($stdin, @execlist) = @_;
+
+ my $pid = open my $kid, "-|";
+ defined $pid or die "Cannot fork: $!";
+
+ unless ($pid) {
+ if (defined $stdin) {
+ open STDIN, "<&", $stdin or die "Cannot dup(): $!";
+ }
+ exec @execlist;
+ die "Cannot exec @execlist: $!";
+ }
+
+ return $kid;
+}
+
+sub revlist($@) {
+ my ($rev, @files) = @_;
+ open_pipe(undef, "git-rev-list", "--remove-empty",
+ @revlist_args, $rev, "--", @files)
+ or die "Failed to exec git-rev-list: $!";
+}
+
+sub difftree($) {
+ my ($revlist) = @_;
+ open_pipe($revlist, "git-diff-tree", "-r", "-m", "--stdin", "-M",
+ "--pretty=raw", @difftree_args)
+ or die "Failed to exec git-diff-tree: $!";
+}
+
+sub revdiffpipe($@) {
+ my ($rev, @files) = @_;
+ my $pipe = difftree(revlist($rev, @files));
+}
+
+
+sub read_commit($$) {
+ my ($head, $tolerant) = @_;
+ my $pipe = $head->{'pipe'};
+ my $against;
+ my @oldset = @{$head->{'files'}};
+ my @newset;
+ my $rename;
+
+ # Load header
+ while (my $line = <$pipe>) {
+ $head->{'str'} .= $line;
+ chomp $line;
+ $line eq '' and goto header_loaded;
+
+ if ($line =~ /^diff-tree (\S+) \(from (root|\S+)\)/) {
+ $head->{'id'} = $1;
+ if (not $tolerant and $commits{$1}++) {
+ close $pipe;
+ return undef;
+ }
+ # The 'root' case is harmless since there'll be no renames.
+ $against = $2;
+ } elsif ($line =~ /^parent (\S+)/) {
+ push (@{$head->{'parents'}}, $1);
+ } elsif ($line =~ /^committer .*?> (\d+)/) {
+ $head->{'time'} = $1;
+ }
+ }
+ return undef;
+header_loaded:
+
+ # Load message
+ while (my $line = <$pipe>) {
+ $head->{'str'} .= $line;
+ chomp $line;
+ $line eq '' and goto message_loaded;
+ }
+ return undef;
+message_loaded:
+
+ # Load delta
+ while (my $line = <$pipe>) {
+ $head->{'str'} .= $line;
+ chomp $line;
+ $line eq '' and goto delta_loaded;
+
+ $line =~ /^:/ or return undef;
+ my ($info, $newfile, $oldfile) = split("\t", $line);
+ if ($info =~ /[RC]\d*$/) {
+ # Behold, a rename!
+ # (Or a copy, it's all the same for us.)
+ my $i;
+ for ($i = 0; $i <= $#oldset; $i++) {
+ $oldfile eq $oldset[$i] or next;
+ $rename = 1;
+ splice(@oldset, $i, 1);
+ push(@newset, $newfile);
+ last;
+ }
+ # In case of multiple candidates, follow
+ # all of them:
+ # (TODO: This might be a policy decision
+ # best left on the user.)
+ if ($i > $#oldset and grep { $oldfile eq $_ } @newset) {
+ $rename = 1;
+ push(@newset, $newfile);
+ }
+ } elsif ($info =~ /D$/) {
+ # Not weeding out deleted files might cause bizarre
+ # results when following multiple files since
+ # git-rev-list weeds them out too (probably?).
+ @oldset = grep { $newfile ne $_ } @oldset;
+ @{$head->{'files'}} = grep { $newfile ne $_ } @{$head->{'files'}};
+ }
+ }
+ $head->{'str'} .= "\n";
+delta_loaded:
+
+ if ($rename) {
+ $head->{'recurse'}->{$against} = [@newset, @oldset];
+ }
+ return 1;
+}
+
+sub load_commit($) {
+ my ($head) = @_;
+ $head->{'time'} = undef;
+ $head->{'str'} = '';
+ $head->{'parents'} = ();
+
+ read_commit($head, 0) or return undef;
+
+ # In case there was a merge, the commit will be multiple times
+ # here, each time with a different delta section. Read them all.
+ for (1 .. $#{$head->{'parents'}}) { # stupid vim syntax highlighting
+ read_commit($head, 1) or return undef;
+ }
+
+ return 1;
+}
+
+
+# Add head at the proper position
+sub add_head($) {
+ my ($head) = @_;
+ my $i;
+ for ($i = 0; $i <= $#heads; $i++) {
+ last if ($head->{'time'} > $heads[$i]->{'time'})
+ }
+ splice(@heads, $i, 0, $head);
+}
+
+# Create new head
+sub init_head($@) {
+ my ($rev, @files) = @_;
+ my $head = { files => \@files, 'pipe' => revdiffpipe($rev, @files) };
+ load_commit($head) or return;
+ add_head($head);
+}
+
+
+
+{ # Seed the heads list
+ for my $rev (@revs) {
+ init_head($rev, @files);
+ }
+}
+
+# Process the heads
+{
+ while (@heads) {
+ my $head = splice(@heads, 0, 1);
+
+ print $head->{'str'};
+
+ foreach my $parent (keys %{$head->{'recurse'}}) {
+ init_head($parent, @{$head->{'recurse'}->{$parent}});
+ }
+ $head->{'recurse'} = undef;
+
+ load_commit($head) or next;
+ add_head($head);
+ }
+}
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time. I think
I have forgotten this before.
next prev parent reply other threads:[~2006-03-26 23:26 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-03-26 1:49 Following renames Petr Baudis
2006-03-26 2:49 ` Junio C Hamano
2006-03-26 3:52 ` Jakub Narebski
2006-03-27 6:00 ` Paul Jakma
2006-03-26 10:52 ` Petr Baudis
2006-03-26 10:55 ` Petr Baudis
2006-03-26 16:08 ` Timo Hirvonen
2006-03-26 16:43 ` Linus Torvalds
2006-03-26 16:31 ` Jakub Narebski
2006-03-26 16:46 ` Linus Torvalds
2006-03-26 17:10 ` Jakub Narebski
2006-03-26 18:10 ` Linus Torvalds
2006-03-26 19:22 ` Marco Costalba
2006-03-26 22:23 ` Linus Torvalds
2006-03-27 5:47 ` Marco Costalba
2006-03-27 6:46 ` Junio C Hamano
2006-03-27 8:07 ` Linus Torvalds
2006-03-27 11:19 ` Marco Costalba
2006-03-27 11:30 ` Johannes Schindelin
2006-03-27 16:52 ` Linus Torvalds
2006-03-27 11:55 ` Marco Costalba
2006-03-27 12:27 ` Andreas Ericsson
2006-03-27 6:55 ` Jakub Narebski
2006-03-27 7:40 ` David Lang
2006-03-27 7:53 ` Jakub Narebski
2006-03-26 3:19 ` Linus Torvalds
2006-03-26 7:35 ` Ryan Anderson
2006-03-26 21:09 ` Petr Baudis
2006-03-26 10:07 ` Petr Baudis
2006-03-26 10:34 ` Fredrik Kuivinen
2006-03-26 16:33 ` Linus Torvalds
2006-03-26 19:14 ` Petr Baudis
2006-03-26 20:31 ` Petr Baudis
2006-03-26 22:22 ` Linus Torvalds
2006-03-26 22:31 ` Petr Baudis
2006-03-26 22:43 ` Junio C Hamano
2006-03-26 23:10 ` Linus Torvalds
2006-03-27 7:30 ` Junio C Hamano
2006-03-26 23:09 ` Linus Torvalds
2006-03-26 23:26 ` Petr Baudis [this message]
2006-03-27 21:59 ` Petr Baudis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060326232649.GV18185@pasky.or.cz \
--to=pasky@suse.cz \
--cc=git@vger.kernel.org \
--cc=ryan@michonline.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox