From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 5/5] contrib: update stats/mailmap script
Date: Wed, 12 Dec 2012 06:41:41 -0500 [thread overview]
Message-ID: <20121212114141.GE18803@sigill.intra.peff.net> (raw)
In-Reply-To: <20121212113036.GB19625@sigill.intra.peff.net>
This version changes quite a few things:
1. The original parsed the mailmap file itself, and it did
it wrong (it did not understand entries with an extra
email key).
Instead, this version uses git's "%aE" and "%aN"
formats to have git perform the mapping, meaning we do
not have to read .mailmap at all, but still operate on
the current state that git sees (and it also works
properly from subdirs).
2. The original would find multiple names for an email,
but not the other way around.
This version can do either or both. If we find multiple
emails for a name, the resolution is less obvious than
the other way around. However, it can still be a
starting point for a human to investigate.
3. The original would order only by count, not by recency.
This version can do either. Combined with showing the
counts, it can be easier to decide how to resolve.
4. This version shows similar entries in a blank-delimited
stanza, which makes it more clear which options you are
picking from.
Signed-off-by: Jeff King <peff@peff.net>
---
contrib/stats/mailmap.pl | 108 ++++++++++++++++++++++++++++++-----------------
1 file changed, 70 insertions(+), 38 deletions(-)
rewrite contrib/stats/mailmap.pl (97%)
diff --git a/contrib/stats/mailmap.pl b/contrib/stats/mailmap.pl
dissimilarity index 97%
index 4b852e2..9513f5e 100755
--- a/contrib/stats/mailmap.pl
+++ b/contrib/stats/mailmap.pl
@@ -1,38 +1,70 @@
-#!/usr/bin/perl -w
-my %mailmap = ();
-open I, "<", ".mailmap";
-while (<I>) {
- chomp;
- next if /^#/;
- if (my ($author, $mail) = /^(.*?)\s+<(.+)>$/) {
- $mailmap{$mail} = $author;
- }
-}
-close I;
-
-my %mail2author = ();
-open I, "git log --pretty='format:%ae %an' |";
-while (<I>) {
- chomp;
- my ($mail, $author) = split(/\t/, $_);
- next if exists $mailmap{$mail};
- $mail2author{$mail} ||= {};
- $mail2author{$mail}{$author} ||= 0;
- $mail2author{$mail}{$author}++;
-}
-close I;
-
-while (my ($mail, $authorcount) = each %mail2author) {
- # %$authorcount is ($author => $count);
- # sort and show the names from the most frequent ones.
- my @names = (map { $_->[0] }
- sort { $b->[1] <=> $a->[1] }
- map { [$_, $authorcount->{$_}] }
- keys %$authorcount);
- if (1 < @names) {
- for (@names) {
- print "$_ <$mail>\n";
- }
- }
-}
-
+#!/usr/bin/perl
+
+use warnings 'all';
+use strict;
+use Getopt::Long;
+
+my $match_emails;
+my $match_names;
+my $order_by = 'count';
+Getopt::Long::Configure(qw(bundling));
+GetOptions(
+ 'emails|e!' => \$match_emails,
+ 'names|n!' => \$match_names,
+ 'count|c' => sub { $order_by = 'count' },
+ 'time|t' => sub { $order_by = 'stamp' },
+) or exit 1;
+$match_emails = 1 unless $match_names;
+
+my $email = {};
+my $name = {};
+
+open(my $fh, '-|', "git log --format='%at <%aE> %aN'");
+while(<$fh>) {
+ my ($t, $e, $n) = /(\S+) <(\S+)> (.*)/;
+ mark($email, $e, $n, $t);
+ mark($name, $n, $e, $t);
+}
+close($fh);
+
+if ($match_emails) {
+ foreach my $e (dups($email)) {
+ foreach my $n (vals($email->{$e})) {
+ show($n, $e, $email->{$e}->{$n});
+ }
+ print "\n";
+ }
+}
+if ($match_names) {
+ foreach my $n (dups($name)) {
+ foreach my $e (vals($name->{$n})) {
+ show($n, $e, $name->{$n}->{$e});
+ }
+ print "\n";
+ }
+}
+exit 0;
+
+sub mark {
+ my ($h, $k, $v, $t) = @_;
+ my $e = $h->{$k}->{$v} ||= { count => 0, stamp => 0 };
+ $e->{count}++;
+ $e->{stamp} = $t unless $t < $e->{stamp};
+}
+
+sub dups {
+ my $h = shift;
+ return grep { keys($h->{$_}) > 1 } keys($h);
+}
+
+sub vals {
+ my $h = shift;
+ return sort {
+ $h->{$b}->{$order_by} <=> $h->{$a}->{$order_by}
+ } keys($h);
+}
+
+sub show {
+ my ($n, $e, $h) = @_;
+ print "$n <$e> ($h->{$order_by})\n";
+}
--
1.8.0.2.4.g59402aa
prev parent reply other threads:[~2012-12-12 11:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-12 11:30 [PATCH 0/5] git.git .mailmap cleanups Jeff King
2012-12-12 11:36 ` [PATCH 1/5] .mailmap: match up some obvious names/emails Jeff King
2012-12-12 11:38 ` [PATCH 2/5] .mailmap: fix broken entry for Martin Langhoff Jeff King
2012-12-12 11:38 ` [PATCH 3/5] .mailmap: normalize emails for Jeff King Jeff King
2012-12-12 11:41 ` [PATCH 4/5] .mailmap: normalize emails for Linus Torvalds Jeff King
2012-12-12 11:41 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121212114141.GE18803@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).