git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Sverre Rabbelier" <alturin@gmail.com>
To: "Git Mailinglist" <git@vger.kernel.org>
Cc: "David Symonds" <dsymonds@gmail.com>
Subject: [GitStats] Bling bling or some statistics on the git.git repository
Date: Fri, 11 Jul 2008 23:04:31 +0200	[thread overview]
Message-ID: <bd6139dc0807111404y1d3dd48ao6d2903da4cd1aa56@mail.gmail.com> (raw)
In-Reply-To: <bd6139dc0807090621n308b0159n92d946c165d3a5dd@mail.gmail.com>

[I sent this mail earlier, but I think vger rejected it due to the
size of the attachments, I have uploaded them instead now, they can be
found at:
http://alturin.googlepages.com/activity_per_author.txt
http://alturin.googlepages.com/full_activity.txt ]

Heya,

Today I sat down and finished the activity aggregation code. Now it is
possible to generate the attached files with the following commands:
$ stats.py author -e --id=an > activity_per_author.txt
$ stats.py author -a  > full_activity.txt

The first one calculates the activity of all developers on a per-file
basis and dumps it into the file. The "--id=an" switch sets the
grouping field to "%an" (see man git-log), since the default (%ae) is
not that helpful for git.git (I don't know people by their e-mail, I
know them by their name). This was already possible (with "author
-d"), but before one had to pick a specific developer, now it will
show for _all_ developers.This is interesting stuff, although for a
huge project like git it's a bit much to take in. What is probably
more interesting is the second command, it shows how much change a
file has had in it's existence.
I temporarily modified the code to output %04d instead of %4d so that
I could do the following:
$ stats.py author -a  > full_activity_sortable.txt

A few highlights from the sorted file:

$ cat full_activity_sortable.txt | sort | tail -n 20
0170:  2721+  1060- = refs.c
0172:  4369+  2004- = builtin-pack-objects.c
0177:   345+   233- = GIT-VERSION-GEN
0178:  2855+  2121- = commit.c
0178:  4779+  2227- = fast-import.c
0179:  2677+  1400- = read-cache.c
0185:  5661+  2056- = builtin-apply.c
0186:  3269+  1255- = revision.c
0213:  1884+   460- = Documentation/config.txt
0232:  2257+  1621- = Documentation/git.txt
0236:  3990+  1991- = contrib/fast-import/git-p4
0281:  2753+  2220- = git.c
0333: 10259+  7150- = git-gui.sh
0338: 11337+  6187- = git-svn.perl
0338:  5755+  3159- = sha1_file.c
0397: 10230+  9599- = diff.c
0412: 23248+ 20257- = gitk
0432: 10580+  4502- = gitweb/gitweb.perl
0490:  1412+   619- = cache.h
0977:  4703+  2705- = Makefile

$ cat Makefile | wc -l
1482

For some reason you people can't seem to make up your mind about a
file that's not even 1500 lines in size ;). With almost a thousand
edits so far, it's been edited so many times it could've been written
from scratch three times (except that the amount of lines deleted
doesn't match). Also interesting to note is that the "external" files
such as gitweb, gitk, git-gui and git-svn make up the bulk of all
changes. The two contenders from the native git camp are diff.c and
sha1_file.c which both have a lot of LOC. This information is
interesting for GitStats as it might help determine which files have
had a lot of change, and which files are not touched a lot.

A note is in order here, this data was mined with "git log --num-stat"
so things like moving files and copying files are not accounted for. I
thought about using git-blame to gather this info before, but it is
not the right tool for the job. If anyone else has any idea's on what
would be better please let me know and I'll happily dig into it :).

--
Cheers,

Sverre Rabbelier

       reply	other threads:[~2008-07-11 21:05 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bd6139dc0807090621n308b0159n92d946c165d3a5dd@mail.gmail.com>
2008-07-11 21:04 ` Sverre Rabbelier [this message]
2008-07-11 21:22   ` [GitStats] Bling bling or some statistics on the git.git repository Johannes Schindelin
2008-07-11 21:39     ` Johannes Schindelin
2008-07-11 21:55       ` Johannes Schindelin
2008-07-11 22:05         ` Sverre Rabbelier
2008-07-11 22:10           ` Johannes Schindelin
2008-07-11 21:55       ` Sverre Rabbelier
2008-07-11 22:11         ` Johannes Schindelin
2008-07-11 22:14           ` Sverre Rabbelier
2008-07-11 23:02             ` Johannes Schindelin
2008-07-11 23:28               ` [PATCH] Add pretty format %aN which gives the author name, respecting .mailmap Johannes Schindelin
2008-07-11 23:30                 ` Sverre Rabbelier
2008-07-11 23:42                   ` Johannes Schindelin
2008-07-12  8:44                     ` Sverre Rabbelier
2008-07-11 21:52     ` [GitStats] Bling bling or some statistics on the git.git repository Sverre Rabbelier
2008-07-11 22:07       ` Johannes Schindelin
2008-07-11 22:50         ` Sverre Rabbelier
2008-07-11 23:33           ` Johannes Schindelin
2008-07-12  7:39             ` Sverre Rabbelier
2008-07-12 22:36             ` Sverre Rabbelier
2008-07-13  0:29               ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd6139dc0807111404y1d3dd48ao6d2903da4cd1aa56@mail.gmail.com \
    --to=alturin@gmail.com \
    --cc=dsymonds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sverre@rabbelier.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).