git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "alturin marlinon" <alturin@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [SoC RFC] git statistics - information about commits
Date: Sat, 22 Mar 2008 12:40:16 -0700	[thread overview]
Message-ID: <7v3aqik0nz.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: 7vmyospgz7.fsf@gitster.siamese.dyndns.org

Junio C Hamano <gitster@pobox.com> writes:

> "alturin marlinon" <alturin@gmail.com> writes:
>
>> My plan for this summer is to create a 'statistics' feature for git.
>>
>> It would provide the following functionality:
>> * Show how many commits a specific user made.
>> * Show the (average) size of their changes (in lines for example).
>> * Show a 'total diff', that is, take the difference between the source
>> with, and without their changes, including its size (with for example
>> a -c switch).
>> * Show which contributors have contributed to the part of the code
>> that a patch modifies.
>> * Show what part of the code a maintainer is working on the most.
>> * Define an output format for this information that can be used by
>> other tools (such as gitk and git-web)
>> * (Optional) Integrate all this information with gitk and git-web.
>
> * Within reasonable amount of time suitable for interactive use, if you
>   intend it to work with gitk.
>
> What's the ballpack performance goal for e.g. post 2.6.12 kernel history
> which is about 85k commits, 3800 authors, 24k files?
>
> * Who contributed the most code that needed the many fix-ups on top?
>
> * Which part of the codebase had the most commits that had "oops, screwed
>   up, I am fixing this but this is a tricky code" fixes?

A couple more food-for-thought.

* Figure out which blocs of lines (not necessarily the whole files) relate
  to each other by noticing that they are often modified in the same
  commit.

  For example, if you find that the earlier part of a file A.c is updated
  often only by itself, but many other commits often modify the later part
  of A.c and another file B.c at the same time, it might suggest that a
  better reorganization of the code is to split the later part of A.c and
  move it to B.c.

* Who are early birds and who are late night owls?  Who are day-job
  contributors and who are weekenders?

* Identify "buggy commits" from history, without testing.  Zeroth order
  approximation is that the lines it introduced were later rewritten by
  other later commits, but the later ones are not necessarily fixes but
  can be enhancements, so you would need a way to tell which ones are
  "fixing commits" and which ones are not.  You may want to use project
  specific hints to help you doing this:

  - a log that matches /This(?: commit) fixes/ is likely to be a fix;

  - a commit that touches the same vicinity of another commit after a
    short interval is likely to be a fix;

  - a commit that is made on 'maint' branch by definition is a fix;

  - a commit that changes test_expect_failure to test_expect_success have
    a high probability that it itself is a fix, or it comes soon after a
    fix;

  Once you have "these are buggy commits, these are fixes" in place, the
  remaining would be "enhancements" and you can do interesting things.

  * For the integrator, can you spot a pattern like "what he accepts
    during weekdays tend to be buggier than what he applies during
    weekends"?

  * For each contributor, can you spot a pattern like "his late night
    commits are buggier than his early morning commits"?

  * Can you spot a pattern like "his changes to this area tends to be
    buggy but to that area tends to be very good"?

  * Who tends to introduce more bugs, who tends to do more fixes than
    enhancements?

  * Is their correlation between being a day-job contributor and being
    more fixer than bug-introducer?

  parent reply	other threads:[~2008-03-22 19:41 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-21  8:52 [SoC RFC] git statistics - information about commits alturin marlinon
2008-03-21  9:24 ` Junio C Hamano
2008-03-21 13:51   ` Martin Langhoff
2008-03-21 13:56     ` Johannes Schindelin
2008-03-22 19:40   ` Junio C Hamano [this message]
2008-03-23 14:07     ` alturin marlinon
2008-03-23 14:28       ` Johannes Schindelin
2008-03-23 15:41         ` alturin marlinon
2008-03-23 16:32           ` Johannes Schindelin
2008-03-23 17:31       ` Junio C Hamano
2008-03-23 21:32         ` alturin marlinon
2008-03-21 14:49 ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v3aqik0nz.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=alturin@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).