From: Jeff King <peff@peff.net>
To: Joseph Wakeling <joseph.wakeling@webdrake.net>
Cc: Sverre Rabbelier <srabbelier@gmail.com>, git@vger.kernel.org
Subject: Re: Effectively tracing project contributions with git
Date: Sat, 12 Sep 2009 14:59:40 -0400 [thread overview]
Message-ID: <20090912185940.GA21277@coredump.intra.peff.net> (raw)
In-Reply-To: <4AAB9459.3070809@webdrake.net>
On Sat, Sep 12, 2009 at 02:30:17PM +0200, Joseph Wakeling wrote:
> I've recently begun contributing to a FOSS project that has a problem --
> although it has extensive git logs (some being CVS/SVN imports) dating
> back over many years, there has not been maintenance of contribution
> records on a file-by-file basis.
>
> I'm trying to rectify this and track down who contributed what.
> Unfortunately while I'm used to basic operations with git, I don't know
> it well enough to be confident in how to go about tracing contributions
> in this way.
We can probably help you with the git side of things, but defining "who
contributed what" is kind of a hairy problem. You will need to define
exactly how you want to count contributions.
For example:
> 'git annotate' of course is a nice starting point but of limited use
> because every time someone tweaks a line (and there have been many such
> tweaks in the history of the project) the responsibility of the original
> contributor is replaced by that of the tweaker.
But often the tweaking of the line _does_ make it their own. One of the
metrics often discussed in git is "of the surviving lines in the code,
how many were authored by each person". Which really is the output of
"git blame" (or annotate, which is more or less the same thing). So
people who contribute code that needs a lot of changes or cleanup don't
get as much credit for that code, because their lines got tweaked later.
It's an OK metric if you assume that lines are a good atom of
contribution. That is, if I replace your line, then I remove everything
of value that you added and I should get credit. That is arguably not
the case with something like a style cleanup. Changing:
if(i = 0; i < n; i++)
to
if (i = 0; i < n; i++)
to fix whitespace should probably leave authorship with the original
line. But I don't know if you can determine programatically how
significant a change was. In the case of whitespace, "git blame" has an
option to ignore whitespace changes, which probably covers a large
portion of such "trivial change" cases.
> An alternative is to use gitk to trace the history of individual files
> (or paths, as gitk has it). The problem here is that files have been
> renamed, content has been moved about between different files and so on.
You can use rename detection via --follow and simply count the lines
changed (and by whom) in each commit. Which differs from "git blame"
strategy by counting every change as of value, even if it is a line that
doesn't survive.
But no, that won't handle the movement of some chunk of content from
one file to another. Only "git blame" really looks at code movement on a
smaller-than-file level.
> I'm just hoping that the git community can offer some good advice on
> this, to what extent the process of tracing contributions can be
> automated, and so on. I'm not expecting anyone to provide a solution
> for me, but suggestions and pointers in the possible right directions
> would be much appreciated.
I think it is less a git problem and more of a "how do you want to
define contribution" problem. The above is just my thinking about it for
a few minutes. Sverre Rabelier (cc'd) did a "git stats" GSoC project
last year, but I don't think I ever looked closely at the results or
what metrics he came up with. But that is probably a good direction to
look in.
-Peff
next prev parent reply other threads:[~2009-09-12 19:01 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-12 12:30 Effectively tracing project contributions with git Joseph Wakeling
2009-09-12 18:59 ` Jeff King [this message]
2009-09-12 19:03 ` Sverre Rabbelier
2009-09-13 0:10 ` Joseph Wakeling
2009-09-13 2:28 ` Theodore Tso
2009-09-13 9:24 ` Jeff King
2009-09-13 14:30 ` Joseph Wakeling
2009-09-13 0:03 ` Joseph Wakeling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090912185940.GA21277@coredump.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=joseph.wakeling@webdrake.net \
--cc=srabbelier@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).