git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@mit.edu>
To: Joseph Wakeling <joseph.wakeling@webdrake.net>
Cc: Sverre Rabbelier <srabbelier@gmail.com>,
	Jeff King <peff@peff.net>,
	git@vger.kernel.org
Subject: Re: Effectively tracing project contributions with git
Date: Sat, 12 Sep 2009 22:28:43 -0400	[thread overview]
Message-ID: <20090913022843.GB26588@mit.edu> (raw)
In-Reply-To: <4AAC3889.6030908@webdrake.net>

On Sun, Sep 13, 2009 at 02:10:49AM +0200, Joseph Wakeling wrote:
> 
> I don't see any solution that doesn't see me browsing diffs -- there's
> no metric that will solve the problem -- but if your stats work could
> help me get an output of the form 'here are all the diffs on file X by
> contributor Y in order of size, largest first' then I think it would
> help a LOT.

This will display all of the diffs on file (pathname) XXX by contributor YYY:

	git log -p --author=YYY XXX 

You might also find the diffstats useful:

	git log --stat --author=YYY XXX

Or if you want *only* the diffstats for the file in question, you might try:

	git log --stat --pretty=format: --author=YYY XXX | grep XXX

So the bottom line is git will allow you to extract quite a lot of
information.  You might need to do some perl- or shell- or python-
scripting to analyze or format the information, but the harder
question is determining exactly what question you want to ask.

Eliminating whitespace changes isn't hard (add the -b flag).  If you
want to eliminate variable renaming, that's harder since that requires
actually parsing the patch.  There are programs that will do that
(normally used by University professors to catch students cheating at
Programming 101 courses :-), but you'd need to do some shell (or perl
or python) scripting to splice them into the git invocations to
extract out the information.

Is there a particular reason why this is important to you?  Is it for
curiosity reasons; are you trying to build a case that you've
contacted all of the significant contributors for the purposes of
changing the license used on a file?  If it's the latter, what I'd
probably do is just simply collect everyone who has ever changed a
file (git log --format="%aN <%aE>" pathname/to/a/file | sort -u) and
try to get as many people as possible to agree to the license change.
For the ones who have _not_ agreed, or which you can not contact, you
can go back and just analyze their changes (git log --author=YYY) to
decide whether or not they are significant, and whether you need to
try extract hard to contact them, or in the worst case, find someone
to rewrite the parts of the file which they had modified in the past.

Or maybe you have some other reason for gathering said information.
Depending on what the high-level thing it is that you are trying to
do, there may be an easier or more elegant way to get the information
you are requesting.

						- Ted

  reply	other threads:[~2009-09-13  2:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-12 12:30 Effectively tracing project contributions with git Joseph Wakeling
2009-09-12 18:59 ` Jeff King
2009-09-12 19:03   ` Sverre Rabbelier
2009-09-13  0:10     ` Joseph Wakeling
2009-09-13  2:28       ` Theodore Tso [this message]
2009-09-13  9:24         ` Jeff King
2009-09-13 14:30         ` Joseph Wakeling
2009-09-13  0:03   ` Joseph Wakeling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090913022843.GB26588@mit.edu \
    --to=tytso@mit.edu \
    --cc=git@vger.kernel.org \
    --cc=joseph.wakeling@webdrake.net \
    --cc=peff@peff.net \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).