git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* A generalization of git blame
@ 2012-09-25 18:14 xmeng
  2012-09-25 22:19 ` Philip Oakley
  0 siblings, 1 reply; 7+ messages in thread
From: xmeng @ 2012-09-25 18:14 UTC (permalink / raw)
  To: git




Hi,

I have been developing my git tool (based on the git internal API) that
can find out all the commits that have changed a line for better
authorship.

The reason is for my binary code authorship research, I use machine
learning to classify code authorship. To produce training data, I start
with a source code repository with well-known author labels for each line
and then compiling the project into binary. So, I am able to know the
authorship for binary code and then apply some machine learning
techniques.

To get ground truth of authorship for each line, I start with git-blame.
But later I find this is not sufficient because the last commit may only
add comments or may only change a small part of the line, so that I
shouldn't attribute the line of code to the last author. Of course, there
must be some debates on who can be the representative of a line of code.
So what I would like to do is find out all the commits that have ever
changed a line, then I can try different approaches to summarize over all
these commits to produce my final authorship label (or even tuple).

I was wondering whether there have been similar debates over accurate
authorship in this community before and whether there may be other people
interested in this work.

Thanks

--Xiaozhu

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-09-27  6:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-25 18:14 A generalization of git blame xmeng
2012-09-25 22:19 ` Philip Oakley
2012-09-25 23:05   ` Junio C Hamano
2012-09-26 15:36     ` xmeng
2012-09-26 19:11       ` Junio C Hamano
2012-09-27  4:18         ` xmeng
2012-09-27  6:38           ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).