[SoC RFC] git statistics - information about commits

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [SoC RFC] git statistics - information about commits
@ 2008-03-21  8:52 alturin marlinon
  2008-03-21  9:24 ` Junio C Hamano
  2008-03-21 14:49 ` Jakub Narebski
  0 siblings, 2 replies; 12+ messages in thread
From: alturin marlinon @ 2008-03-21  8:52 UTC (permalink / raw)
  To: git

Heya,

With regard to Google Summer of Code's application deadline closing in
fast, I could really appreciate it to get some feedback on my
application so far. Especially on what parts of this idea would be
appreciated the most, and what parts could be done without.

I have been using git on several projects so far and am very happy
with the way it does things.
When looking at TortoiseSVN I noticed that it comes with a
'statistics' button that allows you to see which users have done what.
Even though it is limited in that it can only show how many commits
were made, I think this is an important feature to any VCS. I became
aware of the importance of statistics during a project at my
University (we had to use Subversion). During the project I noticed I
used these statistics to talk about fair distribution of work, and it
really helped to get everybody's nose pointing in the right direction.
Keeping that in mind, I tried to get such statistics from git. Git
provides a 'commits per user' feature under 'git shortlog -s -n -c
master' (note the order of the switches).

Consider Ohloh, an external tool that provides commit information
about contributors to a project.
It provides with a quick over of all contributors to a project, and
what their contribution has been so far. At the moment git does not
have anything similar, even though all the data needed for such an
analysis is present. Integration with gitk and git-web would allow the
data to be presented in a clear and informative way.

Another bit of interesting information would be 'who is maintaining
this code?'. Such information is especially useful when trying to
decide whom to send a copy of a patch. Consider that git already
contains the e-mail address of each developer that maintains a certain
bit of code (this information is included in each commit). If we now
find out who maintains the code that was changed in a commit
git-format-patch could automatically include them in the cc: field.
Similarly, one might be interested in what code a maintainer is
currently working on.

In a more broad sense it might be interesting to determine what part
of the code is most actively worked on, and what part of the code is
most stable. This is most interesting when deciding whether an API is
ready to be published. (If the API is changing a lot it might be
better to wait till it has stabilized.) This information could even be
used to find 'edit wars'. (In which a part of the code is changed over
and over again.)

My plan for this summer is to create a 'statistics' feature for git.

It would provide the following functionality:
* Show how many commits a specific user made.
* Show the (average) size of their changes (in lines for example).
* Show a 'total diff', that is, take the difference between the source
with, and without their changes, including its size (with for example
a -c switch).
* Show which contributors have contributed to the part of the code
that a patch modifies.
* Show what part of the code a maintainer is working on the most.
* Define an output format for this information that can be used by
other tools (such as gitk and git-web)
* (Optional) Integrate all this information with gitk and git-web.

Implementation would probably start out with python scripts since
those are easy to modify and combine with other scripts. As milestones
are reached in time, or ahead of time, attention could be shifted to
converting these to C and combining them with the rest of git. When
the other milestones are finished time could be spent on using the
newly added features in gitk and/or git-web.

To achieve all these milestones heavy usage can be made of existing
git commands. For example, getting the total amount of commits from a
maintainer can be achieved with the less-than-intuitive 'git shortlog
-s -n -c master', providing an alias to this command would make it
easier to use this functionality. Since other git commands will be
used a lot, performance may suffer as a result of piping/parsing
results from one command to another. When a feature is converted to C
later on attention could be given to directly passing the result from
one function to another.

Determining which users have been active on a file git's built in
'blame' functionality can be used. Git blame is very fast it would be
no problem to make extensive use of it in determining maintainer
focus. In a similar way it can be used to determine who has worked on
a file recently.

I am a Dutch student, doing my Bachelor at 'Delft University of
Technology'. I study 'Technische Informatica', Dutch for 'Computer
Science'. Even before starting fourth grade in Highschool I learned
C++ so that I could help out as a coder on a MUD (Multi User Dungeon).
In grade four through six I followed the optional "Informatica" (a
High school version of 'Computer Science') course. We learned Java and
SQL, nothing too difficult, but it got me wanting to learn more. I
learned to learn other languages on my own, probably valuable thing I
learned.

I have used git on numerous projects so far, although some of its more
elaborate features I am not yet familiar with. My motivation for this
particular idea I have described above. Enjoying working with git made
me want to work on it as my Google of Summer project. Knowing that an
original idea has more chance of being selected I spent a lot of time
looking for ways to improve git worth a GSoC of coding. I'm really
looking forward to coding for git and I think GSoC would be an awesome
introduction to it's codebase but also to contributing to a large
project.

Thank you for your time and attention,

Sverre Rabbelier
(SRabbelier on #git)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-21  8:52 [SoC RFC] git statistics - information about commits alturin marlinon
@ 2008-03-21  9:24 ` Junio C Hamano
  2008-03-21 13:51   ` Martin Langhoff
  2008-03-22 19:40   ` Junio C Hamano
  2008-03-21 14:49 ` Jakub Narebski
  1 sibling, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2008-03-21  9:24 UTC (permalink / raw)
  To: alturin marlinon; +Cc: git

"alturin marlinon" <alturin@gmail.com> writes:

> My plan for this summer is to create a 'statistics' feature for git.
>
> It would provide the following functionality:
> * Show how many commits a specific user made.
> * Show the (average) size of their changes (in lines for example).
> * Show a 'total diff', that is, take the difference between the source
> with, and without their changes, including its size (with for example
> a -c switch).
> * Show which contributors have contributed to the part of the code
> that a patch modifies.
> * Show what part of the code a maintainer is working on the most.
> * Define an output format for this information that can be used by
> other tools (such as gitk and git-web)
> * (Optional) Integrate all this information with gitk and git-web.

* Within reasonable amount of time suitable for interactive use, if you
  intend it to work with gitk.

What's the ballpack performance goal for e.g. post 2.6.12 kernel history
which is about 85k commits, 3800 authors, 24k files?

* Who contributed the most code that needed the many fix-ups on top?

* Which part of the codebase had the most commits that had "oops, screwed
  up, I am fixing this but this is a tricky code" fixes?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-21  9:24 ` Junio C Hamano
@ 2008-03-21 13:51   ` Martin Langhoff
  2008-03-21 13:56     ` Johannes Schindelin
  2008-03-22 19:40   ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Martin Langhoff @ 2008-03-21 13:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: alturin marlinon, git

On Fri, Mar 21, 2008 at 5:24 AM, Junio C Hamano <gitster@pobox.com> wrote:
>  * Which part of the codebase had the most commits that had "oops, screwed
>   up, I am fixing this but this is a tricky code" fixes?

How the hell do we spot that one? ;-)


martin
-- 
 martin.langhoff@gmai.com
 martin@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-21 13:51   ` Martin Langhoff
@ 2008-03-21 13:56     ` Johannes Schindelin
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Schindelin @ 2008-03-21 13:56 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Junio C Hamano, alturin marlinon, git

Hi,

On Fri, 21 Mar 2008, Martin Langhoff wrote:

> On Fri, Mar 21, 2008 at 5:24 AM, Junio C Hamano <gitster@pobox.com> wrote:
> >  * Which part of the codebase had the most commits that had "oops, screwed
> >   up, I am fixing this but this is a tricky code" fixes?
> 
> How the hell do we spot that one? ;-)

It would probably involve finding pieces of code that were changed 
multiple times, i.e. lines of code that did not survive for many commits.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-21  9:24 ` Junio C Hamano
  2008-03-21 13:51   ` Martin Langhoff
@ 2008-03-22 19:40   ` Junio C Hamano
  2008-03-23 14:07     ` alturin marlinon
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2008-03-22 19:40 UTC (permalink / raw)
  To: alturin marlinon; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> "alturin marlinon" <alturin@gmail.com> writes:
>
>> My plan for this summer is to create a 'statistics' feature for git.
>>
>> It would provide the following functionality:
>> * Show how many commits a specific user made.
>> * Show the (average) size of their changes (in lines for example).
>> * Show a 'total diff', that is, take the difference between the source
>> with, and without their changes, including its size (with for example
>> a -c switch).
>> * Show which contributors have contributed to the part of the code
>> that a patch modifies.
>> * Show what part of the code a maintainer is working on the most.
>> * Define an output format for this information that can be used by
>> other tools (such as gitk and git-web)
>> * (Optional) Integrate all this information with gitk and git-web.
>
> * Within reasonable amount of time suitable for interactive use, if you
>   intend it to work with gitk.
>
> What's the ballpack performance goal for e.g. post 2.6.12 kernel history
> which is about 85k commits, 3800 authors, 24k files?
>
> * Who contributed the most code that needed the many fix-ups on top?
>
> * Which part of the codebase had the most commits that had "oops, screwed
>   up, I am fixing this but this is a tricky code" fixes?

A couple more food-for-thought.

* Figure out which blocs of lines (not necessarily the whole files) relate
  to each other by noticing that they are often modified in the same
  commit.

  For example, if you find that the earlier part of a file A.c is updated
  often only by itself, but many other commits often modify the later part
  of A.c and another file B.c at the same time, it might suggest that a
  better reorganization of the code is to split the later part of A.c and
  move it to B.c.

* Who are early birds and who are late night owls?  Who are day-job
  contributors and who are weekenders?

* Identify "buggy commits" from history, without testing.  Zeroth order
  approximation is that the lines it introduced were later rewritten by
  other later commits, but the later ones are not necessarily fixes but
  can be enhancements, so you would need a way to tell which ones are
  "fixing commits" and which ones are not.  You may want to use project
  specific hints to help you doing this:

  - a log that matches /This(?: commit) fixes/ is likely to be a fix;

  - a commit that touches the same vicinity of another commit after a
    short interval is likely to be a fix;

  - a commit that is made on 'maint' branch by definition is a fix;

  - a commit that changes test_expect_failure to test_expect_success have
    a high probability that it itself is a fix, or it comes soon after a
    fix;

  Once you have "these are buggy commits, these are fixes" in place, the
  remaining would be "enhancements" and you can do interesting things.

  * For the integrator, can you spot a pattern like "what he accepts
    during weekdays tend to be buggier than what he applies during
    weekends"?

  * For each contributor, can you spot a pattern like "his late night
    commits are buggier than his early morning commits"?

  * Can you spot a pattern like "his changes to this area tends to be
    buggy but to that area tends to be very good"?

  * Who tends to introduce more bugs, who tends to do more fixes than
    enhancements?

  * Is their correlation between being a day-job contributor and being
    more fixer than bug-introducer?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-22 19:40   ` Junio C Hamano
@ 2008-03-23 14:07     ` alturin marlinon
  2008-03-23 14:28       ` Johannes Schindelin
  2008-03-23 17:31       ` Junio C Hamano
  0 siblings, 2 replies; 12+ messages in thread
From: alturin marlinon @ 2008-03-23 14:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sat, Mar 22, 2008 at 8:40 PM, Junio C Hamano <gitster@pobox.com> wrote:
>  * Figure out which blocs of lines (not necessarily the whole files) relate
>   to each other by noticing that they are often modified in the same
>   commit.

I've worked with directed graphs before (including writing my own
implementation) and have written an algorithm to detect cycles in a
graph.
I think that this could be done by creating an undirected weighted
graph of all files in a commit.
If we create a graph that records how many time two files are edited
in the same commit, the connection with the highest value would
indicate that two files are strongly related.
I'm not sure how this could be extrapolated to a section-based
approach but a solution to that problem will have to be written
anyway.
(As with the other featires I'll need to be able to keep track of
lines, the mechanism to be developed for that can be used here also.)

>  * Who are early birds and who are late night owls?  Who are day-job
>   contributors and who are weekenders?

Sounds like a 'fun feature', but how about timezones?
I'm not sure how commit times are recorded, in UTC, if so, does it
also record their timezone?

>  * Identify "buggy commits" from history, without testing.  Zeroth order
>   approximation is that the lines it introduced were later rewritten by
>   other later commits, but the later ones are not necessarily fixes but
>   can be enhancements, so you would need a way to tell which ones are
>   "fixing commits" and which ones are not.  You may want to use project
>   specific hints to help you doing this:

A feature like this would fit well with the other "buggy
commit/maintainer detection" but would require a lot of customization.
However, considering git already comes with a good customization
system it should still be feasible.

>   - a log that matches /This(?: commit) fixes/ is likely to be a fix;

Perhaps a regexp could be configured that marks a commit message as being a fix?

>   - a commit that touches the same vicinity of another commit after a
>     short interval is likely to be a fix;

Do you mean with "touches the same vicinity " something like "edits
code within 5 lines and within 5 commits of a commit x"?

>   - a commit that is made on 'maint' branch by definition is a fix;

Either a list of branches that are maintenance branches or a regexp
would be in place again I think.

>   - a commit that changes test_expect_failure to test_expect_success have
>     a high probability that it itself is a fix, or it comes soon after a fix;

I'm not sure I understand this but that's probably because I'm not yet
familiar with git's testing suite.
Do you think a general rule to identify changes like this can be made?

>   * For the integrator, can you spot a pattern like "what he accepts
>     during weekdays tend to be buggier than what he applies during
>     weekends"?

That would be interesting data, I think a nice graph could be made
easily, showing a column for weekdays (or one for each day) and a
column for weekends (or one for each day). Each column could then
represent the amount of buggy commits / day, or perhaps the ration
buggy/enhancements. This histogram could then go back several weeks to
give a better picture.
Perhaps a line style graph with two lines could be made, one for the
weekends and one for the weekdays, or seven lines, one for each day.
That way it would be easy to track if the integrator is getting better
at his job, or that he is perhaps having a bad/good period lately.

>   * For each contributor, can you spot a pattern like "his late night
>     commits are buggier than his early morning commits"?

This would be a 'fun feature' again I think, although it could of
course be used to decide that 'late night commits' of this contributor
should be examined more carefully.

>   * Can you spot a pattern like "his changes to this area tends to be
>     buggy but to that area tends to be very good"?

This would require connecting commits to area's, that is, track what
area's the buggy commits apply to. Maybe instead of tracking this on a
commit basis a per-file basis might be more interesting. That is,
don't just track if a commit is buggy, but also if a specific change
to a certain file is buggy. Doing so would allow for more careful
tracking of the area's a developer provides good work in.

>   * Who tends to introduce more bugs, who tends to do more fixes than
>     enhancements?

The former is an confronting yet interesting statistic, something that
could best be presented in a pie chart or such. The latter could be
shown as a bar chart in which each bar is divided into three parts
'buggy', 'fixes', and 'enhancements', with one bar per contributor.

>   * Is their correlation between being a day-job contributor and being
>     more fixer than bug-introducer?

This would require information about whether a contributor has a day
job, although this might be inferred from the commit times feature
mentioned earlier. It might be nice to have this feature to help
decide what kind of work to assign a contributor to (in the case that
contributors are assigned a task).

The question now though, is which of these features are feasible to do
in one GSoC project? That is, which one should be done first, as I
want to finishing this feature even if I can't finish it all in three
months. Should this be something that is decided in the application
already, or should I list all the features and then later on decide
(with the aid of the community) which ones to implement first.

Thank you for your suggestions, this is starting to be very interesting indeed!

Sverre

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-23 14:07     ` alturin marlinon
@ 2008-03-23 14:28       ` Johannes Schindelin
  2008-03-23 15:41         ` alturin marlinon
  2008-03-23 17:31       ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Johannes Schindelin @ 2008-03-23 14:28 UTC (permalink / raw)
  To: alturin marlinon; +Cc: Junio C Hamano, git

Hi,

On Sun, 23 Mar 2008, alturin marlinon wrote:

> On Sat, Mar 22, 2008 at 8:40 PM, Junio C Hamano <gitster@pobox.com> wrote:
> >  * Figure out which blocs of lines (not necessarily the whole files) relate
> >   to each other by noticing that they are often modified in the same
> >   commit.
> 
> I've worked with directed graphs before (including writing my own
> implementation) and have written an algorithm to detect cycles in a
> graph.
> I think that this could be done by creating an undirected weighted
> graph of all files in a commit.

I think you will have to go to the line level to achieve what Junio 
suggested.

> >  * Who are early birds and who are late night owls?  Who are day-job
> >   contributors and who are weekenders?
> 
> Sounds like a 'fun feature', but how about timezones?
> I'm not sure how commit times are recorded, in UTC, if so, does it
> also record their timezone?

Timezones are recorded as epoch (seconds since Jan 1, 1970) and timezone.  
So yes, you have that, _provided_ you trust the users to set up that thing 
correctly.

I, for one, do not change the timezone on my laptop, just because I happen 
to be travelling through the air at high altitude...

> >  * Identify "buggy commits" from history, without testing.  Zeroth order
> >   approximation is that the lines it introduced were later rewritten by
> >   other later commits, but the later ones are not necessarily fixes but
> >   can be enhancements, so you would need a way to tell which ones are
> >   "fixing commits" and which ones are not.  You may want to use project
> >   specific hints to help you doing this:
> 
> A feature like this would fit well with the other "buggy
> commit/maintainer detection" but would require a lot of customization.
> However, considering git already comes with a good customization
> system it should still be feasible.

Yes.  And it would be really interesting for me.  Until it shows that I am 
the biggest offender, of course.

> >   * For the integrator, can you spot a pattern like "what he accepts
> >     during weekdays tend to be buggier than what he applies during
> >     weekends"?
> 
> That would be interesting data, I think a nice graph could be made
> easily, showing a column for weekdays (or one for each day) and a
> column for weekends (or one for each day). Each column could then
> represent the amount of buggy commits / day, or perhaps the ration
> buggy/enhancements. This histogram could then go back several weeks to
> give a better picture.
> Perhaps a line style graph with two lines could be made, one for the
> weekends and one for the weekdays, or seven lines, one for each day.
> That way it would be easy to track if the integrator is getting better
> at his job, or that he is perhaps having a bad/good period lately.

I think the bigger problem is not visualising it, but finding what is 
buggy, and what not.

> The question now though, is which of these features are feasible to do 
> in one GSoC project? That is, which one should be done first, as I want 
> to finishing this feature even if I can't finish it all in three months. 
> Should this be something that is decided in the application already, or 
> should I list all the features and then later on decide (with the aid of 
> the community) which ones to implement first.

I think it can be vague about the order in which things will be 
implemented.  And the features which you think might be too complicated 
should be marked as such: "possible extension (which might not be finished 
within this project): <blabla>".

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-23 14:28       ` Johannes Schindelin
@ 2008-03-23 15:41         ` alturin marlinon
  2008-03-23 16:32           ` Johannes Schindelin
  0 siblings, 1 reply; 12+ messages in thread
From: alturin marlinon @ 2008-03-23 15:41 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git

On Sun, Mar 23, 2008 at 3:28 PM, Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>  I think you will have to go to the line level to achieve what Junio
>  suggested.

I'm not sure what you mean with "go to the line level"?
Do you mean that using a Graph is not possible?

>  Timezones are recorded as epoch (seconds since Jan 1, 1970) and timezone.
>  So yes, you have that, _provided_ you trust the users to set up that thing
>  correctly.

Yeah, I'll trust the user on this ;).
If the timezone is stored as well this should be easy to do, sweet.


>  > >  * Identify "buggy commits" from history, without testing.  Zeroth order
>  >
>  > A feature like this would fit well with the other "buggy
>  > commit/maintainer detection" but would require a lot of customization.
>  > However, considering git already comes with a good customization
>  > system it should still be feasible.
>
>  Yes.  And it would be really interesting for me.  Until it shows that I am
>  the biggest offender, of course.

Maybe we can put in an if-check for user "Johannes Schindelin"? ;)

>  I think the bigger problem is not visualising it, but finding what is
>  buggy, and what not.

Yes, ofcourse, I think I'll be busy mostly following lines across
commits and after that determining if a commit is buggy or not.

>  I think it can be vague about the order in which things will be
>  implemented.  And the features which you think might be too complicated
>  should be marked as such: "possible extension (which might not be finished
>  within this project): <blabla>".

Cool, I think I can start on a RC for my application then! (Maybe I
should'of tracked it with git, then I could tag it...)

Thanks for the feedback, I really want to come up with a superb application!

Sverre

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-23 15:41         ` alturin marlinon
@ 2008-03-23 16:32           ` Johannes Schindelin
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Schindelin @ 2008-03-23 16:32 UTC (permalink / raw)
  To: alturin marlinon; +Cc: Junio C Hamano, git

Hi,

On Sun, 23 Mar 2008, alturin marlinon wrote:

> On Sun, Mar 23, 2008 at 3:28 PM, Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >  I think you will have to go to the line level to achieve what Junio 
> >  suggested.
> 
> I'm not sure what you mean with "go to the line level"? Do you mean that 
> using a Graph is not possible?

IIUC you suggested having a graph of the files.  But I think you have to 
have a graph of file _parts_, i.e.

	git.c:111-137

which you can split even further should the need arise.

> >  > >  * Identify "buggy commits" from history, without testing.  
> >  > >    Zeroth order
> >  >
> >  > A feature like this would fit well with the other "buggy 
> >  > commit/maintainer detection" but would require a lot of 
> >  > customization. However, considering git already comes with a good 
> >  > customization system it should still be feasible.
> >
> >  Yes.  And it would be really interesting for me.  Until it shows that 
> >  I am the biggest offender, of course.
> 
> Maybe we can put in an if-check for user "Johannes Schindelin"? ;)

I thought about something like this, actually ;-)

> >  I think the bigger problem is not visualising it, but finding what is 
> >  buggy, and what not.
> 
> Yes, ofcourse, I think I'll be busy mostly following lines across 
> commits and after that determining if a commit is buggy or not.

But as Junio said, there are improvements, and even in the same commit 
series, you can touch the same _line_ multiple times, to make the patch 
more obvious.

See for example Linus' nice commit series regarding core.ignorecase.  
Very nicely done, very easy to understand, no buggy code.

> >  I think it can be vague about the order in which things will be 
> >  implemented.  And the features which you think might be too 
> >  complicated should be marked as such: "possible extension (which 
> >  might not be finished within this project): <blabla>".
> 
> Cool, I think I can start on a RC for my application then! (Maybe I 
> should'of tracked it with git, then I could tag it...)

Hehe.  You'll come around putting even your photo collection into git, 
like I do.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-23 14:07     ` alturin marlinon
  2008-03-23 14:28       ` Johannes Schindelin
@ 2008-03-23 17:31       ` Junio C Hamano
  2008-03-23 21:32         ` alturin marlinon
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2008-03-23 17:31 UTC (permalink / raw)
  To: alturin marlinon; +Cc: git

"alturin marlinon" <alturin@gmail.com> writes:

> On Sat, Mar 22, 2008 at 8:40 PM, Junio C Hamano <gitster@pobox.com> wrote:
> ...
> The question now though, is which of these features are feasible to do
> in one GSoC project? That is, which one should be done first, as I
> want to finishing this feature even if I can't finish it all in three
> months.

Hey, don't get me wrong.  Please do not start your thought with "which of
these features".  Proposed feature set should come from you.  It's your
project after all.

I was NOT giving you an instruction "You should do all of these" (I am not
your mentor), an opinion "These are all important" (I haven't thought
things through), nor criteria "Unless you do your feature this way, you
fail" (I am not GSoC admin to judge your application nor evaluate at the
end of project).  Nothing of that sort.  They are just random ideas, I
haven't even thought through the feasibility of, and/or possible approach
to solution for, some of them.

If you find any of them interesting, you are welcome to include them in
your target feature set.  Other uninteresting ones and unrealistic ones
you can discard without even commenting.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-23 17:31       ` Junio C Hamano
@ 2008-03-23 21:32         ` alturin marlinon
  0 siblings, 0 replies; 12+ messages in thread
From: alturin marlinon @ 2008-03-23 21:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, Mar 23, 2008 at 6:31 PM, Junio C Hamano <gitster@pobox.com> wrote:
>  Hey, don't get me wrong.  Please do not start your thought with "which of
>  these features".  Proposed feature set should come from you.  It's your
>  project after all.

Ah, perhaps I should have phrased it more like "which of these
features are of most interest to the community".

>  I was NOT giving you an instruction "You should do all of these" (I am not
>  your mentor), an opinion "These are all important" (I haven't thought
>  things through), nor criteria "Unless you do your feature this way, you
>  fail" (I am not GSoC admin to judge your application nor evaluate at the
>  end of project).  Nothing of that sort.  They are just random ideas, I
>  haven't even thought through the feasibility of, and/or possible approach
>  to solution for, some of them.

Even so, most of them are very interesting, although I agree that the
feasibility should perhaps be looked at more closely.

>  If you find any of them interesting, you are welcome to include them in
>  your target feature set.  Other uninteresting ones and unrealistic ones
>  you can discard without even commenting.

I think I will divide the features into subsets and list the
dependencies between them.
Then based upon 'popularity' an easy selection could be made.
Johannes said:

> I think it can be vague about the order in which things will be
> implemented.  And the features which you think might be too complicated
> should be marked as such: "possible extension (which might not be finished
> within this project): <blabla>".

Would such a list be allowed to include such a list of grouped
features which then can be selected later on?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [SoC RFC] git statistics - information about commits
  2008-03-21  8:52 [SoC RFC] git statistics - information about commits alturin marlinon
  2008-03-21  9:24 ` Junio C Hamano
@ 2008-03-21 14:49 ` Jakub Narebski
  1 sibling, 0 replies; 12+ messages in thread
From: Jakub Narebski @ 2008-03-21 14:49 UTC (permalink / raw)
  To: Alturin Marlinon; +Cc: git

"alturin marlinon" <alturin@gmail.com> writes:

> Consider Ohloh, an external tool that provides commit information
> about contributors to a project.

Ohloh currently doesn't make the statictics part of Ohloh code
available: only ohcount[1] is open source.

But there is other similar project, but is fully open source:
GitStat[2]. It is also geared towards using it from web browser;
nevertheles it is worth examining to avoid "reinventing the wheel".

References:
[1] http://labs.ohloh.net/ohcount
[2] http://tree.celinuxforum.org/gitstat/
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-03-23 21:33 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-21  8:52 [SoC RFC] git statistics - information about commits alturin marlinon
2008-03-21  9:24 ` Junio C Hamano
2008-03-21 13:51   ` Martin Langhoff
2008-03-21 13:56     ` Johannes Schindelin
2008-03-22 19:40   ` Junio C Hamano
2008-03-23 14:07     ` alturin marlinon
2008-03-23 14:28       ` Johannes Schindelin
2008-03-23 15:41         ` alturin marlinon
2008-03-23 16:32           ` Johannes Schindelin
2008-03-23 17:31       ` Junio C Hamano
2008-03-23 21:32         ` alturin marlinon
2008-03-21 14:49 ` Jakub Narebski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).