Re: Reporting bugs and bisection

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Reporting bugs and bisection
       [not found]         ` <48028830.6020703@earthlink.net>
@ 2008-04-13 23:51           ` david
  2008-04-14  0:36             ` Jakub Narebski
  2008-04-14  4:39             ` Willy Tarreau
  0 siblings, 2 replies; 66+ messages in thread
From: david @ 2008-04-13 23:51 UTC (permalink / raw)
  To: Stephen Clark
  Cc: Evgeniy Polyakov, Rafael J. Wysocki, Andrew Morton, Willy Tarreau,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

cross-posted to git for the suggestion at the bottom

On Sun, 13 Apr 2008, Stephen Clark wrote:

> Evgeniy Polyakov wrote:
>> On Sun, Apr 13, 2008 at 10:33:49PM +0200, Rafael J. Wysocki (rjw@sisk.pl) 
>> wrote:
>>> Things like this are very disappointing and have a very negative impact on 
>>> bug
>>> reporters.  We should do our best to avoid them.
>> 
>> Shit happens. This is a matter of either bug report or those who were in
>> the copy list. There are different people and different situations, in
>> which they do not reply.
>> 
> Well less shit would happen if developers would take the time to at least 
> test their patches before they were submitted. It like we will just have the 
> poor user do our testing for us. What kind of testing do developers do. I 
> been a linux user and have followed the LKML for a number of years and have 
> yet to see
> any test plans for any submitted patches.

I've been reading LKML for 11 years now, I've tested kernels and reported 
a few bugs along the way.

the expectation is that the submitter should have tested the patches 
before submitting them (where hardware allows). but that "where hardware 
allows" is a big problem. so many issues are dependant on hardwre that 
it's not possible to test everything.

there are people who download, compile and test the tree nightly (with 
farms of machines to test different configs), but they can't catch 
everything.

expecting the patches to be tested to the point where there are no bugs is 
unreasonable.

bisecting is a very powerful tool, but I do think that sometimes 
developers lean on it a bit much. taking the attitude (as some have) that 
'if the reporter can't be bothered to do a bisection I can't be bothered 
to deal with the bug' is going way too far.

if a bug can be reproduced reliably on a test system then bisecting it may 
reveal the patch that introduced or unmasked the bug (assuming that there 
aren't other problems along the way), but if the bug takes a long time to 
show up after a boot, or only happens under production loads, bisecting it 
may not be possible. that doesn't mean that the bug isn't real, it just 
means that the user is going to have to stick with an old version until 
there is a solution or work-around.

even in the hard-to-test situations, the reporter is usually able to test 
a few fixes, but there's a big difference between going to management and 
saying "the kernel guru's think that this will help, can we test it this 
weekend" 2-3 times and doing a bisection that will take 10-15 cycles to 
find the problem.

it's very reasonable to ask the reporter if they can bisect the problem, 
but if they say that they can't, declaring that they are out of luck is 
not reasonable, it just means that it's going to take more thinking to 
find the problem instead of being able to let the mechanical bisect 
process narrow things down for you. it may mean that the developer will 
need to make a patch to instrament an old (working) kernel that has 
minimal impact on that kernel so that the reporter can run this to gather 
information about what the load is so that the developer can try to 
simulate it on a new (non-working) kernel

in theory everyone has a test environment that lets them simulate 
everything in their production envrionment. in practice this is only true 
at the very low end (where it's easy to do) and the very high end (where 
it's so critical that it's done no matter how much it costs). Everyone 
else has a test environment that can test most things, but not everything. 
As such when they run into a problem they may not be able to do lots of 
essentially random testing.

elsewhere in this thread someone said that the pre-git way was to do a 
manual bisect where the developer would send patches backing out specific 
changes to find the problem. one big difference between tat and bisecting 
the problem is that the manual process was focused on the changes in the 
area that is suspected of causing the problem, while the git bisect 
process goes after all changes. this makes it much more likely that the 
tester will run into unrelated problems along the way.

I wonder if it would be possible to make a variation of git bisect that 
only looked at a subset of the tree when picking bisect points (if you are 
looking for a e1000 bug, testing bisect points that haven't changed that 
driver won't help you for example). If this can be done it would speed up 
the reporters efforts, but will require more assistance from the 
developers (who would need to tell the reporters what subtrees to test) so 
it's a tradeoff of efficiancy vs simplicity.

David Lang

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-13 23:51           ` Reporting bugs and bisection david
@ 2008-04-14  0:36             ` Jakub Narebski
  2008-04-14  4:39             ` Willy Tarreau
  1 sibling, 0 replies; 66+ messages in thread
From: Jakub Narebski @ 2008-04-14  0:36 UTC (permalink / raw)
  To: david
  Cc: Stephen Clark, Evgeniy Polyakov, Rafael J. Wysocki, Andrew Morton,
	Willy Tarreau, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

david@lang.hm writes:

> cross-posted to git for the suggestion at the bottom

[...]

> Elsewhere in this thread someone said that the pre-git way was to do a
> manual bisect where the developer would send patches backing out
> specific changes to find the problem. one big difference between that
> and bisecting the problem is that the manual process was focused on
> the changes in the area that is suspected of causing the problem,
> while the git bisect process goes after all changes. this makes it
> much more likely that the tester will run into unrelated problems
> along the way.
> 
> I wonder if it would be possible to make a variation of git bisect
> that only looked at a subset of the tree when picking bisect points
> (if you are looking for a e1000 bug, testing bisect points that
> haven't changed that driver won't help you for example). If this can
> be done it would speed up the reporters efforts, but will require more
> assistance from the developers (who would need to tell the reporters
> what subtrees to test) so it's a tradeoff of efficiancy vs simplicity.

Errr... the synopisis of git-bisect contains the following:

 git bisect start [<bad> [<good>...]] [--] [<paths>...]

so you can limit bisection to commits affecting specified subsystem.

P.S. Unfortunately git currently doesn't deal with directory renames,
so if there was sime big code restructuring one has to provide all
historic pathspecs.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-13 23:51           ` Reporting bugs and bisection david
  2008-04-14  0:36             ` Jakub Narebski
@ 2008-04-14  4:39             ` Willy Tarreau
  2008-04-14  5:39               ` Al Viro
  1 sibling, 1 reply; 66+ messages in thread
From: Willy Tarreau @ 2008-04-14  4:39 UTC (permalink / raw)
  To: david
  Cc: Stephen Clark, Evgeniy Polyakov, Rafael J. Wysocki, Andrew Morton,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

On Sun, Apr 13, 2008 at 04:51:34PM -0700, david@lang.hm wrote:
> cross-posted to git for the suggestion at the bottom
> 
> On Sun, 13 Apr 2008, Stephen Clark wrote:
> 
> >Evgeniy Polyakov wrote:
> >>On Sun, Apr 13, 2008 at 10:33:49PM +0200, Rafael J. Wysocki (rjw@sisk.pl) 
> >>wrote:
> >>>Things like this are very disappointing and have a very negative impact 
> >>>on bug
> >>>reporters.  We should do our best to avoid them.
> >>
> >>Shit happens. This is a matter of either bug report or those who were in
> >>the copy list. There are different people and different situations, in
> >>which they do not reply.
> >>
> >Well less shit would happen if developers would take the time to at least 
> >test their patches before they were submitted. It like we will just have 
> >the poor user do our testing for us. What kind of testing do developers 
> >do. I been a linux user and have followed the LKML for a number of years 
> >and have yet to see
> >any test plans for any submitted patches.
> 
> I've been reading LKML for 11 years now, I've tested kernels and reported 
> a few bugs along the way.
> 
> the expectation is that the submitter should have tested the patches 
> before submitting them (where hardware allows). but that "where hardware 
> allows" is a big problem. so many issues are dependant on hardwre that 
> it's not possible to test everything.
> 
> there are people who download, compile and test the tree nightly (with 
> farms of machines to test different configs), but they can't catch 
> everything.
> 
> expecting the patches to be tested to the point where there are no bugs is 
> unreasonable.
[...]

Agreed. The difficulty is that only the developer knows how confident
he is in his code. Even the subsystem maintainer does not know, which
is the real issue since as long as the code is not identified, he does
not know whom to ping.

And I think that it might help if we could add a "Trust" rating to the
patches we submit, similarly to "Tested-By" or "Signed-off-by". We could
use 1 to 5. Basically, when the patch was completed at 3am and just builds,
it's more likely 1/5. When it has been stressed for 1 week, it would be
4/5. 5/5 would only be used in backports of known working code, for some
wide-used external patches, or for trivial patches (eg: doc/whitespace
fixes). The goal would clearly not be to just trust patches with a high
rate (since they might break when associated with others), but for the
subsystem maintainer to quickly check if there are some of them the
author does not 100% trust, in which case he could ping the author to
check if his patch *may* cause the reported problem.

What makes this rating system delicate is that the rate cannot be changed
afterwards. But after all, that's not much of a problem. A bug may very
well reveal itself one year after the code was merged, so it's really the
developer's estimation which matters.

For this to be efficiently used, we would need git-commit to accept a
new "-T <rating>" argument with the following possible values :

   0: untested (default)
   1: builds
   2: seems to be working
   3: passed basic non-regression tests
   4: survived stress testing at the developer's
   5: known to be working for a long time somewhere else

I'm sure many people would find this useless (or in fact reject the
idea because it would show that most code will be rated 1 or 2),
but I really think it can help subsystem maintainers make the relation
between a reported bug and a possible submitter.

Willy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  4:39             ` Willy Tarreau
@ 2008-04-14  5:39               ` Al Viro
  2008-04-14  6:24                 ` Andrew Morton
  0 siblings, 1 reply; 66+ messages in thread
From: Al Viro @ 2008-04-14  5:39 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: david, Stephen Clark, Evgeniy Polyakov, Rafael J. Wysocki,
	Andrew Morton, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On Mon, Apr 14, 2008 at 06:39:39AM +0200, Willy Tarreau wrote:

[snip]

> I'm sure many people would find this useless (or in fact reject the
> idea because it would show that most code will be rated 1 or 2),
> but I really think it can help subsystem maintainers make the relation
> between a reported bug and a possible submitter.

I have a related proposal: let us require all patches to be stamped
with Discordian *and* Eternal September dates.  In triplicate.  While
we are at it, why don't we introduce new mandatory headers like, say
it,

X-checkpatch: {Yes,No}
X-checkpatch-why-not: <string>
X-pointless: <number from 1 to 69, going from "1: does something useful" all
the way to "68: aligns right ends of lines in comments">
X-arbitrary-rules-added-to-CodingStyle: <number> (should be present if
and only if X-pointless: 69 is present).

Come to think of that, we clearly need a new file in Documentation/*,
documenting such headers.  Why don't we organize a subcommittee^Wnew maillist
devoted to that?  That would provide another entry route for contributors,
lowering the overall entry barriers even further...

Seriously, looks like Andi is right - we've got ourselves a developing
beaurocracy.  As in "more and more ways of generating activity without
doing anything even remotely useful".  Complete with tendency to operate in
the ways that make sense only to beaurocracy in question and an ever-growing
set of bylaws...

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  5:39               ` Al Viro
@ 2008-04-14  6:24                 ` Andrew Morton
  2008-04-14  6:39                   ` David Miller
                                     ` (2 more replies)
  0 siblings, 3 replies; 66+ messages in thread
From: Andrew Morton @ 2008-04-14  6:24 UTC (permalink / raw)
  To: Al Viro
  Cc: Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On Mon, 14 Apr 2008 06:39:43 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Mon, Apr 14, 2008 at 06:39:39AM +0200, Willy Tarreau wrote:
> 
> [snip]
> 
> > I'm sure many people would find this useless (or in fact reject the
> > idea because it would show that most code will be rated 1 or 2),
> > but I really think it can help subsystem maintainers make the relation
> > between a reported bug and a possible submitter.
> 
> I have a related proposal: let us require all patches to be stamped
> with Discordian *and* Eternal September dates.  In triplicate.  While
> we are at it, why don't we introduce new mandatory headers like, say
> it,
> 
> X-checkpatch: {Yes,No}
> X-checkpatch-why-not: <string>
> X-pointless: <number from 1 to 69, going from "1: does something useful" all
> the way to "68: aligns right ends of lines in comments">
> X-arbitrary-rules-added-to-CodingStyle: <number> (should be present if
> and only if X-pointless: 69 is present).
> 
> Come to think of that, we clearly need a new file in Documentation/*,
> documenting such headers.  Why don't we organize a subcommittee^Wnew maillist
> devoted to that?  That would provide another entry route for contributors,
> lowering the overall entry barriers even further...
> 

None of the above was particularly useful.

> 
> Seriously, looks like Andi is right - we've got ourselves a developing
> beaurocracy.  As in "more and more ways of generating activity without
> doing anything even remotely useful".  Complete with tendency to operate in
> the ways that make sense only to beaurocracy in question and an ever-growing
> set of bylaws...

No.  The problem we're discussing here is the apparently-large number of
bugs which are in the kernel, the apparently-large number of new bugs which
we're adding to the kernel, and our apparent tardiness in addressing them.

Do you agree with these impressions, or not?

If you do agree, what would you propose we do about it?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  6:24                 ` Andrew Morton
@ 2008-04-14  6:39                   ` David Miller
  2008-04-14  6:43                     ` David Miller
  2008-04-14  7:23                   ` Al Viro
  2008-04-14 19:13                   ` Rene Herman
  2 siblings, 1 reply; 66+ messages in thread
From: David Miller @ 2008-04-14  6:39 UTC (permalink / raw)
  To: akpm
  Cc: viro, w, david, sclark46, johnpol, rjw, tilman, Valdis.Kletnieks,
	lkml, jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

From: Andrew Morton <akpm@linux-foundation.org>
Date: Sun, 13 Apr 2008 23:24:41 -0700

> Do you agree with these impressions, or not?

I think things are improving.

I wrote or merged in ~10 bugs in the last hour, for example.

And I also agree with Al's point, which was embedded in his humorous
and obviously sarcastic suggestions, in that adding beurocracy isn't
the answer.  We already have too much and it scares developers away.

Sure you don't want crap getting into the tree (for too long), but it
is important to be careful to define crap properly.  For example,
inundating patch submitters with more requirements, especially ones
involving automatons like checkpatch, is in the end bad.

We can improve the quality of stuff going in and be flexible at the
same time.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  6:39                   ` David Miller
@ 2008-04-14  6:43                     ` David Miller
  0 siblings, 0 replies; 66+ messages in thread
From: David Miller @ 2008-04-14  6:43 UTC (permalink / raw)
  To: akpm
  Cc: viro, w, david, sclark46, johnpol, rjw, tilman, Valdis.Kletnieks,
	lkml, jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

From: David Miller <davem@davemloft.net>
Date: Sun, 13 Apr 2008 23:39:59 -0700 (PDT)

> I wrote or merged in ~10 bugs in the last hour, for example.

Bug fixes!  I meant "fixes" I swear!

That's quite a Freudian slip if I ever saw one.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  6:24                 ` Andrew Morton
  2008-04-14  6:39                   ` David Miller
@ 2008-04-14  7:23                   ` Al Viro
  2008-04-14  7:43                     ` Al Viro
                                       ` (2 more replies)
  2008-04-14 19:13                   ` Rene Herman
  2 siblings, 3 replies; 66+ messages in thread
From: Al Viro @ 2008-04-14  7:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On Sun, Apr 13, 2008 at 11:24:41PM -0700, Andrew Morton wrote:

> No.  The problem we're discussing here is the apparently-large number of
> bugs which are in the kernel, the apparently-large number of new bugs which
> we're adding to the kernel, and our apparent tardiness in addressing them.
> 
> Do you agree with these impressions, or not?
> 
> If you do agree, what would you propose we do about it?

In addition to obvious "we need testing and something better than bugzilla
to keep track of bugs"?  Real review of code in tree and patches getting into
the tree.

And the latter part _must_ be done on each entry point.  Any git tree
that acts as injection point really needs a working mechanism of some
sort that would do that; afterwards it's too late, since review of
the stuff getting into mainline on a massive merge is sadly impractical.

I don't know any formal mechanism that could take care of that; no more
than making sure that no backdoors are injected into the tree.  It really
has to be a matter of trust for tree maintainers and community around
the subsystem.

Git is damn good at killing the merge bottleneck.  Too good, since it
hides the review bottleneck.  And we get equivalents of self-selected
communities that had been problem for "here's our CVS, here's monthly
dump from it, apply" kind of setups.  It _is_ better, since one can
get to commit history (modulo interesting issues with merge nodes and
conflict resolution).  But in practice it's not good enough - the patches
going in during a merge (especially for a tree that collects from
secondaries) are not visible enough.  And it's too late at that point,
since one has to do something monumentally ugly to get Linus revert
a large merge.  On the scale of Great IDE Mess in 2.5...

linux-next might help with the last part, but I don't think it really
deals with the first one.  It certainly helps to some extent, but...

We need higher S/N on l-k.  We need people looking into the subsystem
trees as those grow and causing a stench when bad things are found,
with design issues getting brought to l-k if nothing else helps.  We
need tree maintainers understanding that review, including out-of-community
one, is needed (the need of testing is generally better understood - I
_hope_).

We need more people reading the fscking source.  Subsystem by subsystem.
Without assumption that code is not broken.  With mechanism collating
the questions asked and answers given.  Ideally we need growing documentation
of core subsystems and data structures, with explicit goal of helping
reviewers new to an area to find their way around it.  And yes, I'm
guilty of procrastinating on that - several half-finished pieces on
VFS-related stuff are sitting locally ;-/

We need gregkh to get real and stop assuming that two Signed-off-by are
equivalent to "reviewed at least twice", while we are at it ;-)

We need people to realize that warnings are useful as triage tools -
not as "Ug see warning.  Warning bad.  Ug fix that line.  Warning go away.
Ug changeset count grow.  Ug happy.", but as machine-assisted part of
finding confused areas of code.  With human combining signals from
different warnings to get statistically useful triage strategies (note
that aforementioned making gcc/sparse/whatnot to STFU by local change
has a lovely potential of distorting those signals and actually _hiding_
crap code).

Maybe we need a list a-la linux-arch for tree maintainers to coordinate
stuff - obviously open not only for those.

We really need to get around to doing triage of remaining stuff in -mm,
BTW - again, guilty for not getting through such on VFS-related stuff
in there.  Hopefully linux-next trees will eventually vacuum most of the
pile in...

As for the bug that got this thread started...  I'd say that asking to
bisect was reasonable in this particular case.  The following DSW mixed
into the thread very soon went the way of all DSW (OK, it hadn't godwinated
yet, at least in the parts I've seen, so there's still way to go, but...)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  7:23                   ` Al Viro
@ 2008-04-14  7:43                     ` Al Viro
  2008-04-14  8:04                     ` Andrew Morton
  2008-04-14 15:54                     ` James Morris
  2 siblings, 0 replies; 66+ messages in thread
From: Al Viro @ 2008-04-14  7:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On Mon, Apr 14, 2008 at 08:23:28AM +0100, Al Viro wrote:

> And the latter part _must_ be done on each entry point.  Any git tree
> that acts as injection point really needs a working mechanism of some
> sort that would do that; afterwards it's too late, since review of
> the stuff getting into mainline on a massive merge is sadly impractical.

PS: net/* is actually pretty sane in that respect - the huge volume
being what it is, of course, but still, my impression is that it's
pretty far from the worst sources of crap.  OTOH, I might be missing
secondary tree problems - e.g. net/sctp is much worse off in that
respect, AFAICT; there might very well be more of such areas.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  7:23                   ` Al Viro
  2008-04-14  7:43                     ` Al Viro
@ 2008-04-14  8:04                     ` Andrew Morton
  2008-04-14  8:30                       ` David Miller
                                         ` (2 more replies)
  2008-04-14 15:54                     ` James Morris
  2 siblings, 3 replies; 66+ messages in thread
From: Andrew Morton @ 2008-04-14  8:04 UTC (permalink / raw)
  To: Al Viro
  Cc: Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On Mon, 14 Apr 2008 08:23:28 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:

> On Sun, Apr 13, 2008 at 11:24:41PM -0700, Andrew Morton wrote:
> 
> > No.  The problem we're discussing here is the apparently-large number of
> > bugs which are in the kernel, the apparently-large number of new bugs which
> > we're adding to the kernel, and our apparent tardiness in addressing them.
> > 
> > Do you agree with these impressions, or not?
> > 
> > If you do agree, what would you propose we do about it?
> 
> In addition to obvious "we need testing and something better than bugzilla
> to keep track of bugs"?

Swapping out bugzilla for something else wouldn't help.  We'd end up with
lots of people ignoring a good bug tracking system just like they were
ignoring a bad one.

(And I don't think developers and maintainers _should_ spend time mucking
in bug-tracking systems.  They should have helpers who do all the
triaging/tracking/routing/closing work for them, and then provide other
developers with the results, letting them know what they should be spending
time on.  But there's a manpower problem).

>  Real review of code in tree and patches getting into
> the tree.
> 
> And the latter part _must_ be done on each entry point.  Any git tree
> that acts as injection point really needs a working mechanism of some
> sort that would do that; afterwards it's too late, since review of
> the stuff getting into mainline on a massive merge is sadly impractical.
> 
> I don't know any formal mechanism that could take care of that; no more
> than making sure that no backdoors are injected into the tree.  It really
> has to be a matter of trust for tree maintainers and community around
> the subsystem.
> 
> Git is damn good at killing the merge bottleneck.  Too good, since it
> hides the review bottleneck.  And we get equivalents of self-selected
> communities that had been problem for "here's our CVS, here's monthly
> dump from it, apply" kind of setups.  It _is_ better, since one can
> get to commit history (modulo interesting issues with merge nodes and
> conflict resolution).  But in practice it's not good enough - the patches
> going in during a merge (especially for a tree that collects from
> secondaries) are not visible enough.  And it's too late at that point,
> since one has to do something monumentally ugly to get Linus revert
> a large merge.  On the scale of Great IDE Mess in 2.5...
> 
> linux-next might help with the last part, but I don't think it really
> deals with the first one.  It certainly helps to some extent, but...
> 
> We need higher S/N on l-k.  We need people looking into the subsystem
> trees as those grow and causing a stench when bad things are found,
> with design issues getting brought to l-k if nothing else helps.  We
> need tree maintainers understanding that review, including out-of-community
> one, is needed (the need of testing is generally better understood - I
> _hope_).
> 
> We need more people reading the fscking source.  Subsystem by subsystem.
> Without assumption that code is not broken.  With mechanism collating
> the questions asked and answers given.  Ideally we need growing documentation
> of core subsystems and data structures, with explicit goal of helping
> reviewers new to an area to find their way around it.  And yes, I'm
> guilty of procrastinating on that - several half-finished pieces on
> VFS-related stuff are sitting locally ;-/
> 
> We need gregkh to get real and stop assuming that two Signed-off-by are
> equivalent to "reviewed at least twice", while we are at it ;-)
> 
> We need people to realize that warnings are useful as triage tools -
> not as "Ug see warning.  Warning bad.  Ug fix that line.  Warning go away.
> Ug changeset count grow.  Ug happy.", but as machine-assisted part of
> finding confused areas of code.  With human combining signals from
> different warnings to get statistically useful triage strategies (note
> that aforementioned making gcc/sparse/whatnot to STFU by local change
> has a lovely potential of distorting those signals and actually _hiding_
> crap code).
> 
> Maybe we need a list a-la linux-arch for tree maintainers to coordinate
> stuff - obviously open not only for those.
> 
> We really need to get around to doing triage of remaining stuff in -mm,
> BTW - again, guilty for not getting through such on VFS-related stuff
> in there.  Hopefully linux-next trees will eventually vacuum most of the
> pile in...

That all sounds good and I expect few would disagree.  But if it is to
happen, it clearly won't happen by itself, automatically.  We will need to
force it upon ourselves and the means by which we will do that is process
changes.  The thing which is being disparaged as "bureaucracy".

The steps to be taken are:

a) agree that we have a problem

b) agree that we need to address it

c) identify the day-to-day work practices which will help address it (as
   you have done)

d) identify the process changes which will force us to adopt those practices

e) implement those process changes.

I have thus far failed to get us past step a).

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  8:04                     ` Andrew Morton
@ 2008-04-14  8:30                       ` David Miller
  2008-04-14  9:06                         ` Christoph Hellwig
                                           ` (2 more replies)
  2008-04-14 12:08                       ` Adrian Bunk
  2008-04-14 14:43                       ` Arjan van de Ven
  2 siblings, 3 replies; 66+ messages in thread
From: David Miller @ 2008-04-14  8:30 UTC (permalink / raw)
  To: akpm
  Cc: viro, w, david, sclark46, johnpol, rjw, tilman, Valdis.Kletnieks,
	lkml, jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 14 Apr 2008 01:04:12 -0700

> That all sounds good and I expect few would disagree.  But if it is to
> happen, it clearly won't happen by itself, automatically.  We will need to
> force it upon ourselves and the means by which we will do that is process
> changes.  The thing which is being disparaged as "bureaucracy".
> 
> The steps to be taken are:
> 
> a) agree that we have a problem
 ...
> I have thus far failed to get us past step a).

A lot of people, myself included, subconsciously don't want to
get past step a) because the resulting "bureaucracy" or whatever
you want to call it is perceived to undercut the very thing
that makes the Linux kernel fun to work on.

It's still largely free form, loose, and flexible.  And that's
a notable accomplishment considering how much things have changed.
That feeling is why I got involved in the first place, and I know
it's what gets other new people in and addicted too.

Nobody is "forced" to do anything, and I notice you used the
word "force" in d) :-)

And I realize this relaxed attitude goes hand in hand with reduced
quality and occaisionally more bugs.  In many ways, I'm happy with
that tradeoff at least wrt. how that works out for the subsystems
I'm responsible for.

We can ask more subsystem tree maintainers to run their trees more
strictly, review patches more closely, etc.  But, be honest, good luck
getting that from the guys who do subsystem maintainence in their
spare time on the weekends.  The remaining cases should know better,
or simply don't care.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  8:30                       ` David Miller
@ 2008-04-14  9:06                         ` Christoph Hellwig
  2008-04-14  9:46                         ` Andi Kleen
  2008-04-14 10:15                         ` Andrew Morton
  2 siblings, 0 replies; 66+ messages in thread
From: Christoph Hellwig @ 2008-04-14  9:06 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, viro, w, david, sclark46, johnpol, rjw, tilman,
	Valdis.Kletnieks, lkml, jesper.juhl, yoshfuji, jeff, linux-kernel,
	git, netdev

On Mon, Apr 14, 2008 at 01:30:58AM -0700, David Miller wrote:
> We can ask more subsystem tree maintainers to run their trees more
> strictly, review patches more closely, etc.  But, be honest, good luck
> getting that from the guys who do subsystem maintainence in their
> spare time on the weekends.  The remaining cases should know better,
> or simply don't care.

Actually my impression is that spare-time maitainer produce much better
code and subsystem trees than corporate-drones.  But of course there's
a lot of shades between those two extremes.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  8:30                       ` David Miller
  2008-04-14  9:06                         ` Christoph Hellwig
@ 2008-04-14  9:46                         ` Andi Kleen
  2008-04-15  5:25                           ` Bill Fink
  2008-04-14 10:15                         ` Andrew Morton
  2 siblings, 1 reply; 66+ messages in thread
From: Andi Kleen @ 2008-04-14  9:46 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, viro, w, david, sclark46, johnpol, rjw, tilman,
	Valdis.Kletnieks, lkml, jesper.juhl, yoshfuji, jeff, linux-kernel,
	git, netdev

David Miller <davem@davemloft.net> writes:
>
> It's still largely free form, loose, and flexible. 

I think Al's point was that we need far more "free form, loose and
flexible" work for reviewing code. As in people going over trees and
just checking it for anything suspicious and going over existing code
and checking it for anything suspicious and going also over mailing
list patch posts. And also maintainers who appreciate such review.

And checking it for anything suspicious does not mean running
only checkpatch.pl or even just sparse, but actually reading it
and trying to make sense of it.

I don't see that really as conflicting with your goals.

It would be some more work for the maintainers to handle more such
feedback because they would need to process comments from such "free
form reviewers".  Some of them will undoutedly be wrong and that will
take some time away from processing features (and bugs) but I suspect
it would be still worth it.

On the other hand it would also take some work away from
processing bugs, but as Andrew mentions earlier it looks
like significant parts of the boring areas of bug reports 
(like getting basic information from reporter etc.) 
could be "out-sourced" to bug masters. 

And I think being a bug master is an excellent way for someone who isn't
a great coder to contribute in excellent ways to Linux
(far more than someone e.g. running checkpatch.pl ever could) 

The challenging thing is also to make sure that the quality of
comments stays high. That means more focus on logic and functionality
than on form. If the reviewer just goes over the coding style or
trivialities I don't think that will improve Linux really. I think the
problem is often that people think kernel code must be very
complicated and they don't even dare try to understand it.  But
frankly a lot of the kernel code is not really that complicated logic
wise and also doesn't need too specialized knowledge to understand.
So I am optimistic that there are a lot of people out there who would
be qualified to do some logic review.

Really Linux needs a better "reviewing culture" and also
a better "bug processing culture"

> We can ask more subsystem tree maintainers to run their trees more
> strictly, review patches more closely, etc.  But, be honest, good luck
> getting that from the guys who do subsystem maintainence in their
> spare time on the weekends.  The remaining cases should know better,
> or simply don't care.

In my experience weekend maintainers tend to be better at sharing
out work. As in they usually (ok there are exceptions) more work
including review work on the mailing lists, while my impression
is that paid for maintainers tend to have tendency for more 
centralized "cathedral" tree maintenance. That is with them trying to 
keep everything under control and effectively much more stuff going on the 
background out of public view. But the sharing out of work and less
centralization is what we really want here I think.

Anyways I'm not saying all paid-for maintainers are like this, but
there is certainly a trend I think.

I admit I personally went through both phases in several projects.

When you're really focussed on something it is tempting to do 
the "keep things under control" central model, but in the end
it is the wrong way to go.

-Andi

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  8:30                       ` David Miller
  2008-04-14  9:06                         ` Christoph Hellwig
  2008-04-14  9:46                         ` Andi Kleen
@ 2008-04-14 10:15                         ` Andrew Morton
  2008-04-14 10:41                           ` David Miller
  2 siblings, 1 reply; 66+ messages in thread
From: Andrew Morton @ 2008-04-14 10:15 UTC (permalink / raw)
  To: David Miller
  Cc: viro, w, david, sclark46, johnpol, rjw, tilman, Valdis.Kletnieks,
	lkml, jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

On Mon, 14 Apr 2008 01:30:58 -0700 (PDT) David Miller <davem@davemloft.net> wrote:

> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Mon, 14 Apr 2008 01:04:12 -0700
> 
> > That all sounds good and I expect few would disagree.  But if it is to
> > happen, it clearly won't happen by itself, automatically.  We will need to
> > force it upon ourselves and the means by which we will do that is process
> > changes.  The thing which is being disparaged as "bureaucracy".
> > 
> > The steps to be taken are:
> > 
> > a) agree that we have a problem
>  ...
> > I have thus far failed to get us past step a).
> 
> A lot of people, myself included, subconsciously don't want to
> get past step a) because the resulting "bureaucracy" or whatever
> you want to call it is perceived to undercut the very thing
> that makes the Linux kernel fun to work on.
> 
> It's still largely free form, loose, and flexible.  And that's
> a notable accomplishment considering how much things have changed.
> That feeling is why I got involved in the first place, and I know
> it's what gets other new people in and addicted too.
> 
> Nobody is "forced" to do anything, and I notice you used the
> word "force" in d) :-)

OK, I was going to let this pass, but I changed my mind.

You carefully deleted my text so that you could misquote it, thereby
flagrantly misrepresenting everything I said.

Here it is again:

: The steps to be taken are:
: 
: a) agree that we have a problem
: 
: b) agree that we need to address it
: 
: c) identify the day-to-day work practices which will help address it (as
:    you have done)
: 
: d) identify the process changes which will force us to adopt those practices
: 
: e) implement those process changes.

Forcing a discipline upon oneself is totally different from having it
forced upon you by someone else.

Each step will need general agreement and buyin, otherwise none of it will
(or should) work.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 10:15                         ` Andrew Morton
@ 2008-04-14 10:41                           ` David Miller
  2008-04-14 17:35                             ` Roman Shaposhnik
  0 siblings, 1 reply; 66+ messages in thread
From: David Miller @ 2008-04-14 10:41 UTC (permalink / raw)
  To: akpm
  Cc: viro, w, david, sclark46, johnpol, rjw, tilman, Valdis.Kletnieks,
	lkml, jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

From: Andrew Morton <akpm@linux-foundation.org>
Date: Mon, 14 Apr 2008 03:15:30 -0700

> You carefully deleted my text so that you could misquote it, thereby
> flagrantly misrepresenting everything I said.

Not the intention, but anyways:

> Here it is again:
> 
> : The steps to be taken are:
> : 
> : a) agree that we have a problem
> : 
> : b) agree that we need to address it
> : 
> : c) identify the day-to-day work practices which will help address it (as
> :    you have done)
> : 
> : d) identify the process changes which will force us to adopt those practices
> : 
> : e) implement those process changes.
> 
> Forcing a discipline upon oneself is totally different from having it
> forced upon you by someone else.
> 
> Each step will need general agreement and buyin, otherwise none of it will
> (or should) work.

The "force" is to "us" which is a group.

And I imagine that newcomers will be expected to adopt these
"practices".  So in effect, they will be "forced" into the process
changes as well.

I'm getting more and more sensitive to issues on this level over time,
because I realize that the fundamental issue in all human group issues
is getting people to "want" to do things.  And "force", in any form,
tends to be incompatible with "want".  And in particular, people will
often even shun things they "want" when it is "forced" to them.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  8:04                     ` Andrew Morton
  2008-04-14  8:30                       ` David Miller
@ 2008-04-14 12:08                       ` Adrian Bunk
  2008-04-14 14:43                       ` Arjan van de Ven
  2 siblings, 0 replies; 66+ messages in thread
From: Adrian Bunk @ 2008-04-14 12:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On Mon, Apr 14, 2008 at 01:04:12AM -0700, Andrew Morton wrote:
>...
> (And I don't think developers and maintainers _should_ spend time mucking
> in bug-tracking systems.  They should have helpers who do all the
> triaging/tracking/routing/closing work for them, and then provide other
> developers with the results, letting them know what they should be spending
> time on.  But there's a manpower problem).
>...

Speaking as the one who was for a few years going again and again 
through all open bugs in the kernel Bugzilla:

The manpower problem isn't in handling the bugs in Bugzilla.

I'd claim that even if all bugs in the kernel would be reported in the 
kernel Bugzilla I alone would be able to do all the handling of incoming 
bugs, bug forwarding and doing all the cleanup stuff like asking 
submitters whether a bug is still present in the latest kernel.

The manpower problem is at the developers and maintainers who could 
actually debug the problems.

One problem are unmaintained areas.
Do we have anyone who would debug e.g. APM bugs?
And if I want to be really nasty, I'll ask whether we have anyone who 
understands our floppy driver...  ;)

And who would debug problems with old and unmaintained drivers, e.g. 
some old net or SCSI driver?

Note that I do not blame James or Jeff or whoever else for the latter - 
they might simply not have the time to spend a day or two for debugging 
some obscure problem on some obscure hardware.

And it could happen everywhere that maintainers simply don't have 
the time to cope with all incoming bug reports.

We have many people who write new bugs^Wcode.
But too few people who review code.
And too few people willing to maintain the existing code.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  8:04                     ` Andrew Morton
  2008-04-14  8:30                       ` David Miller
  2008-04-14 12:08                       ` Adrian Bunk
@ 2008-04-14 14:43                       ` Arjan van de Ven
  2008-04-14 17:51                         ` Andrew Morton
  2 siblings, 1 reply; 66+ messages in thread
From: Arjan van de Ven @ 2008-04-14 14:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On Mon, 14 Apr 2008 01:04:12 -0700
> 
> The steps to be taken are:
> 
> a) agree that we have a problem
> 

I for one do not agree that we have a problem.

Based on actual data on oopses (which very clearly excludes other kinds of bugs, so I know I only see part of the story)
we are doing reasonably well. Lets look at the 2.6.25 cycle. 
We got a total of roughly 2700 reports of oopses/warn_ons from users. (This may sound high to those of you only reading
lkml, but this includes automatically collected oopses from Fedora 9 beta testers).
Out of these 2700, the top 20 issues account for 75% of the total reports.

Out of these 20 issues, 9 were from still out of tree drivers (wireless.git and drm.git included in F9). These were
caught before they even got close to mainline.
The remaining 11 issues can be split in
1) The ones we caught and fixed
2) TCP/IP warnings that DaveM and co are chasing down hard (but have trouble finding reproducers)
3) An EXT3 bug that in theory can cause data corruption, but in practice seems to happen after you yank out a USB stick
  with an EXT3 filesystem on (so it can't corrupt the disk data). Ted is working on this
4) A bug (double free) that hits in the skb layer, probably caused by a bug in the ipv4 code
   (a first analysis + potential patch was mailed to netdev this weekend)
5) sysfs "existing file added" warning, mostly in the USB stack
   (gregkh claims he fixed this recently, I'm not entirely sure he got all cases)

And when I look beyond the first 20, the same pattern arises, we fixed the majority of the issues before -rc9.
At position 25 we have less than 20 reports per bug. At position 35 we have less than 10 reports per bug. 
At position 50 we have less than 5 reports per bug. Conclusion there: the bugs people actually hit fall of dramatically;
there's a core set of issues that gets hit a lot, the rest quickly gets reduced to noise levels.

To me this does not sound like we have a huge quality problem because
1) The distribution of the bugs is such that there is a relatively small set of core issues
   that are widely hit, and then there's a near exponential drop after that
2) We are fixing the important bugs by and large before they hit a release
   (important as defined by the number of people actually hitting the bug)

I'll be writing a report with more details about this soon with more analysis and statistics
(I'll be looking at more detail around the top 25 issues, when they got introduced, when they got fixed etc)

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  7:23                   ` Al Viro
  2008-04-14  7:43                     ` Al Viro
  2008-04-14  8:04                     ` Andrew Morton
@ 2008-04-14 15:54                     ` James Morris
  2008-04-14 22:01                       ` David Miller
  2008-04-15  9:33                       ` David Newall
  2 siblings, 2 replies; 66+ messages in thread
From: James Morris @ 2008-04-14 15:54 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Willy Tarreau, david, Stephen Clark,
	Evgeniy Polyakov, Rafael J. Wysocki, Tilman Schmidt,
	Valdis.Kletnieks, Mark Lord, David Miller, jesper.juhl, yoshfuji,
	jeff, linux-kernel, git, netdev

On Mon, 14 Apr 2008, Al Viro wrote:

> Real review of code in tree and patches getting into the tree.

There is currently little incentive for developers to perform review.  

It's difficult work, and is generally not rewarded or recognized, except 
in often quite negative ways.  There is a small handful of people who do a 
lot of review, but they are exceptional in various ways.

OTOH, writing code is relatively simple, and is much more highly rewarded:

- People tend to get paid to write kernel code, but not so much to review 
  it.

- Things like "who made the kernel" statistics and related articles ignore 
  code review.

- Creating new features is perceived as the highest form of contribution 
  for general developers, and likely important as career currency 
  (similar to the publish or perish model in the academic world).

I don't know how to solve this, but suspect that encouraging the use of 
reviewed-by and also including it in things like analysis of who is 
contributing, selection for kernel summit invitations etc. would be a 
start.  At least, better than nothing.

- James 
-- 
James Morris
<jmorris@namei.org>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 10:41                           ` David Miller
@ 2008-04-14 17:35                             ` Roman Shaposhnik
  0 siblings, 0 replies; 66+ messages in thread
From: Roman Shaposhnik @ 2008-04-14 17:35 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, viro, w, david, sclark46, johnpol, rjw, tilman,
	Valdis.Kletnieks, lkml, jesper.juhl, yoshfuji, jeff, linux-kernel,
	git, netdev

On Mon, 2008-04-14 at 03:41 -0700, David Miller wrote:
> I'm getting more and more sensitive to issues on this level over time,
> because I realize that the fundamental issue in all human group issues
> is getting people to "want" to do things.  And "force", in any form,
> tends to be incompatible with "want".  And in particular, people will
> often even shun things they "want" when it is "forced" to them.

Just wanted to add my 2c by mentioning my favorite example of 
"virtual Tom Sawyering" as far as a tedious review process goes:
   http://en.wikipedia.org/wiki/Knuth_reward_check

Which is also quite cheap too -- AFAIK very few of those have ever
been cashed.

Thanks,
Roman.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 14:43                       ` Arjan van de Ven
@ 2008-04-14 17:51                         ` Andrew Morton
  2008-04-14 18:24                           ` Arjan van de Ven
  2008-04-14 19:30                           ` Ilpo Järvinen
  0 siblings, 2 replies; 66+ messages in thread
From: Andrew Morton @ 2008-04-14 17:51 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Al Viro, Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On Mon, 14 Apr 2008 07:43:49 -0700 Arjan van de Ven <arjan@infradead.org> wrote:

> On Mon, 14 Apr 2008 01:04:12 -0700
> > 
> > The steps to be taken are:
> > 
> > a) agree that we have a problem
> > 
> 
> 
> I for one do not agree that we have a problem.
> 
> Based on actual data on oopses (which very clearly excludes other kinds of bugs, so I know I only see part of the story)
> we are doing reasonably well. Lets look at the 2.6.25 cycle. 
> We got a total of roughly 2700 reports of oopses/warn_ons from users. (This may sound high to those of you only reading
> lkml, but this includes automatically collected oopses from Fedora 9 beta testers).
> Out of these 2700, the top 20 issues account for 75% of the total reports.
> 
> Out of these 20 issues, 9 were from still out of tree drivers (wireless.git and drm.git included in F9). These were
> caught before they even got close to mainline.
> The remaining 11 issues can be split in
> 1) The ones we caught and fixed
> 2) TCP/IP warnings that DaveM and co are chasing down hard (but have trouble finding reproducers)
> 3) An EXT3 bug that in theory can cause data corruption, but in practice seems to happen after you yank out a USB stick
>   with an EXT3 filesystem on (so it can't corrupt the disk data). Ted is working on this
> 4) A bug (double free) that hits in the skb layer, probably caused by a bug in the ipv4 code
>    (a first analysis + potential patch was mailed to netdev this weekend)
> 5) sysfs "existing file added" warning, mostly in the USB stack
>    (gregkh claims he fixed this recently, I'm not entirely sure he got all cases)
> 
> And when I look beyond the first 20, the same pattern arises, we fixed the majority of the issues before -rc9.
> At position 25 we have less than 20 reports per bug. At position 35 we have less than 10 reports per bug. 
> At position 50 we have less than 5 reports per bug. Conclusion there: the bugs people actually hit fall of dramatically;
> there's a core set of issues that gets hit a lot, the rest quickly gets reduced to noise levels.
> 
> 
> To me this does not sound like we have a huge quality problem because
> 1) The distribution of the bugs is such that there is a relatively small set of core issues
>    that are widely hit, and then there's a near exponential drop after that
> 2) We are fixing the important bugs by and large before they hit a release
>    (important as defined by the number of people actually hitting the bug)
> 
> 
>  
> I'll be writing a report with more details about this soon with more analysis and statistics
> (I'll be looking at more detail around the top 25 issues, when they got introduced, when they got fixed etc)

Well OK.  But I don't think we can generalise from oops-causing bugs all
the way to all bugs.  Very few bugs actually cause oopses, and oopses tend
to be the thing which developers will zoom in on and pay attention to.

If we had metrics on "time goes backwards" or anything containing "ASUS",
things might be different.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 17:51                         ` Andrew Morton
@ 2008-04-14 18:24                           ` Arjan van de Ven
  2008-04-14 19:30                           ` Ilpo Järvinen
  1 sibling, 0 replies; 66+ messages in thread
From: Arjan van de Ven @ 2008-04-14 18:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On Mon, 14 Apr 2008 10:51:52 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> Well OK.  But I don't think we can generalise from oops-causing bugs

including all WARN_ON's and various other kernel backtrace-causing bugs.

> all the way to all bugs.  Very few bugs actually cause oopses, and
> oopses tend to be the thing which developers will zoom in on and pay
> attention to.

maybe.
> 
> If we had metrics on "time goes backwards" or anything containing
> "ASUS", things might be different.

Sounds really like we need to add more strategic WARN_ON's and other diagnostics in 
the kernel to track these issues down.

Because another thing that I found so far is that what hits LKML is by far not representative
on what happens for users. The most obvious example was the whole input layer refcounting disaster
in 2.6.25-rc; this was about 1/3rd of TOTAL reports for a few weeks in a row, but there
was hardly an LKML posting for it (in fact there was only 1 half one).
We need diagnostics and stuff the kernel spits out so that automated tools can detect these,
otherwise we'll very likely not get good information on what is actually wrong with the kernel.

In case you want to see the 2.6.25-rc data, the top 100 list is at
http://www.kerneloops.org/twentyfive.html

(I'm still working on annotating the individual items, but since there's 100
that does take time)

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  6:24                 ` Andrew Morton
  2008-04-14  6:39                   ` David Miller
  2008-04-14  7:23                   ` Al Viro
@ 2008-04-14 19:13                   ` Rene Herman
  2008-04-14 20:38                     ` Andrew Morton
  2 siblings, 1 reply; 66+ messages in thread
From: Rene Herman @ 2008-04-14 19:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Al Viro, Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On 14-04-08 08:24, Andrew Morton wrote:

> On Mon, 14 Apr 2008 06:39:43 +0100 Al Viro <viro@ZenIV.linux.org.uk> wrote:

>> I have a related proposal: let us require all patches to be stamped
>> with Discordian *and* Eternal September dates.  In triplicate.  While
>> we are at it, why don't we introduce new mandatory headers like, say
>> it,
>>
>> X-checkpatch: {Yes,No}
>> X-checkpatch-why-not: <string>
>> X-pointless: <number from 1 to 69, going from "1: does something useful" all
>> the way to "68: aligns right ends of lines in comments">
>> X-arbitrary-rules-added-to-CodingStyle: <number> (should be present if
>> and only if X-pointless: 69 is present).
>>
>> Come to think of that, we clearly need a new file in Documentation/*,
>> documenting such headers.  Why don't we organize a subcommittee^Wnew maillist
>> devoted to that?  That would provide another entry route for contributors,
>> lowering the overall entry barriers even further...
>>
> 
> None of the above was particularly useful.

Does that mean you're not going to take patches that align the right end of 
lines in comments? :-(

Rene.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 17:51                         ` Andrew Morton
  2008-04-14 18:24                           ` Arjan van de Ven
@ 2008-04-14 19:30                           ` Ilpo Järvinen
  1 sibling, 0 replies; 66+ messages in thread
From: Ilpo Järvinen @ 2008-04-14 19:30 UTC (permalink / raw)
  To: Andrew Morton, Arjan van de Ven
  Cc: Al Viro, Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, Jeff Garzik, linux-kernel,
	git, Netdev

On Mon, 14 Apr 2008, Andrew Morton wrote:

> On Mon, 14 Apr 2008 07:43:49 -0700 Arjan van de Ven <arjan@infradead.org> wrote:
> 
> > I'll be writing a report with more details about this soon with more analysis and statistics
> > (I'll be looking at more detail around the top 25 issues, when they got introduced, when they got fixed etc)
> 
> Well OK.  But I don't think we can generalise from oops-causing bugs all
> the way to all bugs.  Very few bugs actually cause oopses, and oopses tend
> to be the thing which developers will zoom in on and pay attention to.
> 
> If we had metrics on "time goes backwards" or anything containing "ASUS",
> things might be different.

Even oopses have pitfalls, like in 25-rcs where those WARN_ON TCP 
backtraces were due to three different bugs (there might be fourth one 
still remaining). ...kerneloops.org didn't even make difference between 
different WARN_ONs in a function though that would have helped only little 
in the case of 25-rc TCP because of different bugs causing failures in the 
same invariant.

-- 
 i.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 19:13                   ` Rene Herman
@ 2008-04-14 20:38                     ` Andrew Morton
  2008-04-14 22:18                       ` Rene Herman
  0 siblings, 1 reply; 66+ messages in thread
From: Andrew Morton @ 2008-04-14 20:38 UTC (permalink / raw)
  To: Rene Herman
  Cc: viro, w, david, sclark46, johnpol, rjw, tilman, Valdis.Kletnieks,
	lkml, davem, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On Mon, 14 Apr 2008 21:13:41 +0200
Rene Herman <rene.herman@keyaccess.nl> wrote:

> Does that mean you're not going to take patches that align the right end of 
> lines in comments? :-(

erm, was that ":-(" supposed to be a ":-)"?

I don't like to merge patches which fix typos and spellos and grammaros
in comments, simply because I'd be buried in the things.  I do take such
fixes for user-visible text (Documentation/, kerneldoc comments and
printks).

Right-justification of comments would fall rather a long way below spelling
fixes.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 15:54                     ` James Morris
@ 2008-04-14 22:01                       ` David Miller
  2008-04-14 23:05                         ` Andrew Morton
  2008-04-15  9:33                       ` David Newall
  1 sibling, 1 reply; 66+ messages in thread
From: David Miller @ 2008-04-14 22:01 UTC (permalink / raw)
  To: jmorris
  Cc: viro, akpm, w, david, sclark46, johnpol, rjw, tilman,
	Valdis.Kletnieks, lkml, jesper.juhl, yoshfuji, jeff, linux-kernel,
	git, netdev

From: James Morris <jmorris@namei.org>
Date: Tue, 15 Apr 2008 01:54:00 +1000 (EST)

> - Things like "who made the kernel" statistics and related articles ignore 
>   code review.

Note the apparent irony in that the person who ends up often on the
top of those lists, Al Viro, is also someone who also does a
significant amount of code review.

I think this is no accident.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 20:38                     ` Andrew Morton
@ 2008-04-14 22:18                       ` Rene Herman
  0 siblings, 0 replies; 66+ messages in thread
From: Rene Herman @ 2008-04-14 22:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: viro, w, david, sclark46, johnpol, rjw, tilman, Valdis.Kletnieks,
	lkml, davem, jesper.juhl, yoshfuji, jeff, linux-kernel, git,
	netdev

On 14-04-08 22:38, Andrew Morton wrote:

> On Mon, 14 Apr 2008 21:13:41 +0200
> Rene Herman <rene.herman@keyaccess.nl> wrote:
> 
>> Does that mean you're not going to take patches that align the right end of 
>> lines in comments? :-(
> 
> erm, was that ":-(" supposed to be a ":-)"?

The ":-(" was supposed to add to the implicitly obvious ":-)". That is, was 
indeed joking (Al mentioned them) but with a slightly serious undertone:

> I don't like to merge patches which fix typos and spellos and grammaros 
> in comments, simply because I'd be buried in the things. I do take such 
> fixes for user-visible text (Documentation/, kerneldoc comments and 
> printks).
> 
> Right-justification of comments would fall rather a long way below
> spelling fixes.

You, particularly, seem to be very good at picking up trivia. I've posted 
completely trivial patches from time to time for small things I encounter 
while looking at something else. Things at the "are people going to look 
funny at me for even bothering or..." level but you picking them up means 
it's still useful to post, so I sometimes do.

Now, in fact, Linux as a _whole_ doesn't seem bad at accepting that kind of 
small janitorial stuff but I have been noticing some backlash to it as well. 
I'm not sure it's worse or better than historically, but the "checkpatch 
syndrome" certainly triggers more of it.

Al specifically wanted more new eyes but the way to reward those new eyes is 
accepting their small changes. Al also specifically doesn't like those small 
changes when at the level of the automated and semi-brainless checkpatch level.

I believe the janitorial work has been over-organized, both through the 
kernel-janitors and checkpatch since while these are very useful in guiding 
a newbie in _what_ to do they cause "automated" huge tree-wide trivia storms 
which people then don't react overly favourable to and the new eyes who did 
all that work of generating it all dim again...

Frankly, the kernel really is fairly complex these days when starting at 0. 
Much more complex certainly than, say, back in 2.0 or 2.2 days and while 
Al's scenario of per-subsystem reviews might be good, I don't believe it's 
very realistic. Companies don't pay to have those done and for newbies it's 
generally too complex since understanding most parts of the kernel fully, 
requires understanding most of the rest kernel rather well also.

So you get the really promising newbies? Yeah, that, or you don't get anyone 
and if some promising newbies are building up 137 part checkpatch inspired 
patchsets that don't help none.

So, what am I saying (what _am_ I saying?!?) ...

I seemed to observe somewhat of an internal contradiction in Al's message 
about new eyes and his dislike of the trivial stuff but the contradiction 
only exists if the dislike wouldn't be limited to these kinds of huge trivia 
storms. I believe it is, and I furthermore believe that yes, it's 
over-organization that causes many new eyes to focus on the brainless aspects.

Now, do those new eyes have many other options when very few (to none) of 
the core crowd ever does things like answer question on the kernelnewbies 
list? From the established names, I only remember ever seeing Greg KH and 
Adrian Bunk there. And I'm _still_ pissed that noone would or could tell me 
what was wrong with the legacy CD-ROM driver I and Pekka Enberg were toying 
around with a while ago. Frankly, I care a whole lot less about a hundred 
sparse warning fixes.

In short -- the kernel in it's current state is already quite complex and if 
new eyes are wanted they'll need to be coached more. I'm seeing very little 
of that.

Rene.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 22:01                       ` David Miller
@ 2008-04-14 23:05                         ` Andrew Morton
  2008-04-15  4:55                           ` Willy Tarreau
  0 siblings, 1 reply; 66+ messages in thread
From: Andrew Morton @ 2008-04-14 23:05 UTC (permalink / raw)
  To: David Miller
  Cc: jmorris, viro, w, david, sclark46, johnpol, rjw, tilman,
	Valdis.Kletnieks, lkml, jesper.juhl, yoshfuji, jeff, linux-kernel,
	git, netdev

On Mon, 14 Apr 2008 15:01:05 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: James Morris <jmorris@namei.org>
> Date: Tue, 15 Apr 2008 01:54:00 +1000 (EST)
> 
> > - Things like "who made the kernel" statistics and related articles ignore 
> >   code review.
> 
> Note the apparent irony in that the person who ends up often on the
> top of those lists, Al Viro, is also someone who also does a
> significant amount of code review.
> 
> I think this is no accident.

"who made the kernel" was an interesting and useful exercise, but if you
like irony then...

- The way to boost your commit count is to submit buggy patches and to
  then fix your own bugs.

- The way to lower your commit count is to fix things in other people's
  patches, then fold your fix into the base patch.  I've lost over 1000
  commits that way.  Unless they are counting '^    [akpm' as a commit.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 23:05                         ` Andrew Morton
@ 2008-04-15  4:55                           ` Willy Tarreau
  2008-04-15 13:18                             ` Work WAS(Re: " jamal
  0 siblings, 1 reply; 66+ messages in thread
From: Willy Tarreau @ 2008-04-15  4:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, jmorris, viro, david, sclark46, johnpol, rjw,
	tilman, Valdis.Kletnieks, lkml, jesper.juhl, yoshfuji, jeff,
	linux-kernel, git, netdev

On Mon, Apr 14, 2008 at 04:05:13PM -0700, Andrew Morton wrote:
> On Mon, 14 Apr 2008 15:01:05 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
> 
> > From: James Morris <jmorris@namei.org>
> > Date: Tue, 15 Apr 2008 01:54:00 +1000 (EST)
> > 
> > > - Things like "who made the kernel" statistics and related articles ignore 
> > >   code review.
> > 
> > Note the apparent irony in that the person who ends up often on the
> > top of those lists, Al Viro, is also someone who also does a
> > significant amount of code review.
> > 
> > I think this is no accident.
> 
> "who made the kernel" was an interesting and useful exercise, but if you
> like irony then...
> 
> - The way to boost your commit count is to submit buggy patches and to
>   then fix your own bugs.
> 
> - The way to lower your commit count is to fix things in other people's
>   patches, then fold your fix into the base patch.  I've lost over 1000
>   commits that way.  Unless they are counting '^    [akpm' as a commit.

And if Dave speaks about these stats : http://lwn.net/Articles/237768/
then Al does not even appear in it, which proves your point.

Willy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14  9:46                         ` Andi Kleen
@ 2008-04-15  5:25                           ` Bill Fink
  0 siblings, 0 replies; 66+ messages in thread
From: Bill Fink @ 2008-04-15  5:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Miller, akpm, viro, w, david, sclark46, johnpol, rjw,
	tilman, Valdis.Kletnieks, lkml, jesper.juhl, yoshfuji, jeff,
	linux-kernel, git, netdev

On Mon, 14 Apr 2008, Andi Kleen wrote:

> David Miller <davem@davemloft.net> writes:
> >
> > It's still largely free form, loose, and flexible. 
> 
> I think Al's point was that we need far more "free form, loose and
> flexible" work for reviewing code. As in people going over trees and
> just checking it for anything suspicious and going over existing code
> and checking it for anything suspicious and going also over mailing
> list patch posts. And also maintainers who appreciate such review.
> 
> And checking it for anything suspicious does not mean running
> only checkpatch.pl or even just sparse, but actually reading it
> and trying to make sense of it.

If you really want to get more such review, then it would be very
useful when someone asks about some obtuse portion of kernel code
or makes a suggested improvement, that the reviewer then not be
flamed as being dense for not understanding the code or some kernel
coding concept.  It would be much better to treat it as an oppurtunity
to educate rather than belittle, thus eventually enlarging the base
of people who can assist with various aspects of kernel development.
For what's supposed to be an open, engaging community, and which
generally is, there sometimes seems to be some level of dismissal
of newcomers (not sure it's intended that way but nevertheless it
can tend to discourage newcomers from getting more involved).

						-Bill

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-14 15:54                     ` James Morris
  2008-04-14 22:01                       ` David Miller
@ 2008-04-15  9:33                       ` David Newall
  2008-04-15  9:54                         ` Michael Kerrisk
  2008-04-16 12:15                         ` Sverre Rabbelier
  1 sibling, 2 replies; 66+ messages in thread
From: David Newall @ 2008-04-15  9:33 UTC (permalink / raw)
  To: James Morris
  Cc: Al Viro, Andrew Morton, Willy Tarreau, david, Stephen Clark,
	Evgeniy Polyakov, Rafael J. Wysocki, Tilman Schmidt,
	Valdis.Kletnieks, Mark Lord, David Miller, jesper.juhl, yoshfuji,
	jeff, linux-kernel, git, netdev

James Morris wrote:
> I don't know how to solve this, but suspect that encouraging the use of 
> reviewed-by and also including it in things like analysis of who is 
> contributing, selection for kernel summit invitations etc. would be a 
> start.  At least, better than nothing.


Would it be hard to keep count of the number of errors introduced by
author and reviewer?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-15  9:33                       ` David Newall
@ 2008-04-15  9:54                         ` Michael Kerrisk
  2008-04-15 14:04                           ` David Newall
  2008-04-16 12:15                         ` Sverre Rabbelier
  1 sibling, 1 reply; 66+ messages in thread
From: Michael Kerrisk @ 2008-04-15  9:54 UTC (permalink / raw)
  To: David Newall
  Cc: James Morris, Al Viro, Andrew Morton, Willy Tarreau, david,
	Stephen Clark, Evgeniy Polyakov, Rafael J. Wysocki,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

On 4/15/08, David Newall <davidn@davidnewall.com> wrote:
> James Morris wrote:
>  > I don't know how to solve this, but suspect that encouraging the use of
>  > reviewed-by and also including it in things like analysis of who is
>  > contributing, selection for kernel summit invitations etc. would be a
>  > start.  At least, better than nothing.
>
> Would it be hard to keep count of the number of errors introduced by
>  author and reviewer?

I've found quite a few errors in kernel-userland APIs, but I'm not
sure that this sort of negative statistic would be helpful -- e.g.,
more productive developers probably also introduce more errors.

-- 
I'll likely only see replies if they are CCed to mtk.manpages at gmail dot com

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Work WAS(Re: Reporting bugs and bisection
  2008-04-15  4:55                           ` Willy Tarreau
@ 2008-04-15 13:18                             ` jamal
  0 siblings, 0 replies; 66+ messages in thread
From: jamal @ 2008-04-15 13:18 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andrew Morton, David Miller, jmorris, viro, david, sclark46,
	johnpol, rjw, tilman, Valdis.Kletnieks, lkml, jesper.juhl,
	yoshfuji, jeff, linux-kernel, git, netdev

On Tue, 2008-15-04 at 06:55 +0200, Willy Tarreau wrote:

> And if Dave speaks about these stats : http://lwn.net/Articles/237768/
> then Al does not even appear in it, which proves your point.

Stats such as those above, while useful, are flawed.
IMO James Morris has (probably more than anybody else) hit on the core
issue. To extend his view: theres more than just code review that
deserves respect. Testing is one. Commenting, not necessarily on code,
but on architecture is another. Documenting. Yes, running sparse or even
Lindent or checkpatch.
In the old/current Linux thinking (pun intended) work equates to
churning code. That thought process derives from Linus actually then
propagates down stream to other folks.
I think the Linus approach is still excellent - but its definition of
"work" is no longer valid. Work must include all these other things
and visible credit is important if the revolution is to continue.

If you look at it from a software engineering or production resource
management, the Linux development model has gotta be one of the most
inefficient[1] - with a reward system geared to developers mostly.
If you want to look it from an investment of time (ROI perspective),
developers get way too much credit riding on everybody elses back.
Why should Mark Lord report another bug to us?
Put yourself in his shoes:
- he is a clever guy who has already worked around the bug. So a proper
fix is only a convinience for him.
- Blessed as he was - he got to do more and more work after reporting.
- he got slapped for claiming he had to go and get lunch and therefore
didnt have time to do more bisect for a bug that wasnt just unique to
his setup.
- he spent a gazillion electrons responding to people and justifying his
stance
- he got no credit for his time whatsoever when the bug was fixed (he
wont be showing up on lwn list).

I think perspective and credit for peoples time needs to change.

cheers,
jamal

[1] With current momentum, theres an infinite resources of developers
and testers and documenters in Linux, i.e
resource management is only valid as a metric if you had finite
resources. So the point i am making is moot - but I do strongly believe
the momentum will dampen if current trend of defining work continues.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-15  9:54                         ` Michael Kerrisk
@ 2008-04-15 14:04                           ` David Newall
  2008-04-15 20:51                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 66+ messages in thread
From: David Newall @ 2008-04-15 14:04 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: James Morris, Al Viro, Andrew Morton, Willy Tarreau, david,
	Stephen Clark, Evgeniy Polyakov, Rafael J. Wysocki,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

Michael Kerrisk wrote:
> On 4/15/08, David Newall <davidn@davidnewall.com> wrote:
>   
>> James Morris wrote:
>>  > I don't know how to solve this, but suspect that encouraging the use of
>>  > reviewed-by and also including it in things like analysis of who is
>>  > contributing, selection for kernel summit invitations etc. would be a
>>  > start.  At least, better than nothing.
>>
>> Would it be hard to keep count of the number of errors introduced by
>>  author and reviewer?
>>     
>
> I've found quite a few errors in kernel-userland APIs, but I'm not
> sure that this sort of negative statistic would be helpful -- e.g.,
> more productive developers probably also introduce more errors.

We can already see which developers are more active.  What we can't see
is who is careless, which would be useful to know.  It would also be
useful to know who is careless in approving changes, because they share
responsibility for those changes.  It would be a good thing if this
highlighted that some people are behind frequent buggy changes.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-15 14:04                           ` David Newall
@ 2008-04-15 20:51                             ` Rafael J. Wysocki
  2008-04-16  2:34                               ` David Newall
  0 siblings, 1 reply; 66+ messages in thread
From: Rafael J. Wysocki @ 2008-04-15 20:51 UTC (permalink / raw)
  To: David Newall
  Cc: Michael Kerrisk, James Morris, Al Viro, Andrew Morton,
	Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

On Tuesday, 15 of April 2008, David Newall wrote:
> Michael Kerrisk wrote:
> > On 4/15/08, David Newall <davidn@davidnewall.com> wrote:
> >   
> >> James Morris wrote:
> >>  > I don't know how to solve this, but suspect that encouraging the use of
> >>  > reviewed-by and also including it in things like analysis of who is
> >>  > contributing, selection for kernel summit invitations etc. would be a
> >>  > start.  At least, better than nothing.
> >>
> >> Would it be hard to keep count of the number of errors introduced by
> >>  author and reviewer?
> >>     
> >
> > I've found quite a few errors in kernel-userland APIs, but I'm not
> > sure that this sort of negative statistic would be helpful -- e.g.,
> > more productive developers probably also introduce more errors.
> 
> We can already see which developers are more active.  What we can't see
> is who is careless, which would be useful to know.  It would also be
> useful to know who is careless in approving changes, because they share
> responsibility for those changes.  It would be a good thing if this
> highlighted that some people are behind frequent buggy changes.

Well, even if someone introduces bugs relatively frequently, but then also
works with the reporters and fixes the bugs timely, it's about okay IMO.

The real problem is when patch submitters don't care for their changes any
more once the patches have been merged.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-15 20:51                             ` Rafael J. Wysocki
@ 2008-04-16  2:34                               ` David Newall
  2008-04-16  3:53                                 ` david
  2008-04-16  4:29                                 ` Willy Tarreau
  0 siblings, 2 replies; 66+ messages in thread
From: David Newall @ 2008-04-16  2:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Michael Kerrisk, James Morris, Al Viro, Andrew Morton,
	Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

Rafael J. Wysocki wrote:
> Well, even if someone introduces bugs relatively frequently, but then also
> works with the reporters and fixes the bugs timely, it's about okay IMO.
>   
This really is not okay.  Even if bugs are fixed a version or two later,
the impact those bugs have on users makes the system look bad and drives
them away.  We do not, I believe, want Linux to top the list for "most
bugs".  It's unprofessional, unreliable and quite undesirable.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16  2:34                               ` David Newall
@ 2008-04-16  3:53                                 ` david
  2008-04-16  9:06                                   ` David Newall
  2008-04-16 12:41                                   ` Stephen Clark
  2008-04-16  4:29                                 ` Willy Tarreau
  1 sibling, 2 replies; 66+ messages in thread
From: david @ 2008-04-16  3:53 UTC (permalink / raw)
  To: David Newall
  Cc: Rafael J. Wysocki, Michael Kerrisk, James Morris, Al Viro,
	Andrew Morton, Willy Tarreau, Stephen Clark, Evgeniy Polyakov,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

On Wed, 16 Apr 2008, David Newall wrote:

> Rafael J. Wysocki wrote:
>> Well, even if someone introduces bugs relatively frequently, but then also
>> works with the reporters and fixes the bugs timely, it's about okay IMO.
>>
> This really is not okay.  Even if bugs are fixed a version or two later,
> the impact those bugs have on users makes the system look bad and drives
> them away.  We do not, I believe, want Linux to top the list for "most
> bugs".  It's unprofessional, unreliable and quite undesirable.

timely frequently means the code was merged in -rc1/2 and was fixed before 
the final release of the same version.

given the huge variety of hardware and workloads, it's just too easy for 
there to be cases where any trade-off you make (code size, performance, 
memory usage, common case definitions) can turn around and bite you. In 
addition frequently hardware doesn't work quite the way the design specs 
say that it should (completely ignoring the fact that many drivers are 
reverse engineered). what's most important is that when a case shows up it 
gets addressed promptly

I'd rather have a developer/maintainer who introduces and fixed 100 bug, 
but fixes them promptly, as opposed to one who only introduces one bug, 
but refuses to consider fixing the code 'because they don't make mistakes 
like that' (u\bsadly a common attitude from people who produce very 
good code much of the time)

best of all is a developer/maintainer who writes very good code and is 
willing to accept the fact that they make mistakes and fixes the code 
promptly, but those people are extremely rare, and usually they emerge 
from the pool of people who make more mistakes and fix them promptly, 
which is an added reason I'm more tolerant of that group.

David Lang

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16  2:34                               ` David Newall
  2008-04-16  3:53                                 ` david
@ 2008-04-16  4:29                                 ` Willy Tarreau
  2008-04-16 12:13                                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 66+ messages in thread
From: Willy Tarreau @ 2008-04-16  4:29 UTC (permalink / raw)
  To: David Newall
  Cc: Rafael J. Wysocki, Michael Kerrisk, James Morris, Al Viro,
	Andrew Morton, david, Stephen Clark, Evgeniy Polyakov,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

On Wed, Apr 16, 2008 at 12:04:59PM +0930, David Newall wrote:
> Rafael J. Wysocki wrote:
> > Well, even if someone introduces bugs relatively frequently, but then also
> > works with the reporters and fixes the bugs timely, it's about okay IMO.
> >   
> This really is not okay.  Even if bugs are fixed a version or two later,
> the impact those bugs have on users makes the system look bad and drives
> them away.  We do not, I believe, want Linux to top the list for "most
> bugs".  It's unprofessional, unreliable and quite undesirable.

that's what -rc are for, and it's unprofessional to use them in production :-)


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16  3:53                                 ` david
@ 2008-04-16  9:06                                   ` David Newall
  2008-04-16 11:02                                     ` Andi Kleen
  2008-04-16 12:41                                   ` Stephen Clark
  1 sibling, 1 reply; 66+ messages in thread
From: David Newall @ 2008-04-16  9:06 UTC (permalink / raw)
  To: david
  Cc: Rafael J. Wysocki, Michael Kerrisk, James Morris, Al Viro,
	Andrew Morton, Willy Tarreau, Stephen Clark, Evgeniy Polyakov,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

david@lang.hm wrote:
> I'd rather have a developer/maintainer who introduces and fixed 100
> bug, but fixes them promptly,

And I'd rather be able to see that that person introduced 100 bugs than
to have no idea.  As has been said before, the current situation rewards
people for sloppy work.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16  9:06                                   ` David Newall
@ 2008-04-16 11:02                                     ` Andi Kleen
  0 siblings, 0 replies; 66+ messages in thread
From: Andi Kleen @ 2008-04-16 11:02 UTC (permalink / raw)
  To: David Newall
  Cc: david, Rafael J. Wysocki, Michael Kerrisk, James Morris, Al Viro,
	Andrew Morton, Willy Tarreau, Stephen Clark, Evgeniy Polyakov,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

David Newall <davidn@davidnewall.com> writes:
>
> And I'd rather be able to see that that person introduced 100 bugs than
> to have no idea.   As has been said before, the current situation rewards
> people for sloppy work.

A common issue in the kernel is code who works with a wide 
range of hardware and firmware with varying quality. The original
code is written to spec but then in the real world the hardware
and firmware has all kinds of interesting quirks not quite
matching the spec that need additional updates to handle. I don't think
it's fair to say in this case the original developer was sloppy.

Then there is also code which is just hard to tune. Examples for this
are the CPU scheduler and the VM, but also other areas. They have to
handle a lot of different workloads with often subtle side effects.
Lots of people have put a lot of excellent work into tuning these
subsystems as users report issues with their workloads. Would you say
the original developers were sloppy? I don't think that would be a fair
description. Some problems are just hard and need many 
iterations to get right. And then often also the requirements change over 
time and need further updates.

There are more such examples in kernel.

Grading programers is a hard problem and I don't think the software
industry has really solved it so far, even though there was a lot of
effort trying to do it over several decades. I doubt it will be solved
for the Linux kernel either.

-Andi

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16  4:29                                 ` Willy Tarreau
@ 2008-04-16 12:13                                   ` Rafael J. Wysocki
  0 siblings, 0 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2008-04-16 12:13 UTC (permalink / raw)
  To: Willy Tarreau, David Newall
  Cc: Michael Kerrisk, James Morris, Al Viro, Andrew Morton, david,
	Stephen Clark, Evgeniy Polyakov, Tilman Schmidt, Valdis.Kletnieks,
	Mark Lord, David Miller, jesper.juhl, yoshfuji, jeff,
	linux-kernel, git, netdev, Andi Kleen

On Wednesday, 16 of April 2008, Willy Tarreau wrote:
> On Wed, Apr 16, 2008 at 12:04:59PM +0930, David Newall wrote:
> > Rafael J. Wysocki wrote:
> > > Well, even if someone introduces bugs relatively frequently, but then also
> > > works with the reporters and fixes the bugs timely, it's about okay IMO.
> > >   
> > This really is not okay.  Even if bugs are fixed a version or two later,
> > the impact those bugs have on users makes the system look bad and drives
> > them away.  We do not, I believe, want Linux to top the list for "most
> > bugs".  It's unprofessional, unreliable and quite undesirable.
> 
> that's what -rc are for, and it's unprofessional to use them in production :-)

Exactly.

And BTW, by saying "timely" I meant "in -rc" or "before the next major release".

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-15  9:33                       ` David Newall
  2008-04-15  9:54                         ` Michael Kerrisk
@ 2008-04-16 12:15                         ` Sverre Rabbelier
  2008-04-16 13:26                           ` Adrian Bunk
  2008-04-16 21:17                           ` Jesper Juhl
  1 sibling, 2 replies; 66+ messages in thread
From: Sverre Rabbelier @ 2008-04-16 12:15 UTC (permalink / raw)
  To: git, linux-kernel
  Cc: James Morris, Al Viro, Andrew Morton, Willy Tarreau, david,
	Stephen Clark, Evgeniy Polyakov, Rafael J. Wysocki,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, netdev, David Newall

I'm not subscribed to the kernel mailing list, so please include me in
the cc if you don't reply to the git list (which I am subscribed to).

Git is participating in Google Summer of Code this year and I've
proposed to write a 'git statistics' command. This command would allow
the user to gather data about a repository, ranging from "how active
is dev x" to "what did x work on in the last 3 weeks". It's main
feature however, would be an algorithm that ranks commits as being
either 'buggy', 'bugfix' or 'enhancement'. (There are several clues
that can aid in determining this, a commit msg along the lines of
"fixes ..." being the most obvious.)
In the light of this recent discussion, especially the part on
"keeping count of the number of errors introduced by
author and reviewer?", I thought it might for the kernel mailing list
to be aware of this. Also mentioned in this thread was that reviewers
don't get enough credits. As long as patches are signed with, say,
'reviewed-by:', 'acked-by:' or 'signed-off-by:' the command I suggest
to implement would be able to give more accurate statistics on who
"works on the kernel". This way reviewers get the credit they deserve.
The knife cuts on both sides of course, if someone reviews a patch
that is later determined to introduce a bug, they can be recorded to
have acked a buggy commit. This is especially interesting in
determining who are the good reviewers, but also in determining who
are the good contributors. A distinction could be made between parts
of the source, say, a maintainer might excel in patches related to
driver foo, but when they submit a patch for driver bar it usually
contains bugs . Armed with these statistics reviewers might decide to
be more careful before acking a patch from that maintainer if it's on
driver bar, but when that same maintainer sends in a patch from driver
bar it is probably ok and needs less attention.
My application, and a more extended description, can be found here:
http://alturin.googlepages.com/gsoc2008

I'm interested to know if the community is indeed as interested in my
proposal as I hope and if I oversaw any obvious features that would
make it an even better command.

Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16  3:53                                 ` david
  2008-04-16  9:06                                   ` David Newall
@ 2008-04-16 12:41                                   ` Stephen Clark
  1 sibling, 0 replies; 66+ messages in thread
From: Stephen Clark @ 2008-04-16 12:41 UTC (permalink / raw)
  To: david
  Cc: David Newall, Rafael J. Wysocki, Michael Kerrisk, James Morris,
	Al Viro, Andrew Morton, Willy Tarreau, Evgeniy Polyakov,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, linux-kernel, git, netdev

david@lang.hm wrote:
> On Wed, 16 Apr 2008, David Newall wrote:
> 
>> Rafael J. Wysocki wrote:
>>> Well, even if someone introduces bugs relatively frequently, but then 
>>> also
>>> works with the reporters and fixes the bugs timely, it's about okay IMO.
>>>
>> This really is not okay.  Even if bugs are fixed a version or two later,
>> the impact those bugs have on users makes the system look bad and drives
>> them away.  We do not, I believe, want Linux to top the list for "most
>> bugs".  It's unprofessional, unreliable and quite undesirable.
> 
> timely frequently means the code was merged in -rc1/2 and was fixed 
> before the final release of the same version.
> 
> given the huge variety of hardware and workloads, it's just too easy for 
> there to be cases where any trade-off you make (code size, performance, 
> memory usage, common case definitions) can turn around and bite you. In 
> addition frequently hardware doesn't work quite the way the design specs 
> say that it should (completely ignoring the fact that many drivers are 
> reverse engineered). what's most important is that when a case shows up 
> it gets addressed promptly
> 
> I'd rather have a developer/maintainer who introduces and fixed 100 bug, 
> but fixes them promptly, as opposed to one who only introduces one bug, 
> but refuses to consider fixing the code 'because they don't make 
> mistakes like that' (u\bsadly a common attitude from people who produce 
> very good code much of the time)
> 
> best of all is a developer/maintainer who writes very good code and is 
> willing to accept the fact that they make mistakes and fixes the code 
> promptly, but those people are extremely rare, and usually they emerge 
> from the pool of people who make more mistakes and fix them promptly, 
> which is an added reason I'm more tolerant of that group.
> 
> David Lang
> 
Having been a Linux user since the late 90's the problem I see is that
developers decide to re-design stuff that is already working and then things
that used to work don't work anymore.

Libata is a good example. I had an older laptop that eventually got working
again - but the old ide stuff wasn't studied enough to find out what had to be
brought forward and supported in libata.

Regards,
Steve
-- 

"They that give up essential liberty to obtain temporary safety,
deserve neither liberty nor safety."  (Ben Franklin)

"The course of history shows that as a government grows, liberty
decreases."  (Thomas Jefferson)

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 12:15                         ` Sverre Rabbelier
@ 2008-04-16 13:26                           ` Adrian Bunk
  2008-04-16 19:02                             ` Andrew Morton
                                               ` (2 more replies)
  2008-04-16 21:17                           ` Jesper Juhl
  1 sibling, 3 replies; 66+ messages in thread
From: Adrian Bunk @ 2008-04-16 13:26 UTC (permalink / raw)
  To: sverre
  Cc: git, linux-kernel, James Morris, Al Viro, Andrew Morton,
	Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, netdev, David Newall

On Wed, Apr 16, 2008 at 02:15:22PM +0200, Sverre Rabbelier wrote:
> I'm not subscribed to the kernel mailing list, so please include me in
> the cc if you don't reply to the git list (which I am subscribed to).
> 
> Git is participating in Google Summer of Code this year and I've
> proposed to write a 'git statistics' command. This command would allow
> the user to gather data about a repository, ranging from "how active
> is dev x" to "what did x work on in the last 3 weeks". It's main
> feature however, would be an algorithm that ranks commits as being
> either 'buggy', 'bugfix' or 'enhancement'. (There are several clues
> that can aid in determining this, a commit msg along the lines of
> "fixes ..." being the most obvious.)
>...

At least with the data we have currently in git it's impossible to 
figure that out automatically.

E.g. if you look at commit f743d04dcfbeda7439b78802d35305781999aa11 
(ide/legacy/q40ide.c: add MODULE_LICENSE), how could you determine 
automatically that it is a bugfix, and the commit that introduced
the bug?

You can always get some data, but if you want to get usable statistics 
you need explicit tags in the commits, not some algorithm that tries 
to guess.

> Cheers,
> 
> Sverre Rabbelier

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 13:26                           ` Adrian Bunk
@ 2008-04-16 19:02                             ` Andrew Morton
  2008-04-16 19:43                               ` Sverre Rabbelier
                                                 ` (3 more replies)
  2008-04-16 19:39                             ` Sverre Rabbelier
  2008-04-16 20:04                             ` Willy Tarreau
  2 siblings, 4 replies; 66+ messages in thread
From: Andrew Morton @ 2008-04-16 19:02 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: sverre, git, linux-kernel, jmorris, viro, w, david, sclark46,
	johnpol, rjw, tilman, Valdis.Kletnieks, lkml, davem, jesper.juhl,
	yoshfuji, jeff, netdev, davidn

On Wed, 16 Apr 2008 16:26:34 +0300
Adrian Bunk <bunk@kernel.org> wrote:

> On Wed, Apr 16, 2008 at 02:15:22PM +0200, Sverre Rabbelier wrote:
> > I'm not subscribed to the kernel mailing list, so please include me in
> > the cc if you don't reply to the git list (which I am subscribed to).
> > 
> > Git is participating in Google Summer of Code this year and I've
> > proposed to write a 'git statistics' command. This command would allow
> > the user to gather data about a repository, ranging from "how active
> > is dev x" to "what did x work on in the last 3 weeks". It's main
> > feature however, would be an algorithm that ranks commits as being
> > either 'buggy', 'bugfix' or 'enhancement'. (There are several clues
> > that can aid in determining this, a commit msg along the lines of
> > "fixes ..." being the most obvious.)
> >...

Sounds like an interesting project.

> At least with the data we have currently in git it's impossible to 
> figure that out automatically.
> 
> E.g. if you look at commit f743d04dcfbeda7439b78802d35305781999aa11 
> (ide/legacy/q40ide.c: add MODULE_LICENSE), how could you determine 
> automatically that it is a bugfix, and the commit that introduced
> the bug?
> 
> You can always get some data, but if you want to get usable statistics 
> you need explicit tags in the commits, not some algorithm that tries 
> to guess.

Well yes.  One outcome of the project would be to tell us what changes we'd
need to make to our processes to make such data gathering more effective.

Of course, we may not actually implement such changes.  That would depend
upon how useful the output is to us.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 13:26                           ` Adrian Bunk
  2008-04-16 19:02                             ` Andrew Morton
@ 2008-04-16 19:39                             ` Sverre Rabbelier
  2008-04-16 20:16                               ` Adrian Bunk
  2008-04-16 20:04                             ` Willy Tarreau
  2 siblings, 1 reply; 66+ messages in thread
From: Sverre Rabbelier @ 2008-04-16 19:39 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: git, linux-kernel, James Morris, Al Viro, Andrew Morton,
	Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, netdev, David Newall

On Wed, Apr 16, 2008 at 3:26 PM, Adrian Bunk <bunk@kernel.org> wrote:
> On Wed, Apr 16, 2008 at 02:15:22PM +0200, Sverre Rabbelier wrote:
>  At least with the data we have currently in git it's impossible to
>  figure that out automatically.

I don't quite agree, as I explained in my proposal there are several
ways to detect that a commit was a bugfix. From thereon you can deduct
that if it was a bugfix, that the commit that introduced the fixed
change was a bug! From thereon you can start sifting and get more
confirmations. Junio has made several suggestions as to how this could
be implemented and I'm confident that and algorithm can be devised
that is at least capable of 'guessing' what type a commit is. Aside
from the guessing part I think a lot of information can be gathered
from commit msgs.

Of course, some commits might not be able to be typed (as there might
not be any 'follow up' information on them). Those commits can be
marked as 'unknown' and be ignored. Agreed, should all commits be
'unknown' then the command wouldn't be very useful, but especially on
large repos there is a very large dataset. As the size of the dataset
increases I estimate that the correlation between commits increases
(less commits that add new code which then is never changed
therafter). The higher the degree of correlation between individual
commits the more we can determine about the nature of a commit.

>  E.g. if you look at commit f743d04dcfbeda7439b78802d35305781999aa11
>  (ide/legacy/q40ide.c: add MODULE_LICENSE), how could you determine
>  automatically that it is a bugfix, and the commit that introduced
>  the bug?

Well, a dead giveaway would be:
"http://bugzilla.kernel.org/show_bug.cgi?id=10124"

>  You can always get some data, but if you want to get usable statistics
>  you need explicit tags in the commits, not some algorithm that tries
>  to guess.

As said above, I don't agree, you can 'guess' very reliably on a large
dataset. Also, most commits are already 'tagged' in some way or
another. The trick is to find the pattern in this tagging and use it.

I hope this clears things up a bit,

Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 19:02                             ` Andrew Morton
@ 2008-04-16 19:43                               ` Sverre Rabbelier
  2008-04-16 19:55                               ` Adrian Bunk
                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 66+ messages in thread
From: Sverre Rabbelier @ 2008-04-16 19:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adrian Bunk, git, linux-kernel, jmorris, viro, w, david, sclark46,
	johnpol, rjw, tilman, Valdis.Kletnieks, lkml, davem, jesper.juhl,
	yoshfuji, jeff, netdev, davidn

On Wed, Apr 16, 2008 at 9:02 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
>  Sounds like an interesting project.

Thank you :).

>  Well yes.  One outcome of the project would be to tell us what changes we'd
>  need to make to our processes to make such data gathering more effective.

I defenitly agree here, the command's reliability could be increased
by always specifying bugfixes in a certain way. 'fixed-bug:' for
example should be very recognizable.

>  Of course, we may not actually implement such changes.  That would depend
>  upon how useful the output is to us.

Ah yes, free will and whatnot. Then again, everybody already does
'signed-off-by:', if there's an easy command in git to mark a bugfix,
it would increase the odds of people using it. Perhaps something like
'git commit -b 10256" which would then automagically append a
predefined message to the commit users would feel more inclined?

Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 19:02                             ` Andrew Morton
  2008-04-16 19:43                               ` Sverre Rabbelier
@ 2008-04-16 19:55                               ` Adrian Bunk
  2008-04-17 13:50                                 ` J. Bruce Fields
  2008-04-16 19:58                               ` Alexey Dobriyan
  2008-04-16 20:01                               ` Arjan van de Ven
  3 siblings, 1 reply; 66+ messages in thread
From: Adrian Bunk @ 2008-04-16 19:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: sverre, git, linux-kernel, jmorris, viro, w, david, sclark46,
	johnpol, rjw, tilman, Valdis.Kletnieks, lkml, davem, jesper.juhl,
	yoshfuji, jeff, netdev, davidn

On Wed, Apr 16, 2008 at 12:02:47PM -0700, Andrew Morton wrote:
> On Wed, 16 Apr 2008 16:26:34 +0300
> Adrian Bunk <bunk@kernel.org> wrote:
> 
> > On Wed, Apr 16, 2008 at 02:15:22PM +0200, Sverre Rabbelier wrote:
> > > I'm not subscribed to the kernel mailing list, so please include me in
> > > the cc if you don't reply to the git list (which I am subscribed to).
> > > 
> > > Git is participating in Google Summer of Code this year and I've
> > > proposed to write a 'git statistics' command. This command would allow
> > > the user to gather data about a repository, ranging from "how active
> > > is dev x" to "what did x work on in the last 3 weeks". It's main
> > > feature however, would be an algorithm that ranks commits as being
> > > either 'buggy', 'bugfix' or 'enhancement'. (There are several clues
> > > that can aid in determining this, a commit msg along the lines of
> > > "fixes ..." being the most obvious.)
> > >...
> 
> Sounds like an interesting project.
> 
> > At least with the data we have currently in git it's impossible to 
> > figure that out automatically.
> > 
> > E.g. if you look at commit f743d04dcfbeda7439b78802d35305781999aa11 
> > (ide/legacy/q40ide.c: add MODULE_LICENSE), how could you determine 
> > automatically that it is a bugfix, and the commit that introduced
> > the bug?
> > 
> > You can always get some data, but if you want to get usable statistics 
> > you need explicit tags in the commits, not some algorithm that tries 
> > to guess.
> 
> Well yes.  One outcome of the project would be to tell us what changes we'd
> need to make to our processes to make such data gathering more effective.
> 
> Of course, we may not actually implement such changes.  That would depend
> upon how useful the output is to us.

That you can add this information through tags is clear, but according
to his SoC application that's not what he wants to do.

According to his application he wants to determine automatically whether 
a commit was a fix or whether a commit introduced a bug by doing stuff 
like tracking whether a changed line was modified again shortly after a 
commit.

This plan of him will simply not result in accurate numbers.

Sure, you will get some numbers, but if anyone would e.g. wrongly accuse 
me that 2% of my commits last year introduced bugs I would get 
***really*** angry.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 19:02                             ` Andrew Morton
  2008-04-16 19:43                               ` Sverre Rabbelier
  2008-04-16 19:55                               ` Adrian Bunk
@ 2008-04-16 19:58                               ` Alexey Dobriyan
  2008-04-16 20:01                               ` Arjan van de Ven
  3 siblings, 0 replies; 66+ messages in thread
From: Alexey Dobriyan @ 2008-04-16 19:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adrian Bunk, sverre, git, linux-kernel, jmorris, viro, w, david,
	sclark46, johnpol, rjw, tilman, Valdis.Kletnieks, lkml, davem,
	jesper.juhl, yoshfuji, jeff, netdev, davidn

On Wed, Apr 16, 2008 at 12:02:47PM -0700, Andrew Morton wrote:
> On Wed, 16 Apr 2008 16:26:34 +0300
> Adrian Bunk <bunk@kernel.org> wrote:
> 
> > On Wed, Apr 16, 2008 at 02:15:22PM +0200, Sverre Rabbelier wrote:
> > > I'm not subscribed to the kernel mailing list, so please include me in
> > > the cc if you don't reply to the git list (which I am subscribed to).
> > > 
> > > Git is participating in Google Summer of Code this year and I've
> > > proposed to write a 'git statistics' command. This command would allow
> > > the user to gather data about a repository, ranging from "how active
> > > is dev x" to "what did x work on in the last 3 weeks".

These are pointy-hairy questions.

> > > It's main
> > > feature however, would be an algorithm that ranks commits as being
> > > either 'buggy', 'bugfix' or 'enhancement'. (There are several clues
> > > that can aid in determining this, a commit msg along the lines of
> > > "fixes ..." being the most obvious.)
> > >...
> 
> Sounds like an interesting project.

The interesting (and answerable) questions are:

1) How many bugs one non-merge commit brings on average
2) What is average time between buggy commit entering Linus's tree and
   fix entering the same tree.
3) Graphs of #1 and #2 over time.
4) rough division of bugs a-la refcounting, locking, hw, hw workaround.
5) if other OS have such statistics, comparison with them
   (little finger for this)

#1 alone can shred OSDL and LWN induced PDFs into innumerable pieces!

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 19:02                             ` Andrew Morton
                                                 ` (2 preceding siblings ...)
  2008-04-16 19:58                               ` Alexey Dobriyan
@ 2008-04-16 20:01                               ` Arjan van de Ven
  3 siblings, 0 replies; 66+ messages in thread
From: Arjan van de Ven @ 2008-04-16 20:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Adrian Bunk, sverre, git, linux-kernel, jmorris, viro, w, david,
	sclark46, johnpol, rjw, tilman, Valdis.Kletnieks, lkml, davem,
	jesper.juhl, yoshfuji, jeff, netdev, davidn

On Wed, 16 Apr 2008 12:02:47 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> 
> > At least with the data we have currently in git it's impossible to 
> > figure that out automatically.
> > 
> > E.g. if you look at commit f743d04dcfbeda7439b78802d35305781999aa11 
> > (ide/legacy/q40ide.c: add MODULE_LICENSE), how could you determine 
> > automatically that it is a bugfix, and the commit that introduced
> > the bug?
> > 
> > You can always get some data, but if you want to get usable
> > statistics you need explicit tags in the commits, not some
> > algorithm that tries to guess.
> 
> Well yes.  One outcome of the project would be to tell us what
> changes we'd need to make to our processes to make such data
> gathering more effective.

also.. "what is a bugfix" is an interesting thing... for some things it's very easy.
For others.. it's really hard to draw a solid line where bugs stop and features start.
(for example, is a missing cpu id in oprofile a bugfix ("oprofile doesn't work") or 
a feature ("new cpu support"). This one is one of the more simple ones even...)

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 13:26                           ` Adrian Bunk
  2008-04-16 19:02                             ` Andrew Morton
  2008-04-16 19:39                             ` Sverre Rabbelier
@ 2008-04-16 20:04                             ` Willy Tarreau
  2008-04-16 20:55                               ` Jakub Narebski
  2 siblings, 1 reply; 66+ messages in thread
From: Willy Tarreau @ 2008-04-16 20:04 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: sverre, git, linux-kernel, James Morris, Al Viro, Andrew Morton,
	david, Stephen Clark, Evgeniy Polyakov, Rafael J. Wysocki,
	Tilman Schmidt, Valdis.Kletnieks, Mark Lord, David Miller,
	jesper.juhl, yoshfuji, jeff, netdev, David Newall

On Wed, Apr 16, 2008 at 04:26:34PM +0300, Adrian Bunk wrote:
> On Wed, Apr 16, 2008 at 02:15:22PM +0200, Sverre Rabbelier wrote:
> > I'm not subscribed to the kernel mailing list, so please include me in
> > the cc if you don't reply to the git list (which I am subscribed to).
> > 
> > Git is participating in Google Summer of Code this year and I've
> > proposed to write a 'git statistics' command. This command would allow
> > the user to gather data about a repository, ranging from "how active
> > is dev x" to "what did x work on in the last 3 weeks". It's main
> > feature however, would be an algorithm that ranks commits as being
> > either 'buggy', 'bugfix' or 'enhancement'. (There are several clues
> > that can aid in determining this, a commit msg along the lines of
> > "fixes ..." being the most obvious.)
> >...
> 
> At least with the data we have currently in git it's impossible to 
> figure that out automatically.
> 
> E.g. if you look at commit f743d04dcfbeda7439b78802d35305781999aa11 
> (ide/legacy/q40ide.c: add MODULE_LICENSE), how could you determine 
> automatically that it is a bugfix, and the commit that introduced
> the bug?
> 
> You can always get some data, but if you want to get usable statistics 
> you need explicit tags in the commits, not some algorithm that tries 
> to guess.

yes, and doing that would get back to the bureaucracy some people are
trying to reduce in order to save time to do the real work.

However, in another project of mine, I've got used to systematically
indicate the type of change in the subject line. It does not get any
slower for the author, and it appears in shortlogs. And quite amazingly
the principle has immediately been adopted by several contributors :

-----
Note to contributors: it's very handy when patches comes with a properly
formated subject. Try to put one of the following words between brackets
to indicate the importance of the patch followed by a short description:

[MINOR]    minor fix, very low risk of impact
[MEDIUM]   medium risk, may cause unexpected regressions of low importance or
           which may quickly be discovered
[MAJOR]    major risk of hidden regression. This happens when I rearrange large
           parts of code, when I play with timeouts, with variable
           initializations, etc...
[BUG]      fix for a minor or medium-level bug.
[CRITICAL] medium-term reliability or security is at risk, an upgrade is
           absolutely required.
[RELEASE]  release a new version
[BUILD]    fix build issues. If you could build, no upgrade required.
[CLEANUP]  code cleanup, silence of warnings, etc... theorically no impact
[TESTS]    added regression testing configuration files or scripts
[DOC]      documentation updates, no need to upgrade
[LICENSE]  licensing updates (may impact distro packagers)

Example: "[DOC] document options forwardfor to logasap"
-----

Nothing is mandatory, and I (as the maintainer) can still choose to
adjust the prefix if I want. But in fact, I only had to to it when
contributors did not classify their patch themselves. Several other
tags may be added for LKML, such as "RFC" which is already used,
etc...

The advantages of this usage are multiple. Nothing needs to be changed
in the tools, no header needs to be added, it's still very compatible
with the mailing-list usages (and helps focusing on specific patches),
it's absolutely not mandatory and easily tweakable.

I'd like people in this thread not to forget that what we need is not
a fantastic tool to work around some developers' weaknesses, but cheap
(if any) help from the developers to help reviewers. I think that such
a proposal falls exactly in this category.

I'm quite ready to use it already (though I do not post often), and
think that it would still feel natural to many developers since most
of them are already used to such a format. I think it just requires
a few starters to get most of us to progressively use such a scheme
by default.

Regards,
Willy

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 19:39                             ` Sverre Rabbelier
@ 2008-04-16 20:16                               ` Adrian Bunk
  2008-04-16 20:53                                 ` Adrian Bunk
  0 siblings, 1 reply; 66+ messages in thread
From: Adrian Bunk @ 2008-04-16 20:16 UTC (permalink / raw)
  To: sverre
  Cc: git, linux-kernel, James Morris, Al Viro, Andrew Morton,
	Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, netdev, David Newall

On Wed, Apr 16, 2008 at 09:39:41PM +0200, Sverre Rabbelier wrote:
> On Wed, Apr 16, 2008 at 3:26 PM, Adrian Bunk <bunk@kernel.org> wrote:
>...
> >  E.g. if you look at commit f743d04dcfbeda7439b78802d35305781999aa11
> >  (ide/legacy/q40ide.c: add MODULE_LICENSE), how could you determine
> >  automatically that it is a bugfix, and the commit that introduced
> >  the bug?
> 
> Well, a dead giveaway would be:
> "http://bugzilla.kernel.org/show_bug.cgi?id=10124"

Which could be "There is no driver for my TV card in the kernel."

> >  You can always get some data, but if you want to get usable statistics
> >  you need explicit tags in the commits, not some algorithm that tries
> >  to guess.
> 
> As said above, I don't agree, you can 'guess' very reliably on a large
> dataset. Also, most commits are already 'tagged' in some way or
> another. The trick is to find the pattern in this tagging and use it.
> 
> I hope this clears things up a bit,

I hope you are aware of the non-technical implications if the results 
don't match reality?

E.g. I am proud that my commits do virtually never introduce bugs, so 
any results someone publishes about what I do should better be right
or my first thoughts are somewhere between "fist" and "lawyer". [1]

> Cheers,
> 
> Sverre Rabbelier

cu
Adrian

[1] my actual reaction might only be an angry email, but I hope you
    get the point that wrong results can really piss off people

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 20:16                               ` Adrian Bunk
@ 2008-04-16 20:53                                 ` Adrian Bunk
  2008-04-16 21:05                                   ` Sverre Rabbelier
  0 siblings, 1 reply; 66+ messages in thread
From: Adrian Bunk @ 2008-04-16 20:53 UTC (permalink / raw)
  To: sverre
  Cc: git, linux-kernel, James Morris, Al Viro, Andrew Morton,
	Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, netdev, David Newall

On Wed, Apr 16, 2008 at 11:16:06PM +0300, Adrian Bunk wrote:
>...
> E.g. I am proud that my commits do virtually never introduce bugs, so 
> any results someone publishes about what I do should better be right
> or my first thoughts are somewhere between "fist" and "lawyer". [1]
>...

To avoid any misunderstandings:

This is not in any way meant against you personally.

But saying things like " X% of your commits introduced bugs" is not a
friendly thing, and wrong data could be quite hurting.

Especially in the open source world where much motivation comes from
people being proud of their work.

Even correct data can do harm.

And bad data can have really bad effects.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 20:04                             ` Willy Tarreau
@ 2008-04-16 20:55                               ` Jakub Narebski
  0 siblings, 0 replies; 66+ messages in thread
From: Jakub Narebski @ 2008-04-16 20:55 UTC (permalink / raw)
  To: git; +Cc: linux-kernel, netdev

Willy Tarreau wrote:

> Note to contributors: it's very handy when patches comes with a properly
> formated subject. Try to put one of the following words between brackets
> to indicate the importance of the patch followed by a short description:
> 
> [MINOR]    minor fix, very low risk of impact
> [MEDIUM]   medium risk, may cause unexpected regressions of low importance or
>            which may quickly be discovered

[...]

And git-am strips such prefixes because of [PATCH] and [PATCH n/m] which
should be stripped.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 20:53                                 ` Adrian Bunk
@ 2008-04-16 21:05                                   ` Sverre Rabbelier
  2008-04-16 21:25                                     ` Adrian Bunk
  0 siblings, 1 reply; 66+ messages in thread
From: Sverre Rabbelier @ 2008-04-16 21:05 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: git, linux-kernel, James Morris, Al Viro, Andrew Morton,
	Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, netdev, David Newall

On Wed, Apr 16, 2008 at 10:53 PM, Adrian Bunk <bunk@kernel.org> wrote:
>  To avoid any misunderstandings:
>
>  This is not in any way meant against you personally.

Thanks for pointing it out, I wasn't quite sure, but assumed that :).

>  But saying things like " X% of your commits introduced bugs" is not a
>  friendly thing, and wrong data could be quite hurting.

Yes, it could be, and I agree that conclusions shouldn't be based on
the details, but on the bigger picture. Also, I think it should (at
first) be used mainly as an indicator, of where attention might be
required. I mean, if it points out that one contributor almost always
commits buggy code, you don't have to present them with those
statistics right away. Instead you can ask the program where it bases
it's conclusions on, and research them yourself. If it does indeed
turn out that they are slacking that much you have good ground to have
a talk with them.

>  Especially in the open source world where much motivation comes from
>  people being proud of their work.

Yes, that is very true, I very much agree with that, but on the other
hand it might also point out contributors that are particularly
skillful in a certain section that was previously not noted. As with
all statistics, it's up to interpretation, misinterpreting statistics
could -always- have bad effects.

>  Even correct data can do harm.
>
>  And bad data can have really bad effects.

True, both, but as said, if properly interpreted it could be very useful.

Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 12:15                         ` Sverre Rabbelier
  2008-04-16 13:26                           ` Adrian Bunk
@ 2008-04-16 21:17                           ` Jesper Juhl
  2008-04-17 17:04                             ` David Newall
  1 sibling, 1 reply; 66+ messages in thread
From: Jesper Juhl @ 2008-04-16 21:17 UTC (permalink / raw)
  To: sverre
  Cc: git, linux-kernel, James Morris, Al Viro, Andrew Morton,
	Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, yoshfuji, jeff, netdev, David Newall

On 16/04/2008, Sverre Rabbelier <alturin@gmail.com> wrote:
...
>  Git is participating in Google Summer of Code this year and I've
>  proposed to write a 'git statistics' command. This command would allow
>  the user to gather data about a repository, ranging from "how active
>  is dev x" to "what did x work on in the last 3 weeks". It's main
>  feature however, would be an algorithm that ranks commits as being
>  either 'buggy', 'bugfix' or 'enhancement'.

Interresting. Just be careful results are produced for the big picture
and not used to point fingers at individuals.

>(There are several clues
>  that can aid in determining this, a commit msg along the lines of
>  "fixes ..." being the most obvious.)

One thing I thought of is that the more "Acked-by", "Reviewed-by" and
"Signed-off-by" lines a patch has, the better reviewed we can probably
assume it to be and thus the probability of it having introduced a bug
probably drops slightly compared to other less-reviewed patches... or
maybe not, but at least it's something to think about :-)


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 21:05                                   ` Sverre Rabbelier
@ 2008-04-16 21:25                                     ` Adrian Bunk
  0 siblings, 0 replies; 66+ messages in thread
From: Adrian Bunk @ 2008-04-16 21:25 UTC (permalink / raw)
  To: sverre
  Cc: git, linux-kernel, James Morris, Al Viro, Andrew Morton,
	Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, jesper.juhl, yoshfuji, jeff, netdev, David Newall

On Wed, Apr 16, 2008 at 11:05:17PM +0200, Sverre Rabbelier wrote:
> On Wed, Apr 16, 2008 at 10:53 PM, Adrian Bunk <bunk@kernel.org> wrote:
> >  To avoid any misunderstandings:
> >
> >  This is not in any way meant against you personally.
> 
> Thanks for pointing it out, I wasn't quite sure, but assumed that :).

Sorry, I was a bit overreacting since I see too often people putting 
some data into some statistics or graph and drawing conclusins without 
paying attention to whether their data allows these conclusions at all.

> >  But saying things like " X% of your commits introduced bugs" is not a
> >  friendly thing, and wrong data could be quite hurting.
> 
> Yes, it could be, and I agree that conclusions shouldn't be based on
> the details, but on the bigger picture. Also, I think it should (at
> first) be used mainly as an indicator, of where attention might be
> required. I mean, if it points out that one contributor almost always
> commits buggy code,

I would assume that in all projects the main maintainers already have an 
impression of how good the quality of the patches of each main 
contributor is.

In much more complex ways than a number could express.

> you don't have to present them with those
> statistics right away. Instead you can ask the program where it bases
> it's conclusions on, and research them yourself.

Sooner or later someone will run the program for the Linux kernel, 
write a paper about the results, and publish his research somewhere.

>...
> Cheers,
> 
> Sverre Rabbelier

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 19:55                               ` Adrian Bunk
@ 2008-04-17 13:50                                 ` J. Bruce Fields
  2008-04-17 15:26                                   ` Adrian Bunk
  0 siblings, 1 reply; 66+ messages in thread
From: J. Bruce Fields @ 2008-04-17 13:50 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andrew Morton, sverre, git, linux-kernel, jmorris, viro, w, david,
	sclark46, johnpol, rjw, tilman, Valdis.Kletnieks, lkml, davem,
	jesper.juhl, yoshfuji, jeff, netdev, davidn

On Wed, Apr 16, 2008 at 10:55:03PM +0300, Adrian Bunk wrote:
> On Wed, Apr 16, 2008 at 12:02:47PM -0700, Andrew Morton wrote:
> > On Wed, 16 Apr 2008 16:26:34 +0300
> > Adrian Bunk <bunk@kernel.org> wrote:
> > 
> > > On Wed, Apr 16, 2008 at 02:15:22PM +0200, Sverre Rabbelier wrote:
> > > > I'm not subscribed to the kernel mailing list, so please include me in
> > > > the cc if you don't reply to the git list (which I am subscribed to).
> > > > 
> > > > Git is participating in Google Summer of Code this year and I've
> > > > proposed to write a 'git statistics' command. This command would allow
> > > > the user to gather data about a repository, ranging from "how active
> > > > is dev x" to "what did x work on in the last 3 weeks". It's main
> > > > feature however, would be an algorithm that ranks commits as being
> > > > either 'buggy', 'bugfix' or 'enhancement'. (There are several clues
> > > > that can aid in determining this, a commit msg along the lines of
> > > > "fixes ..." being the most obvious.)
> > > >...
> > 
> > Sounds like an interesting project.
> > 
> > > At least with the data we have currently in git it's impossible to 
> > > figure that out automatically.
> > > 
> > > E.g. if you look at commit f743d04dcfbeda7439b78802d35305781999aa11 
> > > (ide/legacy/q40ide.c: add MODULE_LICENSE), how could you determine 
> > > automatically that it is a bugfix, and the commit that introduced
> > > the bug?
> > > 
> > > You can always get some data, but if you want to get usable statistics 
> > > you need explicit tags in the commits, not some algorithm that tries 
> > > to guess.
> > 
> > Well yes.  One outcome of the project would be to tell us what changes we'd
> > need to make to our processes to make such data gathering more effective.
> > 
> > Of course, we may not actually implement such changes.  That would depend
> > upon how useful the output is to us.
> 
> That you can add this information through tags is clear, but according
> to his SoC application that's not what he wants to do.
> 
> According to his application he wants to determine automatically whether 
> a commit was a fix or whether a commit introduced a bug by doing stuff 
> like tracking whether a changed line was modified again shortly after a 
> commit.
> 
> This plan of him will simply not result in accurate numbers.

They won't be completely accurate, but who knows, maybe they'd turn out
to have a higher rate of accuracy than we'd expect.  (I assume you could
do a closer manual study of a small random sample of the results to
estimate the accuracy.)  Seems worth a try.

> Sure, you will get some numbers, but if anyone would e.g. wrongly accuse 
> me that 2% of my commits last year introduced bugs I would get 
> ***really*** angry.

It's just an experiment; reasonable people won't take it as the final
word.

--b.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-17 13:50                                 ` J. Bruce Fields
@ 2008-04-17 15:26                                   ` Adrian Bunk
  0 siblings, 0 replies; 66+ messages in thread
From: Adrian Bunk @ 2008-04-17 15:26 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Andrew Morton, sverre, git, linux-kernel, jmorris, viro, w, david,
	sclark46, johnpol, rjw, tilman, Valdis.Kletnieks, lkml, davem,
	jesper.juhl, yoshfuji, jeff, netdev, davidn

On Thu, Apr 17, 2008 at 09:50:13AM -0400, J. Bruce Fields wrote:
> On Wed, Apr 16, 2008 at 10:55:03PM +0300, Adrian Bunk wrote:
>...
> > Sure, you will get some numbers, but if anyone would e.g. wrongly accuse 
> > me that 2% of my commits last year introduced bugs I would get 
> > ***really*** angry.
> 
> It's just an experiment; reasonable people won't take it as the final
> word.

Take e.g. [1] as an example how git statistics about the Linux kernel 
are already used to "prove" things that aren't true.

> --b.

cu
Adrian

[1] http://digitalvampire.org/blog/index.php/2008/04/11/lies-d-oh-forget-it/

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-16 21:17                           ` Jesper Juhl
@ 2008-04-17 17:04                             ` David Newall
  2008-04-17 19:09                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 66+ messages in thread
From: David Newall @ 2008-04-17 17:04 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: sverre, git, linux-kernel, James Morris, Al Viro, Andrew Morton,
	Willy Tarreau, david, Stephen Clark, Evgeniy Polyakov,
	Rafael J. Wysocki, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, yoshfuji, jeff, netdev

Jesper Juhl wrote:
> Interresting. Just be careful results are produced for the big picture
> and not used to point fingers at individuals.
>   

If there are individuals at whom a finger needs to be pointed, this
system will highlight them, and fingers will (and should) be pointed. 
Contributors of poor-quality code need to be weeded-out. 
Finger-pointing, in these extreme cases, gives incentive to improve
quality.  It's a positive thing.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-17 17:04                             ` David Newall
@ 2008-04-17 19:09                               ` Rafael J. Wysocki
  2008-04-17 19:35                                 ` Ray Lee
  0 siblings, 1 reply; 66+ messages in thread
From: Rafael J. Wysocki @ 2008-04-17 19:09 UTC (permalink / raw)
  To: David Newall
  Cc: Jesper Juhl, sverre, git, linux-kernel, James Morris, Al Viro,
	Andrew Morton, Willy Tarreau, david, Stephen Clark,
	Evgeniy Polyakov, Tilman Schmidt, Valdis.Kletnieks, Mark Lord,
	David Miller, yoshfuji, jeff, netdev

On Thursday, 17 of April 2008, David Newall wrote:
> Jesper Juhl wrote:
> > Interresting. Just be careful results are produced for the big picture
> > and not used to point fingers at individuals.
> >   
> 
> If there are individuals at whom a finger needs to be pointed, this
> system will highlight them, and fingers will (and should) be pointed. 
> Contributors of poor-quality code need to be weeded-out.

Define poor quality.
 
> Finger-pointing, in these extreme cases, gives incentive to improve
> quality.  It's a positive thing.

Sorry, but I have to disagree.  Negative finger-pointing is never a good thing.
Also, it doesn't give any incentive to anyone.  It only makes people feel bad
and finally discourages them from contributing anything.

If you want to give poeple incentives, reward them for doing things you'd like
them to do.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-17 19:09                               ` Rafael J. Wysocki
@ 2008-04-17 19:35                                 ` Ray Lee
  2008-04-17 19:57                                   ` Sverre Rabbelier
  2008-04-17 20:16                                   ` Al Viro
  0 siblings, 2 replies; 66+ messages in thread
From: Ray Lee @ 2008-04-17 19:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Newall, Jesper Juhl, sverre, git, linux-kernel,
	James Morris, Al Viro, Andrew Morton, Willy Tarreau, david,
	Stephen Clark, Evgeniy Polyakov, Tilman Schmidt, Valdis.Kletnieks,
	Mark Lord, David Miller, yoshfuji, jeff, netdev

On Thu, Apr 17, 2008 at 12:09 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>  > Finger-pointing, in these extreme cases, gives incentive to improve
>  > quality.  It's a positive thing.
>
>  Sorry, but I have to disagree.  Negative finger-pointing is never a good thing.

Correct, but let's be careful here. The original suggestion was,
effectively, to get better metrics on the quality of contributions.
Those metrics *could* be used for finger pointing, or (my preference)
they could be used to direct and allocate our scarce resources: code
reviews and mentoring.

There's no way to know what the metrics will tell us until we have
them. Arguing against metrics because they *may* be used to point
fingers at people is a silly argument; anything can be subverted to do
that.

Let's get some measurements and see what they say. In the meantime,
try to believe that they could be put to good purposes, such as
identifying code areas that are tricky for contributors to get right
(independent of contributor), or contributors that could benefit from
code reviews, etc.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-17 19:35                                 ` Ray Lee
@ 2008-04-17 19:57                                   ` Sverre Rabbelier
  2008-04-17 20:16                                   ` Al Viro
  1 sibling, 0 replies; 66+ messages in thread
From: Sverre Rabbelier @ 2008-04-17 19:57 UTC (permalink / raw)
  To: Ray Lee
  Cc: Rafael J. Wysocki, David Newall, Jesper Juhl, git, linux-kernel,
	James Morris, Al Viro, Andrew Morton, Willy Tarreau, david,
	Stephen Clark, Evgeniy Polyakov, Tilman Schmidt, Valdis.Kletnieks,
	Mark Lord, David Miller, yoshfuji, jeff, netdev

On Thu, Apr 17, 2008 at 9:35 PM, Ray Lee <ray-lk@madrabbit.org> wrote:
> On Thu, Apr 17, 2008 at 12:09 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>  >  > Finger-pointing, in these extreme cases, gives incentive to improve
>  >  > quality.  It's a positive thing.
>  >
>  >  Sorry, but I have to disagree.  Negative finger-pointing is never a good thing.
>
>  Correct, but let's be careful here. The original suggestion was,
>  effectively, to get better metrics on the quality of contributions.
>  Those metrics *could* be used for finger pointing, or (my preference)
>  they could be used to direct and allocate our scarce resources: code
>  reviews and mentoring.

Exactly!

>  There's no way to know what the metrics will tell us until we have
>  them. Arguing against metrics because they *may* be used to point
>  fingers at people is a silly argument; anything can be subverted to do
>  that.

Thank you, that should have been said before, you worded it perfectly.

>  Let's get some measurements and see what they say. In the meantime,
>  try to believe that they could be put to good purposes, such as
>  identifying code areas that are tricky for contributors to get right
>  (independent of contributor), or contributors that could benefit from
>  code reviews, etc.

This especially is an area that I plan to focus on and should be very
reliable when finished. As can be read in my application, I plan to
look at how often a piece of code is changed, in what timespan and by
how many different authors.

Thanks for the reply!

Cheers,

Sverre

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-17 19:35                                 ` Ray Lee
  2008-04-17 19:57                                   ` Sverre Rabbelier
@ 2008-04-17 20:16                                   ` Al Viro
  2008-04-17 20:38                                     ` Ray Lee
  1 sibling, 1 reply; 66+ messages in thread
From: Al Viro @ 2008-04-17 20:16 UTC (permalink / raw)
  To: Ray Lee
  Cc: Rafael J. Wysocki, David Newall, Jesper Juhl, sverre, git,
	linux-kernel, James Morris, Andrew Morton, Willy Tarreau, david,
	Stephen Clark, Evgeniy Polyakov, Tilman Schmidt, Valdis.Kletnieks,
	Mark Lord, David Miller, yoshfuji, jeff, netdev

On Thu, Apr 17, 2008 at 12:35:12PM -0700, Ray Lee wrote:
> On Thu, Apr 17, 2008 at 12:09 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >  > Finger-pointing, in these extreme cases, gives incentive to improve
> >  > quality.  It's a positive thing.
> >
> >  Sorry, but I have to disagree.  Negative finger-pointing is never a good thing.
> 
> Correct, but let's be careful here. The original suggestion was,
> effectively, to get better metrics on the quality of contributions.

	There already is one: reputation with people working on the tree,
be it actively modifying/reviewing/bug hunting/etc.  _We_ _already_ _know_;
generally one gets a decent idea of what to expect pretty soon.

	And frankly, that's the only thing that matters anyway; I suspect
I'd do rather well by proposed criteria, but you know what?  I don't give
a flying f*ck through the rolling doughnut for self-appointed PHBs and
their idea of performance reviews.

	Think of it as a modified Turing test: convince me that you are
not a script piped through an Eng.Lit. wanker or an MBA, then I might care
for your opinion.

	Al, who never had problems with pointing fingers and laughing, but
likes an informed human brain to be the source of it...

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-17 20:16                                   ` Al Viro
@ 2008-04-17 20:38                                     ` Ray Lee
  2008-04-17 20:53                                       ` Al Viro
  0 siblings, 1 reply; 66+ messages in thread
From: Ray Lee @ 2008-04-17 20:38 UTC (permalink / raw)
  To: Al Viro
  Cc: Rafael J. Wysocki, David Newall, Jesper Juhl, sverre, git,
	linux-kernel, James Morris, Andrew Morton, Willy Tarreau, david,
	Stephen Clark, Evgeniy Polyakov, Tilman Schmidt, Valdis.Kletnieks,
	Mark Lord, David Miller, yoshfuji, jeff, netdev

On Thu, Apr 17, 2008 at 1:16 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Apr 17, 2008 at 12:35:12PM -0700, Ray Lee wrote:
>  > On Thu, Apr 17, 2008 at 12:09 PM, Rafael J. Wysocki <rjw@sisk.pl> wrote:
>  > >  > Finger-pointing, in these extreme cases, gives incentive to improve
>  > >  > quality.  It's a positive thing.
>  > >
>  > >  Sorry, but I have to disagree.  Negative finger-pointing is never a good thing.
>  >
>  > Correct, but let's be careful here. The original suggestion was,
>  > effectively, to get better metrics on the quality of contributions.
>
>         There already is one: reputation with people working on the tree,
>  be it actively modifying/reviewing/bug hunting/etc.  _We_ _already_ _know_;

Sigh. No, you already know. I don't. This is not a rhetorical point.
I've just bid out another project that'd involve getting linux running
on another embedded hardware platform. If that happens, I get to spend
paid time to work on the kernel, and as a by-product spend more time
looking at patches and code coming across the list.

So, where would it be best to spend my time? Or anyone else's?

>  generally one gets a decent idea of what to expect pretty soon.
>
>         And frankly, that's the only thing that matters anyway; I suspect
>  I'd do rather well by proposed criteria, but you know what?  I don't give
>  a flying f*ck through the rolling doughnut for self-appointed PHBs and
>  their idea of performance reviews.

(Geez, conflate the issue much?) No one is saying you should. But
also, I haven't seen anyone saying it'd be used for performance
reviews other than you.

>         Think of it as a modified Turing test: convince me that you are
>  not a script piped through an Eng.Lit. wanker or an MBA, then I might care
>  for your opinion.

<shrug> Shockingly enough, I actually don't care. I'm just trying to
scratch my own itch, which is figure out where in the kernel (if
anywhere!) it'd be best to donate my time.

And your point is likely about the metrics, and yes, they'll be
computer generated. So? Perhaps they'll be crap. Who knows until we
look at them and match them up with what everyone already knows? If,
by some one in a thousand chance, they turn out to be good and useful,
then it'll either be a one-off eye-opener, or perhaps something useful
more than once.

Who knows? And to the larger point, why put effort into stopping
someone else from finding out?

>         Al, who never had problems with pointing fingers and laughing, but
>  likes an informed human brain to be the source of it...

<shrug> Shame and Guilt, two major motivators of human behavior, it's
true. But, one last time, *you're* the one saying the stats would be
used for finger pointing at people. Perhaps, instead, the stats will
show that we should all collectively point our fingers at some random
area in the tree, where everyone, despite their track record, ends up
making mistakes.

Let the kid find out, that's all I'm saying.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-17 20:38                                     ` Ray Lee
@ 2008-04-17 20:53                                       ` Al Viro
  2008-04-17 21:01                                         ` Ray Lee
  0 siblings, 1 reply; 66+ messages in thread
From: Al Viro @ 2008-04-17 20:53 UTC (permalink / raw)
  To: Ray Lee
  Cc: Rafael J. Wysocki, David Newall, Jesper Juhl, sverre, git,
	linux-kernel, James Morris, Andrew Morton, Willy Tarreau, david,
	Stephen Clark, Evgeniy Polyakov, Tilman Schmidt, Valdis.Kletnieks,
	Mark Lord, David Miller, yoshfuji, jeff, netdev

On Thu, Apr 17, 2008 at 01:38:18PM -0700, Ray Lee wrote:
> >         And frankly, that's the only thing that matters anyway; I suspect
> >  I'd do rather well by proposed criteria, but you know what?  I don't give
> >  a flying f*ck through the rolling doughnut for self-appointed PHBs and
> >  their idea of performance reviews.
> 
> (Geez, conflate the issue much?) No one is saying you should. But
> also, I haven't seen anyone saying it'd be used for performance
> reviews other than you.

|| If there are individuals at whom a finger needs to be pointed, this  
|| system will highlight them, and fingers will (and should) be pointed.
|| Contributors of poor-quality code need to be weeded-out.

in this thread (From: David Newall).

> <shrug> Shame and Guilt, two major motivators of human behavior, it's
> true. But, one last time, *you're* the one saying the stats would be
> used for finger pointing at people.

Not really.  Unless you are trying to imply that David is my sock puppet, that
is...

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: Reporting bugs and bisection
  2008-04-17 20:53                                       ` Al Viro
@ 2008-04-17 21:01                                         ` Ray Lee
  0 siblings, 0 replies; 66+ messages in thread
From: Ray Lee @ 2008-04-17 21:01 UTC (permalink / raw)
  To: Al Viro
  Cc: Rafael J. Wysocki, David Newall, Jesper Juhl, sverre, git,
	linux-kernel, James Morris, Andrew Morton, Willy Tarreau, david,
	Stephen Clark, Evgeniy Polyakov, Tilman Schmidt, Valdis.Kletnieks,
	Mark Lord, David Miller, yoshfuji, jeff, netdev

On Thu, Apr 17, 2008 at 1:53 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Thu, Apr 17, 2008 at 01:38:18PM -0700, Ray Lee wrote:
>  > >         And frankly, that's the only thing that matters anyway; I suspect
>  > >  I'd do rather well by proposed criteria, but you know what?  I don't give
>  > >  a flying f*ck through the rolling doughnut for self-appointed PHBs and
>  > >  their idea of performance reviews.
>  >
>  > (Geez, conflate the issue much?) No one is saying you should. But
>  > also, I haven't seen anyone saying it'd be used for performance
>  > reviews other than you.
>
>
> || If there are individuals at whom a finger needs to be pointed, this
>  || system will highlight them, and fingers will (and should) be pointed.
>  || Contributors of poor-quality code need to be weeded-out.
>
>  in this thread (From: David Newall).

Ah, I failed reading comprehension, yet again. Well, sounds like you
have a beef to take up with David, then. That's still not an argument
against trying to gather statistics and to see if they're worth
anything.

>  > <shrug> Shame and Guilt, two major motivators of human behavior, it's
>  > true. But, one last time, *you're* the one saying the stats would be
>  > used for finger pointing at people.
>
>  Not really.  Unless you are trying to imply that David is my sock puppet, that
>  is...

Momentarily amusing to think so, but no :-).

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2008-04-17 21:02 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <47FEADCB.7070104@rtr.ca>
     [not found] ` <20080413121831.d89dd424.akpm@linux-foundation.org>
     [not found]   ` <20080413202118.GA29658@2ka.mipt.ru>
     [not found]     ` <200804132233.50491.rjw@sisk.pl>
     [not found]       ` <20080413205406.GA9190@2ka.mipt.ru>
     [not found]         ` <48028830.6020703@earthlink.net>
2008-04-13 23:51           ` Reporting bugs and bisection david
2008-04-14  0:36             ` Jakub Narebski
2008-04-14  4:39             ` Willy Tarreau
2008-04-14  5:39               ` Al Viro
2008-04-14  6:24                 ` Andrew Morton
2008-04-14  6:39                   ` David Miller
2008-04-14  6:43                     ` David Miller
2008-04-14  7:23                   ` Al Viro
2008-04-14  7:43                     ` Al Viro
2008-04-14  8:04                     ` Andrew Morton
2008-04-14  8:30                       ` David Miller
2008-04-14  9:06                         ` Christoph Hellwig
2008-04-14  9:46                         ` Andi Kleen
2008-04-15  5:25                           ` Bill Fink
2008-04-14 10:15                         ` Andrew Morton
2008-04-14 10:41                           ` David Miller
2008-04-14 17:35                             ` Roman Shaposhnik
2008-04-14 12:08                       ` Adrian Bunk
2008-04-14 14:43                       ` Arjan van de Ven
2008-04-14 17:51                         ` Andrew Morton
2008-04-14 18:24                           ` Arjan van de Ven
2008-04-14 19:30                           ` Ilpo Järvinen
2008-04-14 15:54                     ` James Morris
2008-04-14 22:01                       ` David Miller
2008-04-14 23:05                         ` Andrew Morton
2008-04-15  4:55                           ` Willy Tarreau
2008-04-15 13:18                             ` Work WAS(Re: " jamal
2008-04-15  9:33                       ` David Newall
2008-04-15  9:54                         ` Michael Kerrisk
2008-04-15 14:04                           ` David Newall
2008-04-15 20:51                             ` Rafael J. Wysocki
2008-04-16  2:34                               ` David Newall
2008-04-16  3:53                                 ` david
2008-04-16  9:06                                   ` David Newall
2008-04-16 11:02                                     ` Andi Kleen
2008-04-16 12:41                                   ` Stephen Clark
2008-04-16  4:29                                 ` Willy Tarreau
2008-04-16 12:13                                   ` Rafael J. Wysocki
2008-04-16 12:15                         ` Sverre Rabbelier
2008-04-16 13:26                           ` Adrian Bunk
2008-04-16 19:02                             ` Andrew Morton
2008-04-16 19:43                               ` Sverre Rabbelier
2008-04-16 19:55                               ` Adrian Bunk
2008-04-17 13:50                                 ` J. Bruce Fields
2008-04-17 15:26                                   ` Adrian Bunk
2008-04-16 19:58                               ` Alexey Dobriyan
2008-04-16 20:01                               ` Arjan van de Ven
2008-04-16 19:39                             ` Sverre Rabbelier
2008-04-16 20:16                               ` Adrian Bunk
2008-04-16 20:53                                 ` Adrian Bunk
2008-04-16 21:05                                   ` Sverre Rabbelier
2008-04-16 21:25                                     ` Adrian Bunk
2008-04-16 20:04                             ` Willy Tarreau
2008-04-16 20:55                               ` Jakub Narebski
2008-04-16 21:17                           ` Jesper Juhl
2008-04-17 17:04                             ` David Newall
2008-04-17 19:09                               ` Rafael J. Wysocki
2008-04-17 19:35                                 ` Ray Lee
2008-04-17 19:57                                   ` Sverre Rabbelier
2008-04-17 20:16                                   ` Al Viro
2008-04-17 20:38                                     ` Ray Lee
2008-04-17 20:53                                       ` Al Viro
2008-04-17 21:01                                         ` Ray Lee
2008-04-14 19:13                   ` Rene Herman
2008-04-14 20:38                     ` Andrew Morton
2008-04-14 22:18                       ` Rene Herman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).