* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 2:30 ` Linus Torvalds
@ 2008-03-28 3:24 ` Christoph Lameter
2008-03-28 4:00 ` Linus Torvalds
2008-03-28 3:31 ` Yinghai Lu
` (2 subsequent siblings)
3 siblings, 1 reply; 55+ messages in thread
From: Christoph Lameter @ 2008-03-28 3:24 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
On Thu, 27 Mar 2008, Linus Torvalds wrote:
>
>
> On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9412
> > Subject : commit a878539ef994787c447a98c2e3ba0fe3dad984ec breaks boot on SB600 AHCI
> > Submitter : Srihari Vijayaraghavan <sriharivijayaraghavan@yahoo.com.au>
> > Date : 2008-03-12 17:15 (16 days old)
> > Handled-By : Jeff Garzik <jgarzik@pobox.com>
> > Richard Zhao <richard.zhao@amd.com>
>
> Fixed by 4cde32fc4b32e96a99063af3183acdfd54c563f0, methinks.
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9962
> > Subject : mount: could not find filesystem
> > Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> > Date : 2008-02-12 14:34 (45 days old)
> > References : http://lkml.org/lkml/2008/2/12/91
> > Handled-By : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > Yinghai Lu <yhlu.kernel@gmail.com>
>
> Needs more info. The original oops that opened it is fixed, but..
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9976
> > Subject : BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> > Submitter : Ben Nizette <bn@niasdigital.com>
> > Date : 2008-02-12 12:46 (45 days old)
> > References : http://lkml.org/lkml/2008/2/12/148
> > Handled-By : Haavard Skinnemoen <hskinnemoen@atmel.com>
>
> This one seems gone (and was apparently AVR-only):
>
> http://lkml.org/lkml/2008/2/13/607:
> "What ever the problem is it isn't immediately apparent in latest git so
> I guess we'll just have to keep our eyes peeled."
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9978
> > Subject : 2.6.25-rc1: volanoMark regression
> > Submitter : Zhang, Yanmin <yanmin_zhang@linux.intel.com>
> > Date : 2008-02-13 10:30 (44 days old)
> > References : http://lkml.org/lkml/2008/2/13/128
> > http://lkml.org/lkml/2008/3/12/52
> > http://lkml.org/lkml/2008/3/18/81
> > Handled-By : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> > Balbir Singh <balbir@linux.vnet.ibm.com>
>
> Hmm. It is a regression on one machine (2x quad-core stoakley), but not
> another (4x quad-core tigerton).
>
> Interestingly, the stoakley box numbers have apparently been all over the
> map.
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10318
> > Subject : WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
> > Submitter : Pawel Staszewski <pstaszewski@artcom.pl>
> > Date : 2008-03-25 02:50 (3 days old)
>
> Andrew and seems to have debugged this down to a kzalloc(GFP_ATOMIC) or
> similar.
Slab allocations can never use GFP_HIGHMEM. Slab allocators BUG
if either of these bits are set (checks on the slowpaths):
#define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
GFP flags are not masked/checked if either inline fallback to the page
allocator occurs (SLUB for >4k allocs) or if an allocation is forwarded
to the page allocator (SLOB, SLUB). They are also not checked on the
fastpaths.
AFAICT the check in kmap_atomic_prot is simply too strict.
void *kmap_atomic_prot(struct page *page, enum km_type type, pgprot_t
prot)
{
enum fixed_addresses idx;
unsigned long vaddr;
/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
debug_kmap_atomic_prot(type);
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
The check for PageHighMem(page) needs to either come before the
debug_kmap_atomic_prot() or kmap_atomic_prot should only be called for
HIGHMEM allocations. Otherwise any get_zeroed_page() alloc from an
interrupt context may cause a false positive here.
Seems to be a reoccurrence of something that I discussed with Andrew a
while back.
http://marc.info/?t=118790336700011&r=1&w=2
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 3:24 ` Christoph Lameter
@ 2008-03-28 4:00 ` Linus Torvalds
2008-03-28 10:48 ` Paweł Staszewski
` (2 more replies)
0 siblings, 3 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 4:00 UTC (permalink / raw)
To: Christoph Lameter
Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
On Thu, 27 Mar 2008, Christoph Lameter wrote:
>
> Slab allocations can never use GFP_HIGHMEM.
Totally irrelevant.
The page allocation path does
if (gfp_flags & __GFP_ZERO)
prep_zero_page(page, order, gfp_flags);
and that will cause a warning REGARDLESS of whether the page is a HIGHMEM
page or not.
And the fact is, passing in GFP_ZERO from the SLUB code is a bug
regardless, because it unnecessarily does the dual memset().
So here's a damn big clue:
- SLUB does its own GFP_ZERO handling
- so passing GFP_ZERO down to the page allocator is a f*cking bug
- and this has NOTHING what-so-ever to do with GFP_HIGHMEM or even
whether the warning is "valid" or not - it's a bug even if the warning
had never happened.
So stop blathering, and just admit that this was buggy. It was also
fundamentally fragile to leave GFP_ZERO around when it was known to not be
valid at that point (exactly because GFP_ZERO was handled by the caller).
Linus
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 4:00 ` Linus Torvalds
@ 2008-03-28 10:48 ` Paweł Staszewski
2008-03-28 17:46 ` Andrew Morton
2008-03-28 17:15 ` Pekka Enberg
2008-03-28 18:33 ` Christoph Lameter
2 siblings, 1 reply; 55+ messages in thread
From: Paweł Staszewski @ 2008-03-28 10:48 UTC (permalink / raw)
To: Linus Torvalds
Cc: Christoph Lameter, Rafael J. Wysocki, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
Linus Torvalds pisze:
> On Thu, 27 Mar 2008, Christoph Lameter wrote:
>
>> Slab allocations can never use GFP_HIGHMEM.
>>
>
> Totally irrelevant.
>
> The page allocation path does
>
> if (gfp_flags & __GFP_ZERO)
> prep_zero_page(page, order, gfp_flags);
>
> and that will cause a warning REGARDLESS of whether the page is a HIGHMEM
> page or not.
>
> And the fact is, passing in GFP_ZERO from the SLUB code is a bug
> regardless, because it unnecessarily does the dual memset().
>
> So here's a damn big clue:
> - SLUB does its own GFP_ZERO handling
> - so passing GFP_ZERO down to the page allocator is a f*cking bug
> - and this has NOTHING what-so-ever to do with GFP_HIGHMEM or even
> whether the warning is "valid" or not - it's a bug even if the warning
> had never happened.
>
> So stop blathering, and just admit that this was buggy. It was also
> fundamentally fragile to leave GFP_ZERO around when it was known to not be
> valid at that point (exactly because GFP_ZERO was handled by the caller).
>
> Linus
>
>
>
Sorry for offtopic but i have the same problem with kernels 2.6.25-*
like:
http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1274804
http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1270334
I search linux-netdev and found this links.
I only sugest that
Denys Fedoryshchenko
can have the same problem that i have with this kernels.
I must revert my all kernels to 2.6.23.11 to get stable work on high (ip
traffic) loads.
And there is no documentation for LRO... and Stephen Hemminger write me
that LRO is not compatible with bridgeing and routing.
see this link:
http://bugzilla.kernel.org/show_bug.cgi?id=10335
So there must be some documentation for this ... because people can have
many problems with this.
Sorry for offtopic but this can resolve problems like my and Denys .
Pawel
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 10:48 ` Paweł Staszewski
@ 2008-03-28 17:46 ` Andrew Morton
2008-03-28 21:57 ` Rafael J. Wysocki
0 siblings, 1 reply; 55+ messages in thread
From: Andrew Morton @ 2008-03-28 17:46 UTC (permalink / raw)
To: Paweł Staszewski
Cc: Linus Torvalds, Christoph Lameter, Rafael J. Wysocki, LKML,
Adrian Bunk, Natalie Protasevich, netdev
On Fri, 28 Mar 2008 11:48:55 +0100 Pawe__ Staszewski <pstaszewski@artcom.pl> wrote:
> Linus Torvalds pisze:
> > On Thu, 27 Mar 2008, Christoph Lameter wrote:
> >
> >> Slab allocations can never use GFP_HIGHMEM.
> >>
> >
> > Totally irrelevant.
> >
> > The page allocation path does
> >
> > if (gfp_flags & __GFP_ZERO)
> > prep_zero_page(page, order, gfp_flags);
> >
> > and that will cause a warning REGARDLESS of whether the page is a HIGHMEM
> > page or not.
> >
> > And the fact is, passing in GFP_ZERO from the SLUB code is a bug
> > regardless, because it unnecessarily does the dual memset().
> >
> > So here's a damn big clue:
> > - SLUB does its own GFP_ZERO handling
> > - so passing GFP_ZERO down to the page allocator is a f*cking bug
> > - and this has NOTHING what-so-ever to do with GFP_HIGHMEM or even
> > whether the warning is "valid" or not - it's a bug even if the warning
> > had never happened.
> >
> > So stop blathering, and just admit that this was buggy. It was also
> > fundamentally fragile to leave GFP_ZERO around when it was known to not be
> > valid at that point (exactly because GFP_ZERO was handled by the caller).
> >
> > Linus
> >
> >
> >
> Sorry for offtopic but i have the same problem with kernels 2.6.25-*
> like:
> http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1274804
> http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1270334
>
> I search linux-netdev and found this links.
> I only sugest that
>
> Denys Fedoryshchenko
>
> can have the same problem that i have with this kernels.
> I must revert my all kernels to 2.6.23.11 to get stable work on high (ip
> traffic) loads.
>
> And there is no documentation for LRO... and Stephen Hemminger write me
> that LRO is not compatible with bridgeing and routing.
> see this link:
> http://bugzilla.kernel.org/show_bug.cgi?id=10335
>
>
> So there must be some documentation for this ... because people can have
> many problems with this.
>
These are all networking things, so let's cc that list.
>
> Sorry for offtopic but this can resolve problems like my and Denys .
It's very on-topic - thanks for the reminder.
Rafael, are these things actually on the list?
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 17:46 ` Andrew Morton
@ 2008-03-28 21:57 ` Rafael J. Wysocki
0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 21:57 UTC (permalink / raw)
To: Andrew Morton
Cc: Paweł Staszewski, Linus Torvalds, Christoph Lameter, LKML,
Adrian Bunk, Natalie Protasevich, netdev
On Friday, 28 of March 2008, Andrew Morton wrote:
> On Fri, 28 Mar 2008 11:48:55 +0100 Pawe__ Staszewski <pstaszewski@artcom.pl> wrote:
>
> > Linus Torvalds pisze:
> > > On Thu, 27 Mar 2008, Christoph Lameter wrote:
> > >
> > >> Slab allocations can never use GFP_HIGHMEM.
> > >>
> > >
> > > Totally irrelevant.
> > >
> > > The page allocation path does
> > >
> > > if (gfp_flags & __GFP_ZERO)
> > > prep_zero_page(page, order, gfp_flags);
> > >
> > > and that will cause a warning REGARDLESS of whether the page is a HIGHMEM
> > > page or not.
> > >
> > > And the fact is, passing in GFP_ZERO from the SLUB code is a bug
> > > regardless, because it unnecessarily does the dual memset().
> > >
> > > So here's a damn big clue:
> > > - SLUB does its own GFP_ZERO handling
> > > - so passing GFP_ZERO down to the page allocator is a f*cking bug
> > > - and this has NOTHING what-so-ever to do with GFP_HIGHMEM or even
> > > whether the warning is "valid" or not - it's a bug even if the warning
> > > had never happened.
> > >
> > > So stop blathering, and just admit that this was buggy. It was also
> > > fundamentally fragile to leave GFP_ZERO around when it was known to not be
> > > valid at that point (exactly because GFP_ZERO was handled by the caller).
> > >
> > > Linus
> > >
> > >
> > >
> > Sorry for offtopic but i have the same problem with kernels 2.6.25-*
> > like:
> > http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1274804
> > http://kerneltrap.org/mailarchive/linux-netdev/2008/3/27/1270334
> >
> > I search linux-netdev and found this links.
> > I only sugest that
> >
> > Denys Fedoryshchenko
> >
> > can have the same problem that i have with this kernels.
> > I must revert my all kernels to 2.6.23.11 to get stable work on high (ip
> > traffic) loads.
> >
> > And there is no documentation for LRO... and Stephen Hemminger write me
> > that LRO is not compatible with bridgeing and routing.
> > see this link:
> > http://bugzilla.kernel.org/show_bug.cgi?id=10335
> >
> >
> > So there must be some documentation for this ... because people can have
> > many problems with this.
> >
>
> These are all networking things, so let's cc that list.
>
> >
> > Sorry for offtopic but this can resolve problems like my and Denys .
>
> It's very on-topic - thanks for the reminder.
>
> Rafael, are these things actually on the list?
Well, this isn't a recent regression, at least not from 2.6.24, so I don't
list it.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 4:00 ` Linus Torvalds
2008-03-28 10:48 ` Paweł Staszewski
@ 2008-03-28 17:15 ` Pekka Enberg
2008-03-28 17:27 ` Linus Torvalds
2008-03-28 18:33 ` Christoph Lameter
2 siblings, 1 reply; 55+ messages in thread
From: Pekka Enberg @ 2008-03-28 17:15 UTC (permalink / raw)
To: Linus Torvalds
Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
Hi Christoph,
On Fri, Mar 28, 2008 at 6:00 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> And the fact is, passing in GFP_ZERO from the SLUB code is a bug
> regardless, because it unnecessarily does the dual memset().
We clear GFP_ZERO in new_slab() so the normal kmalloc()/kzalloc() path
should be fine but don't do it for kmalloc_large() nor
kmalloc_large_node(). Is that the bug here?
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 17:15 ` Pekka Enberg
@ 2008-03-28 17:27 ` Linus Torvalds
2008-03-28 18:08 ` Pekka Enberg
2008-03-28 18:37 ` Christoph Lameter
0 siblings, 2 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 17:27 UTC (permalink / raw)
To: Pekka Enberg
Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
On Fri, 28 Mar 2008, Pekka Enberg wrote:
>
> On Fri, Mar 28, 2008 at 6:00 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > And the fact is, passing in GFP_ZERO from the SLUB code is a bug
> > regardless, because it unnecessarily does the dual memset().
>
> We clear GFP_ZERO in new_slab() so the normal kmalloc()/kzalloc() path
> should be fine but don't do it for kmalloc_large() nor
> kmalloc_large_node(). Is that the bug here?
Dammit, NO.
The bug was that the commit I made (which was correct and robust) was then
partially reverted by Christoph for no good reason. At that point,
kmalloc_large() didn't even exist, so at that point the change was
"technically correct" (since the only user of gfpflags really did end up
clearing it somewhere deep in its callchain).
So when that original 3811dbf67162bd08412f1b0e02e554f353e93bdb happened,
it wasn't an outright bug - but that doesn't make it right. That commit
was always just a bug waiting to happen, because it just set things up for
later problems by retaining that bit when it really shouldn't have been
retained, and forcing all future callers to be careful. Which they
obviously were not!
Yes, you can clear GFP_ZERO in multiple illogical places, and it will fix
the bug. Or you can clear it in *one* place, that is in on the direct
callchain from the person who actually does the memset(0), and even add a
comment that says exactly what is going on.
So the fact is, commit 3811dbf67162bd08412f1b0e02e554f353e93bdb is and was
total and utter crap. I've reverted it in my tree. It's crap not because
it was buggy when it was put in, but because it was *fragile* when it was
put in. And that fragility ended up causing a bug later.
I'm getting really tired of slub. It was supposed to be simpler code than
slab, and yes, it's simpler, but it has been buggy as hell, and part of it
has been that people just haven't been careful enough, and haven't written
code to be defensive and easy-to-follow.
So the *last* thing we want to do is to clear GFP_ZERO in multiple subtle
places based on new random code being added. We want to clear it at the
top level, so that no other code never ever even has to _think_ about it!
Linus
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 17:27 ` Linus Torvalds
@ 2008-03-28 18:08 ` Pekka Enberg
2008-03-28 18:20 ` Linus Torvalds
2008-03-28 18:37 ` Christoph Lameter
1 sibling, 1 reply; 55+ messages in thread
From: Pekka Enberg @ 2008-03-28 18:08 UTC (permalink / raw)
To: Linus Torvalds
Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
Hi Linus,
On Fri, 28 Mar 2008, Pekka Enberg wrote:
> > We clear GFP_ZERO in new_slab() so the normal kmalloc()/kzalloc() path
> > should be fine but don't do it for kmalloc_large() nor
> > kmalloc_large_node(). Is that the bug here?
On Fri, Mar 28, 2008 at 7:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Dammit, NO.
>
> The bug was that the commit I made (which was correct and robust) was then
> partially reverted by Christoph for no good reason. At that point,
> kmalloc_large() didn't even exist, so at that point the change was
> "technically correct" (since the only user of gfpflags really did end up
> clearing it somewhere deep in its callchain).
I was not implying that we should clear GFP_ZERO in kmalloc_large()
but that we can hit the page allocator with GFP_ZERO via kmalloc() and
kzalloc() for size > PAGE_SIZE allocations. And asking Christoph if
that's the bug we're seeing here.
On Fri, Mar 28, 2008 at 7:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> So the *last* thing we want to do is to clear GFP_ZERO in multiple subtle
> places based on new random code being added. We want to clear it at the
> top level, so that no other code never ever even has to _think_ about it!
We are clearing it in one place, just before calling alloc_pages.
[Yes, it's hard to spot, it's in new_slab() where we call
allocate_slab().] I'm okay with moving it to top level but I don't see
how that fixes any of the bugs mentioned here.
Pekka
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 18:08 ` Pekka Enberg
@ 2008-03-28 18:20 ` Linus Torvalds
2008-03-28 18:38 ` Christoph Lameter
` (2 more replies)
0 siblings, 3 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 18:20 UTC (permalink / raw)
To: Pekka Enberg
Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
On Fri, 28 Mar 2008, Pekka Enberg wrote:
>
> On Fri, Mar 28, 2008 at 7:27 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > So the *last* thing we want to do is to clear GFP_ZERO in multiple subtle
> > places based on new random code being added. We want to clear it at the
> > top level, so that no other code never ever even has to _think_ about it!
>
> We are clearing it in one place, just before calling alloc_pages.
BUT THAT IS TOTALLY IRRELEVANT.
We're not clearing it in kmalloc_large()!
What's so hard to understand?
> [Yes, it's hard to spot, it's in new_slab() where we call
> allocate_slab().] I'm okay with moving it to top level but I don't see
> how that fixes any of the bugs mentioned here.
That stupid clearing in new_slab() is totally and utterly irrelevant (in
addition to the fact that it's hard to spot).
The point was never new_slab(). So why do you even mention it?
The code in question is __slab_alloc(). It did *not* clear it correctly
before its uses (__slab_alloc -> kmalloc_large -> __get_free_pages).
Linus
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 18:20 ` Linus Torvalds
@ 2008-03-28 18:38 ` Christoph Lameter
2008-03-28 18:47 ` Andrew Morton
2008-03-28 19:59 ` Pekka Enberg
2 siblings, 0 replies; 55+ messages in thread
From: Christoph Lameter @ 2008-03-28 18:38 UTC (permalink / raw)
To: Linus Torvalds
Cc: Pekka Enberg, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
On Fri, 28 Mar 2008, Linus Torvalds wrote:
> The code in question is __slab_alloc(). It did *not* clear it correctly
> before its uses (__slab_alloc -> kmalloc_large -> __get_free_pages).
But why could have __GFP_ZERO there be harmful? Its a non highmem
allocation. No kmap needed.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 18:20 ` Linus Torvalds
2008-03-28 18:38 ` Christoph Lameter
@ 2008-03-28 18:47 ` Andrew Morton
2008-03-28 18:53 ` Christoph Lameter
2008-03-28 19:37 ` Linus Torvalds
2008-03-28 19:59 ` Pekka Enberg
2 siblings, 2 replies; 55+ messages in thread
From: Andrew Morton @ 2008-03-28 18:47 UTC (permalink / raw)
To: Linus Torvalds
Cc: penberg, clameter, rjw, pstaszewski, linux-kernel, bunk, protasnb
On Fri, 28 Mar 2008 11:20:52 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> The code in question is __slab_alloc().
The bug we're discussing is http://bugzilla.kernel.org/show_bug.cgi?id=10318
It's a driver call stright into the page allocator - slab is not involved.
I was planning on plugging it this way:
From: Andrew Morton <akpm@linux-foundation.org>
I believe http://bugzilla.kernel.org/show_bug.cgi?id=10318 is a false
positive. There's no way in which networking will be using highmem pages
here, so it won't be taking the KM_USER0 kmap slot, so there's no point in
performing these checks.
Cc: Pawel Staszewski <pstaszewski@artcom.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
arch/x86/mm/highmem_32.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff -puN arch/x86/mm/highmem_32.c~x86-kmap_atomic-debugging-only-run-debug_kmap_atomic_prot-for-highmem-pages arch/x86/mm/highmem_32.c
--- a/arch/x86/mm/highmem_32.c~x86-kmap_atomic-debugging-only-run-debug_kmap_atomic_prot-for-highmem-pages
+++ a/arch/x86/mm/highmem_32.c
@@ -73,15 +73,15 @@ void *kmap_atomic_prot(struct page *page
{
enum fixed_addresses idx;
unsigned long vaddr;
- /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
-
- debug_kmap_atomic_prot(type);
+ /* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
pagefault_disable();
if (!PageHighMem(page))
return page_address(page);
+ debug_kmap_atomic_prot(type);
+
idx = type + KM_TYPE_NR*smp_processor_id();
vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
BUG_ON(!pte_none(*(kmap_pte-idx)));
_
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 18:47 ` Andrew Morton
@ 2008-03-28 18:53 ` Christoph Lameter
2008-03-28 19:37 ` Linus Torvalds
1 sibling, 0 replies; 55+ messages in thread
From: Christoph Lameter @ 2008-03-28 18:53 UTC (permalink / raw)
To: Andrew Morton
Cc: Linus Torvalds, penberg, rjw, pstaszewski, linux-kernel, bunk,
protasnb
On Fri, 28 Mar 2008, Andrew Morton wrote:
> I believe http://bugzilla.kernel.org/show_bug.cgi?id=10318 is a false
> positive. There's no way in which networking will be using highmem pages
> here, so it won't be taking the KM_USER0 kmap slot, so there's no point in
> performing these checks.
Reviewed-by: Christoph Lameter <clameter@sgi.com>
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 18:47 ` Andrew Morton
2008-03-28 18:53 ` Christoph Lameter
@ 2008-03-28 19:37 ` Linus Torvalds
2008-03-28 19:59 ` Linus Torvalds
1 sibling, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 19:37 UTC (permalink / raw)
To: Andrew Morton
Cc: penberg, clameter, rjw, pstaszewski, linux-kernel, bunk, protasnb
On Fri, 28 Mar 2008, Andrew Morton wrote:
>
> I was planning on plugging it this way:
.. and thereby losing the point of doing it the old way.
The old way caused more warnigns, but more importantly, it caused warnings
on machines that didn't actually _have_ highmem pages. Which is actually
the big majority of them.
That was the whole (and only) point of the debugging! If you only test the
page address, you lose all the coverage!
It would be better if we actually passed in the gfp_flags, and then we
could test the __GFP_HIGH bit rather than the page address. But for now,
the rule is that GFP_ATOMIC and __GFP_ZERO do not work together, because
this sanity test currently cannot work for that case.
Linus
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 19:37 ` Linus Torvalds
@ 2008-03-28 19:59 ` Linus Torvalds
0 siblings, 0 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 19:59 UTC (permalink / raw)
To: Andrew Morton
Cc: penberg, clameter, rjw, pstaszewski, linux-kernel, bunk, protasnb
On Fri, 28 Mar 2008, Linus Torvalds wrote:
>
> It would be better if we actually passed in the gfp_flags, and then we
> could test the __GFP_HIGH bit rather than the page address. But for now,
> the rule is that GFP_ATOMIC and __GFP_ZERO do not work together, because
> this sanity test currently cannot work for that case.
Gaah. Sadly, we don't actually have any reasonably sane way of doing that.
So I guess we'll have to lose the coverage of kmap_atomic() in other
places, even though the only case where this was worth losing it for was
that one clear_highpage() call site :/
Linus
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 18:20 ` Linus Torvalds
2008-03-28 18:38 ` Christoph Lameter
2008-03-28 18:47 ` Andrew Morton
@ 2008-03-28 19:59 ` Pekka Enberg
2008-03-28 20:24 ` Linus Torvalds
2 siblings, 1 reply; 55+ messages in thread
From: Pekka Enberg @ 2008-03-28 19:59 UTC (permalink / raw)
To: Linus Torvalds
Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
Linus Torvalds wrote:
> That stupid clearing in new_slab() is totally and utterly irrelevant (in
> addition to the fact that it's hard to spot).
>
> The point was never new_slab(). So why do you even mention it?
Because I didn't spot that we're calling kmalloc_large() in the fallback
path too. I was referring to the page allocator pass-through
kmalloc_large() call that happens before we hit __slab_alloc. My bad, sorry.
Pekka
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 19:59 ` Pekka Enberg
@ 2008-03-28 20:24 ` Linus Torvalds
0 siblings, 0 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 20:24 UTC (permalink / raw)
To: Pekka Enberg
Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
On Fri, 28 Mar 2008, Pekka Enberg wrote:
>
> Because I didn't spot that we're calling kmalloc_large() in the fallback path
> too. I was referring to the page allocator pass-through kmalloc_large() call
> that happens before we hit __slab_alloc. My bad, sorry.
So this is kind of my point of the whole thing. It's really hard to even
realize that there are multiple points - unless you do it at the top
level.
And yeah, the double clearing is just a performance issue in a rare and
odd case, and if it hadn't been for the very anal debug checks we would
never have even noticed. But code that is hard to follow and has really
subtle non-local implications is fundamentally more likely to have these
kinds of problems. Which is why I'm arguing against code like that.
Linus
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 17:27 ` Linus Torvalds
2008-03-28 18:08 ` Pekka Enberg
@ 2008-03-28 18:37 ` Christoph Lameter
2008-03-28 19:32 ` Linus Torvalds
1 sibling, 1 reply; 55+ messages in thread
From: Christoph Lameter @ 2008-03-28 18:37 UTC (permalink / raw)
To: Linus Torvalds
Cc: Pekka Enberg, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
On Fri, 28 Mar 2008, Linus Torvalds wrote:
> The bug was that the commit I made (which was correct and robust) was then
> partially reverted by Christoph for no good reason. At that point,
> kmalloc_large() didn't even exist, so at that point the change was
> "technically correct" (since the only user of gfpflags really did end up
> clearing it somewhere deep in its callchain).
That commit cleared a flag that was cleared later anyways....
> So the fact is, commit 3811dbf67162bd08412f1b0e02e554f353e93bdb is and was
> total and utter crap. I've reverted it in my tree. It's crap not because
> it was buggy when it was put in, but because it was *fragile* when it was
> put in. And that fragility ended up causing a bug later.
Do we have confirmation that the revert actually does something? It
certainly does not address the issues with inline page forwarding. And why
are we forbidding page allocator allocs with __GFP_ZERO in interrupt
context? Only __GFP_HIGHMEM needs kmap and only __GFP_HIGHMEM can cause
trouble.
> I'm getting really tired of slub. It was supposed to be simpler code than
> slab, and yes, it's simpler, but it has been buggy as hell, and part of it
> has been that people just haven't been careful enough, and haven't written
> code to be defensive and easy-to-follow.
You may want to check SLAB recent bug list. There is still crap over there
as well and AFAICT this is more severe than SLUB and its inherent in the
complex scheme that we had to put in ther.
> So the *last* thing we want to do is to clear GFP_ZERO in multiple subtle
> places based on new random code being added. We want to clear it at the
> top level, so that no other code never ever even has to _think_ about it!
__GFP_ZERO is cleared in one place when new_slab() calls allocate_slab().
The fallback code in 2.6.25 will go away in 2.6.26.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 18:37 ` Christoph Lameter
@ 2008-03-28 19:32 ` Linus Torvalds
0 siblings, 0 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 19:32 UTC (permalink / raw)
To: Christoph Lameter
Cc: Pekka Enberg, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
On Fri, 28 Mar 2008, Christoph Lameter wrote:
>
> __GFP_ZERO is cleared in one place when new_slab() calls allocate_slab().
You're so full of crap that it's not even funny.
The fact that one path does it correct makes it ok to do it wrong in
another path?
> The fallback code in 2.6.25 will go away in 2.6.26.
And the fact that 2.5.26 may fix it differently makes it ok to be buggy in
2.6.25?
You are not making any sense what-so-ever.
Linus
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 4:00 ` Linus Torvalds
2008-03-28 10:48 ` Paweł Staszewski
2008-03-28 17:15 ` Pekka Enberg
@ 2008-03-28 18:33 ` Christoph Lameter
2008-03-28 19:25 ` Linus Torvalds
2 siblings, 1 reply; 55+ messages in thread
From: Christoph Lameter @ 2008-03-28 18:33 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
On Thu, 27 Mar 2008, Linus Torvalds wrote:
> Totally irrelevant.
>
> The page allocation path does
>
> if (gfp_flags & __GFP_ZERO)
> prep_zero_page(page, order, gfp_flags);
>
> and that will cause a warning REGARDLESS of whether the page is a HIGHMEM
> page or not.
prep_zero_page does:
static inline void prep_zero_page(struct page *page, int order, gfp_t
gfp_flags)
{
int i;
/*
* clear_highpage() will use KM_USER0, so it's a bug to use __GFP_ZERO
* and __GFP_HIGHMEM from hard or soft interrupt context.
*/
VM_BUG_ON((gfp_flags & __GFP_HIGHMEM) && in_interrupt());
for (i = 0; i < (1 << order); i++)
clear_highpage(page + i);
}
So we forbit __GFP_HIGHMEM and in_interrupt which makes sense. The simple
forwarding of large kmallocs to the page allocator as done by SLUB / SLOB
is fine.
Then clear_highpage calls additional checking functions that have
the effect of generally forbiding zeroing in interrupt context if
CONFIG_HIGHMEM is set. This is wrong and needs to be fixed.
> And the fact is, passing in GFP_ZERO from the SLUB code is a bug
> regardless, because it unnecessarily does the dual memset().
Well that is only the fallback path of __slab_alloc which is not triggered
here and not performance sensitive. We could clear the flag there but
that is irrevelant for this issue.
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 18:33 ` Christoph Lameter
@ 2008-03-28 19:25 ` Linus Torvalds
2008-03-29 20:42 ` Christoph Lameter
0 siblings, 1 reply; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 19:25 UTC (permalink / raw)
To: Christoph Lameter
Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
On Fri, 28 Mar 2008, Christoph Lameter wrote:
>
> prep_zero_page does:
>
> static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags)
> {
> int i;
>
> /*
> * clear_highpage() will use KM_USER0, so it's a bug to use __GFP_ZERO
> * and __GFP_HIGHMEM from hard or soft interrupt context.
> */
> VM_BUG_ON((gfp_flags & __GFP_HIGHMEM) && in_interrupt());
> for (i = 0; i < (1 << order); i++)
> clear_highpage(page + i);
.. and clear_highpage() does:
void *kaddr = kmap_atomic(page, KM_USER0);
clear_page(kaddr);
.. where kmap_atomic() on x86 does:
kmap_atomic() ->
kmap_atomic_prot() ->
debug_kmap_atomic_prot() ->
if (in_irq())
WARN_ON_ONCE()
none of which are at all conditional on __GFP_HIGHMEM.
But none of this is relevant. The warning possibly didn't even come from
slub, it just made me look at it - because *something* is doing GFP_ATOMIC
together with __GFP_ZERO, and it became obvious that SLUB is one potential
cause of that.
And the SLUB case simply isn't valid!
> Then clear_highpage calls additional checking functions that have
> the effect of generally forbiding zeroing in interrupt context if
> CONFIG_HIGHMEM is set. This is wrong and needs to be fixed.
No. Dammit, the bug is in SLUB.
If SLUB *ever* calls the page allocator with __GFP_ZERO set, it's a
bug, and that has nothing to do with GFP_ATOMIC or anything else. Because
SLUB uses its own logic for clearing the result.
Why cannot you just admit it?
Now, _outside_ of SLUB there appear to be other users too, and those users
need to either be fixed or we need to allow __GFP_ZERO togethe with
GFP_ATOMIC. But the fact is, SLUB had a really stupid bug that it
shouldn't have had.
Linus
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 19:25 ` Linus Torvalds
@ 2008-03-29 20:42 ` Christoph Lameter
2008-03-29 21:29 ` Linus Torvalds
0 siblings, 1 reply; 55+ messages in thread
From: Christoph Lameter @ 2008-03-29 20:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
On Fri, 28 Mar 2008, Linus Torvalds wrote:
> .. where kmap_atomic() on x86 does:
>
> kmap_atomic() ->
> kmap_atomic_prot() ->
> debug_kmap_atomic_prot() ->
> if (in_irq())
> WARN_ON_ONCE()
>
> none of which are at all conditional on __GFP_HIGHMEM.
kmap check for PageHighmem and does not do a kmap for regular pages.
So this is actually okay. If the allocation that was performed does not
allow GFP_HIGHMEM then the kmap will never use a real mapping. The check
should not trigger.
> > Then clear_highpage calls additional checking functions that have
> > the effect of generally forbiding zeroing in interrupt context if
> > CONFIG_HIGHMEM is set. This is wrong and needs to be fixed.
>
> No. Dammit, the bug is in SLUB.
>
> If SLUB *ever* calls the page allocator with __GFP_ZERO set, it's a
> bug, and that has nothing to do with GFP_ATOMIC or anything else. Because
> SLUB uses its own logic for clearing the result.
Yes it uses its own logic if the object is managed by SLUB but not if the
object is too big and/or the allocation forwarded to the page allocator
or for other internal allocations of buffers etc.
> Why cannot you just admit it?
Admitting something that is not true is rather difficult.
> Now, _outside_ of SLUB there appear to be other users too, and those users
> need to either be fixed or we need to allow __GFP_ZERO togethe with
> GFP_ATOMIC. But the fact is, SLUB had a really stupid bug that it
> shouldn't have had.
So what you want is to forbid any use of
alloc_pages(__GFP_ZERO|...)
from an interrupt context? That works fine on most platforms and used to
work fine on x86 as well until the check was added on January 30th.
If we really want this then the check in prep_zero_page should be changed
too:
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c 2008-03-29 13:40:42.166669333 -0700
+++ linux-2.6/mm/page_alloc.c 2008-03-29 13:41:21.039168276 -0700
@@ -317,7 +317,7 @@ static inline void prep_zero_page(struct
* clear_highpage() will use KM_USER0, so it's a bug to use __GFP_ZERO
* and __GFP_HIGHMEM from hard or soft interrupt context.
*/
- VM_BUG_ON((gfp_flags & __GFP_HIGHMEM) && in_interrupt());
+ VM_BUG_ON(in_interrupt());
for (i = 0; i < (1 << order); i++)
clear_highpage(page + i);
}
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-29 20:42 ` Christoph Lameter
@ 2008-03-29 21:29 ` Linus Torvalds
2008-03-29 23:52 ` Pekka Enberg
2008-03-31 18:45 ` Christoph Lameter
0 siblings, 2 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-29 21:29 UTC (permalink / raw)
To: Christoph Lameter
Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
On Sat, 29 Mar 2008, Christoph Lameter wrote:
>
> > If SLUB *ever* calls the page allocator with __GFP_ZERO set, it's a
> > bug, and that has nothing to do with GFP_ATOMIC or anything else. Because
> > SLUB uses its own logic for clearing the result.
>
> Yes it uses its own logic if the object is managed by SLUB but not if the
> object is too big and/or the allocation forwarded to the page allocator
> or for other internal allocations of buffers etc.
Wrong.
It uses it's own logic for __GFP_ZERO *regardless* of size.
> > Why cannot you just admit it?
>
> Admitting something that is not true is rather difficult.
You don't have a f*cking clue about this cocde that you're supposed to be
maintaining, do you?
See "slab_alloc()". See the code:
if (unlikely((gfpflags & __GFP_ZERO) && object))
memset(object, 0, c->objsize);
and see how it does it regardless of anything else.
In short, if *any* code-path calls down to any allocator from that routine
with GFP_ZERO set, it's a bug. No ifs, buts or maybes about it. It
shouldn't do that, because the actual memset() is done by slab_alloc(),
and should not be done ANYWHERE ELSE.
It has *nothing* to do with "object is too big" or anything else.
> So what you want is to forbid any use of
>
> alloc_pages(__GFP_ZERO|...)
No. I want you to admit the bugs in code you maintain. I want you to admit
that slab_alloc() does the memset(), and should NEVER EVER use __GFP_ZERO
for the page allocations.
I have told you about a million times now that THIS HAS NOTHING TO DO WITH
interrupts or HIGHMEM or *anything* else. This is purely a SLUB issue.
But don't worry. I already fixed it by reverting your broken commit. I
just wish you could follow code that you are supposed to be maintaining.
Linus
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-29 21:29 ` Linus Torvalds
@ 2008-03-29 23:52 ` Pekka Enberg
2008-03-31 18:56 ` Christoph Lameter
2008-03-31 18:45 ` Christoph Lameter
1 sibling, 1 reply; 55+ messages in thread
From: Pekka Enberg @ 2008-03-29 23:52 UTC (permalink / raw)
To: Linus Torvalds
Cc: Christoph Lameter, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
Hi,
On Sat, 29 Mar 2008, Christoph Lameter wrote:
> > Yes it uses its own logic if the object is managed by SLUB but not if the
> > object is too big and/or the allocation forwarded to the page allocator
> > or for other internal allocations of buffers etc.
On Sat, Mar 29, 2008 at 11:29 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Wrong.
>
> It uses it's own logic for __GFP_ZERO *regardless* of size.
Christoph, I think you're overlooking the same thing I was until Linus
straightened me out. We're calling kmalloc_large() from __slab_alloc()
for the fall-back case which causes a bug fixed by Linus' revert.
On Sat, Mar 29, 2008 at 11:29 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> You don't have a f*cking clue about this cocde that you're supposed to be
> maintaining, do you?
Yeah, me too. Fortunately we have you as our upstream maintainer :-).
Pekka
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-29 23:52 ` Pekka Enberg
@ 2008-03-31 18:56 ` Christoph Lameter
0 siblings, 0 replies; 55+ messages in thread
From: Christoph Lameter @ 2008-03-31 18:56 UTC (permalink / raw)
To: Pekka Enberg
Cc: Linus Torvalds, Rafael J. Wysocki, Pawel Staszewski, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
On Sun, 30 Mar 2008, Pekka Enberg wrote:
> Christoph, I think you're overlooking the same thing I was until Linus
> straightened me out. We're calling kmalloc_large() from __slab_alloc()
> for the fall-back case which causes a bug fixed by Linus' revert.
Yes, that means that __GFP_ZERO can be passed to the page allocator there.
Same thing as happens when kmalloc() calls kmalloc_large() (as you pointed
out before). The harm is zeroing an allocated object twice in rare cases
that trigger the fallback path and that is avoided by Linus' patch.
For the fix to get rid of the backtraces that started this discussion see
commit 9c312058b2e530722c7bd30c1b6f26eea35dc5fe
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-29 21:29 ` Linus Torvalds
2008-03-29 23:52 ` Pekka Enberg
@ 2008-03-31 18:45 ` Christoph Lameter
1 sibling, 0 replies; 55+ messages in thread
From: Christoph Lameter @ 2008-03-31 18:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Pawel Staszewski, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
On Sat, 29 Mar 2008, Linus Torvalds wrote:
>
> You don't have a f*cking clue about this cocde that you're supposed to be
> maintaining, do you?
>
> See "slab_alloc()". See the code:
>
> if (unlikely((gfpflags & __GFP_ZERO) && object))
> memset(object, 0, c->objsize);
>
> and see how it does it regardless of anything else.
Yes I am very aware of that.
> In short, if *any* code-path calls down to any allocator from that routine
> with GFP_ZERO set, it's a bug. No ifs, buts or maybes about it. It
> shouldn't do that, because the actual memset() is done by slab_alloc(),
> and should not be done ANYWHERE ELSE.
>
> It has *nothing* to do with "object is too big" or anything else.
It has to do how large objects are allocated through kmalloc_large().
kmalloc_large() is elsewhere called with unfiltered gfpflags and relies
on zeroing being handled by the page allocator. It can take unfiltered gfp
flags.
The filtering of __GFP_ZERO that you added avoids the double zeroing for
the fallback path (which is only called if all the partial lists are empty
and after the page allocator went through reclaim and did not get the
large sized memory we wanted). So okay the patch could be a performance
enhancement. But then it adds the filtering to the hot path instead of the
code path that containts the kmalloc_large that is executed once in a blue
moon. The hot path should only filter when we actually decide that we need
to allocate a new slab from the page allocator.
It seemed to me that the reason for inserting the filtering of __GFP_ZERO
there was the belief that the page allocator cannot take __GFP_ZERO
through kmalloc_large() if we are in an interrupt.
The use of kmalloc_large() in __slab_alloc() is a bit strange at this
point. The cleanup work in 2.6.26 will make this all nice again.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 2:30 ` Linus Torvalds
2008-03-28 3:24 ` Christoph Lameter
@ 2008-03-28 3:31 ` Yinghai Lu
2008-03-31 10:14 ` Kamalesh Babulal
2008-03-28 11:29 ` Haavard Skinnemoen
2008-03-28 16:10 ` Rafael J. Wysocki
3 siblings, 1 reply; 55+ messages in thread
From: Yinghai Lu @ 2008-03-28 3:31 UTC (permalink / raw)
To: Linus Torvalds, Kamalesh Babulal, Ingo Molnar
Cc: Rafael J. Wysocki, Pawel Staszewski, Christoph Lameter, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
On Thu, Mar 27, 2008 at 7:30 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>
> On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
> >
...
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9962
> > Subject : mount: could not find filesystem
> > Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> > Date : 2008-02-12 14:34 (45 days old)
> > References : http://lkml.org/lkml/2008/2/12/91
> > Handled-By : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > Yinghai Lu <yhlu.kernel@gmail.com>
>
> Needs more info. The original oops that opened it is fixed, but..
>
Kamalesh,
can you check that?
YH
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 3:31 ` Yinghai Lu
@ 2008-03-31 10:14 ` Kamalesh Babulal
2008-03-31 12:10 ` Rafael J. Wysocki
0 siblings, 1 reply; 55+ messages in thread
From: Kamalesh Babulal @ 2008-03-31 10:14 UTC (permalink / raw)
To: Yinghai Lu
Cc: Linus Torvalds, Ingo Molnar, Rafael J. Wysocki, Pawel Staszewski,
Christoph Lameter, LKML, Adrian Bunk, Andrew Morton,
Natalie Protasevich
Yinghai Lu wrote:
> On Thu, Mar 27, 2008 at 7:30 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
>> >
> ...
>> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9962
>> > Subject : mount: could not find filesystem
>> > Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
>> > Date : 2008-02-12 14:34 (45 days old)
>> > References : http://lkml.org/lkml/2008/2/12/91
>> > Handled-By : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
>> > Yinghai Lu <yhlu.kernel@gmail.com>
>>
>> Needs more info. The original oops that opened it is fixed, but..
>>
> Kamalesh,
> can you check that?
>
> YH
Hi,
I have tested the 2.6.25-rc7 kernel and the kernel boots up without the panic.
--
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-31 10:14 ` Kamalesh Babulal
@ 2008-03-31 12:10 ` Rafael J. Wysocki
0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-31 12:10 UTC (permalink / raw)
To: Kamalesh Babulal
Cc: Yinghai Lu, Linus Torvalds, Ingo Molnar, Pawel Staszewski,
Christoph Lameter, LKML, Adrian Bunk, Andrew Morton,
Natalie Protasevich
On Monday, 31 of March 2008, Kamalesh Babulal wrote:
> Yinghai Lu wrote:
> > On Thu, Mar 27, 2008 at 7:30 PM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
> >> >
> > ...
> >> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9962
> >> > Subject : mount: could not find filesystem
> >> > Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> >> > Date : 2008-02-12 14:34 (45 days old)
> >> > References : http://lkml.org/lkml/2008/2/12/91
> >> > Handled-By : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> >> > Yinghai Lu <yhlu.kernel@gmail.com>
> >>
> >> Needs more info. The original oops that opened it is fixed, but..
> >>
> > Kamalesh,
> > can you check that?
> >
> > YH
> Hi,
>
> I have tested the 2.6.25-rc7 kernel and the kernel boots up without the panic.
Thanks for the update, I closed the bug.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 2:30 ` Linus Torvalds
2008-03-28 3:24 ` Christoph Lameter
2008-03-28 3:31 ` Yinghai Lu
@ 2008-03-28 11:29 ` Haavard Skinnemoen
2008-03-28 16:11 ` Rafael J. Wysocki
2008-03-28 16:10 ` Rafael J. Wysocki
3 siblings, 1 reply; 55+ messages in thread
From: Haavard Skinnemoen @ 2008-03-28 11:29 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Pawel Staszewski, Christoph Lameter, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich, Ben Nizette
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9976
> > Subject : BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> > Submitter : Ben Nizette <bn@niasdigital.com>
> > Date : 2008-02-12 12:46 (45 days old)
> > References : http://lkml.org/lkml/2008/2/12/148
> > Handled-By : Haavard Skinnemoen <hskinnemoen@atmel.com>
>
> This one seems gone (and was apparently AVR-only):
>
> http://lkml.org/lkml/2008/2/13/607:
> "What ever the problem is it isn't immediately apparent in latest git so
> I guess we'll just have to keep our eyes peeled."
Yes, I haven't been able to reproduce it, so I was going to ask for it
to be removed. I suspect it must be some sort of -rc1 weirdness that
has been fixed by a later commit.
I don't feel good about closing it without knowing what it was, but
since neither Ben nor I can reproduce it anymore, I don't have any
better suggestions.
Haavard
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 11:29 ` Haavard Skinnemoen
@ 2008-03-28 16:11 ` Rafael J. Wysocki
0 siblings, 0 replies; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 16:11 UTC (permalink / raw)
To: Haavard Skinnemoen
Cc: Linus Torvalds, Pawel Staszewski, Christoph Lameter, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich, Ben Nizette
On Friday, 28 of March 2008, Haavard Skinnemoen wrote:
> Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9976
> > > Subject : BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> > > Submitter : Ben Nizette <bn@niasdigital.com>
> > > Date : 2008-02-12 12:46 (45 days old)
> > > References : http://lkml.org/lkml/2008/2/12/148
> > > Handled-By : Haavard Skinnemoen <hskinnemoen@atmel.com>
> >
> > This one seems gone (and was apparently AVR-only):
> >
> > http://lkml.org/lkml/2008/2/13/607:
> > "What ever the problem is it isn't immediately apparent in latest git so
> > I guess we'll just have to keep our eyes peeled."
>
> Yes, I haven't been able to reproduce it, so I was going to ask for it
> to be removed. I suspect it must be some sort of -rc1 weirdness that
> has been fixed by a later commit.
>
> I don't feel good about closing it without knowing what it was, but
> since neither Ben nor I can reproduce it anymore, I don't have any
> better suggestions.
I closed it as "unreproducible".
Thanks,
Rafael
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 2:30 ` Linus Torvalds
` (2 preceding siblings ...)
2008-03-28 11:29 ` Haavard Skinnemoen
@ 2008-03-28 16:10 ` Rafael J. Wysocki
2008-03-28 16:47 ` Linus Torvalds
3 siblings, 1 reply; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 16:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: Pawel Staszewski, Christoph Lameter, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
On Friday, 28 of March 2008, Linus Torvalds wrote:
>
> On Thu, 27 Mar 2008, Rafael J. Wysocki wrote:
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9412
> > Subject : commit a878539ef994787c447a98c2e3ba0fe3dad984ec breaks boot on SB600 AHCI
> > Submitter : Srihari Vijayaraghavan <sriharivijayaraghavan@yahoo.com.au>
> > Date : 2008-03-12 17:15 (16 days old)
> > Handled-By : Jeff Garzik <jgarzik@pobox.com>
> > Richard Zhao <richard.zhao@amd.com>
>
> Fixed by 4cde32fc4b32e96a99063af3183acdfd54c563f0, methinks.
Closed.
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9962
> > Subject : mount: could not find filesystem
> > Submitter : Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
> > Date : 2008-02-12 14:34 (45 days old)
> > References : http://lkml.org/lkml/2008/2/12/91
> > Handled-By : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > Yinghai Lu <yhlu.kernel@gmail.com>
>
> Needs more info. The original oops that opened it is fixed, but..
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9976
> > Subject : BUG: 2.6.25-rc1: iptables postrouting setup causes oops
> > Submitter : Ben Nizette <bn@niasdigital.com>
> > Date : 2008-02-12 12:46 (45 days old)
> > References : http://lkml.org/lkml/2008/2/12/148
> > Handled-By : Haavard Skinnemoen <hskinnemoen@atmel.com>
>
> This one seems gone (and was apparently AVR-only):
>
> http://lkml.org/lkml/2008/2/13/607:
> "What ever the problem is it isn't immediately apparent in latest git so
> I guess we'll just have to keep our eyes peeled."
Closed, too.
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=9978
> > Subject : 2.6.25-rc1: volanoMark regression
> > Submitter : Zhang, Yanmin <yanmin_zhang@linux.intel.com>
> > Date : 2008-02-13 10:30 (44 days old)
> > References : http://lkml.org/lkml/2008/2/13/128
> > http://lkml.org/lkml/2008/3/12/52
> > http://lkml.org/lkml/2008/3/18/81
> > Handled-By : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
> > Balbir Singh <balbir@linux.vnet.ibm.com>
>
> Hmm. It is a regression on one machine (2x quad-core stoakley), but not
> another (4x quad-core tigerton).
>
> Interestingly, the stoakley box numbers have apparently been all over the
> map.
>
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=10318
> > Subject : WARNING: at arch/x86/mm/highmem_32.c:43 kmap_atomic_prot+0x87/0x184()
> > Submitter : Pawel Staszewski <pstaszewski@artcom.pl>
> > Date : 2008-03-25 02:50 (3 days old)
>
> Andrew and seems to have debugged this down to a kzalloc(GFP_ATOMIC) or
> similar.
>
> I wonder if the bug is in that commit
> 3811dbf67162bd08412f1b0e02e554f353e93bdb ("SLUB: remove useless masking
> of GFP_ZERO"), because I don't think that masking was at all useless and I
> think my original 7fd272550bd43cc1d7289ef0ab2fa50de137e767 was correct.
>
> That apparently bogus commit says "GFP_ZERO is already masked out in
> new_slab()", but gfpflags is not just used for new_slab(), but for
> kmalloc_large() too. Which does *not* clear GFP_ZERO.
>
> Pawel, does reverting 3811dbf67162bd08412f1b0e02e554f353e93bdb fix it for
> you?
>
> Also, Rafael - do these reminder emails also go to the people who are
> mentioned in the regressions (especially people who are set up as being
> "hanled-by" or having patches for the problem)?
No, they don't. I need to do some scriptwork to make that happen.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 16:10 ` Rafael J. Wysocki
@ 2008-03-28 16:47 ` Linus Torvalds
2008-03-28 17:36 ` Adrian Bunk
2008-03-28 22:28 ` Rafael J. Wysocki
0 siblings, 2 replies; 55+ messages in thread
From: Linus Torvalds @ 2008-03-28 16:47 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Pawel Staszewski, Christoph Lameter, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
On Fri, 28 Mar 2008, Rafael J. Wysocki wrote:
> >
> > Also, Rafael - do these reminder emails also go to the people who are
> > mentioned in the regressions (especially people who are set up as being
> > "handled-by" or having patches for the problem)?
>
> No, they don't. I need to do some scriptwork to make that happen.
It would be good. Right now I know for a fact that a lot of people read
LKML with various filters in place (or just very spottily), so I have a
feeling that while people try to track regressions, many people are
probably less aware of these things than they should be. And sometimes the
"handled-by" ends up being inaccurate (maybe somebody replied to the
original problem, but it became obvious that it was somewhere else, and
they remain "handled-by" even though the person doesn't actually handle
it).
It would probably also make sense to add some of the bigger subsystem
maintainers to the Cc (and/or with a mailing list for regressions?)
I think the regression list is _extremely_ valuable, but the problem I
see is that it's not necessarily reaching all the involved people.
The other problem is that I think the old reports (especially the ones
that haven't had reporter feedback in the last two weeks) end up being not
just stale, but they sort at the top, so when the right people _do_ look
at the list, the natural way to do so with email is to look at the first
ones first, and they *all* tend to be stale.
So the reaction is often "need more info" or "I think this was fixed
already two weeks ago, but there hasn't been any reply". Which is
psychologically really bad, because after you've seen three or four of
those, you just dismiss the rest (even if the later ones then may be much
more relevant!)
This is why I have always been advocating so aggressive culling of
regressions and bug-reports - stale bug-reports are worse than useless,
they actually _hurt_.
Linus
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 16:47 ` Linus Torvalds
@ 2008-03-28 17:36 ` Adrian Bunk
2008-03-28 20:33 ` Ingo Molnar
2008-03-28 22:28 ` Rafael J. Wysocki
1 sibling, 1 reply; 55+ messages in thread
From: Adrian Bunk @ 2008-03-28 17:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Pawel Staszewski, Christoph Lameter, LKML,
Andrew Morton, Natalie Protasevich
On Fri, Mar 28, 2008 at 09:47:47AM -0700, Linus Torvalds wrote:
>
>
> On Fri, 28 Mar 2008, Rafael J. Wysocki wrote:
> > >
> > > Also, Rafael - do these reminder emails also go to the people who are
> > > mentioned in the regressions (especially people who are set up as being
> > > "handled-by" or having patches for the problem)?
> >
> > No, they don't. I need to do some scriptwork to make that happen.
>
> It would be good. Right now I know for a fact that a lot of people read
> LKML with various filters in place (or just very spottily), so I have a
> feeling that while people try to track regressions, many people are
> probably less aware of these things than they should be. And sometimes the
> "handled-by" ends up being inaccurate (maybe somebody replied to the
> original problem, but it became obvious that it was somewhere else, and
> they remain "handled-by" even though the person doesn't actually handle
> it).
>
> It would probably also make sense to add some of the bigger subsystem
> maintainers to the Cc (and/or with a mailing list for regressions?)
>...
When I tracked regressions I also Cc'ed all submitters, maintainers of
all affected subsystems, and the people who said they'd handle the
reports.
The problem I ran into was vger dropping emails with >= 30 recipients,
which meant that I had to split the regression list into half a dozen
emails (splitted by the area of the regressions) when I tracked at about
30 regressions.
But that was one year ago when we had only half as many regressions per
release as they do now, with 2.6.25 we had a peak of 66 pending
regressions...
> Linus
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 55+ messages in thread* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 17:36 ` Adrian Bunk
@ 2008-03-28 20:33 ` Ingo Molnar
0 siblings, 0 replies; 55+ messages in thread
From: Ingo Molnar @ 2008-03-28 20:33 UTC (permalink / raw)
To: Adrian Bunk
Cc: Linus Torvalds, Rafael J. Wysocki, Pawel Staszewski,
Christoph Lameter, LKML, Andrew Morton, Natalie Protasevich
* Adrian Bunk <bunk@kernel.org> wrote:
> But that was one year ago when we had only half as many regressions
> per release as they do now, with 2.6.25 we had a peak of 66 pending
> regressions...
we have twice as many commits, and we have better test coverage. I get
the impression that user trust is coming back as well: regressions are
being reported sooner and more persistently - because we are handling
them in a more structured and more dependable way. We also seem to have
more users of latest -git.
so it _appears_ to be an increase in bugginess but i believe it's an
increase of activity and it's all good IMO, we close 90% of the
regressions within a week or two, and most of the regressions are for
obscure cases.
I had 2.6.25 running on most of my boxes from -rc1 on without any
unprovoked crash. (provoked bugs were another matter) Bisection became
more practical and more widespread as well. And i periodically find bugs
that came from ancient kernels so we are fixing bugs faster than we put
them in i think. I didnt have that feeling in the .18-.19 kernels.
also, now that the kerneloops.org client is in Fedora 9 by default,
we'll start to have really objective long-term statistics about how our
users react to the bugs we put into the kernel.
Ingo
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 16:47 ` Linus Torvalds
2008-03-28 17:36 ` Adrian Bunk
@ 2008-03-28 22:28 ` Rafael J. Wysocki
2008-03-31 13:34 ` Ingo Molnar
1 sibling, 1 reply; 55+ messages in thread
From: Rafael J. Wysocki @ 2008-03-28 22:28 UTC (permalink / raw)
To: Linus Torvalds
Cc: Pawel Staszewski, Christoph Lameter, LKML, Adrian Bunk,
Andrew Morton, Natalie Protasevich
On Friday, 28 of March 2008, Linus Torvalds wrote:
>
> On Fri, 28 Mar 2008, Rafael J. Wysocki wrote:
> > >
> > > Also, Rafael - do these reminder emails also go to the people who are
> > > mentioned in the regressions (especially people who are set up as being
> > > "handled-by" or having patches for the problem)?
> >
> > No, they don't. I need to do some scriptwork to make that happen.
>
> It would be good. Right now I know for a fact that a lot of people read
> LKML with various filters in place (or just very spottily), so I have a
> feeling that while people try to track regressions, many people are
> probably less aware of these things than they should be. And sometimes the
> "handled-by" ends up being inaccurate (maybe somebody replied to the
> original problem, but it became obvious that it was somewhere else, and
> they remain "handled-by" even though the person doesn't actually handle
> it).
I generally agree, but OTOH I don't think that sending one message with
50+ addresses on the CC list will actually help a lot. I personally don't like
to receive messages like that and tend to filter them out.
I'd prefer people to receive information regarding those regression reports in
which they are directly involved, but it will take some time to arrange that in
a reliable way.
> It would probably also make sense to add some of the bigger subsystem
> maintainers to the Cc (and/or with a mailing list for regressions?)
Tell that to davem. ;-)
Seriously, it often is difficult to say who's the right maintainer and what
list is appropriate.
> I think the regression list is _extremely_ valuable, but the problem I
> see is that it's not necessarily reaching all the involved people.
>
> The other problem is that I think the old reports (especially the ones
> that haven't had reporter feedback in the last two weeks) end up being not
> just stale, but they sort at the top, so when the right people _do_ look
> at the list, the natural way to do so with email is to look at the first
> ones first, and they *all* tend to be stale.
Well, I can sort them in the reverse order just fine, if you prefer.
As far as closing the older ones is concerned, I'm a bit reluctant to do
that, because (1) some reporters reappear after a long period of silence (like
more than two weeks) and (2) it would create an impression that we _fixed_
more regressions than we actually did.
> So the reaction is often "need more info" or "I think this was fixed
> already two weeks ago, but there hasn't been any reply". Which is
> psychologically really bad, because after you've seen three or four of
> those, you just dismiss the rest (even if the later ones then may be much
> more relevant!)
Well, you shouldn't. :-)
> This is why I have always been advocating so aggressive culling of
> regressions and bug-reports - stale bug-reports are worse than useless,
> they actually _hurt_.
I don't quite agree here. At least, they indicate that we may have an unfixed
problem and the fact that no one has taken care of it doesn't really mean we
should generally ignore it.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: 2.6.25-rc7-git2: Reported regressions from 2.6.24
2008-03-28 22:28 ` Rafael J. Wysocki
@ 2008-03-31 13:34 ` Ingo Molnar
0 siblings, 0 replies; 55+ messages in thread
From: Ingo Molnar @ 2008-03-31 13:34 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linus Torvalds, Pawel Staszewski, Christoph Lameter, LKML,
Adrian Bunk, Andrew Morton, Natalie Protasevich
* Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > This is why I have always been advocating so aggressive culling of
> > regressions and bug-reports - stale bug-reports are worse than
> > useless, they actually _hurt_.
>
> I don't quite agree here. At least, they indicate that we may have an
> unfixed problem and the fact that no one has taken care of it doesn't
> really mean we should generally ignore it.
culling doesnt mean ignoring - it just means de-prioritizing. There's
four basic bug categories:
1- bugs where there's inactivity on the reporter side. This we should
de-prioritize - and reactivate them once their activity changes.
2- bugs where there's inactivity on the _maintainer_ side show bad bugs
in our process.
3- bugs that are old but have lots of activity are usually the most
difficult bugs where both side try their best to get it resolved.
4- bugs that are relatively new can be in any of the above 3 categories,
we dont know yet.
so i think we should list bugs in category #3 first: the hardest bugs,
which need the most eyes. Then should we list #2 - the embarrasing bugs
where our pocess failed. Then should we list #4 - new, not yet resolved
bugs which need more eyes - especially in late -rc's. Then comes #1 -
inactive bugs.
the problem for your scripting is to efficiently parse lkml activity for
these bugs: which replies are from the "maintainer", and how "active" is
a thread. So i guess a good heuristic is what you did in your latest
mail: to reverse sort by age of the bug - but i'd also suggest to list
too old entries where the bugzilla is not in NEEDINFO state - those
indicate inactive (or unaware) maintainers.
Ingo
^ permalink raw reply [flat|nested] 55+ messages in thread