Re: 2.6.23-rc1: BUG_ON in kmap_atomic

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
       [not found]   ` <20070723190152.GA5755@martell.zuzino.mipt.ru>
@ 2007-07-23 20:24     ` Andrew Morton
  2007-07-23 20:40       ` Alexey Dobriyan
  2007-07-24 10:01       ` Mike Galbraith
  0 siblings, 2 replies; 31+ messages in thread
From: Andrew Morton @ 2007-07-23 20:24 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Linus Torvalds, linux-kernel, netdev

On Mon, 23 Jul 2007 23:01:52 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Mon, Jul 23, 2007 at 10:38:39PM +0400, Alexey Dobriyan wrote:
> > Managed to hit BUG_ON() in kmap_atomic_prot() three times while doing
> > nothing unusual for this box (two times it was under X, so I can't
> > guarantee, one time while trying to reproduce via ./configure in gdb
> > tarball)

Yeah, I hit this several times a few days ago.  Same story: it just
randomly went splat in response to no obvious stimulus.  Reported it to
netdev, was greeted with stunned silence.


> > Box has 2.5G of RAM. 2.6.22 was OK.
> > 
> > [dives into framebuffer console setup for complete oops]
> 
> kernel BUG at arch/i386/mm/highmem.c:38
> PREEMPT DEBUG_PAGEALLOC SLAB
> EIP at kmap_atomic_prot+0x32/0x93
> 	get_page_from_freelist
> 	__alloc_pages
> 	cache_alloc_refill
> 	cache_alloc_refill
> 	kmem_cache_alloc
> 	dst_alloc
> 	dst_alloc
> 	__ip_route_output_key
> 	[some junk I don't trust]
> 
> eax: 0000000c
> ebx: 00000003
> ecx: c065efe0
> edx: 00000003
> edi: 00000163
> 
> 
> c010cc9b <kmap_atomic_prot>:
> c010cc9b:	57                   	push   %edi
> c010cc9c:	56                   	push   %esi
> c010cc9d:	53                   	push   %ebx
> c010cc9e:	89 c6                	mov    %eax,%esi
> c010cca0:	89 d3                	mov    %edx,%ebx
> c010cca2:	89 cf                	mov    %ecx,%edi
> c010cca4:	b8 01 00 00 00       	mov    $0x1,%eax
> c010cca9:	e8 dd 1b 00 00       	call   c010e88b <add_preempt_count>
> c010ccae:	e8 b1 ac 0e 00       	call   c01f7964 <debug_smp_processor_id>
> c010ccb3:	6b c0 0d             	imul   $0xd,%eax,%eax
> c010ccb6:	8d 14 03             	lea    (%ebx,%eax,1),%edx
> c010ccb9:	8d 04 95 00 00 00 00 	lea    0x0(,%edx,4),%eax
> c010ccc0:	8b 0d 30 a1 3e c0    	mov    0xc03ea130,%ecx
> c010ccc6:	29 c1                	sub    %eax,%ecx
> c010ccc8:	83 39 00             	cmpl   $0x0,(%ecx)
> c010cccb:	74 04                	je     c010ccd1 <kmap_atomic_prot+0x36>
> c010cccd:	0f 0b                	ud2a   

I had more complete info: http://article.gmane.org/gmane.linux.network/66966

You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.

I haven't worked out where that kmap_atomic() call is coming from yet. 
Both traces point up into the page allocator, but I _think_ that's stack
gunk.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 20:24     ` 2.6.23-rc1: BUG_ON in kmap_atomic_prot() Andrew Morton
@ 2007-07-23 20:40       ` Alexey Dobriyan
  2007-07-23 21:01         ` Alexey Dobriyan
  2007-07-24 10:01       ` Mike Galbraith
  1 sibling, 1 reply; 31+ messages in thread
From: Alexey Dobriyan @ 2007-07-23 20:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel, netdev

On Mon, Jul 23, 2007 at 01:24:31PM -0700, Andrew Morton wrote:
> On Mon, 23 Jul 2007 23:01:52 +0400
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > On Mon, Jul 23, 2007 at 10:38:39PM +0400, Alexey Dobriyan wrote:
> > > Managed to hit BUG_ON() in kmap_atomic_prot() three times while doing
> > > nothing unusual for this box (two times it was under X, so I can't
> > > guarantee, one time while trying to reproduce via ./configure in gdb
> > > tarball)
> 
> Yeah, I hit this several times a few days ago.  Same story: it just
> randomly went splat in response to no obvious stimulus.  Reported it to
> netdev, was greeted with stunned silence.
> 
> 
> > > Box has 2.5G of RAM. 2.6.22 was OK.
> > > 
> > > [dives into framebuffer console setup for complete oops]
> > 
> > kernel BUG at arch/i386/mm/highmem.c:38
> > PREEMPT DEBUG_PAGEALLOC SLAB
> > EIP at kmap_atomic_prot+0x32/0x93
> > 	get_page_from_freelist
> > 	__alloc_pages
> > 	cache_alloc_refill
> > 	cache_alloc_refill
> > 	kmem_cache_alloc
> > 	dst_alloc
> > 	dst_alloc
> > 	__ip_route_output_key
> > 	[some junk I don't trust]
> > 
> > eax: 0000000c
> > ebx: 00000003
> > ecx: c065efe0
> > edx: 00000003
> > edi: 00000163
> > 
> > 
> > c010cc9b <kmap_atomic_prot>:
> > c010cc9b:	57                   	push   %edi
> > c010cc9c:	56                   	push   %esi
> > c010cc9d:	53                   	push   %ebx
> > c010cc9e:	89 c6                	mov    %eax,%esi
> > c010cca0:	89 d3                	mov    %edx,%ebx
> > c010cca2:	89 cf                	mov    %ecx,%edi
> > c010cca4:	b8 01 00 00 00       	mov    $0x1,%eax
> > c010cca9:	e8 dd 1b 00 00       	call   c010e88b <add_preempt_count>
> > c010ccae:	e8 b1 ac 0e 00       	call   c01f7964 <debug_smp_processor_id>
> > c010ccb3:	6b c0 0d             	imul   $0xd,%eax,%eax
> > c010ccb6:	8d 14 03             	lea    (%ebx,%eax,1),%edx
> > c010ccb9:	8d 04 95 00 00 00 00 	lea    0x0(,%edx,4),%eax
> > c010ccc0:	8b 0d 30 a1 3e c0    	mov    0xc03ea130,%ecx
> > c010ccc6:	29 c1                	sub    %eax,%ecx
> > c010ccc8:	83 39 00             	cmpl   $0x0,(%ecx)
> > c010cccb:	74 04                	je     c010ccd1 <kmap_atomic_prot+0x36>
> > c010cccd:	0f 0b                	ud2a   
> 
> I had more complete info: http://article.gmane.org/gmane.linux.network/66966
> 
> You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.
> 
> I haven't worked out where that kmap_atomic() call is coming from yet. 
> Both traces point up into the page allocator, but I _think_ that's stack
> gunk.

Ahh, you suspect networking.

Here, setup is 2 cheap-ass 100Mb realtek 8139 NICs, one to campus network
receiving ~20 junk packets per second, one gathering netconsole output
and ssh to it, no conntracks and fancy stuff.

[reboots with cables physically unplugged]


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 20:40       ` Alexey Dobriyan
@ 2007-07-23 21:01         ` Alexey Dobriyan
  2007-07-23 21:11           ` Andrew Morton
  0 siblings, 1 reply; 31+ messages in thread
From: Alexey Dobriyan @ 2007-07-23 21:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel, netdev

On Tue, Jul 24, 2007 at 12:40:45AM +0400, Alexey Dobriyan wrote:
> > I had more complete info: http://article.gmane.org/gmane.linux.network/66966
> > 
> > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.
> > 
> > I haven't worked out where that kmap_atomic() call is coming from yet. 
> > Both traces point up into the page allocator, but I _think_ that's stack
> > gunk.
> 
> Ahh, you suspect networking.
> 
> Here, setup is 2 cheap-ass 100Mb realtek 8139 NICs, one to campus network
> receiving ~20 junk packets per second, one gathering netconsole output
> and ssh to it, no conntracks and fancy stuff.
> 
> [reboots with cables physically unplugged]

OK, I run gdb recompile, cat(1) every file in /usr/portage (shitload of
small files) with both cables unplugged. It all went fine for ~5 minutes
after that it crashed exactly same way after 10 secs after plugging one
of them.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 21:01         ` Alexey Dobriyan
@ 2007-07-23 21:11           ` Andrew Morton
  2007-07-23 21:28             ` Linus Torvalds
  2007-07-23 22:04             ` 2.6.23-rc1: BUG_ON in kmap_atomic_prot() Alexey Dobriyan
  0 siblings, 2 replies; 31+ messages in thread
From: Andrew Morton @ 2007-07-23 21:11 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Linus Torvalds, linux-kernel, netdev

On Tue, 24 Jul 2007 01:01:53 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Tue, Jul 24, 2007 at 12:40:45AM +0400, Alexey Dobriyan wrote:
> > > I had more complete info: http://article.gmane.org/gmane.linux.network/66966
> > > 
> > > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.
> > > 
> > > I haven't worked out where that kmap_atomic() call is coming from yet. 
> > > Both traces point up into the page allocator, but I _think_ that's stack
> > > gunk.
> > 
> > Ahh, you suspect networking.
> > 
> > Here, setup is 2 cheap-ass 100Mb realtek 8139 NICs, one to campus network
> > receiving ~20 junk packets per second, one gathering netconsole output
> > and ssh to it, no conntracks and fancy stuff.
> > 
> > [reboots with cables physically unplugged]
> 
> OK, I run gdb recompile, cat(1) every file in /usr/portage (shitload of
> small files) with both cables unplugged. It all went fine for ~5 minutes
> after that it crashed exactly same way after 10 secs after plugging one
> of them.

It'd be nice to get a clean trace.  Are you able to obtain the full
trace with CONFIG_FRAME_POINTER=y?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 21:11           ` Andrew Morton
@ 2007-07-23 21:28             ` Linus Torvalds
  2007-07-23 21:37               ` Sam Ravnborg
  2007-07-24 17:59               ` Adrian Bunk
  2007-07-23 22:04             ` 2.6.23-rc1: BUG_ON in kmap_atomic_prot() Alexey Dobriyan
  1 sibling, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-07-23 21:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alexey Dobriyan, linux-kernel, netdev



On Mon, 23 Jul 2007, Andrew Morton wrote:
> 
> It'd be nice to get a clean trace.  Are you able to obtain the full
> trace with CONFIG_FRAME_POINTER=y?

If you are talking about

	http://userweb.kernel.org/~akpm/dsc03659.jpg

then I think that _is_ a full trace. It's certainly not very messy, and it 
seems accurate. It's just that inlining makes it much harder to see the 
call-graphs, but that's what inlining does..

For example, missing from the call graph is

	get_page_from_freelist ->
	  buffered_rmqueue ->		[ missing - inlined ]
	    prep_new_page ->		[ missing - inlined ]
	      prep_zero_page ->		[ missing - inlined ]
		clear_highpage ->	[ missing - inlined ]
		  kmap_atomic ->	[ missing - tailcall ]
		    kmap_atomic_prot

but I'm pretty sure the call trace is good (and I'm also pretty sure gcc 
is overly aggressive at inlining, and that it causes us pain for 
debugging, but whatever)

The earlier part of the trace looks fine too.

The only odd part I see is the existence of "dput()" there, so maybe it's 
not *quite* clean and enabling frame pointers might get rid of a few bogus 
entries, but it looks pretty close to clean.

			Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 21:28             ` Linus Torvalds
@ 2007-07-23 21:37               ` Sam Ravnborg
  2007-07-24 17:59               ` Adrian Bunk
  1 sibling, 0 replies; 31+ messages in thread
From: Sam Ravnborg @ 2007-07-23 21:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel, netdev

> 
> For example, missing from the call graph is
> 
> 	get_page_from_freelist ->
> 	  buffered_rmqueue ->		[ missing - inlined ]
> 	    prep_new_page ->		[ missing - inlined ]
> 	      prep_zero_page ->		[ missing - inlined ]
> 		clear_highpage ->	[ missing - inlined ]
> 		  kmap_atomic ->	[ missing - tailcall ]
> 		    kmap_atomic_prot
> 
> (and I'm also pretty sure gcc 
> is overly aggressive at inlining, and that it causes us pain for 
> debugging, but whatever)

mm/page_alloc.c:static inline void prep_zero_page(struct page *page, int order, gfp_t gfp_flags)
include/linux/highmem.h:static inline void clear_highpage(struct page *page)

So at least two was explicit marked inline.
Now if that made I change i dunno.

	Sam

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 21:28             ` Linus Torvalds
  2007-07-23 21:37               ` Sam Ravnborg
@ 2007-07-24 17:59               ` Adrian Bunk
  2007-07-24 18:14                 ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Adrian Bunk @ 2007-07-24 17:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel, netdev

On Mon, Jul 23, 2007 at 02:28:11PM -0700, Linus Torvalds wrote:
> 
> 
> On Mon, 23 Jul 2007, Andrew Morton wrote:
> > 
> > It'd be nice to get a clean trace.  Are you able to obtain the full
> > trace with CONFIG_FRAME_POINTER=y?
> 
> If you are talking about
> 
> 	http://userweb.kernel.org/~akpm/dsc03659.jpg
> 
> then I think that _is_ a full trace. It's certainly not very messy, and it 
> seems accurate. It's just that inlining makes it much harder to see the 
> call-graphs, but that's what inlining does..
> 
> For example, missing from the call graph is
> 
> 	get_page_from_freelist ->
> 	  buffered_rmqueue ->		[ missing - inlined ]
> 	    prep_new_page ->		[ missing - inlined ]
> 	      prep_zero_page ->		[ missing - inlined ]
> 		clear_highpage ->	[ missing - inlined ]
> 		  kmap_atomic ->	[ missing - tailcall ]
> 		    kmap_atomic_prot
> 
> but I'm pretty sure the call trace is good (and I'm also pretty sure gcc 
> is overly aggressive at inlining, and that it causes us pain for 
> debugging, but whatever)
>...

For prep_zero_page() and clear_highpage() we can't blame gcc since we 
force gcc to always inline them.

buffered_rmqueue() and prep_new_page() are static functions with only 
one caller each, and for the normal non-debug case it's a really nice 
optimization to have them inlined automatically. But it might make sense 
to add -fno-inline-functions-called-once to the CFLAGS depending on some 
debug option?

> 			Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 17:59               ` Adrian Bunk
@ 2007-07-24 18:14                 ` Linus Torvalds
  2007-07-24 18:28                   ` Andrew Morton
  2007-07-26  6:09                   ` commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console H. Peter Anvin
  0 siblings, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-07-24 18:14 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel, netdev



On Tue, 24 Jul 2007, Adrian Bunk wrote:
> 
> buffered_rmqueue() and prep_new_page() are static functions with only 
> one caller each, and for the normal non-debug case it's a really nice 
> optimization to have them inlined automatically.

I'm not at all sure I agree.

Inlining big functions doesn't actually tend to generally generate any 
better code, so if gcc's logic is "single callsite - always inline", then 
that logic is likely not right.

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 18:14                 ` Linus Torvalds
@ 2007-07-24 18:28                   ` Andrew Morton
  2007-07-24 19:15                     ` Linus Torvalds
  2007-07-26  6:09                   ` commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console H. Peter Anvin
  1 sibling, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2007-07-24 18:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Adrian Bunk, Alexey Dobriyan, linux-kernel, netdev

On Tue, 24 Jul 2007 11:14:21 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Tue, 24 Jul 2007, Adrian Bunk wrote:
> > 
> > buffered_rmqueue() and prep_new_page() are static functions with only 
> > one caller each, and for the normal non-debug case it's a really nice 
> > optimization to have them inlined automatically.
> 
> I'm not at all sure I agree.
> 
> Inlining big functions doesn't actually tend to generally generate any 
> better code, so if gcc's logic is "single callsite - always inline", then 
> that logic is likely not right.
> 

fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig
vmlinux .text from 928360 up to 955362 bytes (27k larger).

A surprisingly large increase - I wonder if it did something dumb.  It
appears to still correctly inline those things which we've manually marked
inline.  hm.

It would be nice to defeat the autoinlining for debug purposes though.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 18:28                   ` Andrew Morton
@ 2007-07-24 19:15                     ` Linus Torvalds
  2007-07-24 19:40                       ` Adrian Bunk
  2007-07-24 20:27                       ` Andi Kleen
  0 siblings, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-07-24 19:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Adrian Bunk, Alexey Dobriyan, linux-kernel, netdev

On Tue, 24 Jul 2007, Andrew Morton wrote:
> 
> fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig
> vmlinux .text from 928360 up to 955362 bytes (27k larger).
> 
> A surprisingly large increase - I wonder if it did something dumb.  It
> appears to still correctly inline those things which we've manually marked
> inline.  hm.

I think inlining small enough functions is worth it, and the thing is, the 
kernel is actually pretty damn good at having lots of small functions. 
It's one of the few things I really care about from a coding style 
standpoint.

So I'm not surprised that "-fno-inline-functions-called-once" makes things 
larger, because I think it's generally a good idea to inline things that 
are just called once. But it does make things harder to debug, and the 
performance advantages become increasingly small for bigger functions.

And that's a balancing act. Do we care about performance? Yes. But do we 
care so much that it's worth inlining something like buffered_rmqueue()? 

So I would not be surprised if "-fno-inline-functions-called-once" will 
disable *all* the inlining heuristics, and say "oh, it's not an inline 
function, and it's only called once, so we won't inline it at all".

So "called once" should probably make the inlining weight bigger (ie 
inline *larger* functions than you would otherwise), it just shouldn't 
make it "infinite". It's not worth it.

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 19:15                     ` Linus Torvalds
@ 2007-07-24 19:40                       ` Adrian Bunk
  2007-07-24 19:48                         ` Linus Torvalds
  2007-07-24 20:27                       ` Andi Kleen
  1 sibling, 1 reply; 31+ messages in thread
From: Adrian Bunk @ 2007-07-24 19:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel, netdev

On Tue, Jul 24, 2007 at 12:15:48PM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 24 Jul 2007, Andrew Morton wrote:
> > 
> > fwiw, -fno-inline-functions-called-once (who knew?) takes i386 allnoconfig
> > vmlinux .text from 928360 up to 955362 bytes (27k larger).
> > 
> > A surprisingly large increase - I wonder if it did something dumb.  It
> > appears to still correctly inline those things which we've manually marked
> > inline.  hm.
> 
> I think inlining small enough functions is worth it, and the thing is, the 
> kernel is actually pretty damn good at having lots of small functions. 
> It's one of the few things I really care about from a coding style 
> standpoint.
> 
> So I'm not surprised that "-fno-inline-functions-called-once" makes things 
> larger, because I think it's generally a good idea to inline things that 
> are just called once. But it does make things harder to debug, and the 
> performance advantages become increasingly small for bigger functions.
> 
> And that's a balancing act. Do we care about performance? Yes.

When using CONFIG_CC_OPTIMIZE_FOR_SIZE=y we even actively tell gcc that 
we only care about size and do not care about performance...

> But do we 
> care so much that it's worth inlining something like buffered_rmqueue()? 
>...

Where is the problem with having buffered_rmqueue() inlined?

> 		Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 19:40                       ` Adrian Bunk
@ 2007-07-24 19:48                         ` Linus Torvalds
  2007-07-26 18:07                           ` Adrian Bunk
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2007-07-24 19:48 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel, netdev

On Tue, 24 Jul 2007, Adrian Bunk wrote:
> 
> > But do we 
> > care so much that it's worth inlining something like buffered_rmqueue()? 
> >...
> 
> Where is the problem with having buffered_rmqueue() inlined?

In this case, it was a pain to just even try to find the call chain, or 
read the asm.

I would encourage lots of kernel hackers to read the assembler code gcc 
generates. I suspect people being aware of code generation issues (and 
writing their code with that in mind) is a *much* bigger performance 
impact than gcc inlining random functions.

So maybe I'm old-fashioned and crazy, but "readability of the asm result" 
actually is a worthwhile goal. Not because we care directly, but because 
I'd like to encourage people to do it, due to the *indirect* benefits.

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 19:48                         ` Linus Torvalds
@ 2007-07-26 18:07                           ` Adrian Bunk
  2007-07-26 18:19                             ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Adrian Bunk @ 2007-07-26 18:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel, netdev

On Tue, Jul 24, 2007 at 12:48:13PM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 24 Jul 2007, Adrian Bunk wrote:
> > 
> > > But do we 
> > > care so much that it's worth inlining something like buffered_rmqueue()? 
> > >...
> > 
> > Where is the problem with having buffered_rmqueue() inlined?
> 
> In this case, it was a pain to just even try to find the call chain, or 
> read the asm.

Optimization versus debugging is a common issue...

As I said, it might make sense to disable this optimization depending on 
some debugging option.

> I would encourage lots of kernel hackers to read the assembler code gcc 
> generates. I suspect people being aware of code generation issues (and 
> writing their code with that in mind) is a *much* bigger performance 
> impact than gcc inlining random functions.
> 
> So maybe I'm old-fashioned and crazy, but "readability of the asm result" 
> actually is a worthwhile goal. Not because we care directly, but because 
> I'd like to encourage people to do it, due to the *indirect* benefits.

This would lead to people trying to optimize code for one gcc version - 
and the code might stay this way for 10 years.

People should write readable C code. This also has the best chances of 
resulting in good performance with the next gcc version on the next 
generation hardware.

> 		Linus

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-26 18:07                           ` Adrian Bunk
@ 2007-07-26 18:19                             ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-07-26 18:19 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Andrew Morton, Alexey Dobriyan, linux-kernel, netdev



On Thu, 26 Jul 2007, Adrian Bunk wrote:
> > 
> > So maybe I'm old-fashioned and crazy, but "readability of the asm result" 
> > actually is a worthwhile goal. Not because we care directly, but because 
> > I'd like to encourage people to do it, due to the *indirect* benefits.
> 
> This would lead to people trying to optimize code for one gcc version - 
> and the code might stay this way for 10 years.

No. The fact is, code that is easy to optimize is easy to optimize.

It has _nothing_ to do with gcc versions. 

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 19:15                     ` Linus Torvalds
  2007-07-24 19:40                       ` Adrian Bunk
@ 2007-07-24 20:27                       ` Andi Kleen
  2007-07-24 19:45                         ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Andi Kleen @ 2007-07-24 20:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Adrian Bunk, Alexey Dobriyan, linux-kernel, netdev

Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> So "called once" should probably make the inlining weight bigger (ie 
> inline *larger* functions than you would otherwise), it just shouldn't 
> make it "infinite". It's not worth it.

There's probably a --param where it can be tweaked exactly. The
problem is that --params tend to be very gcc version specific
and might do something completely different on a newer or 
older version. So it's better not to use them.

-Andi

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 20:27                       ` Andi Kleen
@ 2007-07-24 19:45                         ` Linus Torvalds
  0 siblings, 0 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-07-24 19:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Adrian Bunk, Alexey Dobriyan, linux-kernel, netdev



On Tue, 24 Jul 2007, Andi Kleen wrote:
> 
> There's probably a --param where it can be tweaked exactly. The
> problem is that --params tend to be very gcc version specific
> and might do something completely different on a newer or 
> older version. So it's better not to use them.

I agree wholeheartedly with that sentiment. We've tried at times (because 
some gcc snapshots made some *truly* insane choices for a while), and 
maybe we still have some around. Not worth the pain.

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console
  2007-07-24 18:14                 ` Linus Torvalds
  2007-07-24 18:28                   ` Andrew Morton
@ 2007-07-26  6:09                   ` H. Peter Anvin
  1 sibling, 0 replies; 31+ messages in thread
From: H. Peter Anvin @ 2007-07-26  6:09 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Adrian Bunk, Andrew Morton, Alexey Dobriyan, linux-kernel, netdev

Jeff Garzik wrote:
> 
> On Tue, 24 Jul 2007, Adrian Bunk wrote:
>> buffered_rmqueue() and prep_new_page() are static functions with only 
>> one caller each, and for the normal non-debug case it's a really nice 
>> optimization to have them inlined automatically.
> 
> I'm not at all sure I agree.
> 
> Inlining big functions doesn't actually tend to generally generate any 
> better code, so if gcc's logic is "single callsite - always inline", then 
> that logic is likely not right.
> 

Only up to a threshold, as far as I know.

	-hpa

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 21:11           ` Andrew Morton
  2007-07-23 21:28             ` Linus Torvalds
@ 2007-07-23 22:04             ` Alexey Dobriyan
  2007-07-23 22:27               ` Andrew Morton
  1 sibling, 1 reply; 31+ messages in thread
From: Alexey Dobriyan @ 2007-07-23 22:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel, netdev

On Mon, Jul 23, 2007 at 02:11:37PM -0700, Andrew Morton wrote:
> On Tue, 24 Jul 2007 01:01:53 +0400
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > On Tue, Jul 24, 2007 at 12:40:45AM +0400, Alexey Dobriyan wrote:
> > > > I had more complete info: http://article.gmane.org/gmane.linux.network/66966
> > > > 
> > > > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.
> > > > 
> > > > I haven't worked out where that kmap_atomic() call is coming from yet. 
> > > > Both traces point up into the page allocator, but I _think_ that's stack
> > > > gunk.
> > > 
> > > Ahh, you suspect networking.
> > > 
> > > Here, setup is 2 cheap-ass 100Mb realtek 8139 NICs, one to campus network
> > > receiving ~20 junk packets per second, one gathering netconsole output
> > > and ssh to it, no conntracks and fancy stuff.
> > > 
> > > [reboots with cables physically unplugged]
> > 
> > OK, I run gdb recompile, cat(1) every file in /usr/portage (shitload of
> > small files) with both cables unplugged. It all went fine for ~5 minutes
> > after that it crashed exactly same way after 10 secs after plugging one
> > of them.
> 
> It'd be nice to get a clean trace.  Are you able to obtain the full
> trace with CONFIG_FRAME_POINTER=y?

Sorry, no camera shot, finding camera requires wakening up M. :)

It took longer that usual, but here it is

	kmap_atomic
	get_page_from_freelist
	__alloc_pages
	cache_alloc_refill
	__alloc_pages
	cache_alloc_refill
	kmem_cache_alloc
	dst_alloc
	ip_route_input
	ip_rcv
	netif_receive_skb
	rtl8139_poll
	net_rx_action
	__do_softirq
	do_softirq
	irq_exit
	do_IRQ
	common_interrupt
	handle_mm_fault
	do_page_fault
	error_core

much more loaded x86_64 box near also running 2.6.23-rc1 with debugging
turned on, using atl1 driver doesn't experience any crashes.

And I found 2.6.22-b91cba52e9b7b3f1c0037908a192d93a869ca9e5-x entry on
top of grub config which means b91cba52e9b7b3f1c0037908a192d93a869ca9e5
_without_ any debugging was OK.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 22:04             ` 2.6.23-rc1: BUG_ON in kmap_atomic_prot() Alexey Dobriyan
@ 2007-07-23 22:27               ` Andrew Morton
  2007-07-24  5:20                 ` Alexey Dobriyan
  2007-07-24  8:17                 ` Jens Axboe
  0 siblings, 2 replies; 31+ messages in thread
From: Andrew Morton @ 2007-07-23 22:27 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Linus Torvalds, linux-kernel, netdev

On Tue, 24 Jul 2007 02:04:46 +0400
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Mon, Jul 23, 2007 at 02:11:37PM -0700, Andrew Morton wrote:
> > On Tue, 24 Jul 2007 01:01:53 +0400
> > Alexey Dobriyan <adobriyan@gmail.com> wrote:
> > 
> > > On Tue, Jul 24, 2007 at 12:40:45AM +0400, Alexey Dobriyan wrote:
> > > > > I had more complete info: http://article.gmane.org/gmane.linux.network/66966
> > > > > 
> > > > > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.
> > > > > 
> > > > > I haven't worked out where that kmap_atomic() call is coming from yet. 
> > > > > Both traces point up into the page allocator, but I _think_ that's stack
> > > > > gunk.
> > > > 
> > > > Ahh, you suspect networking.
> > > > 
> > > > Here, setup is 2 cheap-ass 100Mb realtek 8139 NICs, one to campus network
> > > > receiving ~20 junk packets per second, one gathering netconsole output
> > > > and ssh to it, no conntracks and fancy stuff.
> > > > 
> > > > [reboots with cables physically unplugged]
> > > 
> > > OK, I run gdb recompile, cat(1) every file in /usr/portage (shitload of
> > > small files) with both cables unplugged. It all went fine for ~5 minutes
> > > after that it crashed exactly same way after 10 secs after plugging one
> > > of them.
> > 
> > It'd be nice to get a clean trace.  Are you able to obtain the full
> > trace with CONFIG_FRAME_POINTER=y?
> 
> Sorry, no camera shot, finding camera requires wakening up M. :)
> 
> It took longer that usual, but here it is
> 
> 	kmap_atomic
> 	get_page_from_freelist
> 	__alloc_pages
> 	cache_alloc_refill
> 	__alloc_pages
> 	cache_alloc_refill
> 	kmem_cache_alloc
> 	dst_alloc
> 	ip_route_input
> 	ip_rcv
> 	netif_receive_skb
> 	rtl8139_poll
> 	net_rx_action
> 	__do_softirq
> 	do_softirq
> 	irq_exit
> 	do_IRQ
> 	common_interrupt
> 	handle_mm_fault
> 	do_page_fault
> 	error_core
> 
> much more loaded x86_64 box near also running 2.6.23-rc1 with debugging
> turned on, using atl1 driver doesn't experience any crashes.
> 
> And I found 2.6.22-b91cba52e9b7b3f1c0037908a192d93a869ca9e5-x entry on
> top of grub config which means b91cba52e9b7b3f1c0037908a192d93a869ca9e5
> _without_ any debugging was OK.

I worked out that the crash I saw was in

        BUG_ON(!pte_none(*(kmap_pte-idx)));

in the read of kmap_pte[idx].  Which would be weird as the caller is using 
a literal KM_USER0.

So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am
unable to reproduce it now).

If that BUG_ON _is_ triggering then it might indicate that someone is doing
a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0.

If they're holding an atomic kmap then they'll be running in_atomic so it
is unlikely that they accidentally added __GFP_WAIT because lots of people
would be getting lots of might_sleep() warnings.

Hence that first VM_BUG_ON in prep_zero_page() _should_ be triggering.

Do you have CONFIG_DEBUG_VM enabled?



Also, it might be useful to apply -mm's kmap_atomic-debugging.patch.  it
will detect lots of abuse.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 22:27               ` Andrew Morton
@ 2007-07-24  5:20                 ` Alexey Dobriyan
  2007-07-24  8:17                 ` Jens Axboe
  1 sibling, 0 replies; 31+ messages in thread
From: Alexey Dobriyan @ 2007-07-24  5:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linus Torvalds, linux-kernel, netdev

On Mon, Jul 23, 2007 at 03:27:12PM -0700, Andrew Morton wrote:
> On Tue, 24 Jul 2007 02:04:46 +0400
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > On Mon, Jul 23, 2007 at 02:11:37PM -0700, Andrew Morton wrote:
> > > On Tue, 24 Jul 2007 01:01:53 +0400
> > > Alexey Dobriyan <adobriyan@gmail.com> wrote:
> > > 
> > > > On Tue, Jul 24, 2007 at 12:40:45AM +0400, Alexey Dobriyan wrote:
> > > > > > I had more complete info: http://article.gmane.org/gmane.linux.network/66966
> > > > > > 
> > > > > > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.
> > > > > > 
> > > > > > I haven't worked out where that kmap_atomic() call is coming from yet. 
> > > > > > Both traces point up into the page allocator, but I _think_ that's stack
> > > > > > gunk.
> > > > > 
> > > > > Ahh, you suspect networking.
> > > > > 
> > > > > Here, setup is 2 cheap-ass 100Mb realtek 8139 NICs, one to campus network
> > > > > receiving ~20 junk packets per second, one gathering netconsole output
> > > > > and ssh to it, no conntracks and fancy stuff.
> > > > > 
> > > > > [reboots with cables physically unplugged]
> > > > 
> > > > OK, I run gdb recompile, cat(1) every file in /usr/portage (shitload of
> > > > small files) with both cables unplugged. It all went fine for ~5 minutes
> > > > after that it crashed exactly same way after 10 secs after plugging one
> > > > of them.
> > > 
> > > It'd be nice to get a clean trace.  Are you able to obtain the full
> > > trace with CONFIG_FRAME_POINTER=y?
> > 
> > Sorry, no camera shot, finding camera requires wakening up M. :)
> > 
> > It took longer that usual, but here it is
> > 
> > 	kmap_atomic
> > 	get_page_from_freelist
> > 	__alloc_pages
> > 	cache_alloc_refill
> > 	__alloc_pages
> > 	cache_alloc_refill
> > 	kmem_cache_alloc
> > 	dst_alloc
> > 	ip_route_input
> > 	ip_rcv
> > 	netif_receive_skb
> > 	rtl8139_poll
> > 	net_rx_action
> > 	__do_softirq
> > 	do_softirq
> > 	irq_exit
> > 	do_IRQ
> > 	common_interrupt
> > 	handle_mm_fault
> > 	do_page_fault
> > 	error_core
> > 
> > much more loaded x86_64 box near also running 2.6.23-rc1 with debugging
> > turned on, using atl1 driver doesn't experience any crashes.
> > 
> > And I found 2.6.22-b91cba52e9b7b3f1c0037908a192d93a869ca9e5-x entry on
> > top of grub config which means b91cba52e9b7b3f1c0037908a192d93a869ca9e5
> > _without_ any debugging was OK.
> 
> I worked out that the crash I saw was in
> 
>         BUG_ON(!pte_none(*(kmap_pte-idx)));
> 
> in the read of kmap_pte[idx].  Which would be weird as the caller is using 
> a literal KM_USER0.
> 
> So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am
> unable to reproduce it now).
> 
> If that BUG_ON _is_ triggering then it might indicate that someone is doing
> a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0.
> 
> If they're holding an atomic kmap then they'll be running in_atomic so it
> is unlikely that they accidentally added __GFP_WAIT because lots of people
> would be getting lots of might_sleep() warnings.
> 
> Hence that first VM_BUG_ON in prep_zero_page() _should_ be triggering.
> 
> Do you have CONFIG_DEBUG_VM enabled?

Yes.

> Also, it might be useful to apply -mm's kmap_atomic-debugging.patch.  it
> will detect lots of abuse.

I hit it only once with this patch applied, but there were no additional
warnings.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 22:27               ` Andrew Morton
  2007-07-24  5:20                 ` Alexey Dobriyan
@ 2007-07-24  8:17                 ` Jens Axboe
  2007-07-24  8:22                   ` Jens Axboe
  1 sibling, 1 reply; 31+ messages in thread
From: Jens Axboe @ 2007-07-24  8:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexey Dobriyan, Linus Torvalds, linux-kernel, netdev,
	mark.fasheh

On Mon, Jul 23 2007, Andrew Morton wrote:
> I worked out that the crash I saw was in
> 
>         BUG_ON(!pte_none(*(kmap_pte-idx)));
> 
> in the read of kmap_pte[idx].  Which would be weird as the caller is using 
> a literal KM_USER0.
> 
> So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am
> unable to reproduce it now).
> 
> If that BUG_ON _is_ triggering then it might indicate that someone is doing
> a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0.

Or doing double kunmaps, or doing a kunmap_atomic() on the page, not the
address. I've seen both of those end up triggering that BUG_ON() in a
later kmap.

Looking over the 2.6.22..2.6.23-rc1 diff, I found one such error in
ocfs2 at least. But you are probably not using that, so I'll keep
looking...

---

[PATCH] ocfs2: bad kunmap_atomic()

kunmap_atomic() takes the virtual address, not the mapped page as
argument.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 5727cd1..c4034f6 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2153,7 +2153,7 @@ static int ocfs2_splice_write_actor(struct pipe_inode_info *pipe,
 	src = buf->ops->map(pipe, buf, 1);
 	dst = kmap_atomic(page, KM_USER1);
 	memcpy(dst + offset, src + buf->offset, count);
-	kunmap_atomic(page, KM_USER1);
+	kunmap_atomic(dst, KM_USER1);
 	buf->ops->unmap(pipe, buf, src);
 
 	copied = ocfs2_write_end(file, file->f_mapping, sd->pos, count, count,

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24  8:17                 ` Jens Axboe
@ 2007-07-24  8:22                   ` Jens Axboe
  2007-07-24  8:34                     ` Andrew Morton
  2007-07-24 13:55                     ` Dan Williams
  0 siblings, 2 replies; 31+ messages in thread
From: Jens Axboe @ 2007-07-24  8:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexey Dobriyan, Linus Torvalds, linux-kernel, netdev,
	mark.fasheh, dan.j.williams

On Tue, Jul 24 2007, Jens Axboe wrote:
> On Mon, Jul 23 2007, Andrew Morton wrote:
> > I worked out that the crash I saw was in
> > 
> >         BUG_ON(!pte_none(*(kmap_pte-idx)));
> > 
> > in the read of kmap_pte[idx].  Which would be weird as the caller is using 
> > a literal KM_USER0.
> > 
> > So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am
> > unable to reproduce it now).
> > 
> > If that BUG_ON _is_ triggering then it might indicate that someone is doing
> > a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0.
> 
> Or doing double kunmaps, or doing a kunmap_atomic() on the page, not the
> address. I've seen both of those end up triggering that BUG_ON() in a
> later kmap.
> 
> Looking over the 2.6.22..2.6.23-rc1 diff, I found one such error in
> ocfs2 at least. But you are probably not using that, so I'll keep
> looking...

What about the new async crypto stuff? I've been looking, but is it
guarenteed that async_memcpy() runs in process context with interrupts
enabled always? If not, there's a km type bug there.

In general, I think the highmem stuff could do with more safety checks:

- People ALWAYS get the atomic unmaps wrong, passing in the page instead
  of the address. I've seen tons of these. And since kunmap_atomic()
  takes a void pointer, nobody notices until it goes boom.
- People easily get the km type wrong - they use KM_USERx in interrupt
  context, or one of the irq variants without disabling interrupts.

If we could just catch these two types of bugs, we've got a lot of these
problems covered.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24  8:22                   ` Jens Axboe
@ 2007-07-24  8:34                     ` Andrew Morton
  2007-07-24 14:00                       ` Dan Williams
  2007-07-24 13:55                     ` Dan Williams
  1 sibling, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2007-07-24  8:34 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Alexey Dobriyan, Linus Torvalds, linux-kernel, netdev,
	mark.fasheh, dan.j.williams, Nelson, Shannon

On Tue, 24 Jul 2007 10:22:07 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:

> On Tue, Jul 24 2007, Jens Axboe wrote:
> > On Mon, Jul 23 2007, Andrew Morton wrote:
> > > I worked out that the crash I saw was in
> > > 
> > >         BUG_ON(!pte_none(*(kmap_pte-idx)));
> > > 
> > > in the read of kmap_pte[idx].  Which would be weird as the caller is using 
> > > a literal KM_USER0.
> > > 
> > > So maybe I goofed, and that BUG_ON is triggering (it scrolled off, and I am
> > > unable to reproduce it now).
> > > 
> > > If that BUG_ON _is_ triggering then it might indicate that someone is doing
> > > a __GFP_HIGHMEM|__GFP_ZERO allocation while holding KM_USER0.
> > 
> > Or doing double kunmaps, or doing a kunmap_atomic() on the page, not the
> > address. I've seen both of those end up triggering that BUG_ON() in a
> > later kmap.
> > 
> > Looking over the 2.6.22..2.6.23-rc1 diff, I found one such error in
> > ocfs2 at least. But you are probably not using that, so I'll keep
> > looking...
> 
> What about the new async crypto stuff? I've been looking, but is it
> guarenteed that async_memcpy() runs in process context with interrupts
> enabled always? If not, there's a km type bug there.

I think Shannon maintains that now.

> In general, I think the highmem stuff could do with more safety checks:
> 
> - People ALWAYS get the atomic unmaps wrong, passing in the page instead
>   of the address. I've seen tons of these. And since kunmap_atomic()
>   takes a void pointer, nobody notices until it goes boom.

yeah, it's a real trap.  For a while I had a patch which converted
kmap_atomic() to return a char*, and kunmap_atomic() to take a char*, so
misuse got compile warnings.  But it was a pig to maintain so I tossed it. 
It'd be somewhat easier to do now we've converted a lot of callers to
clear_user_highpage() and similar.

> - People easily get the km type wrong - they use KM_USERx in interrupt
>   context, or one of the irq variants without disabling interrupts.
> 
> If we could just catch these two types of bugs, we've got a lot of these
> problems covered.

Here's the -mm debug patch:


diff -puN arch/i386/mm/highmem.c~kmap_atomic-debugging arch/i386/mm/highmem.c
--- a/arch/i386/mm/highmem.c~kmap_atomic-debugging
+++ a/arch/i386/mm/highmem.c
@@ -30,7 +30,44 @@ void *kmap_atomic(struct page *page, enu
 {
 	enum fixed_addresses idx;
 	unsigned long vaddr;
+	static unsigned warn_count = 10;
 
+	if (unlikely(warn_count == 0))
+		goto skip;
+
+	if (unlikely(in_interrupt())) {
+		if (in_irq()) {
+			if (type != KM_IRQ0 && type != KM_IRQ1 &&
+			    type != KM_BIO_SRC_IRQ && type != KM_BIO_DST_IRQ &&
+			    type != KM_BOUNCE_READ) {
+				WARN_ON(1);
+				warn_count--;
+			}
+		} else if (!irqs_disabled()) {	/* softirq */
+			if (type != KM_IRQ0 && type != KM_IRQ1 &&
+			    type != KM_SOFTIRQ0 && type != KM_SOFTIRQ1 &&
+			    type != KM_SKB_SUNRPC_DATA &&
+			    type != KM_SKB_DATA_SOFTIRQ &&
+			    type != KM_BOUNCE_READ) {
+				WARN_ON(1);
+				warn_count--;
+			}
+		}
+	}
+
+	if (type == KM_IRQ0 || type == KM_IRQ1 || type == KM_BOUNCE_READ ||
+			type == KM_BIO_SRC_IRQ || type == KM_BIO_DST_IRQ) {
+		if (!irqs_disabled()) {
+			WARN_ON(1);
+			warn_count--;
+		}
+	} else if (type == KM_SOFTIRQ0 || type == KM_SOFTIRQ1) {
+		if (irq_count() == 0 && !irqs_disabled()) {
+			WARN_ON(1);
+			warn_count--;
+		}
+	}
+skip:
 	/* even !CONFIG_PREEMPT needs this, for in_atomic in do_page_fault */
 	pagefault_disable();
 
_


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24  8:34                     ` Andrew Morton
@ 2007-07-24 14:00                       ` Dan Williams
  0 siblings, 0 replies; 31+ messages in thread
From: Dan Williams @ 2007-07-24 14:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jens Axboe, Alexey Dobriyan, Linus Torvalds, linux-kernel, netdev,
	mark.fasheh, Nelson, Shannon

On 7/24/07, Andrew Morton <akpm@linux-foundation.org> wrote:
[...]
> > What about the new async crypto stuff? I've been looking, but is it
> > guarenteed that async_memcpy() runs in process context with interrupts
> > enabled always? If not, there's a km type bug there.
>
> I think Shannon maintains that now.
>

I am looking after the async_tx API, I will send a patch to update
MAINTAINERS shortly.

--
Dan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24  8:22                   ` Jens Axboe
  2007-07-24  8:34                     ` Andrew Morton
@ 2007-07-24 13:55                     ` Dan Williams
  1 sibling, 0 replies; 31+ messages in thread
From: Dan Williams @ 2007-07-24 13:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Andrew Morton, Alexey Dobriyan, Linus Torvalds, linux-kernel,
	netdev, mark.fasheh

On 7/24/07, Jens Axboe <jens.axboe@oracle.com> wrote:
> What about the new async crypto stuff? I've been looking, but is it
> guarenteed that async_memcpy() runs in process context with interrupts
> enabled always? If not, there's a km type bug there.
>

Currently the only user is the MD raid456 driver, and yes, it only
performs copies from the handle_stripe routine which is always run in
process context with interrupts enabled.  However this is not
documented.  Would it be advisable to add a WARN_ON for this
condition?

> In general, I think the highmem stuff could do with more safety checks:
>
> - People ALWAYS get the atomic unmaps wrong, passing in the page instead
>   of the address. I've seen tons of these. And since kunmap_atomic()
>   takes a void pointer, nobody notices until it goes boom.
> - People easily get the km type wrong - they use KM_USERx in interrupt
>   context, or one of the irq variants without disabling interrupts.
>
> If we could just catch these two types of bugs, we've got a lot of these
> problems covered.
>
>
> --
> Jens Axboe
>
--
Dan

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-23 20:24     ` 2.6.23-rc1: BUG_ON in kmap_atomic_prot() Andrew Morton
  2007-07-23 20:40       ` Alexey Dobriyan
@ 2007-07-24 10:01       ` Mike Galbraith
  2007-07-24 10:37         ` Mike Galbraith
  2007-07-24 16:28         ` Andrew Morton
  1 sibling, 2 replies; 31+ messages in thread
From: Mike Galbraith @ 2007-07-24 10:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alexey Dobriyan, Linus Torvalds, linux-kernel, netdev

On Mon, 2007-07-23 at 13:24 -0700, Andrew Morton wrote:

> You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.

My box bugged during boot the first time I booted 23-rc1, but nothing
made it to the console, and I didn't have a serial console running.  I
didn't have DEBUG_PAGEALLOC or friends set.

> I haven't worked out where that kmap_atomic() call is coming from yet. 
> Both traces point up into the page allocator, but I _think_ that's stack
> gunk.

I just enabled all debug options, and was just rewarded with the below.

[  119.079531] eth1: link up, 100Mbps, full-duplex, lpa 0x45E1
[  119.558867] ------------[ cut here ]------------
[  119.572197] kernel BUG at arch/i386/mm/highmem.c:38!
[  119.585804] invalid opcode: 0000 [#1]
[  119.598013] PREEMPT SMP DEBUG_PAGEALLOC
[  119.610103] Modules linked in: edd button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 nls_utf8 snd_intel8x0 snd_ac97_codec ac97_bus snd_mpu401 snd_pcm prism54 snd_timer snd_mpu401_uart snd_rawmidi snd_seq_device snd intel_agp agpgart soundcore snd_page_alloc i2c_i801 fan thermal processor
[  119.698063] CPU:    1
[  119.698065] EIP:    0060:[<c011cd2d>]    Not tainted VLI
[  119.698067] EFLAGS: 00010006   (2.6.23-rc1-smp #75)
[  119.736358] EIP is at kmap_atomic_prot+0xa7/0xab
[  119.749647] eax: 3d07f163   ebx: c166db80   ecx: c0750e60   edx: 00000007
[  119.765417] esi: 00000022   edi: 00000163   ebp: c069dcd4   esp: c069dcc8
[  119.781273] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
[  119.796378] Process udevd (pid: 4775, ti=c069d000 task=f31aea60 task.ti=f477d000)
[  119.804068] Stack: c166db80 00000000 c166db80 c069dcdc c011cd3f c069dd40 c015b6e0 00000001 
[  119.822272]        00000044 00000163 00000000 00000001 c165f4e0 00000001 c165f4e0 00000001 
[  119.840762]        00000000 00028020 c061e71c c166db80 00000046 00000080 00000001 c011e4de 
[  119.859389] Call Trace:
[  119.881302]  [<c0105144>] show_trace_log_lvl+0x1a/0x30
[  119.896319]  [<c01051ff>] show_stack_log_lvl+0xa5/0xca
[  119.911171]  [<c0105420>] show_registers+0x1fc/0x343
[  119.925756]  [<c0105689>] die+0x122/0x249
[  119.939241]  [<c0105834>] do_trap+0x84/0xad
[  119.952897]  [<c0105b1c>] do_invalid_op+0x88/0x92
[  119.967118]  [<c04cf3c2>] error_code+0x72/0x78
[  119.980948]  [<c011cd3f>] kmap_atomic+0xe/0x10
[  119.994642]  [<c015b6e0>] get_page_from_freelist+0x39e/0x45e
[  120.009485]  [<c015b7fb>] __alloc_pages+0x5b/0x2db
[  120.023342]  [<c0172872>] cache_alloc_refill+0x380/0x6f2
[  120.037623]  [<c0172e7a>] kmem_cache_alloc+0xa1/0xa5
[  120.051426]  [<c03fb397>] neigh_create+0x5f/0x506
[  120.064894]  [<c046e25d>] ndisc_dst_alloc+0x122/0x151
[  120.078769]  [<c0471b0b>] __ndisc_send+0x8d/0x4fa
[  120.092340]  [<c0472915>] ndisc_send_ns+0x5f/0x7d
[  120.105848]  [<c0469ff5>] addrconf_dad_timer+0xdb/0xe0
[  120.119758]  [<c012f8a0>] run_timer_softirq+0x130/0x191
[  120.133717]  [<c012c06d>] __do_softirq+0x76/0xe4
[  120.147475]  [<c0106b48>] do_softirq+0x63/0xac
[  120.147488]  [<c012bff5>] 

(gdb) list *neigh_create+0x5f
0xc03fb397 is in neigh_create (include/linux/slab.h:259).
254     /*
255      * Shortcuts
256      */
257     static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
258     {
259             return kmem_cache_alloc(k, flags | __GFP_ZERO);
260     }
261
262     /**
263      * kzalloc - allocate memory. The memory is set to zero.
(gdb) list *kmem_cache_alloc+0xa1
0xc0172e7a is in kmem_cache_alloc (mm/slab.c:3176).
3171                    STATS_INC_ALLOCHIT(cachep);
3172                    ac->touched = 1;
3173                    objp = ac->entry[--ac->avail];
3174            } else {
3175                    STATS_INC_ALLOCMISS(cachep);
3176                    objp = cache_alloc_refill(cachep, flags);
3177            }
3178            return objp;
3179    }
3180
(gdb) list *cache_alloc_refill+0x380
0xc0172872 is in cache_alloc_refill (include/linux/gfp.h:154).
149
150             /* Unknown node is current node */
151             if (nid < 0)
152                     nid = numa_node_id();
153
154             return __alloc_pages(gfp_mask, order,
155                     NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
156     }
157
158     #ifdef CONFIG_NUMA
(gdb) list *__alloc_pages+0x5b
0xc015b7fb is in __alloc_pages (mm/page_alloc.c:1248).
1243            if (unlikely(*z == NULL)) {
1244                    /* Should this ever happen?? */
1245                    return NULL;
1246            }
1247
1248            page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
1249                                    zonelist, ALLOC_WMARK_LOW|ALLOC_CPUSET);
1250            if (page)
1251                    goto got_pg;
1252
(gdb) list *get_page_from_freelist+0x39e
0xc015b6e0 is in get_page_from_freelist (include/linux/highmem.h:122).
117             return __alloc_zeroed_user_highpage(__GFP_MOVABLE, vma, vaddr);
118     }
119
120     static inline void clear_highpage(struct page *page)
121     {
122             void *kaddr = kmap_atomic(page, KM_USER0);
123             clear_page(kaddr);
124             kunmap_atomic(kaddr, KM_USER0);
125     }
126




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 10:01       ` Mike Galbraith
@ 2007-07-24 10:37         ` Mike Galbraith
  2007-07-24 16:28         ` Andrew Morton
  1 sibling, 0 replies; 31+ messages in thread
From: Mike Galbraith @ 2007-07-24 10:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Alexey Dobriyan, Linus Torvalds, linux-kernel, netdev

On Tue, 2007-07-24 at 12:01 +0200, Mike Galbraith wrote:
> On Mon, 2007-07-23 at 13:24 -0700, Andrew Morton wrote:
> 
> > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.
> 
> My box bugged during boot the first time I booted 23-rc1, but nothing
> made it to the console, and I didn't have a serial console running.  I
> didn't have DEBUG_PAGEALLOC or friends set.
> 
> > I haven't worked out where that kmap_atomic() call is coming from yet. 
> > Both traces point up into the page allocator, but I _think_ that's stack
> > gunk.
> 
> I just enabled all debug options, and was just rewarded with the below.

Hm.  I just also experienced filesystem corruption when I tried to send
from that kernel, and it bugged in the process.  My mount table ended up
in /etc/resolv.conf along with some binary goop, making nscd rather
unhappy after reboot.  fsck time.

	.Mike


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 10:01       ` Mike Galbraith
  2007-07-24 10:37         ` Mike Galbraith
@ 2007-07-24 16:28         ` Andrew Morton
  2007-07-24 18:25           ` Linus Torvalds
  1 sibling, 1 reply; 31+ messages in thread
From: Andrew Morton @ 2007-07-24 16:28 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Alexey Dobriyan, Linus Torvalds, linux-kernel, netdev,
	Christoph Lameter

On Tue, 24 Jul 2007 12:01:09 +0200 Mike Galbraith <efault@gmx.de> wrote:

> On Mon, 2007-07-23 at 13:24 -0700, Andrew Morton wrote:
> 
> > You're using DEBUG_PAGEALLOC, but I was not, so I think we can rule that out.
> 
> My box bugged during boot the first time I booted 23-rc1, but nothing
> made it to the console, and I didn't have a serial console running.  I
> didn't have DEBUG_PAGEALLOC or friends set.
> 
> > I haven't worked out where that kmap_atomic() call is coming from yet. 
> > Both traces point up into the page allocator, but I _think_ that's stack
> > gunk.
> 
> I just enabled all debug options, and was just rewarded with the below.

doh.  It's a slab bug.

> [  119.079531] eth1: link up, 100Mbps, full-duplex, lpa 0x45E1
> [  119.558867] ------------[ cut here ]------------
> [  119.572197] kernel BUG at arch/i386/mm/highmem.c:38!
> [  119.585804] invalid opcode: 0000 [#1]
> [  119.598013] PREEMPT SMP DEBUG_PAGEALLOC
> [  119.610103] Modules linked in: edd button battery ac ip6t_REJECT xt_tcpudp ipt_REJECT xt_state iptable_mangle iptable_nat nf_nat iptable_filter ip6table_mangle nf_conntrack_ipv4 nf_conntrack nfnetlink ip_tables ip6table_filter ip6_tables x_tables nls_iso8859_1 nls_cp437 nls_utf8 snd_intel8x0 snd_ac97_codec ac97_bus snd_mpu401 snd_pcm prism54 snd_timer snd_mpu401_uart snd_rawmidi snd_seq_device snd intel_agp agpgart soundcore snd_page_alloc i2c_i801 fan thermal processor
> [  119.698063] CPU:    1
> [  119.698065] EIP:    0060:[<c011cd2d>]    Not tainted VLI
> [  119.698067] EFLAGS: 00010006   (2.6.23-rc1-smp #75)
> [  119.736358] EIP is at kmap_atomic_prot+0xa7/0xab
> [  119.749647] eax: 3d07f163   ebx: c166db80   ecx: c0750e60   edx: 00000007
> [  119.765417] esi: 00000022   edi: 00000163   ebp: c069dcd4   esp: c069dcc8
> [  119.781273] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
> [  119.796378] Process udevd (pid: 4775, ti=c069d000 task=f31aea60 task.ti=f477d000)
> [  119.804068] Stack: c166db80 00000000 c166db80 c069dcdc c011cd3f c069dd40 c015b6e0 00000001 
> [  119.822272]        00000044 00000163 00000000 00000001 c165f4e0 00000001 c165f4e0 00000001 
> [  119.840762]        00000000 00028020 c061e71c c166db80 00000046 00000080 00000001 c011e4de 
> [  119.859389] Call Trace:
> [  119.881302]  [<c0105144>] show_trace_log_lvl+0x1a/0x30
> [  119.896319]  [<c01051ff>] show_stack_log_lvl+0xa5/0xca
> [  119.911171]  [<c0105420>] show_registers+0x1fc/0x343
> [  119.925756]  [<c0105689>] die+0x122/0x249
> [  119.939241]  [<c0105834>] do_trap+0x84/0xad
> [  119.952897]  [<c0105b1c>] do_invalid_op+0x88/0x92
> [  119.967118]  [<c04cf3c2>] error_code+0x72/0x78
> [  119.980948]  [<c011cd3f>] kmap_atomic+0xe/0x10
> [  119.994642]  [<c015b6e0>] get_page_from_freelist+0x39e/0x45e
> [  120.009485]  [<c015b7fb>] __alloc_pages+0x5b/0x2db
> [  120.023342]  [<c0172872>] cache_alloc_refill+0x380/0x6f2
> [  120.037623]  [<c0172e7a>] kmem_cache_alloc+0xa1/0xa5
> [  120.051426]  [<c03fb397>] neigh_create+0x5f/0x506
> [  120.064894]  [<c046e25d>] ndisc_dst_alloc+0x122/0x151
> [  120.078769]  [<c0471b0b>] __ndisc_send+0x8d/0x4fa
> [  120.092340]  [<c0472915>] ndisc_send_ns+0x5f/0x7d
> [  120.105848]  [<c0469ff5>] addrconf_dad_timer+0xdb/0xe0
> [  120.119758]  [<c012f8a0>] run_timer_softirq+0x130/0x191
> [  120.133717]  [<c012c06d>] __do_softirq+0x76/0xe4
> [  120.147475]  [<c0106b48>] do_softirq+0x63/0xac
> [  120.147488]  [<c012bff5>] 


> (gdb) list *neigh_create+0x5f
> 0xc03fb397 is in neigh_create (include/linux/slab.h:259).
> 254     /*
> 255      * Shortcuts
> 256      */
> 257     static inline void *kmem_cache_zalloc(struct kmem_cache *k, gfp_t flags)
> 258     {
> 259             return kmem_cache_alloc(k, flags | __GFP_ZERO);
> 260     }

See, networking's kmem_cache_alloc(..., __GFP_ZERO) ended up calling into
the page allocator with __GFP_ZERO.  This is the bug - slab isn't supposed
to do that: the __GFP_ZERO is supposed to be removed.

Now, it's not a highmem page, so prep_zero_page() won't actually establish
a kmap, but it will check that the kmap slot is presently unused on this
CPU.

But networking calls in here from softirq context (illegal for KM_USER0)
and sometimes that KM_USER0 slot *will* be in use, so kmap_atomic_prot()
will go BUG.

I must say it's really really scary that such a low-level function as
prep_zero_page() is using KM_USER0.  I don't think it has enough debugging
checks in there to prevent Bad Stuff from going undetected.

I guess this was the bug:

--- a/mm/slab.c~a
+++ a/mm/slab.c
@@ -2776,7 +2776,7 @@ static int cache_grow(struct kmem_cache 
 	 * 'nodeid'.
 	 */
 	if (!objp)
-		objp = kmem_getpages(cachep, flags, nodeid);
+		objp = kmem_getpages(cachep, local_flags, nodeid);
 	if (!objp)
 		goto failed;
 
_


I don't see why you later got fs corruption - afacit we won't actually
modify the KM_USER0 slot in this scenario.


> 262     /**
> 263      * kzalloc - allocate memory. The memory is set to zero.
> (gdb) list *kmem_cache_alloc+0xa1
> 0xc0172e7a is in kmem_cache_alloc (mm/slab.c:3176).
> 3171                    STATS_INC_ALLOCHIT(cachep);
> 3172                    ac->touched = 1;
> 3173                    objp = ac->entry[--ac->avail];
> 3174            } else {
> 3175                    STATS_INC_ALLOCMISS(cachep);
> 3176                    objp = cache_alloc_refill(cachep, flags);
> 3177            }
> 3178            return objp;
> 3179    }
> 3180
> (gdb) list *cache_alloc_refill+0x380
> 0xc0172872 is in cache_alloc_refill (include/linux/gfp.h:154).
> 149
> 150             /* Unknown node is current node */
> 151             if (nid < 0)
> 152                     nid = numa_node_id();
> 153
> 154             return __alloc_pages(gfp_mask, order,
> 155                     NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask));
> 156     }
> 157
> 158     #ifdef CONFIG_NUMA
> (gdb) list *__alloc_pages+0x5b
> 0xc015b7fb is in __alloc_pages (mm/page_alloc.c:1248).
> 1243            if (unlikely(*z == NULL)) {
> 1244                    /* Should this ever happen?? */
> 1245                    return NULL;
> 1246            }
> 1247
> 1248            page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order,
> 1249                                    zonelist, ALLOC_WMARK_LOW|ALLOC_CPUSET);
> 1250            if (page)
> 1251                    goto got_pg;
> 1252
> (gdb) list *get_page_from_freelist+0x39e
> 0xc015b6e0 is in get_page_from_freelist (include/linux/highmem.h:122).
> 117             return __alloc_zeroed_user_highpage(__GFP_MOVABLE, vma, vaddr);
> 118     }
> 119
> 120     static inline void clear_highpage(struct page *page)
> 121     {
> 122             void *kaddr = kmap_atomic(page, KM_USER0);
> 123             clear_page(kaddr);
> 124             kunmap_atomic(kaddr, KM_USER0);
> 125     }
> 126
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 16:28         ` Andrew Morton
@ 2007-07-24 18:25           ` Linus Torvalds
  2007-07-24 20:05             ` Alexey Dobriyan
  2007-07-25  5:09             ` Mike Galbraith
  0 siblings, 2 replies; 31+ messages in thread
From: Linus Torvalds @ 2007-07-24 18:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mike Galbraith, Alexey Dobriyan, linux-kernel, netdev,
	Christoph Lameter



On Tue, 24 Jul 2007, Andrew Morton wrote:
> 
> I guess this was the bug:

Looks very likely to me. Mike, Alexey, does this fix things for you?

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 18:25           ` Linus Torvalds
@ 2007-07-24 20:05             ` Alexey Dobriyan
  2007-07-25  5:09             ` Mike Galbraith
  1 sibling, 0 replies; 31+ messages in thread
From: Alexey Dobriyan @ 2007-07-24 20:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Mike Galbraith, linux-kernel, netdev,
	Christoph Lameter

On Tue, Jul 24, 2007 at 11:25:22AM -0700, Linus Torvalds wrote:
> On Tue, 24 Jul 2007, Andrew Morton wrote:
> > I guess this was the bug:
>
> Looks very likely to me. Mike, Alexey, does this fix things for you?

Yeah, box is running for more than hour, survived LTP, gdb testsuite,
portage sync and so on.

What new to me is that someone decided TSC is unstable but that's
probably OK on full debug kernel:

	Clocksource tsc unstable (delta = 91724418 ns)
	Time: pit clocksource has been installed.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: 2.6.23-rc1: BUG_ON in kmap_atomic_prot()
  2007-07-24 18:25           ` Linus Torvalds
  2007-07-24 20:05             ` Alexey Dobriyan
@ 2007-07-25  5:09             ` Mike Galbraith
  1 sibling, 0 replies; 31+ messages in thread
From: Mike Galbraith @ 2007-07-25  5:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Alexey Dobriyan, linux-kernel, netdev,
	Christoph Lameter

On Tue, 2007-07-24 at 11:25 -0700, Linus Torvalds wrote:
> 
> On Tue, 24 Jul 2007, Andrew Morton wrote:
> > 
> > I guess this was the bug:
> 
> Looks very likely to me. Mike, Alexey, does this fix things for you?

I don't have very much runtime on it yet, but yes, it seems to have.

	-Mike


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2007-07-26 18:19 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <alpine.LFD.0.999.0707221351030.3607@woody.linux-foundation.org>
     [not found] ` <20070723183839.GA5874@martell.zuzino.mipt.ru>
     [not found]   ` <20070723190152.GA5755@martell.zuzino.mipt.ru>
2007-07-23 20:24     ` 2.6.23-rc1: BUG_ON in kmap_atomic_prot() Andrew Morton
2007-07-23 20:40       ` Alexey Dobriyan
2007-07-23 21:01         ` Alexey Dobriyan
2007-07-23 21:11           ` Andrew Morton
2007-07-23 21:28             ` Linus Torvalds
2007-07-23 21:37               ` Sam Ravnborg
2007-07-24 17:59               ` Adrian Bunk
2007-07-24 18:14                 ` Linus Torvalds
2007-07-24 18:28                   ` Andrew Morton
2007-07-24 19:15                     ` Linus Torvalds
2007-07-24 19:40                       ` Adrian Bunk
2007-07-24 19:48                         ` Linus Torvalds
2007-07-26 18:07                           ` Adrian Bunk
2007-07-26 18:19                             ` Linus Torvalds
2007-07-24 20:27                       ` Andi Kleen
2007-07-24 19:45                         ` Linus Torvalds
2007-07-26  6:09                   ` commit 7e92b4fc34 - x86, serial: convert legacy COM ports to platform devices - broke my serial console H. Peter Anvin
2007-07-23 22:04             ` 2.6.23-rc1: BUG_ON in kmap_atomic_prot() Alexey Dobriyan
2007-07-23 22:27               ` Andrew Morton
2007-07-24  5:20                 ` Alexey Dobriyan
2007-07-24  8:17                 ` Jens Axboe
2007-07-24  8:22                   ` Jens Axboe
2007-07-24  8:34                     ` Andrew Morton
2007-07-24 14:00                       ` Dan Williams
2007-07-24 13:55                     ` Dan Williams
2007-07-24 10:01       ` Mike Galbraith
2007-07-24 10:37         ` Mike Galbraith
2007-07-24 16:28         ` Andrew Morton
2007-07-24 18:25           ` Linus Torvalds
2007-07-24 20:05             ` Alexey Dobriyan
2007-07-25  5:09             ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).