Badness with the kernel version 2.6.35-rc1-git1 running on P6 box

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
@ 2010-07-16  8:50 divya
  2010-07-16  9:56 ` Eric Dumazet
  2010-07-17  5:52 ` Maciej Rutecki
  0 siblings, 2 replies; 8+ messages in thread
From: divya @ 2010-07-16  8:50 UTC (permalink / raw)
  To: LKML, linuxppc-dev; +Cc: sachinp

Hi ,

With the latest kernel version 2.6.35-rc5-git1(2f7989efd4398) running on power(p6) box came across the following
call trace

Call Trace:
[c000000006a0e800] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
[c000000006a0e8b0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
[c000000006a0ea30] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
[c000000006a0ead0] [c00000000015b1a0] .new_slab+0xe0/0x314
[c000000006a0eb70] [c00000000015b6fc] .__slab_alloc+0x328/0x644
[c000000006a0ec50] [c00000000015cc34] .__kmalloc_node_track_caller+0x114/0x194
[c000000006a0ed00] [c000000000599f6c] .__alloc_skb+0x94/0x180
[c000000006a0edb0] [c00000000059af5c] .__netdev_alloc_skb+0x3c/0x74
[c000000006a0ee30] [c0000000004f9480] .ehea_refill_rq_def+0xf8/0x2d0
[c000000006a0ef30] [c0000000004fab8c] .ehea_up+0x5b8/0x69c
[c000000006a0f040] [c0000000004facd4] .ehea_open+0x64/0x118
[c000000006a0f0e0] [c0000000005a6e9c] .__dev_open+0x100/0x168
[c000000006a0f170] [c0000000005a3ac0] .__dev_change_flags+0x10c/0x1ac
[c000000006a0f210] [c0000000005a6d44] .dev_change_flags+0x24/0x7c
[c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
[c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
[c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
[c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
[c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
[c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
[c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
[c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
[c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
[c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
[c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
active_anon:50 inactive_anon:260 isolated_anon:0
  active_file:159 inactive_file:139 isolated_file:0
  unevictable:0 dirty:2 writeback:1 unstable:0
  free:16 slab_reclaimable:66 slab_unreclaimable:502
  mapped:120 shmem:2 pagetables:37 bounce:0
Node 0 DMA free:1024kB min:1408kB low:1728kB high:2112kB active_anon:3200kB inactive_anon:16640kB active_file:10176kB inactive_file:8896kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:130944kB mlocked:0kB dirty:128kB writeback:64kB mapped:7680kB shmem:128kB slab_reclaimable:4224kB slab_unreclaimable:32128kB kernel_stack:2528kB pagetables:2368kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
Node 0 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
496 total pagecache pages
178 pages in swap cache
Swap cache stats: add 780, delete 602, find 467/551
Free swap  = 1027904kB
Total swap = 1044160kB
2048 pages RAM
683 pages reserved
582 pages shared
1075 pages non-shared
SLUB: Unable to allocate memory on node -1 (gfp=0x20)
   cache: kmalloc-16384, object size: 16384, buffer size: 16384, default order: 2, min order: 0
   node 0: slabs: 28, objs: 292, free: 0
ip: page allocation failure. order:0, mode:0x8020
Call Trace:
[c000000006a0eb40] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
[c000000006a0ebf0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
[c000000006a0ed70] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
[c000000006a0ee10] [c00000000011fca4] .__get_free_pages+0x18/0x90
[c000000006a0ee90] [c0000000004f7058] .ehea_get_stats+0x4c/0x1bc
[c000000006a0ef30] [c0000000005a0a04] .dev_get_stats+0x38/0x64
[c000000006a0efc0] [c0000000005b456c] .rtnl_fill_ifinfo+0x35c/0x85c
[c000000006a0f150] [c0000000005b5920] .rtmsg_ifinfo+0x164/0x204
[c000000006a0f210] [c0000000005a6d6c] .dev_change_flags+0x4c/0x7c
[c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
[c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
[c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
[c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
[c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
[c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
[c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
[c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
[c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
[c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
[c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0

The mainline 2.6.35-rc5 worked fine.

Thanks
Divya

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
  2010-07-16  8:50 Badness with the kernel version 2.6.35-rc1-git1 running on P6 box divya
@ 2010-07-16  9:56 ` Eric Dumazet
  2010-07-16 12:20   ` Eric Dumazet
                     ` (2 more replies)
  2010-07-17  5:52 ` Maciej Rutecki
  1 sibling, 3 replies; 8+ messages in thread
From: Eric Dumazet @ 2010-07-16  9:56 UTC (permalink / raw)
  To: divya; +Cc: sachinp, netdev, LKML, linuxppc-dev, Jan-Bernd Themann,
	David Miller

Le vendredi 16 juillet 2010 à 14:20 +0530, divya a écrit :
> Hi ,
> 
> With the latest kernel version 2.6.35-rc5-git1(2f7989efd4398) running on power(p6) box came across the following
> call trace
> 
> Call Trace:
> [c000000006a0e800] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
> [c000000006a0e8b0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
> [c000000006a0ea30] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
> [c000000006a0ead0] [c00000000015b1a0] .new_slab+0xe0/0x314
> [c000000006a0eb70] [c00000000015b6fc] .__slab_alloc+0x328/0x644
> [c000000006a0ec50] [c00000000015cc34] .__kmalloc_node_track_caller+0x114/0x194
> [c000000006a0ed00] [c000000000599f6c] .__alloc_skb+0x94/0x180
> [c000000006a0edb0] [c00000000059af5c] .__netdev_alloc_skb+0x3c/0x74
> [c000000006a0ee30] [c0000000004f9480] .ehea_refill_rq_def+0xf8/0x2d0
> [c000000006a0ef30] [c0000000004fab8c] .ehea_up+0x5b8/0x69c
> [c000000006a0f040] [c0000000004facd4] .ehea_open+0x64/0x118
> [c000000006a0f0e0] [c0000000005a6e9c] .__dev_open+0x100/0x168
> [c000000006a0f170] [c0000000005a3ac0] .__dev_change_flags+0x10c/0x1ac
> [c000000006a0f210] [c0000000005a6d44] .dev_change_flags+0x24/0x7c
> [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
> [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
> [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
> [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
> [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
> [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
> [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
> [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
> [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
> [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
> [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> CPU    1: hi:    0, btch:   1 usd:   0
> CPU    2: hi:    0, btch:   1 usd:   0
> CPU    3: hi:    0, btch:   1 usd:   0
> active_anon:50 inactive_anon:260 isolated_anon:0
>   active_file:159 inactive_file:139 isolated_file:0
>   unevictable:0 dirty:2 writeback:1 unstable:0
>   free:16 slab_reclaimable:66 slab_unreclaimable:502
>   mapped:120 shmem:2 pagetables:37 bounce:0
> Node 0 DMA free:1024kB min:1408kB low:1728kB high:2112kB active_anon:3200kB inactive_anon:16640kB active_file:10176kB inactive_file:8896kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:130944kB mlocked:0kB dirty:128kB writeback:64kB mapped:7680kB shmem:128kB slab_reclaimable:4224kB slab_unreclaimable:32128kB kernel_stack:2528kB pagetables:2368kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> Node 0 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
> 496 total pagecache pages
> 178 pages in swap cache
> Swap cache stats: add 780, delete 602, find 467/551
> Free swap  = 1027904kB
> Total swap = 1044160kB
> 2048 pages RAM
> 683 pages reserved
> 582 pages shared
> 1075 pages non-shared
> SLUB: Unable to allocate memory on node -1 (gfp=0x20)
>    cache: kmalloc-16384, object size: 16384, buffer size: 16384, default order: 2, min order: 0
>    node 0: slabs: 28, objs: 292, free: 0
> ip: page allocation failure. order:0, mode:0x8020
> Call Trace:
> [c000000006a0eb40] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
> [c000000006a0ebf0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
> [c000000006a0ed70] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
> [c000000006a0ee10] [c00000000011fca4] .__get_free_pages+0x18/0x90
> [c000000006a0ee90] [c0000000004f7058] .ehea_get_stats+0x4c/0x1bc
> [c000000006a0ef30] [c0000000005a0a04] .dev_get_stats+0x38/0x64
> [c000000006a0efc0] [c0000000005b456c] .rtnl_fill_ifinfo+0x35c/0x85c
> [c000000006a0f150] [c0000000005b5920] .rtmsg_ifinfo+0x164/0x204
> [c000000006a0f210] [c0000000005a6d6c] .dev_change_flags+0x4c/0x7c
> [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
> [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
> [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
> [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
> [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
> [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
> [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
> [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
> [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
> [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
> [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> Mem-Info:
> Node 0 DMA per-cpu:
> CPU    0: hi:    0, btch:   1 usd:   0
> CPU    1: hi:    0, btch:   1 usd:   0
> CPU    2: hi:    0, btch:   1 usd:   0
> CPU    3: hi:    0, btch:   1 usd:   0
> 
> The mainline 2.6.35-rc5 worked fine.

Maybe you were lucky with 2.6.35-rc5

Anyway ehea should not use GFP_ATOMIC in its ehea_get_stats() method,
called in process context, but GFP_KERNEL.

Another patch is needed for ehea_refill_rq_def() as well.



[PATCH] ehea: ehea_get_stats() should use GFP_KERNEL

ehea_get_stats() is called in process context and should use GFP_KERNEL
allocation instead of GFP_ATOMIC.

Clearing stats at beginning of ehea_get_stats() is racy in case of
concurrent stat readers.

get_stats() can also use netdev net_device_stats, instead of a private
copy.

Reported-by: divya <dipraksh@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/net/ehea/ehea.h      |    1 -
 drivers/net/ehea/ehea_main.c |    6 ++----
 2 files changed, 2 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
  2010-07-16  9:56 ` Eric Dumazet
@ 2010-07-16 12:20   ` Eric Dumazet
  2010-07-18 21:51     ` David Miller
  2010-07-16 17:35   ` Dave Hansen
  2010-07-20  9:05   ` divya
  2 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2010-07-16 12:20 UTC (permalink / raw)
  To: divya; +Cc: sachinp, netdev, LKML, linuxppc-dev, Jan-Bernd Themann,
	David Miller

Le vendredi 16 juillet 2010 à 11:56 +0200, Eric Dumazet a écrit :

> [PATCH] ehea: ehea_get_stats() should use GFP_KERNEL
> 
> ehea_get_stats() is called in process context and should use GFP_KERNEL
> allocation instead of GFP_ATOMIC.
> 
> Clearing stats at beginning of ehea_get_stats() is racy in case of
> concurrent stat readers.
> 
> get_stats() can also use netdev net_device_stats, instead of a private
> copy.
> 
> Reported-by: divya <dipraksh@linux.vnet.ibm.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  drivers/net/ehea/ehea.h      |    1 -
>  drivers/net/ehea/ehea_main.c |    6 ++----
>  2 files changed, 2 insertions(+), 5 deletions(-)
> 
> 

Hmm, net-next-2.6 contains following patch :

commit 3d8009c780ee90fccb5c171caf30aff839f13547
Author: Brian King <brking@linux.vnet.ibm.com>
Date:   Wed Jun 30 11:59:12 2010 +0000

    ehea: Allocate stats buffer with GFP_KERNEL
    
    Since ehea_get_stats calls ehea_h_query_ehea_port, which
    can sleep, we can also sleep when allocating a page in
    this function. This fixes some memory allocation failure
    warnings seen under low memory conditions.
    
    Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 8b92acb..3beba70 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -335,7 +335,7 @@ static struct net_device_stats
*ehea_get_stats(struct net_device *dev)
 
        memset(stats, 0, sizeof(*stats));
 
-       cb2 = (void *)get_zeroed_page(GFP_ATOMIC);
+       cb2 = (void *)get_zeroed_page(GFP_KERNEL);
        if (!cb2) {
                ehea_error("no mem for cb2");
                goto out;

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
  2010-07-16  9:56 ` Eric Dumazet
  2010-07-16 12:20   ` Eric Dumazet
@ 2010-07-16 17:35   ` Dave Hansen
  2010-07-16 19:19     ` David Rientjes
  2010-07-20  9:05   ` divya
  2 siblings, 1 reply; 8+ messages in thread
From: Dave Hansen @ 2010-07-16 17:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: sachinp, netdev, LKML, linuxppc-dev, Jan-Bernd Themann,
	David Miller, divya

On Fri, 2010-07-16 at 11:56 +0200, Eric Dumazet wrote:
> 
> > SLUB: Unable to allocate memory on node -1 (gfp=0x20)
> >    cache: kmalloc-16384, object size: 16384, buffer size: 16384,
> default order: 2, min order: 0
> >    node 0: slabs: 28, objs: 292, free: 0
> > ip: page allocation failure. order:0, mode:0x8020
> > Call Trace:
> > [c000000006a0eb40] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
> > [c000000006a0ebf0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
> > [c000000006a0ed70] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
> > [c000000006a0ee10] [c00000000011fca4] .__get_free_pages+0x18/0x90
> > [c000000006a0ee90] [c0000000004f7058] .ehea_get_stats+0x4c/0x1bc
> > [c000000006a0ef30] [c0000000005a0a04] .dev_get_stats+0x38/0x64
> > [c000000006a0efc0] [c0000000005b456c] .rtnl_fill_ifinfo+0x35c/0x85c
> > [c000000006a0f150] [c0000000005b5920] .rtmsg_ifinfo+0x164/0x204
> > [c000000006a0f210] [c0000000005a6d6c] .dev_change_flags+0x4c/0x7c
> > [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
> > [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
> > [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
> > [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
> > [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
> > [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
> > [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
> > [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
> > [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
> > [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
> > [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> > Mem-Info:
> > Node 0 DMA per-cpu:
> > CPU    0: hi:    0, btch:   1 usd:   0
> > CPU    1: hi:    0, btch:   1 usd:   0
> > CPU    2: hi:    0, btch:   1 usd:   0
> > CPU    3: hi:    0, btch:   1 usd:   0
> > 
> > The mainline 2.6.35-rc5 worked fine.
> 
> Maybe you were lucky with 2.6.35-rc5
> 
> Anyway ehea should not use GFP_ATOMIC in its ehea_get_stats() method,
> called in process context, but GFP_KERNEL.
> 
> Another patch is needed for ehea_refill_rq_def() as well.

You're right that this is abusing GFP_ATOMIC.

But is, this is just a normal "GFP_ATOMIC" allocation failure?  "SLUB:
Unable to allocate memory on node -1" seems like a somewhat
inappropriate error message for that.  

It isn't immediately obvious where the -1 is coming from.  Does it truly
mean "allocate from any node" here, or is that a buglet in and of
itself?

-- Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
  2010-07-16 17:35   ` Dave Hansen
@ 2010-07-16 19:19     ` David Rientjes
  0 siblings, 0 replies; 8+ messages in thread
From: David Rientjes @ 2010-07-16 19:19 UTC (permalink / raw)
  To: Dave Hansen
  Cc: sachinp, Eric Dumazet, LKML, linuxppc-dev, Jan-Bernd Themann,
	netdev, David Miller, divya

On Fri, 16 Jul 2010, Dave Hansen wrote:

> > > SLUB: Unable to allocate memory on node -1 (gfp=0x20)
> > >    cache: kmalloc-16384, object size: 16384, buffer size: 16384,
> > default order: 2, min order: 0
> > >    node 0: slabs: 28, objs: 292, free: 0
> > > ip: page allocation failure. order:0, mode:0x8020
> > > Call Trace:
> > > [c000000006a0eb40] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
> > > [c000000006a0ebf0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
> > > [c000000006a0ed70] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
> > > [c000000006a0ee10] [c00000000011fca4] .__get_free_pages+0x18/0x90
> > > [c000000006a0ee90] [c0000000004f7058] .ehea_get_stats+0x4c/0x1bc
> > > [c000000006a0ef30] [c0000000005a0a04] .dev_get_stats+0x38/0x64
> > > [c000000006a0efc0] [c0000000005b456c] .rtnl_fill_ifinfo+0x35c/0x85c
> > > [c000000006a0f150] [c0000000005b5920] .rtmsg_ifinfo+0x164/0x204
> > > [c000000006a0f210] [c0000000005a6d6c] .dev_change_flags+0x4c/0x7c
> > > [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
> > > [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
> > > [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
> > > [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
> > > [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
> > > [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
> > > [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
> > > [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
> > > [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
> > > [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
> > > [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> > > Mem-Info:
> > > Node 0 DMA per-cpu:
> > > CPU    0: hi:    0, btch:   1 usd:   0
> > > CPU    1: hi:    0, btch:   1 usd:   0
> > > CPU    2: hi:    0, btch:   1 usd:   0
> > > CPU    3: hi:    0, btch:   1 usd:   0
> > > 
> > > The mainline 2.6.35-rc5 worked fine.
> > 
> > Maybe you were lucky with 2.6.35-rc5
> > 
> > Anyway ehea should not use GFP_ATOMIC in its ehea_get_stats() method,
> > called in process context, but GFP_KERNEL.
> > 
> > Another patch is needed for ehea_refill_rq_def() as well.
> 
> You're right that this is abusing GFP_ATOMIC.
> 
> But is, this is just a normal "GFP_ATOMIC" allocation failure?  "SLUB:
> Unable to allocate memory on node -1" seems like a somewhat
> inappropriate error message for that.  
> 

The slub message is seperate and doesn't generate a call trace, even 
though it is a (minimum) order-0 GFP_ATOMIC allocation as well.  The page 
allocation failure is seperate instance that is calling the page 
allocator, not the slab allocator.

> It isn't immediately obvious where the -1 is coming from.  Does it truly
> mean "allocate from any node" here, or is that a buglet in and of
> itself?
> 

Yes, slub uses -1 to indicate that the allocation need not come from a 
specific node.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
  2010-07-16  8:50 Badness with the kernel version 2.6.35-rc1-git1 running on P6 box divya
  2010-07-16  9:56 ` Eric Dumazet
@ 2010-07-17  5:52 ` Maciej Rutecki
  1 sibling, 0 replies; 8+ messages in thread
From: Maciej Rutecki @ 2010-07-17  5:52 UTC (permalink / raw)
  To: divya; +Cc: sachinp, linuxppc-dev, LKML

On pi=C4=85tek, 16 lipca 2010 o 10:50:30 divya wrote:
> Hi ,
>=20
> With the latest kernel version 2.6.35-rc5-git1(2f7989efd4398) running on
> power(p6) box came across the following call trace
>=20
I created a Bugzilla entry at=20
https://bugzilla.kernel.org/show_bug.cgi?id=3D16406
for your bug report, please add your address to the CC list in there, thank=
s!

=2D-=20
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
  2010-07-16 12:20   ` Eric Dumazet
@ 2010-07-18 21:51     ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2010-07-18 21:51 UTC (permalink / raw)
  To: eric.dumazet
  Cc: sachinp, netdev, linux-kernel, linuxppc-dev, ossthema, dipraksh

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 16 Jul 2010 14:20:42 +0200

> Le vendredi 16 juillet 2010 =E0 11:56 +0200, Eric Dumazet a =E9crit :=

> =

>> [PATCH] ehea: ehea_get_stats() should use GFP_KERNEL
>> =

>> ehea_get_stats() is called in process context and should use GFP_KER=
NEL
>> allocation instead of GFP_ATOMIC.
>> =

>> Clearing stats at beginning of ehea_get_stats() is racy in case of
>> concurrent stat readers.
>> =

>> get_stats() can also use netdev net_device_stats, instead of a priva=
te
>> copy.
>> =

>> Reported-by: divya <dipraksh@linux.vnet.ibm.com>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>> ---
>>  drivers/net/ehea/ehea.h      |    1 -
>>  drivers/net/ehea/ehea_main.c |    6 ++----
>>  2 files changed, 2 insertions(+), 5 deletions(-)
>> =

>> =

> =

> Hmm, net-next-2.6 contains following patch :

If people think ehea usage is ubiquitous enough to deserve a backport
of this to net-2.6, fine.  But personally I don't think it's worth it.

Can someone close the kernel bugzilla 16406 created for this bug?  This=

patch we have already obviously would fix this issue.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Badness with the kernel version 2.6.35-rc1-git1 running on P6 box
  2010-07-16  9:56 ` Eric Dumazet
  2010-07-16 12:20   ` Eric Dumazet
  2010-07-16 17:35   ` Dave Hansen
@ 2010-07-20  9:05   ` divya
  2 siblings, 0 replies; 8+ messages in thread
From: divya @ 2010-07-20  9:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: sachinp, netdev, LKML, linuxppc-dev, Jan-Bernd Themann,
	David Miller

On Friday 16 July 2010 03:26 PM, Eric Dumazet wrote:
> Le vendredi 16 juillet 2010 à 14:20 +0530, divya a écrit :
>    
>> Hi ,
>>
>> With the latest kernel version 2.6.35-rc5-git1(2f7989efd4398) running on power(p6) box came across the following
>> call trace
>>
>> Call Trace:
>> [c000000006a0e800] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
>> [c000000006a0e8b0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
>> [c000000006a0ea30] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
>> [c000000006a0ead0] [c00000000015b1a0] .new_slab+0xe0/0x314
>> [c000000006a0eb70] [c00000000015b6fc] .__slab_alloc+0x328/0x644
>> [c000000006a0ec50] [c00000000015cc34] .__kmalloc_node_track_caller+0x114/0x194
>> [c000000006a0ed00] [c000000000599f6c] .__alloc_skb+0x94/0x180
>> [c000000006a0edb0] [c00000000059af5c] .__netdev_alloc_skb+0x3c/0x74
>> [c000000006a0ee30] [c0000000004f9480] .ehea_refill_rq_def+0xf8/0x2d0
>> [c000000006a0ef30] [c0000000004fab8c] .ehea_up+0x5b8/0x69c
>> [c000000006a0f040] [c0000000004facd4] .ehea_open+0x64/0x118
>> [c000000006a0f0e0] [c0000000005a6e9c] .__dev_open+0x100/0x168
>> [c000000006a0f170] [c0000000005a3ac0] .__dev_change_flags+0x10c/0x1ac
>> [c000000006a0f210] [c0000000005a6d44] .dev_change_flags+0x24/0x7c
>> [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
>> [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
>> [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
>> [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
>> [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
>> [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
>> [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
>> [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
>> [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
>> [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
>> [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
>> Mem-Info:
>> Node 0 DMA per-cpu:
>> CPU    0: hi:    0, btch:   1 usd:   0
>> CPU    1: hi:    0, btch:   1 usd:   0
>> CPU    2: hi:    0, btch:   1 usd:   0
>> CPU    3: hi:    0, btch:   1 usd:   0
>> active_anon:50 inactive_anon:260 isolated_anon:0
>>    active_file:159 inactive_file:139 isolated_file:0
>>    unevictable:0 dirty:2 writeback:1 unstable:0
>>    free:16 slab_reclaimable:66 slab_unreclaimable:502
>>    mapped:120 shmem:2 pagetables:37 bounce:0
>> Node 0 DMA free:1024kB min:1408kB low:1728kB high:2112kB active_anon:3200kB inactive_anon:16640kB active_file:10176kB inactive_file:8896kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:130944kB mlocked:0kB dirty:128kB writeback:64kB mapped:7680kB shmem:128kB slab_reclaimable:4224kB slab_unreclaimable:32128kB kernel_stack:2528kB pagetables:2368kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>> lowmem_reserve[]: 0 0 0
>> Node 0 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
>> 496 total pagecache pages
>> 178 pages in swap cache
>> Swap cache stats: add 780, delete 602, find 467/551
>> Free swap  = 1027904kB
>> Total swap = 1044160kB
>> 2048 pages RAM
>> 683 pages reserved
>> 582 pages shared
>> 1075 pages non-shared
>> SLUB: Unable to allocate memory on node -1 (gfp=0x20)
>>     cache: kmalloc-16384, object size: 16384, buffer size: 16384, default order: 2, min order: 0
>>     node 0: slabs: 28, objs: 292, free: 0
>> ip: page allocation failure. order:0, mode:0x8020
>> Call Trace:
>> [c000000006a0eb40] [c000000000011c30] .show_stack+0x6c/0x16c (unreliable)
>> [c000000006a0ebf0] [c00000000012129c] .__alloc_pages_nodemask+0x6a0/0x75c
>> [c000000006a0ed70] [c0000000001527cc] .alloc_pages_current+0xc4/0x104
>> [c000000006a0ee10] [c00000000011fca4] .__get_free_pages+0x18/0x90
>> [c000000006a0ee90] [c0000000004f7058] .ehea_get_stats+0x4c/0x1bc
>> [c000000006a0ef30] [c0000000005a0a04] .dev_get_stats+0x38/0x64
>> [c000000006a0efc0] [c0000000005b456c] .rtnl_fill_ifinfo+0x35c/0x85c
>> [c000000006a0f150] [c0000000005b5920] .rtmsg_ifinfo+0x164/0x204
>> [c000000006a0f210] [c0000000005a6d6c] .dev_change_flags+0x4c/0x7c
>> [c000000006a0f2a0] [c0000000005b50b4] .do_setlink+0x31c/0x750
>> [c000000006a0f3b0] [c0000000005b6724] .rtnl_newlink+0x388/0x618
>> [c000000006a0f5f0] [c0000000005b6350] .rtnetlink_rcv_msg+0x268/0x2b4
>> [c000000006a0f6a0] [c0000000005cfdc0] .netlink_rcv_skb+0x74/0x108
>> [c000000006a0f730] [c0000000005b60c4] .rtnetlink_rcv+0x38/0x5c
>> [c000000006a0f7c0] [c0000000005cf8c8] .netlink_unicast+0x318/0x3f4
>> [c000000006a0f890] [c0000000005d05b4] .netlink_sendmsg+0x2d0/0x310
>> [c000000006a0f970] [c00000000058e1e8] .sock_sendmsg+0xd4/0x110
>> [c000000006a0fb50] [c00000000058e514] .SyS_sendmsg+0x1f4/0x288
>> [c000000006a0fd70] [c00000000058c2b8] .SyS_socketcall+0x214/0x280
>> [c000000006a0fe30] [c0000000000085b4] syscall_exit+0x0/0x40
>> Mem-Info:
>> Node 0 DMA per-cpu:
>> CPU    0: hi:    0, btch:   1 usd:   0
>> CPU    1: hi:    0, btch:   1 usd:   0
>> CPU    2: hi:    0, btch:   1 usd:   0
>> CPU    3: hi:    0, btch:   1 usd:   0
>>
>> The mainline 2.6.35-rc5 worked fine.
>>      
> Maybe you were lucky with 2.6.35-rc5
>
> Anyway ehea should not use GFP_ATOMIC in its ehea_get_stats() method,
> called in process context, but GFP_KERNEL.
>
> Another patch is needed for ehea_refill_rq_def() as well.
>
>
>
> [PATCH] ehea: ehea_get_stats() should use GFP_KERNEL
>
> ehea_get_stats() is called in process context and should use GFP_KERNEL
> allocation instead of GFP_ATOMIC.
>
> Clearing stats at beginning of ehea_get_stats() is racy in case of
> concurrent stat readers.
>
> get_stats() can also use netdev net_device_stats, instead of a private
> copy.
>
> Reported-by: divya<dipraksh@linux.vnet.ibm.com>
> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
> ---
>   drivers/net/ehea/ehea.h      |    1 -
>   drivers/net/ehea/ehea_main.c |    6 ++----
>   2 files changed, 2 insertions(+), 5 deletions(-)
>    
Hi,

The call trace mentioned above still appears on upstream kernel and linux-next tree too.
The mentioned patch hasn't still been merged into upstream yet - hence getting call traces for both ehea_get_stats()
and ehea_refill_rq_def() methods.
However w.r.t to linux-next getting call trace only for ehea_refill_rq_def() method.

Thanks
Divya

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-07-20  9:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-16  8:50 Badness with the kernel version 2.6.35-rc1-git1 running on P6 box divya
2010-07-16  9:56 ` Eric Dumazet
2010-07-16 12:20   ` Eric Dumazet
2010-07-18 21:51     ` David Miller
2010-07-16 17:35   ` Dave Hansen
2010-07-16 19:19     ` David Rientjes
2010-07-20  9:05   ` divya
2010-07-17  5:52 ` Maciej Rutecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).