* Re: kernel 2.6.37 : oops in cleanup_once [not found] <4D491B8D.1000107@univ-nantes.fr> @ 2011-02-02 10:52 ` Eric Dumazet 2011-02-02 11:24 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2011-02-02 10:52 UTC (permalink / raw) To: Yann Dupont; +Cc: linux-kernel, netdev Le mercredi 02 février 2011 à 09:53 +0100, Yann Dupont a écrit : > Hello. > We recently upgraded one machine with vanilla 2.6.37, and experienced 2 > kernel oops since. Each oops is after ~1 week of uptime. > The last oops was last night but we didn't had any trace. > > Here is the previous oops : > > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316042] > BUG: unable to handle kernel NULL pointer dereference at 000000000000000d > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316096] > IP: [<ffffffff8130e6bf>] cleanup_once+0x3f/0xa0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316135] PGD 0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316157] > Oops: 0002 [#1] SMP > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316188] > last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316234] CPU 1 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316240] > Modules linked in: xt_physdev ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 > ipt_LOG xt_multiport xt_limit nf_conntrack_tftp nf_conntrack_ftp tun > ip6table_filter ip6_tables ipt_MASQUERADE iptable_nat nf_nat > nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT > xt_tcpudp iptable_filter ip_tables x_tables kvm_intel kvm ipv6 8021q > bridge stp ext2 mbcache fuse snd_pcm snd_timer snd soundcore > snd_page_alloc i5000_edac edac_core psmouse evdev i5k_amb tpm_tis tpm > joydev dcdbas tpm_bios pcspkr rng_core ghes shpchp serio_raw pci_hotplug > processor hed button thermal_sys xfs exportfs dm_mod sg sr_mod sd_mod > cdrom usbhid hid usb_storage qla2xxx scsi_transport_fc scsi_tgt uhci_hcd > mptsas mptscsih ehci_hcd mptbase bnx2 scsi_transport_sas scsi_mod [last > unloaded: scsi_wait_scan] > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316694] > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316715] > Pid: 0, comm: kworker/0:0 Not tainted 2.6.37-dsiun-110105 #17 > 0MY736/PowerEdge M600 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316761] > RIP: 0010:[<ffffffff8130e6bf>] [<ffffffff8130e6bf>] cleanup_once+0x3f/0xa0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316808] > RSP: 0018:ffff8800cfc43e20 EFLAGS: 00010202 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316834] > RAX: ffff8803d3158018 RBX: ffff8803d3158000 RCX: 0000000000000005 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.316878] > RDX: 0b000209f1beadde RSI: 00000000000000ac RDI: ffffffff8152a970 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318512] > RBP: 00000000000248f6 R08: 00000000003d0900 R09: 0000000000000000 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318560] > R10: dead000000200200 R11: 0000000000000000 R12: ffff8800cfc43ea0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318604] > R13: 0000000000000100 R14: ffff88040fc99fd8 R15: 0000000000000000 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318652] > FS: 0000000000000000(0000) GS:ffff8800cfc40000(0000) knlGS:0000000000000000 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318698] > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318725] > CR2: 000000000000000d CR3: 00000000014f1000 CR4: 00000000000026e0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318768] > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318812] > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318855] > Process kworker/0:0 (pid: 0, threadinfo ffff88040fc98000, task > ffff88040fc6c2e0) > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318901] > Stack: > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318921] > 0000000000000082 00000001029221c1 00000000000248f6 ffffffff8130e988 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.318971] > ffff88040fc90000 ffff88040fc90000 ffffffff8152a9a0 ffffffff8105e95f > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319021] > ffff8800cfc43e58 ffff88040fc91020 ffffffff8130e950 ffff88040fc99fd8 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319072] > Call Trace: > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319093] <IRQ> > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319116] > [<ffffffff8130e988>] ? peer_check_expire+0x38/0x110 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319146] > [<ffffffff8105e95f>] ? run_timer_softirq+0x16f/0x350 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319175] > [<ffffffff8130e950>] ? peer_check_expire+0x0/0x110 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319204] > [<ffffffff81079c6b>] ? ktime_get+0x5b/0xe0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319232] > [<ffffffff8105685a>] ? __do_softirq+0xaa/0x1e0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319260] > [<ffffffff81003ddc>] ? call_softirq+0x1c/0x30 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319288] > [<ffffffff81005f75>] ? do_softirq+0x65/0xa0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319315] > [<ffffffff81056745>] ? irq_exit+0x85/0x90 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319343] > [<ffffffff8102137a>] ? smp_apic_timer_interrupt+0x6a/0xa0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319373] > [<ffffffff81003893>] ? apic_timer_interrupt+0x13/0x20 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319401] <EOI> > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319427] > [<ffffffffa032218c>] ? acpi_idle_enter_bm+0x243/0x27b [processor] > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319473] > [<ffffffffa0322185>] ? acpi_idle_enter_bm+0x23c/0x27b [processor] > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319519] > [<ffffffff812c0deb>] ? cpuidle_idle_call+0x8b/0x140 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319547] > [<ffffffff8100208a>] ? cpu_idle+0x6a/0xf0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319573] > Code: 00 48 8b 05 c4 c2 21 00 48 3d 60 a9 52 81 74 5c 48 8d 58 e8 48 8b > 15 11 02 24 00 2b 53 28 48 39 ea 72 49 48 8b 4b 18 48 8b 53 20 <48> 89 > 51 08 48 89 0a 48 89 43 18 48 89 43 20 f0 ff 40 14 48 c7 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319768] > RIP [<ffffffff8130e6bf>] cleanup_once+0x3f/0xa0 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319797] > RSP <ffff8800cfc43e20> > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.319820] > CR2: 000000000000000d > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.320187] > ---[ end trace eaf3ed2d46c78768 ]--- > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.320257] > Kernel panic - not syncing: Fatal exception in interrupt > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.320329] > Pid: 0, comm: kworker/0:0 Tainted: G D 2.6.37-dsiun-110105 #17 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.320418] > Call Trace: > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.320481] > <IRQ> [<ffffffff8137c75e>] ? panic+0x92/0x1a2 > Jan 21 13:15:41 linkwood.u11.univ-nantes.prive kernel: [172825.320601] > [<ffffffff81007357>] ? oops_end+0xe7/0xf0 > > > Any ideas ?? Hi Yann Yes this is a known problem. Please try commit 3408404a4c2a4eead9d73b0bbbfe3f225b65f492 (inetpeer: Use correct AVL tree base pointer in inet_getpeer()) http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3408404a4c2a4eead9d73b0bbbfe3f225b65f492 I believe David will send it to stable team shortly, if not already done :) Thanks ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel 2.6.37 : oops in cleanup_once 2011-02-02 10:52 ` kernel 2.6.37 : oops in cleanup_once Eric Dumazet @ 2011-02-02 11:24 ` Eric Dumazet 2011-02-02 13:08 ` Yann Dupont 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2011-02-02 11:24 UTC (permalink / raw) To: Yann Dupont; +Cc: linux-kernel, netdev Le mercredi 02 février 2011 à 11:52 +0100, Eric Dumazet a écrit : > Le mercredi 02 février 2011 à 09:53 +0100, Yann Dupont a écrit : > > Hello. > > We recently upgraded one machine with vanilla 2.6.37, and experienced 2 > > kernel oops since. Each oops is after ~1 week of uptime. > > The last oops was last night but we didn't had any trace. oops, 2.6.37 "only" > Yes this is a known problem. > > Please try commit 3408404a4c2a4eead9d73b0bbbfe3f225b65f492 > (inetpeer: Use correct AVL tree base pointer in inet_getpeer()) > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3408404a4c2a4eead9d73b0bbbfe3f225b65f492 > > I believe David will send it to stable team shortly, if not already > done :) Please ignore, this patch was for linux-2.6 tree, 2.6.37 was not affected by the problem. So its another problem... Is there anything particular you do on this machine ? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel 2.6.37 : oops in cleanup_once 2011-02-02 11:24 ` Eric Dumazet @ 2011-02-02 13:08 ` Yann Dupont 2011-02-02 14:53 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Yann Dupont @ 2011-02-02 13:08 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel, netdev Le 02/02/2011 12:24, Eric Dumazet a écrit : > Le mercredi 02 février 2011 à 11:52 +0100, Eric Dumazet a écrit : >> Le mercredi 02 février 2011 à 09:53 +0100, Yann Dupont a écrit : >>> Hello. >>> We recently upgraded one machine with vanilla 2.6.37, and experienced 2 >>> kernel oops since. Each oops is after ~1 week of uptime. >>> The last oops was last night but we didn't had any trace. > oops, 2.6.37 "only" > >> Yes this is a known problem. >> >> Please try commit 3408404a4c2a4eead9d73b0bbbfe3f225b65f492 >> (inetpeer: Use correct AVL tree base pointer in inet_getpeer()) >> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3408404a4c2a4eead9d73b0bbbfe3f225b65f492 >> >> I believe David will send it to stable team shortly, if not already >> done :) > Please ignore, this patch was for linux-2.6 tree, 2.6.37 was not > affected by the problem. > > So its another problem... Is there anything particular you do on this > machine ? > > > > Nothing really special there, we run a lot (20) of KVM guest (mainly linux firewalls for lots of differents vlan), so we have a lot of bridges vlan & tun/tap. Oh, and CONFIG_BRIDGE_IGMP_SNOOPING is set to n (because of the other bug already sent to netdev - more to come on next mail) Hard to say if this BUG is new in 2.6.37. This host was running fine with 2.6.34.2 since August 2010. Bisecting will be hard due to the time to trigger the bug (and the fact that this machine is a production machine) Anyway, I can test with a specific kernel version if you suspect something. Regards, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel 2.6.37 : oops in cleanup_once 2011-02-02 13:08 ` Yann Dupont @ 2011-02-02 14:53 ` Eric Dumazet 2011-02-02 15:04 ` Yann Dupont 0 siblings, 1 reply; 9+ messages in thread From: Eric Dumazet @ 2011-02-02 14:53 UTC (permalink / raw) To: Yann Dupont; +Cc: linux-kernel, netdev Le mercredi 02 février 2011 à 14:08 +0100, Yann Dupont a écrit : > Le 02/02/2011 12:24, Eric Dumazet a écrit : > > Le mercredi 02 février 2011 à 11:52 +0100, Eric Dumazet a écrit : > >> Le mercredi 02 février 2011 à 09:53 +0100, Yann Dupont a écrit : > >>> Hello. > >>> We recently upgraded one machine with vanilla 2.6.37, and experienced 2 > >>> kernel oops since. Each oops is after ~1 week of uptime. > >>> The last oops was last night but we didn't had any trace. > > oops, 2.6.37 "only" > > > >> Yes this is a known problem. > >> > >> Please try commit 3408404a4c2a4eead9d73b0bbbfe3f225b65f492 > >> (inetpeer: Use correct AVL tree base pointer in inet_getpeer()) > >> > >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3408404a4c2a4eead9d73b0bbbfe3f225b65f492 > >> > >> I believe David will send it to stable team shortly, if not already > >> done :) > > Please ignore, this patch was for linux-2.6 tree, 2.6.37 was not > > affected by the problem. > > > > So its another problem... Is there anything particular you do on this > > machine ? > > > > > > > > > Nothing really special there, we run a lot (20) of KVM guest (mainly > linux firewalls for lots of differents vlan), so we have a lot of > bridges vlan & tun/tap. > Oh, and CONFIG_BRIDGE_IGMP_SNOOPING is set to n (because of the other > bug already sent to netdev - more to come on next mail) > > Hard to say if this BUG is new in 2.6.37. This host was running fine > with 2.6.34.2 since August 2010. > Bisecting will be hard due to the time to trigger the bug (and the fact > that this machine is a production machine) > > Anyway, I can test with a specific kernel version if you suspect something. > I suspect a mem corruption from another layer (not inetpeer) Unfortunately many kmem caches share the "64 bytes" cache. Could you please add "slub_nomerge" on your boot command ? This way, we can separate corruptions on each cache. On your crash, one inetpeer contain garbage on unused_lists next/prev pointers : RCX: 0000000000000005 RDX: 0b000209f1beadde Definitly something overwrote these values with non pointers values. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel 2.6.37 : oops in cleanup_once 2011-02-02 14:53 ` Eric Dumazet @ 2011-02-02 15:04 ` Yann Dupont 2011-02-02 15:08 ` Eric Dumazet 0 siblings, 1 reply; 9+ messages in thread From: Yann Dupont @ 2011-02-02 15:04 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel, netdev Le 02/02/2011 15:53, Eric Dumazet a écrit : > Le mercredi 02 février 2011 à 14:08 +0100, Yann Dupont a écrit : >> Le 02/02/2011 12:24, Eric Dumazet a écrit : >>> Le mercredi 02 février 2011 à 11:52 +0100, Eric Dumazet a écrit : >>>> Le mercredi 02 février 2011 à 09:53 +0100, Yann Dupont a écrit : >>>>> Hello. >>>>> We recently upgraded one machine with vanilla 2.6.37, and experienced 2 >>>>> kernel oops since. Each oops is after ~1 week of uptime. >>>>> The last oops was last night but we didn't had any trace. >>> oops, 2.6.37 "only" >>> >>>> Yes this is a known problem. >>>> >>>> Please try commit 3408404a4c2a4eead9d73b0bbbfe3f225b65f492 >>>> (inetpeer: Use correct AVL tree base pointer in inet_getpeer()) >>>> >>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3408404a4c2a4eead9d73b0bbbfe3f225b65f492 >>>> >>>> I believe David will send it to stable team shortly, if not already >>>> done :) >>> Please ignore, this patch was for linux-2.6 tree, 2.6.37 was not >>> affected by the problem. >>> >>> So its another problem... Is there anything particular you do on this >>> machine ? >>> >>> >>> >>> >> Nothing really special there, we run a lot (20) of KVM guest (mainly >> linux firewalls for lots of differents vlan), so we have a lot of >> bridges vlan& tun/tap. >> Oh, and CONFIG_BRIDGE_IGMP_SNOOPING is set to n (because of the other >> bug already sent to netdev - more to come on next mail) >> >> Hard to say if this BUG is new in 2.6.37. This host was running fine >> with 2.6.34.2 since August 2010. >> Bisecting will be hard due to the time to trigger the bug (and the fact >> that this machine is a production machine) >> >> Anyway, I can test with a specific kernel version if you suspect something. >> > I suspect a mem corruption from another layer (not inetpeer) > > Unfortunately many kmem caches share the "64 bytes" cache. > > Could you please add "slub_nomerge" on your boot command ? > Ok, will do it at 18:30 CET (to minimize impact) It the suspected bug SLUB related ? The 2.6.34.2 kernel previously used on that server used SLAB. 2 questions : -How can I be sure slub_nomerge is active ? Boot message ? -Is there a very severe impact on performance ? Regards, -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel 2.6.37 : oops in cleanup_once 2011-02-02 15:04 ` Yann Dupont @ 2011-02-02 15:08 ` Eric Dumazet 2011-02-02 17:59 ` Yann Dupont 2011-03-14 10:44 ` Yann Dupont 0 siblings, 2 replies; 9+ messages in thread From: Eric Dumazet @ 2011-02-02 15:08 UTC (permalink / raw) To: Yann Dupont; +Cc: linux-kernel, netdev Le mercredi 02 février 2011 à 16:04 +0100, Yann Dupont a écrit : > > > Ok, will do it at 18:30 CET (to minimize impact) > It the suspected bug SLUB related ? > no : It can be a corruption from another part of kernel. > The 2.6.34.2 kernel previously used on that server used SLAB. > > > 2 questions : > -How can I be sure slub_nomerge is active ? Boot message ? # ls -l /sys/kernel/slab/ If you have symlinks : merge is on (default) If you dont have symlinks : nomerge is in action > -Is there a very severe impact on performance ? > not at all > Regards, > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel 2.6.37 : oops in cleanup_once 2011-02-02 15:08 ` Eric Dumazet @ 2011-02-02 17:59 ` Yann Dupont 2011-03-14 10:44 ` Yann Dupont 1 sibling, 0 replies; 9+ messages in thread From: Yann Dupont @ 2011-02-02 17:59 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel, netdev Le 02/02/2011 16:08, Eric Dumazet a écrit : > Le mercredi 02 février 2011 à 16:04 +0100, Yann Dupont a écrit : >> Ok, will do it at 18:30 CET (to minimize impact) >> It the suspected bug SLUB related ? >> > no : It can be a corruption from another part of kernel. > >> The 2.6.34.2 kernel previously used on that server used SLAB. >> >> >> 2 questions : >> -How can I be sure slub_nomerge is active ? Boot message ? > > # ls -l /sys/kernel/slab/ > > If you have symlinks : merge is on (default) > > If you dont have symlinks : nomerge is in action > >> -Is there a very severe impact on performance ? >> > not at all > >> Regards, >> > well. The server had the good taste to oops at 18H05, 25 minutes before the planned reboot :) here is the oops (I think it's quite the same) : Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128042] BUG: unable to handle kernel NULL pointer dereference at 000000000000000d Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128097] IP: [<ffffffff8130e6bf>] cleanup_once+0x3f/0xa0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128146] PGD 0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128173] Oops: 0002 [#1] SMP Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128200] last sysfs file: /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128250] CPU 7 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128260] Modules linked in: dell_rbu acpi_cpufreq freq_table mperf nls_utf8 nls_cp437 btrfs zlib_deflate crc32c libcrc32c ufs qnx4 hfsplus hfs minix ntfs vfat msdos fat jfs rei serfs ext4 jbd2 crc16 ext3 jbd tun ipt_MASQUERADE iptable_nat nf_nat ipt_REJECT kvm_intel kvm xt_physdev ip6t_LOG nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_LOG xt_multiport xt_limit xt_tcpudp xt_state iptable_filter ip_tables x_tables nf_conntrack_tftp nf_conntrack_ftp nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ipv6 8021q bridge stp ext2 mbcache fuse snd_pcm snd_timer ghes hed button snd soundcore i5000_edac edac_core processor shpchp tpm_tis pc i_hotplug tpm rng_core snd_page_alloc i5k_amb dcdbas tpm_bios joydev evdev psmouse pcspkr serio_raw thermal_sys xfs exportfs dm_mod sg sr_mod cdrom sd_mod usbhid hid usb_storage qla2xxx scsi_transport_fc scsi_tgt uhci_hcd mptsas mptscsih mptbase bnx2 scsi_transport_sas scsi_mod ehci_hcd [last unloaded: scsi_wait_scan] Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128834] Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128855] Pid: 0, comm: kworker/0:1 Not tainted 2.6.37-dsiun-110105 #17 0MY736/PowerEdge M600 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128901] RIP: 0010:[<ffffffff8130e6bf>] [<ffffffff8130e6bf>] cleanup_once+0x3f/0xa0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128948] RSP: 0018:ffff8800cfdc3e20 EFLAGS: 00010206 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.128974] RAX: ffff8803a7e0ea18 RBX: ffff8803a7e0ea00 RCX: 0000000000000005 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129003] RDX: adde806c0d860b00 RSI: 0000000000000096 RDI: ffffffff8152a970 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129032] RBP: 00000000000248f6 R08: 00000000003d0900 R09: 0000000000000000 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129062] R10: dead000000200200 R11: 0000000000000000 R12: ffff8800cfdc3ea0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129091] R13: 0000000000000100 R14: ffff88040fd29fd8 R15: 0000000000000000 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129121] FS: 0000000000000000(0000) GS:ffff8800cfdc0000(0000) knlGS:0000000000000000 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129166] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129193] CR2: 000000000000000d CR3: 00000000014f1000 CR4: 00000000000026e0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129223] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129252] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129282] Process kworker/0:1 (pid: 0, threadinfo ffff88040fd28000, task ffff88040fce6450) Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129327] Stack: Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129347] 0000000000000082 00000001008d3b66 00000000000248f6 ffffffff8130e988 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129397] ffff88040fd24000 ffff88040fd24000 ffffffff8152a9a0 ffffffff8105e95f Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129446] ffff8800cfdc3e58 ffff88040fd25020 ffffffff8130e950 ffff88040fd29fd8 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129496] Call Trace: Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129523] <IRQ> Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129551] [<ffffffff8130e988>] ? peer_check_expire+0x38/0x110 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129581] [<ffffffff8105e95f>] ? run_timer_softirq+0x16f/0x350 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129609] [<ffffffff8130e950>] ? peer_check_expire+0x0/0x110 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129638] [<ffffffff81079c6b>] ? ktime_get+0x5b/0xe0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129666] [<ffffffff8105685a>] ? __do_softirq+0xaa/0x1e0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129694] [<ffffffff81003ddc>] ? call_softirq+0x1c/0x30 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129722] [<ffffffff81005f75>] ? do_softirq+0x65/0xa0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129748] [<ffffffff81056745>] ? irq_exit+0x85/0x90 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129776] [<ffffffff8102137a>] ? smp_apic_timer_interrupt+0x6a/0xa0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129806] [<ffffffff81003893>] ? apic_timer_interrupt+0x13/0x20 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129833] <EOI> Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129857] [<ffffffff8123f5ce>] ? acpi_hw_register_read+0x54/0xe2 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129890] [<ffffffffa01c52b8>] ? acpi_idle_enter_simple+0xf4/0x126 [processor] Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.129936] [<ffffffffa01c52b1>] ? acpi_idle_enter_simple+0xed/0x126 [processor] Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131555] [<ffffffffa01c5034>] ? acpi_idle_enter_bm+0xeb/0x27b [processor] Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131591] [<ffffffff812c0deb>] ? cpuidle_idle_call+0x8b/0x140 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131619] [<ffffffff8100208a>] ? cpu_idle+0x6a/0xf0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131645] Code: 00 48 8b 05 c4 c2 21 00 48 3d 60 a9 52 81 74 5c 48 8d 58 e8 48 8b 15 11 02 24 00 2b 53 28 48 39 ea 72 49 48 8b 4b 18 48 8b 53 20 <48> 89 51 08 48 89 0a 48 89 43 18 48 89 43 20 f0 ff 40 14 48 c7 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131847] RIP [<ffffffff8130e6bf>] cleanup_once+0x3f/0xa0 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131876] RSP <ffff8800cfdc3e20> Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.131898] CR2: 000000000000000d Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.132280] ---[ end trace a9f45436c3b7c143 ]--- Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.132350] Kernel panic - not syncing: Fatal exception in interrupt Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.132422] Pid: 0, comm: kworker/0:1 Tainted: G D 2.6.37-dsiun-110105 #17 Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.132510] Call Trace: Feb 2 18:05:33 linkwood.u11.univ-nantes.prive kernel: [37323.132574] <IRQ> [<ffffffff8137c75e>] ? panic+0x92/0x1a2 and I also have a screenshot with more details. I'll send it in a private message. Since 18H30, the server runs with slub_nomerge. -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel 2.6.37 : oops in cleanup_once 2011-02-02 15:08 ` Eric Dumazet 2011-02-02 17:59 ` Yann Dupont @ 2011-03-14 10:44 ` Yann Dupont 2011-03-14 13:14 ` Eric Dumazet 1 sibling, 1 reply; 9+ messages in thread From: Yann Dupont @ 2011-03-14 10:44 UTC (permalink / raw) To: Eric Dumazet; +Cc: linux-kernel, netdev Le 02/02/2011 16:08, Eric Dumazet a écrit : > I suspect a mem corruption from another layer (not inetpeer) > > Unfortunately many kmem caches share the "64 bytes" cache. > > Could you please add "slub_nomerge" on your boot command ? > ... > >> -Is there a very severe impact on performance ? >> > not at all > Maybe there is an impact after all : since then, we don't have problems anymore ! linkwood:~# uptime 11:42:03 up 39 days, 17:08, 3 users, load average: 0.01, 0.03, 0.05 So... could slub_nomerge hide or simply avoid the problem ? Or are we just lucky this time ? -- Yann Dupont - Service IRTS, DSI Université de Nantes Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@univ-nantes.fr ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: kernel 2.6.37 : oops in cleanup_once 2011-03-14 10:44 ` Yann Dupont @ 2011-03-14 13:14 ` Eric Dumazet 0 siblings, 0 replies; 9+ messages in thread From: Eric Dumazet @ 2011-03-14 13:14 UTC (permalink / raw) To: Yann Dupont; +Cc: linux-kernel, netdev Le lundi 14 mars 2011 à 11:44 +0100, Yann Dupont a écrit : > Le 02/02/2011 16:08, Eric Dumazet a écrit : > > > > I suspect a mem corruption from another layer (not inetpeer) > > > > Unfortunately many kmem caches share the "64 bytes" cache. > > > > Could you please add "slub_nomerge" on your boot command ? > > > ... > > > > >> -Is there a very severe impact on performance ? > >> > > not at all > > > Maybe there is an impact after all : since then, we don't have problems > anymore ! > > linkwood:~# uptime > 11:42:03 up 39 days, 17:08, 3 users, load average: 0.01, 0.03, 0.05 > > So... could slub_nomerge hide or simply avoid the problem ? > Or are we just lucky this time ? > > I would say you are lucky ;) Not all memory corruptions are noticed. Sometimes it touch unused parts of memory, or some parts with no critical content. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-03-14 13:14 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <4D491B8D.1000107@univ-nantes.fr>
2011-02-02 10:52 ` kernel 2.6.37 : oops in cleanup_once Eric Dumazet
2011-02-02 11:24 ` Eric Dumazet
2011-02-02 13:08 ` Yann Dupont
2011-02-02 14:53 ` Eric Dumazet
2011-02-02 15:04 ` Yann Dupont
2011-02-02 15:08 ` Eric Dumazet
2011-02-02 17:59 ` Yann Dupont
2011-03-14 10:44 ` Yann Dupont
2011-03-14 13:14 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox