From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josh Durgin Subject: Re: Ceph kernel client - kernel craches Date: Thu, 17 May 2012 15:49:52 -0700 Message-ID: <4FB58090.8070106@inktank.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from mail-pb0-f46.google.com ([209.85.160.46]:46620 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1030492Ab2EQWtz (ORCPT ); Thu, 17 May 2012 18:49:55 -0400 Received: by pbbrp8 with SMTP id rp8so3122348pbb.19 for ; Thu, 17 May 2012 15:49:55 -0700 (PDT) In-Reply-To: Sender: ceph-devel-owner@vger.kernel.org List-ID: To: Giorgos Kappes Cc: ceph-devel@vger.kernel.org Sorry your mail fell through the cracks before. I filed http://tracker.newdream.net/issues/2445 to track the ceph-related crashes. Alex, do you think the first crash is related to ceph at all? Josh On 05/10/2012 11:00 AM, Giorgos Kappes wrote: > Sorry for my late response. I reproduced the above bug with the Linux > kernel 3.3.4 and without using XEN: > > uname -a > Linux node33 3.3.4 #1 SMP Wed May 9 13:00:07 EEST 2012 x86_64 GNU/Linux > > The trace is shown below: > > ---------------------------------------------------- > [ 763.984023] kernel tried to execute NX-protected page - exploit > attempt? (uid: 0) > [ 763.984177] BUG: unable to handle kernel paging request at ffff880037bd0800 > [ 763.984402] IP: [] 0xffff880037bd07ff > [ 763.984568] PGD 1806063 PUD 180a063 PMD 8000000037a001e3 > [ 763.984845] Oops: 0011 [#1] SMP > [ 763.985058] CPU 3 > [ 763.985124] Modules linked in: cbc netconsole loop snd_pcm > snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac > tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys > button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod > cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core > usbcore usb_common tg3 libphy mptsas mptscsih mptbase > scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan] > [ 763.988002] > [ 763.988002] Pid: 0, comm: swapper/3 Not tainted 3.3.4 #1 HP ProLiant DL160 G5 > [ 763.988002] RIP: 0010:[] [] > 0xffff880037bd07ff > [ 763.988002] RSP: 0018:ffff8800bfcc3e78 EFLAGS: 00010292 > [ 763.988002] RAX: ffff8800b97745b0 RBX: ffff8800bfcce770 RCX: ffff880037bd0800 > [ 763.988002] RDX: ffff880037bd1600 RSI: 00000000b9b6a040 RDI: ffff880037bd1600 > [ 763.988002] RBP: ffffffff81820080 R08: ffff8800b9dd0b00 R09: 000000018020001c > [ 763.988002] R10: 000000008020001c R11: ffffffff816075c0 R12: ffff8800bfcce7a0 > [ 763.988002] R13: ffff8800b97745b0 R14: 0000000000000003 R15: 000000000000000a > [ 763.988002] FS: 0000000000000000(0000) GS:ffff8800bfcc0000(0000) > knlGS:0000000000000000 > [ 763.988002] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 763.988002] CR2: ffff880037bd0800 CR3: 00000000b895b000 CR4: 00000000000006e0 > [ 763.988002] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 763.988002] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 763.988002] Process swapper/3 (pid: 0, threadinfo ffff8800bbae0000, > task ffff8800bbad8000) > [ 763.988002] Stack: > [ 763.988002] ffffffff8109b44d ffff8800bbacd820 ffff8800b97745b0 > ffff8800bbae0010 > [ 763.988002] ffff8800bbad8000 ffff8800bfcc3ea0 0000000000000048 > ffff8800bbae1fd8 > [ 763.988002] 0000000000000100 0000000000000001 0000000000000009 > ffff8800bbae1fd8 > [ 763.988002] Call Trace: > [ 763.988002] > [ 763.988002] [] ? __rcu_process_callbacks+0x1e9/0x335 > [ 763.988002] [] ? rcu_process_callbacks+0x2c/0x56 > [ 763.988002] [] ? __do_softirq+0xc4/0x1a0 > [ 763.988002] [] ? lapic_next_event+0x18/0x1d > [ 763.988002] [] ? call_softirq+0x1c/0x30 > [ 763.988002] [] ? do_softirq+0x3f/0x79 > [ 763.988002] [] ? irq_exit+0x44/0xb1 > [ 763.988002] [] ? smp_apic_timer_interrupt+0x85/0x93 > [ 763.988002] [] ? apic_timer_interrupt+0x6e/0x80 > [ 763.988002] > [ 763.988002] [] ? native_sched_clock+0x28/0x33 > [ 763.988002] [] ? mwait_idle+0x8c/0xbc > [ 763.988002] [] ? mwait_idle+0x44/0xbc > [ 763.988002] [] ? cpu_idle+0xb9/0xf7 > [ 763.988002] [] ? start_secondary+0x270/0x275 > [ 763.988002] Code: 00 00 00 00 04 8a b8 00 88 ff ff 00 04 8a b8 00 > 88 ff ff 00 03 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00<00> 16 bd 37 00 88 ff ff 40 ab cd bf 00 88 ff ff 20 15 42 > b9 00 > [ 763.988002] RIP [] 0xffff880037bd07ff > [ 763.988002] RSP > [ 763.988002] CR2: ffff880037bd0800 > [ 763.988002] ---[ end trace 614049dc850267ac ]--- > [ 763.988002] Kernel panic - not syncing: Fatal exception in interrupt > [ 763.997833] ------------[ cut here ]------------ > [ 763.997936] WARNING: at arch/x86/kernel/smp.c:120 > update_process_times+0x57/0x63() > [ 763.998072] Hardware name: ProLiant DL160 G5 > [ 763.998171] Modules linked in: cbc netconsole loop snd_pcm > snd_timer snd soundcore snd_page_alloc processor tpm_tis i5400_edac > tpm edac_core tpm_bios evdev pcspkr i5k_amb rng_core thermal_sys > button shpchp pci_hotplug sd_mod crc_t10dif usbhid hid ide_cd_mod > cdrom ata_generic uhci_hcd ehci_hcd ata_piix libata piix ide_core > usbcore usb_common tg3 libphy mptsas mptscsih mptbase > scsi_transport_sas scsi_mod [last unloaded: scsi_wait_scan] > [ 764.001205] Pid: 0, comm: swapper/3 Tainted: G D 3.3.4 #1 > [ 764.001311] Call Trace: > [ 764.001404] [] ? warn_slowpath_common+0x78/0x8c > [ 764.001573] [] ? update_process_times+0x57/0x63 > [ 764.001681] [] ? tick_sched_timer+0x65/0x8b > [ 764.001788] [] ? __run_hrtimer+0xb2/0x13d > [ 764.001832] [] ? read_tsc+0x5/0x16 > [ 764.001832] [] ? hrtimer_interrupt+0xd8/0x1a7 > [ 764.001832] [] ? smp_apic_timer_interrupt+0x80/0x93 > [ 764.001832] [] ? native_safe_apic_wait_icr_idle+0x1a/0x49 > [ 764.001832] [] ? apic_timer_interrupt+0x6e/0x80 > [ 764.001832] [] ? up+0xe/0x36 > [ 764.001832] [] ? panic+0x189/0x1c9 > [ 764.001832] [] ? panic+0xf0/0x1c9 > [ 764.001832] [] ? kmsg_dump+0x53/0xef > [ 764.001832] [] ? oops_end+0xaa/0xb7 > [ 764.001832] [] ? no_context+0x254/0x263 > [ 764.001832] [] ? do_page_fault+0x1ad/0x34c > [ 764.001832] [] ? __netif_receive_skb+0x44d/0x491 > [ 764.001832] [] ? read_tsc+0x5/0x16 > [ 764.001832] [] ? netif_receive_skb+0x71/0x77 > [ 764.001832] [] ? napi_gro_receive+0x1f/0x2c > [ 764.001832] [] ? napi_skb_finish+0x1c/0x31 > [ 764.001832] [] ? tg3_poll_work+0x8f9/0xb66 [tg3] > [ 764.001832] [] ? page_fault+0x25/0x30 > [ 764.001832] [] ? __rcu_process_callbacks+0x1e9/0x335 > [ 764.001832] [] ? rcu_process_callbacks+0x2c/0x56 > [ 764.001832] [] ? __do_softirq+0xc4/0x1a0 > [ 764.001832] [] ? lapic_next_event+0x18/0x1d > [ 764.001832] [] ? call_softirq+0x1c/0x30 > [ 764.001832] [] ? do_softirq+0x3f/0x79 > [ 764.001832] [] ? irq_exit+0x44/0xb1 > [ 764.001832] [] ? smp_apic_timer_interrupt+0x85/0x93 > [ 764.001832] [] ? apic_timer_interrupt+0x6e/0x80 > [ 764.001832] [] ? native_sched_clock+0x28/0x33 > [ 764.001832] [] ? mwait_idle+0x8c/0xbc > [ 764.001832] [] ? mwait_idle+0x44/0xbc > [ 764.001832] [] ? cpu_idle+0xb9/0xf7 > [ 764.001832] [] ? start_secondary+0x270/0x275 > [ 764.001832] ---[ end trace 614049dc850267ad ]--- > > ---------------------------------------------------- > > Also, as you noted, I disabled the NX bit by passing "noexec=off" to > the kernel. > Unfortunately, the bug is still happening: > > ---------------------------------------------------- > [ 703.168022] BUG: unable to handle kernel paging request at ffff87ffbfa0e22b > [ 703.168293] IP: [] 0xffff8800b97671ff > [ 703.168457] PGD 0 > [ 703.168613] Oops: 0002 [#1] SMP > [ 703.168831] CPU 0 > [ 703.168896] Modules linked in: cbc netconsole loop tpm_tis snd_pcm > snd_timer snd soundcore shpchp pci_hotplug snd_page_alloc tpm > i5400_edac rng_core tpm_bios edac_core i5k_amb processor pcspkr > thermal_sys evdev button sd_mod crc_t10dif usbhid hid sg sr_mod cdrom > ata_generic uhci_hcd piix ide_core ehci_hcd ata_piix tg3 libphy > usbcore usb_common libata mptsas mptscsih mptbase scsi_transport_sas > scsi_mod [last unloaded: scsi_wait_scan] > [ 703.172001] > [ 703.172001] Pid: 0, comm: swapper/0 Not tainted 3.3.4 #1 HP ProLiant DL160 G5 > [ 703.172001] RIP: 0010:[] [] > 0xffff8800b97671ff > [ 703.172001] RSP: 0018:ffff8800bfc03e78 EFLAGS: 00010292 > [ 703.172001] RAX: ffff880037a02900 RBX: ffff8800bfc0e770 RCX: ffff8800b9767200 > [ 703.172001] RDX: ffff8800b92b9000 RSI: ffff8800b8901800 RDI: ffff8800b92b9000 > [ 703.172001] RBP: ffffffff81820080 R08: ffff8800b8f77f00 R09: 000000018020000a > [ 703.172001] R10: 000000008020000a R11: ffff8800bfc0e600 R12: ffff8800bfc0e7a0 > [ 703.172001] R13: ffff8800ba177370 R14: 0000000000000005 R15: 000000000000000a > [ 703.172001] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) > knlGS:0000000000000000 > [ 703.172001] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 703.172001] CR2: ffff87ffbfa0e22b CR3: 0000000001805000 CR4: 00000000000006f0 > [ 703.172001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 703.172001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 703.172001] Process swapper/0 (pid: 0, threadinfo ffffffff81800000, > task ffffffff8180d020) > [ 703.172001] Stack: > [ 703.172001] ffffffff8109b44d 0000000000000000 ffff880037a02900 > ffffffff81800010 > [ 703.172001] ffffffff8180d020 ffffffff81801fd8 0000000000000048 > ffffffff81801fd8 > [ 703.172001] 0000000000000100 0000000000000001 0000000000000009 > ffffffff81801fd8 > [ 703.172001] Call Trace: > [ 703.172001] > [ 703.172001] [] ? __rcu_process_callbacks+0x1e9/0x335 > [ 703.172001] [] ? rcu_process_callbacks+0x2c/0x56 > [ 703.172001] [] ? __do_softirq+0xc4/0x1a0 > [ 703.172001] [] ? lapic_next_event+0x18/0x1d > [ 703.172001] [] ? call_softirq+0x1c/0x30 > [ 703.172001] [] ? do_softirq+0x3f/0x79 > [ 703.172001] [] ? irq_exit+0x44/0xb1 > [ 703.172001] [] ? smp_apic_timer_interrupt+0x85/0x93 > [ 703.172001] [] ? apic_timer_interrupt+0x6e/0x80 > [ 703.172001] > [ 703.172001] [] ? native_sched_clock+0x28/0x33 > [ 703.172001] [] ? mwait_idle+0x8c/0xbc > [ 703.172001] [] ? mwait_idle+0x44/0xbc > [ 703.172001] [] ? cpu_idle+0xb9/0xf7 > [ 703.172001] [] ? start_kernel+0x395/0x3a0 > [ 703.172001] [] ? x86_64_start_kernel+0x102/0x10f > [ 703.172001] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 00<00> 90 2b b9 00 88 ff ff 20 ac c1 bf 00 88 ff ff 20 98 a7 > b8 00 > [ 703.172001] RIP [] 0xffff8800b97671ff > [ 703.172001] RSP > [ 703.172001] CR2: ffff87ffbfa0e22b > [ 703.172001] ---[ end trace 15e08c2db2033830 ]--- > [ 703.172001] Kernel panic - not syncing: Fatal exception in interrupt > ---------------------------------------------------- > > The strange thing is that the crash traces does not contain any calls > related to Ceph. > However, this bug only happens when running debootstrap to install a > base Debian system > into a Ceph directory. Debootstrap completes successfully when the > target directory is > under NFS or on a local file system. > > Furthermore, a different crash occurs when trying to remove a > non-empty Ceph directory: > ****************************************************** > root@node33:/mnt# rm debian -r > rm: cannot remove `debian/etc': Directory not empty > Write failed: Broken pipe > ****************************************************** > The crash trace is shown below: > > ---------------------------------------------------- > > [74576.543412] libceph: client0 fsid 9b3222ac-fce2-44eb-8599-d39da02d2393 > [74576.651197] libceph: mon0 192.168.2.254:6789 session established > [75143.963663] BUG: unable to handle kernel NULL pointer dereference > at 0000000000000030 > [75143.963771] IP: [] path_init+0x218/0x2cc > [75143.963827] PGD 37a63067 PUD 37b4b067 PMD 0 > [75143.963880] Oops: 0000 [#1] SMP > [75143.963928] CPU 3 > [75143.963935] Modules linked in: cbc netconsole loop i5400_edac > snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc > tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp > pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom > ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore > usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas > scsi_mod [last unloaded: netconsole] > [75143.964390] > [75143.964426] Pid: 3861, comm: rm Not tainted 3.3.4 #1 HP ProLiant DL160 G5 > [75143.964485] RIP: 0010:[] [] > path_init+0x218/0x2cc > [75143.964570] RSP: 0018:ffff880037b45d58 EFLAGS: 00010202 > [75143.964618] RAX: 0000000000000000 RBX: ffff8800b8975000 RCX: ffff880037b45ea8 > [75143.964672] RDX: ffff8800b929a900 RSI: ffff880037b45d74 RDI: ffff8800b9d0e830 > [75143.964727] RBP: 0000000000000050 R08: ffff880037b45de0 R09: 0000000000000000 > [75143.964781] R10: 0000006e7265746c R11: 0000000001406d90 R12: ffff880037b45ea8 > [75143.964835] R13: ffff8800b929a900 R14: ffff880037b45de0 R15: 0000000000000003 > [75143.964890] FS: 00007fb7a9b47700(0000) GS:ffff8800bfcc0000(0000) > knlGS:0000000000000000 > [75143.964974] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [75143.965023] CR2: 0000000000000030 CR3: 00000000379d5000 CR4: 00000000000006e0 > [75143.965078] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [75143.965132] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [75143.965187] Process rm (pid: 3861, threadinfo ffff880037b44000, > task ffff8800b91fed00) > [75143.965269] Stack: > [75143.965306] 00000000b9f76000 000000005088218e ffff8800b9f76280 > 00000000b9f76280 > [75143.965397] ffff880037b45ea8 0000000000000050 ffff8800b8975000 > 0000000000000010 > [75143.965489] 00000000013f2030 ffffffff811072b4 ffffffff81060362 > dead000000100100 > [75143.965581] Call Trace: > [75143.965622] [] ? path_lookupat+0x2c/0x30b > [75143.965674] [] ? try_to_wake_up+0x1a5/0x1a5 > [75143.965725] [] ? do_path_lookup+0x1e/0x9a > [75143.965775] [] ? user_path_parent+0x3a/0x5f > [75143.965826] [] ? virt_to_head_page+0x9/0x2c > [75143.965877] [] ? do_unlinkat+0x1d/0x15e > [75143.965927] [] ? vfs_readdir+0x91/0xa7 > [75143.965977] [] ? fsnotify_find_inode_mark+0x23/0x2f > [75143.966031] [] ? filp_close+0x64/0x6c > [75143.966082] [] ? system_call_fastpath+0x16/0x1b > [75143.966133] Code: 04 01 e9 a1 00 00 00 48 8d 74 24 1c e8 ae 6f ff > ff 49 89 c5 b8 f7 ff ff ff 4d 85 ed 0f 84 b0 00 00 00 49 8b 45 18 80 > 3b 00 74 28<48> 8b 78 30 b8 ec ff ff ff 0f b7 17 81 e2 00 f0 00 00 81 > fa 00 > [75143.966507] RIP [] path_init+0x218/0x2cc > [75143.966558] RSP > [75143.966600] CR2: 0000000000000030 > [75143.967124] ---[ end trace 18e2f523c5af9a38 ]--- > [75143.967322] general protection fault: 0000 [#2] SMP > [75143.967542] CPU 3 > [75143.967607] Modules linked in: cbc netconsole loop i5400_edac > snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc > tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp > pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom > ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore > usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas > scsi_mod [last unloaded: netconsole] > [75143.970715] > [75143.970805] Pid: 3861, comm: rm Tainted: G D 3.3.4 #1 HP > ProLiant DL160 G5 > [75143.971058] RIP: 0010:[] [] > filp_close+0x2d/0x6c > [75143.971085] RSP: 0018:ffff880037b45a48 EFLAGS: 00010206 > [75143.971085] RAX: 0012080800000000 RBX: ffff8800b910d300 RCX: 0000000000000000 > [75143.971085] RDX: 0000000000000000 RSI: ffff8800b9712c00 RDI: ffff8800b910d300 > [75143.971085] RBP: ffff8800b9712c00 R08: 0000000000016870 R09: 00007fff71600000 > [75143.971085] R10: 0000000000000001 R11: ffff880037b459a8 R12: 0000000000000000 > [75143.971085] R13: ffff8800bb51f6c0 R14: 0000000000000004 R15: 0000000000000000 > [75143.971085] FS: 0000000000000000(0000) GS:ffff8800bfcc0000(0000) > knlGS:0000000000000000 > [75143.971085] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [75143.971085] CR2: 0000000000000030 CR3: 0000000001805000 CR4: 00000000000006e0 > [75143.971085] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [75143.971085] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [75143.971085] Process rm (pid: 3861, threadinfo ffff880037b44000, > task ffff8800b91fed00) > [75143.971085] Stack: > [75143.971085] ffff8800b9712c00 0000000000000007 0000000000000000 > ffffffff8103aaad > [75143.971085] 0000000000000009 ffff8800b91fed00 ffff8800b91ff218 > 0000000000000009 > [75143.971085] ffff8800b8ca0700 ffff8800b92c0880 0000000000000001 > ffffffff8103bfe4 > [75143.971085] Call Trace: > [75143.971085] [] ? put_files_struct+0x67/0xbf > [75143.971085] [] ? do_exit+0x2aa/0x7e1 > [75143.971085] [] ? kmsg_dump+0x53/0xef > [75143.971085] [] ? oops_end+0x66/0xb7 > [75143.971085] [] ? oops_end+0xb2/0xb7 > [75143.971085] [] ? no_context+0x254/0x263 > [75143.971085] [] ? ceph_writepages_start+0xbb4/0xbee > [75143.971085] [] ? do_page_fault+0x215/0x34c > [75143.971085] [] ? __cap_is_valid+0x19/0x9a > [75143.971085] [] ? ceph_encode_inode_release+0xed/0x2b2 > [75143.971085] [] ? update_curr+0xfb/0x130 > [75143.971085] [] ? __switch_to+0x20b/0x35f > [75143.971085] [] ? update_curr+0xfb/0x130 > [75143.971085] [] ? page_fault+0x25/0x30 > [75143.971085] [] ? path_init+0x218/0x2cc > [75143.971085] [] ? path_init+0x1fe/0x2cc > [75143.971085] [] ? path_lookupat+0x2c/0x30b > [75143.971085] [] ? try_to_wake_up+0x1a5/0x1a5 > [75143.971085] [] ? do_path_lookup+0x1e/0x9a > [75143.971085] [] ? user_path_parent+0x3a/0x5f > [75143.971085] [] ? virt_to_head_page+0x9/0x2c > [75143.971085] [] ? do_unlinkat+0x1d/0x15e > [75143.971085] [] ? vfs_readdir+0x91/0xa7 > [75143.971085] [] ? fsnotify_find_inode_mark+0x23/0x2f > [75143.971085] [] ? filp_close+0x64/0x6c > [75143.971085] [] ? system_call_fastpath+0x16/0x1b > [75143.971085] Code: 55 48 89 f5 53 48 89 fb 48 8b 47 30 48 85 c0 75 > 11 48 c7 c7 6e d5 72 81 45 31 e4 e8 f0 fa 4c 00 eb 40 48 8b 47 20 48 > 85 c0 74 10<48> 8b 40 60 48 85 c0 74 07 ff d0 41 89 c4 eb 03 45 31 e4 > f6 43 > [75143.971085] RIP [] filp_close+0x2d/0x6c > [75143.971085] RSP > [75143.988721] ---[ end trace 18e2f523c5af9a39 ]--- > [75143.988826] Fixing recursive fault but reboot is needed! > [75146.018276] ------------[ cut here ]------------ > [75146.018399] kernel BUG at mm/slub.c:3442! > [75146.018498] invalid opcode: 0000 [#3] SMP > [75146.018718] CPU 1 > [75146.018789] Modules linked in: cbc netconsole loop i5400_edac > snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc > tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp > pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom > ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore > usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas > scsi_mod [last unloaded: netconsole] > [75146.021908] > [75146.021999] Pid: 1137, comm: kworker/1:0 Tainted: G D > 3.3.4 #1 HP ProLiant DL160 G5 > [75146.022236] RIP: 0010:[] [] > kfree+0x59/0xc2 > [75146.022236] RSP: 0018:ffff8800bbb35b90 EFLAGS: 00010246 > [75146.022236] RAX: 0100000000000400 RBX: ffff8800bfcdab70 RCX: ffff8800b9f762c8 > [75146.022236] RDX: ffff8800bfc4e550 RSI: 0000000000000000 RDI: ffffea0002ff3680 > [75146.022236] RBP: ffffffff815a2a50 R08: 0000000000000000 R09: ffffffff814f9620 > [75146.022236] R10: 000000000000000d R11: ffff8800b9d0d000 R12: ffffffff815a2a47 > [75146.022236] R13: ffff8800ba15d400 R14: ffff8800b9d0d271 R15: ffff8800b9d0d000 > [75146.022236] FS: 0000000000000000(0000) GS:ffff8800bfc40000(0000) > knlGS:0000000000000000 > [75146.022236] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [75146.022236] CR2: ffffffffff600400 CR3: 0000000037b13000 CR4: 00000000000006e0 > [75146.022236] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [75146.022236] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [75146.022236] Process kworker/1:0 (pid: 1137, threadinfo > ffff8800bbb34000, task ffff8800b90b0000) > [75146.022236] Stack: > [75146.022236] ffff8800b8fde500 ffffffff815a2a50 ffff8800b9f72800 > ffffffff815a2a47 > [75146.022236] ffff8800b8fde588 ffffffff81372b8e 0000001b00004040 > ffff8800b9f72a68 > [75146.022236] ffff8800b9f72800 ffffffff8137513d ffff8800ba15d400 > ffff8800b9f72a68 > [75146.022236] Call Trace: > [75146.022236] [] ? ceph_msg_kfree+0x47/0x47 > [75146.022236] [] ? ceph_msg_kfree+0x3e/0x47 > [75146.022236] [] ? kref_put+0x34/0x3e > [75146.022236] [] ? ceph_mdsc_release_request+0x2f/0x145 > [75146.022236] [] ? encode_caps_cb+0x2f9/0x2f9 > [75146.022236] [] ? kref_put+0x34/0x3e > [75146.022236] [] ? dispatch+0xe05/0x132c > [75146.022236] [] ? kernel_recvmsg+0x34/0x3f > [75146.022236] [] ? crc32c+0x56/0x7c > [75146.022236] [] ? ceph_tcp_recvmsg+0x43/0x4f > [75146.022236] [] ? con_work+0x15ac/0x17a8 > [75146.022236] [] ? lock_timer_base+0x25/0x49 > [75146.022236] [] ? ceph_fault+0x2b4/0x2b4 > [75146.022236] [] ? process_one_work+0x1cd/0x2eb > [75146.022236] [] ? worker_thread+0x12e/0x249 > [75146.022236] [] ? process_one_work+0x2eb/0x2eb > [75146.022236] [] ? process_one_work+0x2eb/0x2eb > [75146.022236] [] ? kthread+0x81/0x89 > [75146.022236] [] ? kernel_thread_helper+0x4/0x10 > [75146.022236] [] ? kthread_freezable_should_stop+0x53/0x53 > [75146.022236] [] ? gs_change+0x13/0x13 > [75146.022236] Code: 00 48 83 c5 10 48 83 7d 00 00 eb e6 48 83 fb 10 > 76 7d 48 89 df e8 ad de ff ff 48 89 c7 48 8b 00 84 c0 78 14 66 f7 07 > 00 c0 75 04<0f> 0b eb fe 5b 5d 41 5c e9 e8 03 fd ff 4c 8b 54 24 18 4c > 8b 4f > [75146.022236] RIP [] kfree+0x59/0xc2 > [75146.022236] RSP > [75146.031675] ---[ end trace 18e2f523c5af9a3a ]--- > [75146.031809] BUG: unable to handle kernel paging request at fffffffffffffff8 > [75146.032058] IP: [] kthread_data+0x7/0xc > [75146.032221] PGD 1807067 PUD 1808067 PMD 0 > [75146.032494] Oops: 0000 [#4] SMP > [75146.032706] CPU 1 > [75146.032771] Modules linked in: cbc netconsole loop i5400_edac > snd_pcm edac_core i5k_amb snd_timer snd soundcore evdev snd_page_alloc > tpm_tis tpm processor rng_core thermal_sys pcspkr tpm_bios shpchp > pci_hotplug button sd_mod crc_t10dif usbhid hid ide_cd_mod cdrom > ata_generic uhci_hcd ata_piix libata piix ide_core ehci_hcd usbcore > usb_common tg3 libphy mptsas mptscsih mptbase scsi_transport_sas > scsi_mod [last unloaded: netconsole] > [75146.035616] > [75146.035616] Pid: 1137, comm: kworker/1:0 Tainted: G D > 3.3.4 #1 HP ProLiant DL160 G5 > [75146.035616] RIP: 0010:[] [] > kthread_data+0x7/0xc > [75146.035616] RSP: 0018:ffff8800bbb35900 EFLAGS: 00010002 > [75146.035616] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001 > [75146.035616] RDX: ffffffff819a87b0 RSI: 0000000000000001 RDI: ffff8800b90b0000 > [75146.035616] RBP: ffff8800b90b0000 R08: 0000000000000400 R09: ffffffff81013c7c > [75146.035616] R10: ffff8800b90b0000 R11: ffff8800b90b0518 R12: ffff8800b90b02f8 > [75146.035616] R13: ffff8800bbb359c8 R14: 0000000000000001 R15: 0000000000000001 > [75146.035616] FS: 0000000000000000(0000) GS:ffff8800bfc40000(0000) > knlGS:0000000000000000 > [75146.035616] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [75146.035616] CR2: fffffffffffffff8 CR3: 0000000037b13000 CR4: 00000000000006e0 > [75146.035616] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [75146.035616] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [75146.035616] Process kworker/1:0 (pid: 1137, threadinfo > ffff8800bbb34000, task ffff8800b90b0000) > [75146.035616] Stack: > [75146.035616] ffffffff8104e2a2 ffff8800bfc53340 ffffffff815cb1d1 > 0000000000000001 > [75146.035616] 0000000000000296 0000000000013340 ffff8800bbb35fd8 > 0000000000013340 > [75146.035616] ffff8800bbb35fd8 0000000000013340 ffff8800b90b0000 > 0000000000013340 > [75146.035616] Call Trace: > [75146.035616] [] ? wq_worker_sleeping+0x8/0x82 > [75146.035616] [] ? __schedule+0x166/0x4fc > [75146.035616] [] ? do_exit+0x7dd/0x7e1 > [75146.035616] [] ? printk+0x40/0x4c > [75146.035616] [] ? oops_end+0x66/0xb7 > [75146.035616] [] ? oops_end+0xb2/0xb7 > [75146.035616] [] ? ceph_msg_kfree+0x3e/0x47 > [75146.035616] [] ? do_invalid_op+0x8b/0x95 > [75146.035616] [] ? kfree+0x59/0xc2 > [75146.035616] [] ? inet_recvmsg+0x64/0x75 > [75146.035616] [] ? ceph_msg_kfree+0x47/0x47 > [75146.035616] [] ? invalid_op+0x1b/0x20 > [75146.035616] [] ? ceph_msg_kfree+0x3e/0x47 > [75146.035616] [] ? ceph_msg_kfree+0x47/0x47 > [75146.035616] [] ? tcp_recvmsg+0x773/0x95e > [75146.035616] [] ? kfree+0x59/0xc2 > [75146.035616] [] ? ceph_msg_kfree+0x47/0x47 > [75146.035616] [] ? ceph_msg_kfree+0x3e/0x47 > [75146.035616] [] ? kref_put+0x34/0x3e > [75146.035616] [] ? ceph_mdsc_release_request+0x2f/0x145 > [75146.035616] [] ? encode_caps_cb+0x2f9/0x2f9 > [75146.035616] [] ? kref_put+0x34/0x3e > [75146.035616] [] ? dispatch+0xe05/0x132c > [75146.035616] [] ? kernel_recvmsg+0x34/0x3f > [75146.035616] [] ? crc32c+0x56/0x7c > [75146.035616] [] ? ceph_tcp_recvmsg+0x43/0x4f > [75146.035616] [] ? con_work+0x15ac/0x17a8 > [75146.035616] [] ? lock_timer_base+0x25/0x49 > [75146.035616] [] ? ceph_fault+0x2b4/0x2b4 > [75146.035616] [] ? process_one_work+0x1cd/0x2eb > [75146.035616] [] ? worker_thread+0x12e/0x249 > [75146.035616] [] ? process_one_work+0x2eb/0x2eb > [75146.035616] [] ? process_one_work+0x2eb/0x2eb > [75146.035616] [] ? kthread+0x81/0x89 > [75146.035616] [] ? kernel_thread_helper+0x4/0x10 > [75146.035616] [] ? kthread_freezable_should_stop+0x53/0x53 > [75146.035616] [] ? gs_change+0x13/0x13 > [75146.035616] Code: 41 5e 41 5f c3 41 bf ea ff ff ff eb 97 90 90 90 > 65 48 8b 04 25 c0 c6 00 00 48 8b 80 a0 02 00 00 8b 40 f0 c3 48 8b 87 > a0 02 00 00<48> 8b 40 f8 c3 48 3b 3d 51 5f 95 00 75 08 0f bf 87 6a 06 > 00 00 > [75146.035616] RIP [] kthread_data+0x7/0xc > [75146.035616] RSP > [75146.035616] CR2: fffffffffffffff8 > [75146.035616] ---[ end trace 18e2f523c5af9a3b ]--- > [75146.035616] Fixing recursive fault but reboot is needed! > [75206.036002] INFO: rcu_sched detected stalls on CPUs/tasks: { 1} > (detected by 3, t=15002 jiffies) > [75206.036265] Pid: 0, comm: swapper/3 Tainted: G D 3.3.4 #1 > [75206.036371] Call Trace: > [75206.036464] [] ? __rcu_pending+0x21a/0x336 > [75206.036635] [] ? tick_nohz_handler+0xcb/0xcb > [75206.036740] [] ? rcu_check_callbacks+0xa7/0xe7 > [75206.036846] [] ? tick_nohz_handler+0xcb/0xcb > [75206.036951] [] ? update_process_times+0x31/0x63 > > ---------------------------------------------------- > > Thanks a lot, > Giorgos Kappes > > On Tue, May 8, 2012 at 10:18 PM, Tommi Virtanen wrote: >> On Tue, May 8, 2012 at 8:43 AM, Giorgos Kappes wrote: >>> When I am running deboostrap to install a base Debian Squeeze system >>> on a Ceph directory the client's kernel crashes with the following >>> message: >>> >>> I: Extracting zlib1g... >>> W: Failure trying to run: chroot /mnt/debian mount -t proc proc /proc >>> [ 759.776151] kernel tried to execute NX-protected page - exploit >>> attempt? (uid: 0) >>> [ 759.776169] BUG: unable to handle kernel paging request at ffffe8fffffe4ab0 >> ... >>> [ 759.776438] [] ? __rcu_process_callbacks+0x1c7/0x2f8 >>> [ 759.776447] [] ? rcu_process_callbacks+0x2c/0x56 >>> [ 759.776457] [] ? __do_softirq+0xc4/0x1a0 >>> [ 759.776465] [] ? handle_percpu_irq+0x3d/0x54 >>> [ 759.776475] [] ? __xen_evtchn_do_upcall+0x1c7/0x205 >>> [ 759.776484] [] ? call_softirq+0x1c/0x30 >>> [ 759.776493] [] ? do_softirq+0x3f/0x79 >>> [ 759.776501] [] ? irq_exit+0x44/0xb5 >>> [ 759.776508] [] ? xen_evtchn_do_upcall+0x27/0x32 >>> [ 759.776516] [] ? xen_do_hypervisor_callback+0x1e/0x30 >> ... >>> My simple cluster consists of 3 nodes in total. Each node is a Xen >>> domU guest running the Linux kernel 3.2.6 and ceph 0.43. For >>> reference, here is my configuration: >> ... >>> My Ceph kernel client is another Xen domU node running the Linux >>> kernel 3.2.11. I have also tried a native client with the same result. >>> Please note that this bug happens only in the client side. >>> Your help would be greatly appreciated. >> >> Your backtrace includes Xen code in it -- can you reproduce this bug >> with a mainline kernel, without Xen at all? >> >> Also, the error encountered is from the NX security subsystem. It >> would be nice to know what would happen without NX. > > > > ----------------------------------------------------------- > Giorgos Kappes > Website: http://www.cs.uoi.gr/~gkappes > email: geokapp@gmail.com