All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Drbd-dev] [DRBD-user] drbd 8.3.1 OOPS
  2009-05-19 10:30 [Drbd-dev] [DRBD-user] drbd 8.3.1 OOPS Mickael Marchand
@ 2009-05-19 10:27 ` Lars Ellenberg
  0 siblings, 0 replies; 2+ messages in thread
From: Lars Ellenberg @ 2009-05-19 10:27 UTC (permalink / raw)
  To: Mickael Marchand; +Cc: drbd-dev

On Tue, May 19, 2009 at 11:33:13AM +0200, Mickael Marchand wrote:
> Hi,
> 
> I have a dual node Xen/drbd cluster that got a problem last week-end,
> running 2.6.24-23-xen from ubuntu with self-compiled drbd 8.3.1.
> 
> for some reason a SAS disk got kicked by its controller,
> drbd properly detected it :
> May 17 07:44:20 ifvm1 kernel: [9678143.435599] scsi 0:0:3:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> May 17 07:44:20 ifvm1 kernel: [9678143.435604] end_request: I/O error, dev sdd, sector 255302231
> May 17 07:44:20 ifvm1 kernel: [9678143.435609] drbd1: Method to ensure write ordering: flush
> May 17 07:44:20 ifvm1 kernel: [9678143.435622] drbd1: disk( UpToDate -> Failed )
> May 17 07:44:20 ifvm1 kernel: [9678143.435625] drbd1: Local IO failed.  Detaching...
> May 17 07:44:20 ifvm1 kernel: [9692200.309651] drbd1: disk( Failed -> Diskless )
> May 17 07:44:20 ifvm1 kernel: [9692200.309675] drbd1: Notified peer that my disk is broken.
> 
> that drbd was secondary on that host, the primary was still running fine on
> the other node so I left it untouched till monday when I removed the
> failing drive from the server.
> Adding a working drive in the server, I wanted to attach this drive to
> drbd but this failed.
> 
> I am not sure which exact command I used at this time, probably "drbdadm
> attach r1" which gave :
> May 18 10:59:00 ifvm1 kernel: [9791064.037185] drbd1: drbd_nl_disk_conf: mdev->bc not NULL.
> 
> so I tried to down this drbd and it OOPs-ed :
> 
> May 18 10:59:37 ifvm1 kernel: [9791101.108093] drbd1: drbd_nl_disk_conf: mdev->bc not NULL.
> May 18 10:59:37 ifvm1 kernel: [9791101.108127] Unable to handle kernel paging request at 000000010000002c RIP:
> May 18 10:59:37 ifvm1 kernel: [9791101.108138]  [<ffffffff8029f130>] fput+0x0/0x20
> May 18 10:59:37 ifvm1 kernel: [9791101.108150] PGD 9b30067 PUD 0
> May 18 10:59:37 ifvm1 kernel: [9791101.108154] Oops: 0002 [1] SMP
> May 18 10:59:37 ifvm1 kernel: [9791101.108157] CPU 1
> May 18 10:59:37 ifvm1 kernel: [9791101.108159] Modules linked in: drbd af_packet xt_physdev ipt_LOG xt_state xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack bridge 8021q sbs dock sbshc ac battery container video output iptable_filter ip_tables x_tables cn coretemp ipmi_devintf ipmi_si ipmi_watchdog ipmi_poweroff ipmi_msghandler parport_pc lp parport loop ipv6 megaraid_sas psmouse serio_raw iTCO_wdt evdev dcdbas iTCO_vendor_support 8250_pnp button pcspkr i5000_edac edac_core shpchp pci_hotplug 8250 serial_core ext3 jbd mbcache sr_mod cdrom pata_acpi ata_piix sg sd_mod ata_generic libata bnx2 ehci_hcd uhci_hcdusbcore mptsas mptscsih mptbase scsi_transport_sas scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirrordm_snapshot dm_mod thermal processor fan fuse
> May 18 10:59:37 ifvm1 kernel: [9791101.108249] Pid: 6693, comm: cqueue/1 Not tainted 2.6.24-23-xen #1
> May 18 10:59:37 ifvm1 kernel: [9791101.108253] RIP: e030:[<ffffffff8029f130>]  [<ffffffff8029f130>] fput+0x0/0x20
> May 18 10:59:37 ifvm1 kernel: [9791101.108258] RSP: e02b:ffff88001d7c9dc8  EFLAGS: 00010202
> May 18 10:59:37 ifvm1 kernel: [9791101.108261] RAX: 0000000000000041 RBX: 0000000000000005 RCX: 0000000000000001
> May 18 10:59:37 ifvm1 kernel: [9791101.108264] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000100000004
> May 18 10:59:37 ifvm1 kernel: [9791101.108268] RBP: ffff88001f672800 R08: 000015d2cd1350db R09: 0000000000000000
> May 18 10:59:37 ifvm1 kernel: [9791101.108271] R10: ffff880001ce6fe0 R11: ffffffff80217eb0 R12: ffff880011a93000
> May 18 10:59:37 ifvm1 kernel: [9791101.108274] R13: 000000000000007c R14: ffff880011a93000 R15: ffff88000aefb354
> May 18 10:59:37 ifvm1 kernel: [9791101.108280] FS: 00007fc3f762c6e0(0000) GS:ffffffff805c7080(0000) knlGS:0000000000000000
> May 18 10:59:37 ifvm1 kernel: [9791101.108283] CS:  e033 DS: 0000 ES: 0000
> May 18 10:59:37 ifvm1 kernel: [9791101.108287] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> May 18 10:59:37 ifvm1 kernel: [9791101.108290] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> May 18 10:59:38 ifvm1 kernel: [9791101.108293] Process cqueue/1 (pid: 6693, threadinfo ffff88001d7c8000, task ffff8800016cc800)
> May 18 10:59:38 ifvm1 kernel: [9791101.108297] Stack:  ffffffff88419168 ffff88001d7c9e60 0000000000000000 ffffffff8061cf80
> May 18 10:59:38 ifvm1 kernel: [9791101.108303]  ffffffff8061cf80 ffffffff8061c0c0 ffffffff8061cf80 0000000000000000
> May 18 10:59:38 ifvm1 kernel: [9791101.108309]  000000011f672800 00000000000000d0 0000000000000030 ffff88001c213410
> May 18 10:59:38 ifvm1 kernel: [9791101.108313] Call Trace:
> May 18 10:59:38 ifvm1 kernel: [9791101.108327]  [<ffffffff88419168>] :drbd:drbd_nl_disk_conf+0xb8/0xf30
> May 18 10:59:38 ifvm1 kernel: [9791101.108346]  [<ffffffff88418bcc>] :drbd:drbd_connector_callback+0x11c/0x210
> May 18 10:59:38 ifvm1 kernel: [9791101.108354] [cn:cn_queue_wrapper+0x0/0x30] :cn:cn_queue_wrapper+0x0/0x30
> May 18 10:59:38 ifvm1 kernel: [9791101.108360] [cn:cn_queue_wrapper+0xf/0x30] :cn:cn_queue_wrapper+0xf/0x30
> May 18 10:59:38 ifvm1 kernel: [9791101.108367] [run_workqueue+0xb2/0x190] run_workqueue+0xb2/0x190
> May 18 10:59:38 ifvm1 kernel: [9791101.108373] [worker_thread+0x0/0x110] worker_thread+0x0/0x110
> May 18 10:59:38 ifvm1 kernel: [9791101.108378] [worker_thread+0xa3/0x110] worker_thread+0xa3/0x110
> May 18 10:59:38 ifvm1 kernel: [9791101.108384]  [<ffffffff8024cc80>] autoremove_wake_function+0x0/0x30
> May 18 10:59:38 ifvm1 kernel: [9791101.108390] [worker_thread+0x0/0x110] worker_thread+0x0/0x110
> May 18 10:59:38 ifvm1 kernel: [9791101.108395] [worker_thread+0x0/0x110] worker_thread+0x0/0x110
> May 18 10:59:38 ifvm1 kernel: [9791101.108399]  [kthread+0x4b/0x80] kthread+0x4b/0x80
> May 18 10:59:38 ifvm1 kernel: [9791101.108405]  [child_rip+0xa/0x12] child_rip+0xa/0x12
> May 18 10:59:38 ifvm1 kernel: [9791101.108413] [xen_send_IPI_mask+0x0/0x110] xen_send_IPI_mask+0x0/0x110
> May 18 10:59:38 ifvm1 kernel: [9791101.108420]  [kthread+0x0/0x80] kthread+0x0/0x80
> May 18 10:59:38 ifvm1 kernel: [9791101.108424]  [child_rip+0x0/0x12] child_rip+0x0/0x12
> May 18 10:59:38 ifvm1 kernel: [9791101.108429]
> May 18 10:59:38 ifvm1 kernel: [9791101.108430]
> May 18 10:59:38 ifvm1 kernel: [9791101.108431] Code: f0 ff 4f 28 0f 94 c0 84 c0 75 05 f3 c3 0f 1f 00 e9 cb fb ff
> May 18 10:59:38 ifvm1 kernel: [9791101.108445] RIP  [<ffffffff8029f130>] fput+0x0/0x20
> May 18 10:59:38 ifvm1 kernel: [9791101.108449]  RSP <ffff88001d7c9dc8>
> May 18 10:59:38 ifvm1 kernel: [9791101.108451] CR2: 000000010000002c
> May 18 10:59:38 ifvm1 kernel: [9791101.109019] ---[ end trace e8e64f3da06e3ef3 ]---
> 
> then drbd was dead, the other running drbds were still running (the
> kernels threads were running ok) but I could not change any
> configuration with drbdadm, I had to forcibly reboot this node to get it
> back in right order.

Thanks for the report.
I think this has been fixed in current git already,
though I'm not sure it is the exact same thing.
we now do more failure injection tests,
to catch more of these things.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Drbd-dev] [DRBD-user] drbd 8.3.1 OOPS
@ 2009-05-19 10:30 Mickael Marchand
  2009-05-19 10:27 ` Lars Ellenberg
  0 siblings, 1 reply; 2+ messages in thread
From: Mickael Marchand @ 2009-05-19 10:30 UTC (permalink / raw)
  To: drbd-dev

Hi,

I have a dual node Xen/drbd cluster that got a problem last week-end,
running 2.6.24-23-xen from ubuntu with self-compiled drbd 8.3.1.

for some reason a SAS disk got kicked by its controller,
drbd properly detected it :
May 17 07:44:20 ifvm1 kernel: [9678143.435599] scsi 0:0:3:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
May 17 07:44:20 ifvm1 kernel: [9678143.435604] end_request: I/O error, dev sdd, sector 255302231
May 17 07:44:20 ifvm1 kernel: [9678143.435609] drbd1: Method to ensure write ordering: flush
May 17 07:44:20 ifvm1 kernel: [9678143.435622] drbd1: disk( UpToDate -> Failed )
May 17 07:44:20 ifvm1 kernel: [9678143.435625] drbd1: Local IO failed.  Detaching...
May 17 07:44:20 ifvm1 kernel: [9692200.309651] drbd1: disk( Failed -> Diskless )
May 17 07:44:20 ifvm1 kernel: [9692200.309675] drbd1: Notified peer that my disk is broken.

that drbd was secondary on that host, the primary was still running fine on
the other node so I left it untouched till monday when I removed the
failing drive from the server.
Adding a working drive in the server, I wanted to attach this drive to
drbd but this failed.

I am not sure which exact command I used at this time, probably "drbdadm
attach r1" which gave :
May 18 10:59:00 ifvm1 kernel: [9791064.037185] drbd1: drbd_nl_disk_conf: mdev->bc not NULL.

so I tried to down this drbd and it OOPs-ed :

May 18 10:59:37 ifvm1 kernel: [9791101.108093] drbd1: drbd_nl_disk_conf: mdev->bc not NULL.
May 18 10:59:37 ifvm1 kernel: [9791101.108127] Unable to handle kernel paging request at 000000010000002c RIP:
May 18 10:59:37 ifvm1 kernel: [9791101.108138]  [<ffffffff8029f130>] fput+0x0/0x20
May 18 10:59:37 ifvm1 kernel: [9791101.108150] PGD 9b30067 PUD 0
May 18 10:59:37 ifvm1 kernel: [9791101.108154] Oops: 0002 [1] SMP
May 18 10:59:37 ifvm1 kernel: [9791101.108157] CPU 1
May 18 10:59:37 ifvm1 kernel: [9791101.108159] Modules linked in: drbd af_packet xt_physdev ipt_LOG xt_state xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack bridge 8021q sbs dock sbshc ac battery container video output iptable_filter ip_tables x_tables cn coretemp ipmi_devintf ipmi_si ipmi_watchdog ipmi_poweroff ipmi_msghandler parport_pc lp parport loop ipv6 megaraid_sas psmouse serio_raw iTCO_wdt evdev dcdbas iTCO_vendor_support 8250_pnp button pcspkr i5000_edac edac_core shpchp pci_hotplug 8250 serial_core ext3 jbd mbcache sr_mod cdrom pata_acpi ata_piix sg sd_mod ata_generic libata bnx2 ehci_hcd uhci_hcdusbcore mptsas mptscsih mptbase scsi_transport_sas scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirrordm_snapshot dm_mod thermal processor fan fuse
May 18 10:59:37 ifvm1 kernel: [9791101.108249] Pid: 6693, comm: cqueue/1 Not tainted 2.6.24-23-xen #1
May 18 10:59:37 ifvm1 kernel: [9791101.108253] RIP: e030:[<ffffffff8029f130>]  [<ffffffff8029f130>] fput+0x0/0x20
May 18 10:59:37 ifvm1 kernel: [9791101.108258] RSP: e02b:ffff88001d7c9dc8  EFLAGS: 00010202
May 18 10:59:37 ifvm1 kernel: [9791101.108261] RAX: 0000000000000041 RBX: 0000000000000005 RCX: 0000000000000001
May 18 10:59:37 ifvm1 kernel: [9791101.108264] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000100000004
May 18 10:59:37 ifvm1 kernel: [9791101.108268] RBP: ffff88001f672800 R08: 000015d2cd1350db R09: 0000000000000000
May 18 10:59:37 ifvm1 kernel: [9791101.108271] R10: ffff880001ce6fe0 R11: ffffffff80217eb0 R12: ffff880011a93000
May 18 10:59:37 ifvm1 kernel: [9791101.108274] R13: 000000000000007c R14: ffff880011a93000 R15: ffff88000aefb354
May 18 10:59:37 ifvm1 kernel: [9791101.108280] FS: 00007fc3f762c6e0(0000) GS:ffffffff805c7080(0000) knlGS:0000000000000000
May 18 10:59:37 ifvm1 kernel: [9791101.108283] CS:  e033 DS: 0000 ES: 0000
May 18 10:59:37 ifvm1 kernel: [9791101.108287] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 18 10:59:37 ifvm1 kernel: [9791101.108290] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 18 10:59:38 ifvm1 kernel: [9791101.108293] Process cqueue/1 (pid: 6693, threadinfo ffff88001d7c8000, task ffff8800016cc800)
May 18 10:59:38 ifvm1 kernel: [9791101.108297] Stack:  ffffffff88419168 ffff88001d7c9e60 0000000000000000 ffffffff8061cf80
May 18 10:59:38 ifvm1 kernel: [9791101.108303]  ffffffff8061cf80 ffffffff8061c0c0 ffffffff8061cf80 0000000000000000
May 18 10:59:38 ifvm1 kernel: [9791101.108309]  000000011f672800 00000000000000d0 0000000000000030 ffff88001c213410
May 18 10:59:38 ifvm1 kernel: [9791101.108313] Call Trace:
May 18 10:59:38 ifvm1 kernel: [9791101.108327]  [<ffffffff88419168>] :drbd:drbd_nl_disk_conf+0xb8/0xf30
May 18 10:59:38 ifvm1 kernel: [9791101.108346]  [<ffffffff88418bcc>] :drbd:drbd_connector_callback+0x11c/0x210
May 18 10:59:38 ifvm1 kernel: [9791101.108354] [cn:cn_queue_wrapper+0x0/0x30] :cn:cn_queue_wrapper+0x0/0x30
May 18 10:59:38 ifvm1 kernel: [9791101.108360] [cn:cn_queue_wrapper+0xf/0x30] :cn:cn_queue_wrapper+0xf/0x30
May 18 10:59:38 ifvm1 kernel: [9791101.108367] [run_workqueue+0xb2/0x190] run_workqueue+0xb2/0x190
May 18 10:59:38 ifvm1 kernel: [9791101.108373] [worker_thread+0x0/0x110] worker_thread+0x0/0x110
May 18 10:59:38 ifvm1 kernel: [9791101.108378] [worker_thread+0xa3/0x110] worker_thread+0xa3/0x110
May 18 10:59:38 ifvm1 kernel: [9791101.108384]  [<ffffffff8024cc80>] autoremove_wake_function+0x0/0x30
May 18 10:59:38 ifvm1 kernel: [9791101.108390] [worker_thread+0x0/0x110] worker_thread+0x0/0x110
May 18 10:59:38 ifvm1 kernel: [9791101.108395] [worker_thread+0x0/0x110] worker_thread+0x0/0x110
May 18 10:59:38 ifvm1 kernel: [9791101.108399]  [kthread+0x4b/0x80] kthread+0x4b/0x80
May 18 10:59:38 ifvm1 kernel: [9791101.108405]  [child_rip+0xa/0x12] child_rip+0xa/0x12
May 18 10:59:38 ifvm1 kernel: [9791101.108413] [xen_send_IPI_mask+0x0/0x110] xen_send_IPI_mask+0x0/0x110
May 18 10:59:38 ifvm1 kernel: [9791101.108420]  [kthread+0x0/0x80] kthread+0x0/0x80
May 18 10:59:38 ifvm1 kernel: [9791101.108424]  [child_rip+0x0/0x12] child_rip+0x0/0x12
May 18 10:59:38 ifvm1 kernel: [9791101.108429]
May 18 10:59:38 ifvm1 kernel: [9791101.108430]
May 18 10:59:38 ifvm1 kernel: [9791101.108431] Code: f0 ff 4f 28 0f 94 c0 84 c0 75 05 f3 c3 0f 1f 00 e9 cb fb ff
May 18 10:59:38 ifvm1 kernel: [9791101.108445] RIP  [<ffffffff8029f130>] fput+0x0/0x20
May 18 10:59:38 ifvm1 kernel: [9791101.108449]  RSP <ffff88001d7c9dc8>
May 18 10:59:38 ifvm1 kernel: [9791101.108451] CR2: 000000010000002c
May 18 10:59:38 ifvm1 kernel: [9791101.109019] ---[ end trace e8e64f3da06e3ef3 ]---

then drbd was dead, the other running drbds were still running (the
kernels threads were running ok) but I could not change any
configuration with drbdadm, I had to forcibly reboot this node to get it
back in right order.

Cheers,
Mik
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2009-05-19 10:30 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-19 10:30 [Drbd-dev] [DRBD-user] drbd 8.3.1 OOPS Mickael Marchand
2009-05-19 10:27 ` Lars Ellenberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.