Kernel OOPs after RAID10 assemble

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Kernel OOPs after RAID10 assemble
       [not found] <C8B40B93E2024230AB92E0B360028997@MoshePC>
@ 2011-09-11 13:07 ` Moshe Melnikov
  2011-09-12  4:05   ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Moshe Melnikov @ 2011-09-11 13:07 UTC (permalink / raw)
  To: linux-raid

Hi,

I created RAID10 from  4 disks
“mdadm --create 
/dev/md1 --raid-devices=4 --chunk=64 --level=raid10 --layout=n2 --bitmap=internal 
 --name=1 --run --auto=md --metadata=1.2 --homehost=zadara_vc --verbose 
/dev/dm-0 /dev/dm-1 /dev/dm-2 /dev/dm-3”.
Then I failed all 4 disks by injecting I/O errors. MD marked all except 
/dev/dm-2 as “faulty”.
I removed 3 disks and re-added them.
“mdadm /dev/md1 --remove /dev/dm-0 /dev/dm-1 /dev/dm-2”
“mdadm /dev/md1 –re-add /dev/dm-0 /dev/dm-1 /dev/dm-2”
The 3 disks are still marked as missing.
I Stopped raid “mdadm –-stop /dev/md1”
Assembled it again. “mdadm --assemble 
/dev/md1 --name=1 --config=none --homehost=zadara_vc --run --auto=md --verbose 
/dev/dm-0 /dev/dm-1 /dev/dm-2 /dev/dm-3”
After that I had kernel oops.Below is syslog

Sep 11 14:31:42 vc-0-0-6-01 kernel: [ 4024.417773] Buffer I/O error on 
device md1, logical block 0
Sep 11 14:32:29 vc-0-0-6-01 mdadm[884]: DeviceDisappeared event detected on 
md device /dev/md1
Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613012] md1: detected capacity 
change from 2147352576 to 0
Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613019] md: md1 stopped.
Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613027] md: unbind<dm-3>
Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613032] md: export_rdev(dm-3)
Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613038] md: unbind<dm-1>
Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613041] md: export_rdev(dm-1)
Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613046] md: unbind<dm-0>
Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613049] md: export_rdev(dm-0)
Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613053] md: unbind<dm-2>
Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613056] md: export_rdev(dm-2)
Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.583968] md: md1 stopped.
Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.591469] md: bind<dm-0>
Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.591822] md: bind<dm-1>
Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.592109] md: bind<dm-3>
Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.592355] md: bind<dm-2>
Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.600692] md/raid10:md1: not enough 
operational mirrors.
Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.601459] md: pers->run() failed 
...
Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452226] md: md1 stopped.
Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452235] md: unbind<dm-2>
Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452242] md: export_rdev(dm-2)
Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452274] md: unbind<dm-3>
Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452278] md: export_rdev(dm-3)
Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452297] md: unbind<dm-1>
Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452301] md: export_rdev(dm-1)
Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452319] md: unbind<dm-0>
Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452323] md: export_rdev(dm-0)
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.073655] md: md1 stopped.
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081092] md: bind<dm-0>
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081412] md: bind<dm-1>
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081739] md: bind<dm-3>
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081991] md: bind<dm-2>
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.090382] md/raid10:md1: not enough 
operational mirrors.
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.091194] md: pers->run() failed 
...
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.276215] BUG: unable to handle 
kernel NULL pointer dereference at           (null)
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.276982] IP: [<          (null)>] 
(null)
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.277728] PGD b7433067 PUD b75e2067 
PMD 0
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.278464] Oops: 0010 [#1] SMP
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.279202] last sysfs file: 
/sys/module/raid10/initstate
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.279966] CPU 0
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.279987] Modules linked in: 
dm_iostat iscsi_scst scst_vdisk libcrc32c scst ppdev ib_iser rdma_cm ib_cm 
iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi parport_pc nfsd psmouse exportfs nfs lockd fscache 
nfs_acl serio_raw auth_rpcgss sunrpc i2c_piix4 lp parport floppy raid10 
raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq 
async_tx raid1 raid0 multipath linear
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285078]
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Pid: 4576, comm: md_stat 
Not tainted 2.6.38-8-server #42-Ubuntu Bochs Bochs
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RIP: 
0010:[<0000000000000000>]  [<          (null)>]           (null)
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RSP: 
0018:ffff8800b630fd00  EFLAGS: 00010096
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RAX: ffff880037383de8 
RBX: ffff8800b8e1f8e8 RCX: 0000000000000000
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RDX: 0000000000000000 
RSI: 0000000000000003 RDI: ffff880037383de8
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RBP: ffff8800b630fd48 
R08: 0000000000000000 R09: 0000000000000000
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] R10: 0000000000000004 
R11: 0000000000000000 R12: 0000000000000000
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] R13: ffff8800b7b7b298 
R14: 0000000000000000 R15: 0000000000000000
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] FS: 
00007f8af77ef720(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] CS:  0010 DS: 0000 ES: 
0000 CR0: 0000000080050033
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] CR2: 0000000000000000 
CR3: 00000000b75cc000 CR4: 00000000000006f0
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] DR0: 0000000000000000 
DR1: 0000000000000000 DR2: 0000000000000000
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] DR3: 0000000000000000 
DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Process md_stat (pid: 
4576, threadinfo ffff8800b630e000, task ffff8800b55e44a0)
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Stack:
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  ffffffff8104bb39 
ffffea000280ee88 0000000300000001 ffff8800b630fd28
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  ffff8800b7b7b290 
0000000000000282 0000000000000003 0000000000000001
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  0000000000000000 
ffff8800b630fd88 ffffffff8104e4b8 0000000200000001
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Call Trace:
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8104bb39>] ? 
__wake_up_common+0x59/0x90
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8104e4b8>] 
__wake_up+0x48/0x70
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81489478>] 
md_wakeup_thread+0x28/0x30
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8148a96f>] 
mddev_unlock+0x7f/0xd0
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81495068>] 
md_ioctl+0x2b8/0x720
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8113135d>] ? 
handle_mm_fault+0x16d/0x250
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff812c8cb0>] 
blkdev_ioctl+0x230/0x720
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81198261>] 
block_ioctl+0x41/0x50
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8117680f>] 
do_vfs_ioctl+0x8f/0x320
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8116fd85>] ? 
putname+0x35/0x50
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81176b31>] 
sys_ioctl+0x91/0xa0
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8100bfc2>] 
system_call_fastpath+0x16/0x1b
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Code:  Bad RIP value.
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RIP  [<          (null)>] 
(null)
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  RSP <ffff8800b630fd00>
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] CR2: 0000000000000000
Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] ---[ end trace 
66d7ffb11044dd44 ]---

Thanks,
Moshe Melnikov 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Kernel OOPs after RAID10 assemble
  2011-09-11 13:07 ` Kernel OOPs after RAID10 assemble Moshe Melnikov
@ 2011-09-12  4:05   ` NeilBrown
  2011-09-12  5:33     ` Moshe Melnikov
  0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2011-09-12  4:05 UTC (permalink / raw)
  To: Moshe Melnikov; +Cc: linux-raid

On Sun, 11 Sep 2011 16:07:55 +0300 "Moshe Melnikov" <moshe@zadarastorage.com>
wrote:

> Hi,
> 
> I created RAID10 from  4 disks
> “mdadm --create 
> /dev/md1 --raid-devices=4 --chunk=64 --level=raid10 --layout=n2 --bitmap=internal 
>  --name=1 --run --auto=md --metadata=1.2 --homehost=zadara_vc --verbose 
> /dev/dm-0 /dev/dm-1 /dev/dm-2 /dev/dm-3”.
> Then I failed all 4 disks by injecting I/O errors. MD marked all except 
> /dev/dm-2 as “faulty”.
> I removed 3 disks and re-added them.
> “mdadm /dev/md1 --remove /dev/dm-0 /dev/dm-1 /dev/dm-2”
> “mdadm /dev/md1 –re-add /dev/dm-0 /dev/dm-1 /dev/dm-2”
> The 3 disks are still marked as missing.
> I Stopped raid “mdadm –-stop /dev/md1”
> Assembled it again. “mdadm --assemble 
> /dev/md1 --name=1 --config=none --homehost=zadara_vc --run --auto=md --verbose 
> /dev/dm-0 /dev/dm-1 /dev/dm-2 /dev/dm-3”
> After that I had kernel oops.Below is syslog

Thanks for the report.

Pity your mailer wrapper all the long lines, but I'm getting used to that :-(

Is the reproducible?  Would you be able to test a patch?

I think it would be almost enough to do

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 747d061..ec35b64 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2413,12 +2413,13 @@ out:
 static int stop(mddev_t *mddev)
 {
 	conf_t *conf = mddev->private;
+	mdk_thread_t *th = mddev->thread;
 
 	raise_barrier(conf, 0);
 	lower_barrier(conf);
 
-	md_unregister_thread(mddev->thread);
 	mddev->thread = NULL;
+	md_unregister_thread(th);
 	blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
 	if (conf->r10bio_pool)
 		mempool_destroy(conf->r10bio_pool);

though it really needs some locking with calls the md_wakeup_thread too, but
there is currently no lock that would be easy to use, so I would have to add
a lock and export it.  Particularly the call to md_wakeup_thread in
mddev_unlock is racing with this I think.

If it happens reliable with your current kernel, and you can test the above
patch and it happens significantly less, that would be useful to know.
However I'm fairly sure this is the problem so I create a proper fix for
mainline.

Thanks,
NeilBrown


> 
> Sep 11 14:31:42 vc-0-0-6-01 kernel: [ 4024.417773] Buffer I/O error on 
> device md1, logical block 0
> Sep 11 14:32:29 vc-0-0-6-01 mdadm[884]: DeviceDisappeared event detected on 
> md device /dev/md1
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613012] md1: detected capacity 
> change from 2147352576 to 0
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613019] md: md1 stopped.
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613027] md: unbind<dm-3>
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613032] md: export_rdev(dm-3)
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613038] md: unbind<dm-1>
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613041] md: export_rdev(dm-1)
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613046] md: unbind<dm-0>
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613049] md: export_rdev(dm-0)
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613053] md: unbind<dm-2>
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613056] md: export_rdev(dm-2)
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.583968] md: md1 stopped.
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.591469] md: bind<dm-0>
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.591822] md: bind<dm-1>
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.592109] md: bind<dm-3>
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.592355] md: bind<dm-2>
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.600692] md/raid10:md1: not enough 
> operational mirrors.
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.601459] md: pers->run() failed 
> ...
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452226] md: md1 stopped.
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452235] md: unbind<dm-2>
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452242] md: export_rdev(dm-2)
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452274] md: unbind<dm-3>
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452278] md: export_rdev(dm-3)
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452297] md: unbind<dm-1>
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452301] md: export_rdev(dm-1)
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452319] md: unbind<dm-0>
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452323] md: export_rdev(dm-0)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.073655] md: md1 stopped.
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081092] md: bind<dm-0>
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081412] md: bind<dm-1>
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081739] md: bind<dm-3>
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081991] md: bind<dm-2>
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.090382] md/raid10:md1: not enough 
> operational mirrors.
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.091194] md: pers->run() failed 
> ...
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.276215] BUG: unable to handle 
> kernel NULL pointer dereference at           (null)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.276982] IP: [<          (null)>] 
> (null)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.277728] PGD b7433067 PUD b75e2067 
> PMD 0
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.278464] Oops: 0010 [#1] SMP
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.279202] last sysfs file: 
> /sys/module/raid10/initstate
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.279966] CPU 0
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.279987] Modules linked in: 
> dm_iostat iscsi_scst scst_vdisk libcrc32c scst ppdev ib_iser rdma_cm ib_cm 
> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi 
> scsi_transport_iscsi parport_pc nfsd psmouse exportfs nfs lockd fscache 
> nfs_acl serio_raw auth_rpcgss sunrpc i2c_piix4 lp parport floppy raid10 
> raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq 
> async_tx raid1 raid0 multipath linear
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285078]
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Pid: 4576, comm: md_stat 
> Not tainted 2.6.38-8-server #42-Ubuntu Bochs Bochs
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RIP: 
> 0010:[<0000000000000000>]  [<          (null)>]           (null)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RSP: 
> 0018:ffff8800b630fd00  EFLAGS: 00010096
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RAX: ffff880037383de8 
> RBX: ffff8800b8e1f8e8 RCX: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RDX: 0000000000000000 
> RSI: 0000000000000003 RDI: ffff880037383de8
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RBP: ffff8800b630fd48 
> R08: 0000000000000000 R09: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] R10: 0000000000000004 
> R11: 0000000000000000 R12: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] R13: ffff8800b7b7b298 
> R14: 0000000000000000 R15: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] FS: 
> 00007f8af77ef720(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] CS:  0010 DS: 0000 ES: 
> 0000 CR0: 0000000080050033
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] CR2: 0000000000000000 
> CR3: 00000000b75cc000 CR4: 00000000000006f0
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] DR0: 0000000000000000 
> DR1: 0000000000000000 DR2: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] DR3: 0000000000000000 
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Process md_stat (pid: 
> 4576, threadinfo ffff8800b630e000, task ffff8800b55e44a0)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Stack:
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  ffffffff8104bb39 
> ffffea000280ee88 0000000300000001 ffff8800b630fd28
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  ffff8800b7b7b290 
> 0000000000000282 0000000000000003 0000000000000001
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  0000000000000000 
> ffff8800b630fd88 ffffffff8104e4b8 0000000200000001
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Call Trace:
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8104bb39>] ? 
> __wake_up_common+0x59/0x90
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8104e4b8>] 
> __wake_up+0x48/0x70
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81489478>] 
> md_wakeup_thread+0x28/0x30
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8148a96f>] 
> mddev_unlock+0x7f/0xd0
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81495068>] 
> md_ioctl+0x2b8/0x720
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8113135d>] ? 
> handle_mm_fault+0x16d/0x250
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff812c8cb0>] 
> blkdev_ioctl+0x230/0x720
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81198261>] 
> block_ioctl+0x41/0x50
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8117680f>] 
> do_vfs_ioctl+0x8f/0x320
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8116fd85>] ? 
> putname+0x35/0x50
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81176b31>] 
> sys_ioctl+0x91/0xa0
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8100bfc2>] 
> system_call_fastpath+0x16/0x1b
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Code:  Bad RIP value.
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RIP  [<          (null)>] 
> (null)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  RSP <ffff8800b630fd00>
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] CR2: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] ---[ end trace 
> 66d7ffb11044dd44 ]---
> 
> Thanks,
> Moshe Melnikov 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Kernel OOPs after RAID10 assemble
  2011-09-12  4:05   ` NeilBrown
@ 2011-09-12  5:33     ` Moshe Melnikov
  2011-09-21  5:32       ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Moshe Melnikov @ 2011-09-12  5:33 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

I can reproduce it very easily.

I don't know how to apply this patch. I have very limited knowledge in Linux 
kernel.

Thanks,
Moshe

-----Original Message----- 
From: NeilBrown
Sent: Monday, September 12, 2011 7:05 AM
To: Moshe Melnikov
Cc: linux-raid@vger.kernel.org
Subject: Re: Kernel OOPs after RAID10 assemble

On Sun, 11 Sep 2011 16:07:55 +0300 "Moshe Melnikov" 
<moshe@zadarastorage.com>
wrote:

> Hi,
>
> I created RAID10 from  4 disks
> “mdadm --create
> /dev/md1 --raid-devices=4 --chunk=64 --level=raid10 --layout=n2 --bitmap=internal
>  --name=1 --run --auto=md --metadata=1.2e --homehost=zadara_vc --verbose
> /dev/dm-0 /dev/dm-1 /dev/dm-2 /dev/dm-3”.
> Then I failed all 4 disks by injecting I/O errors. MD marked all except
> /dev/dm-2 as “faulty”.
> I removed 3 disks and re-added them.
> “mdadm /dev/md1 --remove /dev/dm-0 /dev/dm-1 /dev/dm-2”
> “mdadm /dev/md1 –re-add /dev/dm-0 /dev/dm-1 /dev/dm-2”
> The 3 disks are still marked as missing.
> I Stopped raid “mdadm –-stop /dev/md1”
> Assembled it again. “mdadm --assemble
> /dev/md1 --name=1 --config=none --homehost=zadara_vc --run --auto=md --verbose
> /dev/dm-0 /dev/dm-1 /dev/dm-2 /dev/dm-3”
> After that I had kernel oops.Below is syslog

Thanks for the report.

Pity your mailer wrapper all the long lines, but I'm getting used to that 
:-(

Is the reproducible?  Would you be able to test a patch?

I think it would be almost enough to do

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 747d061..ec35b64 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2413,12 +2413,13 @@ out:
static int stop(mddev_t *mddev)
{
  conf_t *conf = mddev->private;
+ mdk_thread_t *th = mddev->thread;

  raise_barrier(conf, 0);
  lower_barrier(conf);

- md_unregister_thread(mddev->thread);
  mddev->thread = NULL;
+ md_unregister_thread(th);
  blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
  if (conf->r10bio_pool)
  mempool_destroy(conf->r10bio_pool);

though it really needs some locking with calls the md_wakeup_thread too, but
there is currently no lock that would be easy to use, so I would have to add
a lock and export it.  Particularly the call to md_wakeup_thread in
mddev_unlock is racing with this I think.

If it happens reliable with your current kernel, and you can test the above
patch and it happens significantly less, that would be useful to know.
However I'm fairly sure this is the problem so I create a proper fix for
mainline.

Thanks,
NeilBrown


>
> Sep 11 14:31:42 vc-0-0-6-01 kernel: [ 4024.417773] Buffer I/O error on
> device md1, logical block 0
> Sep 11 14:32:29 vc-0-0-6-01 mdadm[884]: DeviceDisappeared event detected 
> on
> md device /dev/md1
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613012] md1: detected capacity
> change from 2147352576 to 0
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613019] md: md1 stopped.
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613027] md: unbind<dm-3>
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613032] md: export_rdev(dm-3)
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613038] md: unbind<dm-1>
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613041] md: export_rdev(dm-1)
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613046] md: unbind<dm-0>
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613049] md: export_rdev(dm-0)
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613053] md: unbind<dm-2>
> Sep 11 14:32:29 vc-0-0-6-01 kernel: [ 4071.613056] md: export_rdev(dm-2)
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.583968] md: md1 stopped.
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.591469] md: bind<dm-0>
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.591822] md: bind<dm-1>
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.592109] md: bind<dm-3>
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.592355] md: bind<dm-2>
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.600692] md/raid10:md1: not 
> enough
> operational mirrors.
> Sep 11 14:33:07 vc-0-0-6-01 kernel: [ 4109.601459] md: pers->run() failed
> ...
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452226] md: md1 stopped.
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452235] md: unbind<dm-2>
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452242] md: export_rdev(dm-2)
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452274] md: unbind<dm-3>
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452278] md: export_rdev(dm-3)
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452297] md: unbind<dm-1>
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452301] md: export_rdev(dm-1)
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452319] md: unbind<dm-0>
> Sep 11 14:34:05 vc-0-0-6-01 kernel: [ 4167.452323] md: export_rdev(dm-0)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.073655] md: md1 stopped.
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081092] md: bind<dm-0>
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081412] md: bind<dm-1>
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081739] md: bind<dm-3>
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.081991] md: bind<dm-2>
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.090382] md/raid10:md1: not 
> enough
> operational mirrors.
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.091194] md: pers->run() failed
> ...
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.276215] BUG: unable to handle
> kernel NULL pointer dereference at           (null)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.276982] IP: [< 
> (null)>]
> (null)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.277728] PGD b7433067 PUD 
> b75e2067
> PMD 0
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.278464] Oops: 0010 [#1] SMP
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.279202] last sysfs file:
> /sys/module/raid10/initstate
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.279966] CPU 0
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.279987] Modules linked in:
> dm_iostat iscsi_scst scst_vdisk libcrc32c scst ppdev ib_iser rdma_cm ib_cm
> iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi
> scsi_transport_iscsi parport_pc nfsd psmouse exportfs nfs lockd fscache
> nfs_acl serio_raw auth_rpcgss sunrpc i2c_piix4 lp parport floppy raid10
> raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq
> async_tx raid1 raid0 multipath linear
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285078]
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Pid: 4576, comm: 
> md_stat
> Not tainted 2.6.38-8-server #42-Ubuntu Bochs Bochs
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RIP:
> 0010:[<0000000000000000>]  [<          (null)>]           (null)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RSP:
> 0018:ffff8800b630fd00  EFLAGS: 00010096
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RAX: ffff880037383de8
> RBX: ffff8800b8e1f8e8 RCX: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RDX: 0000000000000000
> RSI: 0000000000000003 RDI: ffff880037383de8
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RBP: ffff8800b630fd48
> R08: 0000000000000000 R09: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] R10: 0000000000000004
> R11: 0000000000000000 R12: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] R13: ffff8800b7b7b298
> R14: 0000000000000000 R15: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] FS:
> 00007f8af77ef720(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] CS:  0010 DS: 0000 ES:
> 0000 CR0: 0000000080050033
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] CR2: 0000000000000000
> CR3: 00000000b75cc000 CR4: 00000000000006f0
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] DR0: 0000000000000000
> DR1: 0000000000000000 DR2: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] DR3: 0000000000000000
> DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Process md_stat (pid:
> 4576, threadinfo ffff8800b630e000, task ffff8800b55e44a0)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Stack:
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  ffffffff8104bb39
> ffffea000280ee88 0000000300000001 ffff8800b630fd28
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  ffff8800b7b7b290
> 0000000000000282 0000000000000003 0000000000000001
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  0000000000000000
> ffff8800b630fd88 ffffffff8104e4b8 0000000200000001
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Call Trace:
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8104bb39>] ?
> __wake_up_common+0x59/0x90
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8104e4b8>]
> __wake_up+0x48/0x70
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81489478>]
> md_wakeup_thread+0x28/0x30
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8148a96f>]
> mddev_unlock+0x7f/0xd0
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81495068>]
> md_ioctl+0x2b8/0x720
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8113135d>] ?
> handle_mm_fault+0x16d/0x250
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff812c8cb0>]
> blkdev_ioctl+0x230/0x720
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81198261>]
> block_ioctl+0x41/0x50
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8117680f>]
> do_vfs_ioctl+0x8f/0x320
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8116fd85>] ?
> putname+0x35/0x50
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff81176b31>]
> sys_ioctl+0x91/0xa0
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  [<ffffffff8100bfc2>]
> system_call_fastpath+0x16/0x1b
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] Code:  Bad RIP value.
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] RIP  [< 
> (null)>]
> (null)
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629]  RSP <ffff8800b630fd00>
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] CR2: 0000000000000000
> Sep 11 14:34:14 vc-0-0-6-01 kernel: [ 4176.285629] ---[ end trace
> 66d7ffb11044dd44 ]---
>
> Thanks,
> Moshe Melnikov
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: Kernel OOPs after RAID10 assemble
  2011-09-12  5:33     ` Moshe Melnikov
@ 2011-09-21  5:32       ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2011-09-21  5:32 UTC (permalink / raw)
  To: Moshe Melnikov; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 6826 bytes --]

On Mon, 12 Sep 2011 08:33:31 +0300 "Moshe Melnikov" <moshe@zadarastorage.com>
wrote:

> I can reproduce it very easily.
> 
> I don't know how to apply this patch. I have very limited knowledge in Linux 
> kernel.

If you can't build a kernel there isn't much I can do to help.
If you are using a distro kernel, maybe report the bug and the fix to the
distro and they might release a new kernel in due course.

Thanks for confirming that it is easily reproduced.  That challenged me to
examine the problem again and I see that I was missing something, and can see
exactly how the set of actions you described would cause exactly that crash.

This patch fixes it properly and will be going upstream shortly.

thanks for your help,

NeilBrown

From 01f96c0a9922cd9919baf9d16febdf7016177a12 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Wed, 21 Sep 2011 15:30:20 +1000
Subject: [PATCH] md: Avoid waking up a thread after it has been freed.

Two related problems:

1/ some error paths call "md_unregister_thread(mddev->thread)"
   without subsequently clearing ->thread.  A subsequent call
   to mddev_unlock will try to wake the thread, and crash.

2/ Most calls to md_wakeup_thread are protected against the thread
   disappeared either by:
      - holding the ->mutex
      - having an active request, so something else must be keeping
        the array active.
   However mddev_unlock calls md_wakeup_thread after dropping the
   mutex and without any certainty of an active request, so the
   ->thread could theoretically disappear.
   So we need a spinlock to provide some protections.

So change md_unregister_thread to take a pointer to the thread
pointer, and ensure that it always does the required locking, and
clears the pointer properly.

Reported-by: "Moshe Melnikov" <moshe@zadarastorage.com>
Signed-off-by: NeilBrown <neilb@suse.de>
cc: stable@kernel.org

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5404b22..5c95ccb 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -61,6 +61,11 @@
 static void autostart_arrays(int part);
 #endif
 
+/* pers_list is a list of registered personalities protected
+ * by pers_lock.
+ * pers_lock does extra service to protect accesses to
+ * mddev->thread when the mutex cannot be held.
+ */
 static LIST_HEAD(pers_list);
 static DEFINE_SPINLOCK(pers_lock);
 
@@ -739,7 +744,12 @@ static void mddev_unlock(mddev_t * mddev)
 	} else
 		mutex_unlock(&mddev->reconfig_mutex);
 
+	/* was we've dropped the mutex we need a spinlock to
+	 * make sur the thread doesn't disappear
+	 */
+	spin_lock(&pers_lock);
 	md_wakeup_thread(mddev->thread);
+	spin_unlock(&pers_lock);
 }
 
 static mdk_rdev_t * find_rdev_nr(mddev_t *mddev, int nr)
@@ -6429,11 +6439,18 @@ mdk_thread_t *md_register_thread(void (*run) (mddev_t *), mddev_t *mddev,
 	return thread;
 }
 
-void md_unregister_thread(mdk_thread_t *thread)
+void md_unregister_thread(mdk_thread_t **threadp)
 {
+	mdk_thread_t *thread = *threadp;
 	if (!thread)
 		return;
 	dprintk("interrupting MD-thread pid %d\n", task_pid_nr(thread->tsk));
+	/* Locking ensures that mddev_unlock does not wake_up a
+	 * non-existent thread
+	 */
+	spin_lock(&pers_lock);
+	*threadp = NULL;
+	spin_unlock(&pers_lock);
 
 	kthread_stop(thread->tsk);
 	kfree(thread);
@@ -7340,8 +7357,7 @@ static void reap_sync_thread(mddev_t *mddev)
 	mdk_rdev_t *rdev;
 
 	/* resync has finished, collect result */
-	md_unregister_thread(mddev->sync_thread);
-	mddev->sync_thread = NULL;
+	md_unregister_thread(&mddev->sync_thread);
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
 	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
 		/* success...*/
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 1e586bb..0a309dc 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -560,7 +560,7 @@ extern int register_md_personality(struct mdk_personality *p);
 extern int unregister_md_personality(struct mdk_personality *p);
 extern mdk_thread_t * md_register_thread(void (*run) (mddev_t *mddev),
 				mddev_t *mddev, const char *name);
-extern void md_unregister_thread(mdk_thread_t *thread);
+extern void md_unregister_thread(mdk_thread_t **threadp);
 extern void md_wakeup_thread(mdk_thread_t *thread);
 extern void md_check_recovery(mddev_t *mddev);
 extern void md_write_start(mddev_t *mddev, struct bio *bi);
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 3535c23..d5b5fb3 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -514,8 +514,7 @@ static int multipath_stop (mddev_t *mddev)
 {
 	multipath_conf_t *conf = mddev->private;
 
-	md_unregister_thread(mddev->thread);
-	mddev->thread = NULL;
+	md_unregister_thread(&mddev->thread);
 	blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
 	mempool_destroy(conf->pool);
 	kfree(conf->multipaths);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index f4622dd..d9587df 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2562,8 +2562,7 @@ static int stop(mddev_t *mddev)
 	raise_barrier(conf);
 	lower_barrier(conf);
 
-	md_unregister_thread(mddev->thread);
-	mddev->thread = NULL;
+	md_unregister_thread(&mddev->thread);
 	if (conf->r1bio_pool)
 		mempool_destroy(conf->r1bio_pool);
 	kfree(conf->mirrors);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d7a8468..0cd9672 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2955,7 +2955,7 @@ static int run(mddev_t *mddev)
 	return 0;
 
 out_free_conf:
-	md_unregister_thread(mddev->thread);
+	md_unregister_thread(&mddev->thread);
 	if (conf->r10bio_pool)
 		mempool_destroy(conf->r10bio_pool);
 	safe_put_page(conf->tmppage);
@@ -2973,8 +2973,7 @@ static int stop(mddev_t *mddev)
 	raise_barrier(conf, 0);
 	lower_barrier(conf);
 
-	md_unregister_thread(mddev->thread);
-	mddev->thread = NULL;
+	md_unregister_thread(&mddev->thread);
 	blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/
 	if (conf->r10bio_pool)
 		mempool_destroy(conf->r10bio_pool);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 43709fa..ac5e8b5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4941,8 +4941,7 @@ static int run(mddev_t *mddev)
 
 	return 0;
 abort:
-	md_unregister_thread(mddev->thread);
-	mddev->thread = NULL;
+	md_unregister_thread(&mddev->thread);
 	if (conf) {
 		print_raid5_conf(conf);
 		free_conf(conf);
@@ -4956,8 +4955,7 @@ static int stop(mddev_t *mddev)
 {
 	raid5_conf_t *conf = mddev->private;
 
-	md_unregister_thread(mddev->thread);
-	mddev->thread = NULL;
+	md_unregister_thread(&mddev->thread);
 	if (mddev->queue)
 		mddev->queue->backing_dev_info.congested_fn = NULL;
 	free_conf(conf);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-09-21  5:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <C8B40B93E2024230AB92E0B360028997@MoshePC>
2011-09-11 13:07 ` Kernel OOPs after RAID10 assemble Moshe Melnikov
2011-09-12  4:05   ` NeilBrown
2011-09-12  5:33     ` Moshe Melnikov
2011-09-21  5:32       ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).