sw raid array completely hungs during verify in 2.6.32

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* sw raid array completely hungs during verify in 2.6.32
@ 2010-08-01 10:57 Michael Tokarev
  2010-08-02  3:01 ` Neil Brown
  0 siblings, 1 reply; 2+ messages in thread
From: Michael Tokarev @ 2010-08-01 10:57 UTC (permalink / raw)
  To: linux-raid

Hello.

It is the second time we come across this issue
after switching from 2.6.27 to 2.6.32 about 3
months ago.

At some point, an md-raid10 array hungs - that
is, all the processes that tries to access it,
either read or write, hungs forever.

Here's a typical set of messages found in kern.log:

 INFO: task oracle:7602 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 oracle        D ffff8801a8837148     0  7602      1 0x00000000
  ffffffff813bc480 0000000000000082 0000000000000000 0000000000000001
  ffff8801a8b7fdd8 000000000000e1c8 ffff88003b397fd8 ffff88003f47d840
  ffff88003f47dbe0 000000012416219a ffff88002820e1c8 ffff88003f47dbe0
 Call Trace:
  [<ffffffffa018e8ae>] ? wait_barrier+0xee/0x130 [raid10]
  [<ffffffff8104f570>] ? default_wake_function+0x0/0x10
  [<ffffffffa0191852>] ? make_request+0x82/0x5f0 [raid10]
  [<ffffffffa007cb2c>] ? md_make_request+0xbc/0x130 [md_mod]
  [<ffffffff810c4722>] ? mempool_alloc+0x62/0x140
  [<ffffffff8117d26f>] ? generic_make_request+0x30f/0x410
  [<ffffffff8112eee4>] ? bio_alloc_bioset+0x54/0xf0
  [<ffffffff8112e28b>] ? __bio_add_page+0x12b/0x240
  [<ffffffff8117d3cc>] ? submit_bio+0x5c/0xe0
  [<ffffffff811313da>] ? dio_bio_submit+0x5a/0x90
  [<ffffffff81131d63>] ? __blockdev_direct_IO+0x5a3/0xcd0
  [<ffffffffa01f66ed>] ? xfs_vm_direct_IO+0x11d/0x140 [xfs]
  [<ffffffffa01f6af0>] ? xfs_get_blocks_direct+0x0/0x20 [xfs]
  [<ffffffffa01f6470>] ? xfs_end_io_direct+0x0/0x70 [xfs]
  [<ffffffff810c3738>] ? generic_file_direct_write+0xc8/0x1b0
  [<ffffffffa01fef18>] ? xfs_write+0x458/0x950 [xfs]
  [<ffffffff8106317b>] ? try_to_del_timer_sync+0x9b/0xd0
  [<ffffffff810f9251>] ? cache_alloc_refill+0x221/0x5e0
  [<ffffffffa01fafe0>] ? xfs_file_aio_write+0x0/0x60 [xfs]
  [<ffffffff8113a6ac>] ? aio_rw_vect_retry+0x7c/0x210
  [<ffffffff8113be02>] ? aio_run_iocb+0x82/0x150
  [<ffffffff8113c747>] ? sys_io_submit+0x2b7/0x6b0
  [<ffffffff8100b542>] ? system_call_fastpath+0x16/0x1b

 INFO: task oracle:7654 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 oracle        D ffff8801a8837148     0  7654      1 0x00000000
  ffff8800614ac7c0 0000000000000086 0000000000000000 0000000000000206
  0000000000000000 000000000000e1c8 ffff88018c175fd8 ffff88005c9ba040
  ffff88005c9ba3e0 ffffffff810c4722 000000038c175810 ffff88005c9ba3e0
 Call Trace:
  [<ffffffff810c4722>] ? mempool_alloc+0x62/0x140
  [<ffffffffa018e8ae>] ? wait_barrier+0xee/0x130 [raid10]
  [<ffffffff8104f570>] ? default_wake_function+0x0/0x10
  [<ffffffff8112ddd1>] ? __bio_clone+0x21/0x70
  [<ffffffffa0191852>] ? make_request+0x82/0x5f0 [raid10]
  [<ffffffff8112d765>] ? bio_split+0x25/0x2a0
  [<ffffffffa0191ce1>] ? make_request+0x511/0x5f0 [raid10]
  [<ffffffffa007cb2c>] ? md_make_request+0xbc/0x130 [md_mod]
  [<ffffffff8117d26f>] ? generic_make_request+0x30f/0x410
  [<ffffffff8112da4a>] ? bvec_alloc_bs+0x6a/0x120
  [<ffffffff8117d3cc>] ? submit_bio+0x5c/0xe0
  [<ffffffff811313da>] ? dio_bio_submit+0x5a/0x90
  [<ffffffff81131480>] ? dio_send_cur_page+0x70/0xc0
  [<ffffffff8113151e>] ? submit_page_section+0x4e/0x140
  [<ffffffff8113215a>] ? __blockdev_direct_IO+0x99a/0xcd0
  [<ffffffffa01f666e>] ? xfs_vm_direct_IO+0x9e/0x140 [xfs]
  [<ffffffffa01f6af0>] ? xfs_get_blocks_direct+0x0/0x20 [xfs]
  [<ffffffffa01f6470>] ? xfs_end_io_direct+0x0/0x70 [xfs]
  [<ffffffff810c4357>] ? generic_file_aio_read+0x607/0x620
  [<ffffffffa023fae8>] ? rpc_run_task+0x38/0x80 [sunrpc]
  [<ffffffffa01ff83b>] ? xfs_read+0x11b/0x270 [xfs]
  [<ffffffff81103453>] ? do_sync_read+0xe3/0x130
  [<ffffffff8113c32c>] ? sys_io_getevents+0x39c/0x420
  [<ffffffff810706b0>] ? autoremove_wake_function+0x0/0x30
  [<ffffffff8113adc0>] ? timeout_func+0x0/0x10
  [<ffffffff81104138>] ? vfs_read+0xc8/0x180
  [<ffffffff81104291>] ? sys_pread64+0xa1/0xb0
  [<ffffffff8100c2db>] ? device_not_available+0x1b/0x20
  [<ffffffff8100b542>] ? system_call_fastpath+0x16/0x1b

 INFO: task md11_resync:11976 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 md11_resync   D ffff88017964d140     0 11976      2 0x00000000
  ffff8801af879880 0000000000000046 0000000000000000 0000000000000001
  ffff8801a8b7fdd8 000000000000e1c8 ffff8800577d1fd8 ffff88017964d140
  ffff88017964d4e0 000000012416219a ffff88002828e1c8 ffff88017964d4e0
 Call Trace:
  [<ffffffffa018e696>] ? raise_barrier+0xb6/0x1e0 [raid10]
  [<ffffffff8104f570>] ? default_wake_function+0x0/0x10
  [<ffffffff8103b263>] ? enqueue_task+0x53/0x60
  [<ffffffffa018f525>] ? sync_request+0x715/0xae0 [raid10]
  [<ffffffffa007dc76>] ? md_do_sync+0x606/0xc70 [md_mod]
  [<ffffffff8104ca4a>] ? finish_task_switch+0x3a/0xc0
  [<ffffffffa007ec47>] ? md_thread+0x67/0x140 [md_mod]
  [<ffffffffa007ebe0>] ? md_thread+0x0/0x140 [md_mod]
  [<ffffffff81070376>] ? kthread+0x96/0xb0
  [<ffffffff8100c52a>] ? child_rip+0xa/0x20
  [<ffffffff810702e0>] ? kthread+0x0/0xb0
  [<ffffffff8100c520>] ? child_rip+0x0/0x20

(All 3 processes shown are reported at the same time).
A few more processes are waiting in wait_barrier like the
first mentioned above does.  Note the 3 different places
it is waiting:

 o raise_barrier
 o wait_barrier
 o mempool_alloc called from wait_barrier

the whole thing look suspicious - smells like a deadlock
somewhere.

From this point on, the array is completely dead, with many
processes (like the above) blocked, with no way to umount the
filesystem in question.  Only forced reboot of the system
helps.

This is 2.6.32.15.  I see there were a few patches for md
after that, but it looks like they aren't relevant for this
issue.

Note that this is not a trivially-triggerable problem.  The
array survived several verify rounds (even during current
uptime) without problems.  But today the array had quite some
load during verify.

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: sw raid array completely hungs during verify in 2.6.32
  2010-08-01 10:57 sw raid array completely hungs during verify in 2.6.32 Michael Tokarev
@ 2010-08-02  3:01 ` Neil Brown
  0 siblings, 0 replies; 2+ messages in thread
From: Neil Brown @ 2010-08-02  3:01 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: linux-raid

On Sun, 01 Aug 2010 14:57:56 +0400
Michael Tokarev <mjt@tls.msk.ru> wrote:

> Hello.
> 
> It is the second time we come across this issue
> after switching from 2.6.27 to 2.6.32 about 3
> months ago.
> 
> At some point, an md-raid10 array hungs - that
> is, all the processes that tries to access it,
> either read or write, hungs forever.

Thanks for the report.
This is the same problem that has been reported recently in a thread with
subject "Raid10 device hangs during resync and heavy I/O."

I have just posted a patch which should address it - I will include it here
was well.

Note that you need to be careful when reading the stack traces.  A "?" means
that the function named make not be in the actual call trace - it may just be
an old address that happens to still be on the stack.
In this case, the "mempool_alloc" was stray - nothing was actually blocking
on that.

This is the patch that I have proposed.

Thanks,
NeilBrown



diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 42e64e4..d1d6891 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio)
 		 */
 		bp = bio_split(bio,
 			       chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
+
+		/* Each of these 'make_request' calls will call 'wait_barrier'.
+		 * If the first succeeds but the second blocks due to the resync
+		 * thread raising the barrier, we will deadlock because the
+		 * IO to the underlying device will be queued in generic_make_request
+		 * and will never complete, so will never reduce nr_pending.
+		 * So increment nr_waiting here so no new raise_barriers will
+		 * succeed, and so the second wait_barrier cannot block.
+		 */
+		spin_lock_irq(&conf->resync_lock);
+		conf->nr_waiting++;
+		spin_unlock_irq(&conf->resync_lock);
+
 		if (make_request(mddev, &bp->bio1))
 			generic_make_request(&bp->bio1);
 		if (make_request(mddev, &bp->bio2))
 			generic_make_request(&bp->bio2);
 
+		spin_lock_irq(&conf->resync_lock);
+		conf->nr_waiting--;
+		wake_up(&conf->wait_barrier);
+		spin_unlock_irq(&conf->resync_lock);
+
 		bio_pair_release(bp);
 		return 0;
 	bad_map:


> 
> Here's a typical set of messages found in kern.log:
> 
>  INFO: task oracle:7602 blocked for more than 120 seconds.
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  oracle        D ffff8801a8837148     0  7602      1 0x00000000
>   ffffffff813bc480 0000000000000082 0000000000000000 0000000000000001
>   ffff8801a8b7fdd8 000000000000e1c8 ffff88003b397fd8 ffff88003f47d840
>   ffff88003f47dbe0 000000012416219a ffff88002820e1c8 ffff88003f47dbe0
>  Call Trace:
>   [<ffffffffa018e8ae>] ? wait_barrier+0xee/0x130 [raid10]
>   [<ffffffff8104f570>] ? default_wake_function+0x0/0x10
>   [<ffffffffa0191852>] ? make_request+0x82/0x5f0 [raid10]
>   [<ffffffffa007cb2c>] ? md_make_request+0xbc/0x130 [md_mod]
>   [<ffffffff810c4722>] ? mempool_alloc+0x62/0x140
>   [<ffffffff8117d26f>] ? generic_make_request+0x30f/0x410
>   [<ffffffff8112eee4>] ? bio_alloc_bioset+0x54/0xf0
>   [<ffffffff8112e28b>] ? __bio_add_page+0x12b/0x240
>   [<ffffffff8117d3cc>] ? submit_bio+0x5c/0xe0
>   [<ffffffff811313da>] ? dio_bio_submit+0x5a/0x90
>   [<ffffffff81131d63>] ? __blockdev_direct_IO+0x5a3/0xcd0
>   [<ffffffffa01f66ed>] ? xfs_vm_direct_IO+0x11d/0x140 [xfs]
>   [<ffffffffa01f6af0>] ? xfs_get_blocks_direct+0x0/0x20 [xfs]
>   [<ffffffffa01f6470>] ? xfs_end_io_direct+0x0/0x70 [xfs]
>   [<ffffffff810c3738>] ? generic_file_direct_write+0xc8/0x1b0
>   [<ffffffffa01fef18>] ? xfs_write+0x458/0x950 [xfs]
>   [<ffffffff8106317b>] ? try_to_del_timer_sync+0x9b/0xd0
>   [<ffffffff810f9251>] ? cache_alloc_refill+0x221/0x5e0
>   [<ffffffffa01fafe0>] ? xfs_file_aio_write+0x0/0x60 [xfs]
>   [<ffffffff8113a6ac>] ? aio_rw_vect_retry+0x7c/0x210
>   [<ffffffff8113be02>] ? aio_run_iocb+0x82/0x150
>   [<ffffffff8113c747>] ? sys_io_submit+0x2b7/0x6b0
>   [<ffffffff8100b542>] ? system_call_fastpath+0x16/0x1b
> 
>  INFO: task oracle:7654 blocked for more than 120 seconds.
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  oracle        D ffff8801a8837148     0  7654      1 0x00000000
>   ffff8800614ac7c0 0000000000000086 0000000000000000 0000000000000206
>   0000000000000000 000000000000e1c8 ffff88018c175fd8 ffff88005c9ba040
>   ffff88005c9ba3e0 ffffffff810c4722 000000038c175810 ffff88005c9ba3e0
>  Call Trace:
>   [<ffffffff810c4722>] ? mempool_alloc+0x62/0x140
>   [<ffffffffa018e8ae>] ? wait_barrier+0xee/0x130 [raid10]
>   [<ffffffff8104f570>] ? default_wake_function+0x0/0x10
>   [<ffffffff8112ddd1>] ? __bio_clone+0x21/0x70
>   [<ffffffffa0191852>] ? make_request+0x82/0x5f0 [raid10]
>   [<ffffffff8112d765>] ? bio_split+0x25/0x2a0
>   [<ffffffffa0191ce1>] ? make_request+0x511/0x5f0 [raid10]
>   [<ffffffffa007cb2c>] ? md_make_request+0xbc/0x130 [md_mod]
>   [<ffffffff8117d26f>] ? generic_make_request+0x30f/0x410
>   [<ffffffff8112da4a>] ? bvec_alloc_bs+0x6a/0x120
>   [<ffffffff8117d3cc>] ? submit_bio+0x5c/0xe0
>   [<ffffffff811313da>] ? dio_bio_submit+0x5a/0x90
>   [<ffffffff81131480>] ? dio_send_cur_page+0x70/0xc0
>   [<ffffffff8113151e>] ? submit_page_section+0x4e/0x140
>   [<ffffffff8113215a>] ? __blockdev_direct_IO+0x99a/0xcd0
>   [<ffffffffa01f666e>] ? xfs_vm_direct_IO+0x9e/0x140 [xfs]
>   [<ffffffffa01f6af0>] ? xfs_get_blocks_direct+0x0/0x20 [xfs]
>   [<ffffffffa01f6470>] ? xfs_end_io_direct+0x0/0x70 [xfs]
>   [<ffffffff810c4357>] ? generic_file_aio_read+0x607/0x620
>   [<ffffffffa023fae8>] ? rpc_run_task+0x38/0x80 [sunrpc]
>   [<ffffffffa01ff83b>] ? xfs_read+0x11b/0x270 [xfs]
>   [<ffffffff81103453>] ? do_sync_read+0xe3/0x130
>   [<ffffffff8113c32c>] ? sys_io_getevents+0x39c/0x420
>   [<ffffffff810706b0>] ? autoremove_wake_function+0x0/0x30
>   [<ffffffff8113adc0>] ? timeout_func+0x0/0x10
>   [<ffffffff81104138>] ? vfs_read+0xc8/0x180
>   [<ffffffff81104291>] ? sys_pread64+0xa1/0xb0
>   [<ffffffff8100c2db>] ? device_not_available+0x1b/0x20
>   [<ffffffff8100b542>] ? system_call_fastpath+0x16/0x1b
> 
>  INFO: task md11_resync:11976 blocked for more than 120 seconds.
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  md11_resync   D ffff88017964d140     0 11976      2 0x00000000
>   ffff8801af879880 0000000000000046 0000000000000000 0000000000000001
>   ffff8801a8b7fdd8 000000000000e1c8 ffff8800577d1fd8 ffff88017964d140
>   ffff88017964d4e0 000000012416219a ffff88002828e1c8 ffff88017964d4e0
>  Call Trace:
>   [<ffffffffa018e696>] ? raise_barrier+0xb6/0x1e0 [raid10]
>   [<ffffffff8104f570>] ? default_wake_function+0x0/0x10
>   [<ffffffff8103b263>] ? enqueue_task+0x53/0x60
>   [<ffffffffa018f525>] ? sync_request+0x715/0xae0 [raid10]
>   [<ffffffffa007dc76>] ? md_do_sync+0x606/0xc70 [md_mod]
>   [<ffffffff8104ca4a>] ? finish_task_switch+0x3a/0xc0
>   [<ffffffffa007ec47>] ? md_thread+0x67/0x140 [md_mod]
>   [<ffffffffa007ebe0>] ? md_thread+0x0/0x140 [md_mod]
>   [<ffffffff81070376>] ? kthread+0x96/0xb0
>   [<ffffffff8100c52a>] ? child_rip+0xa/0x20
>   [<ffffffff810702e0>] ? kthread+0x0/0xb0
>   [<ffffffff8100c520>] ? child_rip+0x0/0x20
> 
> (All 3 processes shown are reported at the same time).
> A few more processes are waiting in wait_barrier like the
> first mentioned above does.  Note the 3 different places
> it is waiting:
> 
>  o raise_barrier
>  o wait_barrier
>  o mempool_alloc called from wait_barrier
> 
> the whole thing look suspicious - smells like a deadlock
> somewhere.
> 
> >From this point on, the array is completely dead, with many
> processes (like the above) blocked, with no way to umount the
> filesystem in question.  Only forced reboot of the system
> helps.
> 
> This is 2.6.32.15.  I see there were a few patches for md
> after that, but it looks like they aren't relevant for this
> issue.
> 
> Note that this is not a trivially-triggerable problem.  The
> array survived several verify rounds (even during current
> uptime) without problems.  But today the array had quite some
> load during verify.
> 
> Thanks!
> 
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2010-08-02  3:01 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-01 10:57 sw raid array completely hungs during verify in 2.6.32 Michael Tokarev
2010-08-02  3:01 ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).