From: Neil Brown <neilb@suse.de>
To: Michael Tokarev <mjt@tls.msk.ru>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: sw raid array completely hungs during verify in 2.6.32
Date: Mon, 2 Aug 2010 13:01:54 +1000 [thread overview]
Message-ID: <20100802130154.217b5cc5@notabene> (raw)
In-Reply-To: <4C555334.5080202@msgid.tls.msk.ru>
On Sun, 01 Aug 2010 14:57:56 +0400
Michael Tokarev <mjt@tls.msk.ru> wrote:
> Hello.
>
> It is the second time we come across this issue
> after switching from 2.6.27 to 2.6.32 about 3
> months ago.
>
> At some point, an md-raid10 array hungs - that
> is, all the processes that tries to access it,
> either read or write, hungs forever.
Thanks for the report.
This is the same problem that has been reported recently in a thread with
subject "Raid10 device hangs during resync and heavy I/O."
I have just posted a patch which should address it - I will include it here
was well.
Note that you need to be careful when reading the stack traces. A "?" means
that the function named make not be in the actual call trace - it may just be
an old address that happens to still be on the stack.
In this case, the "mempool_alloc" was stray - nothing was actually blocking
on that.
This is the patch that I have proposed.
Thanks,
NeilBrown
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 42e64e4..d1d6891 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio)
*/
bp = bio_split(bio,
chunk_sects - (bio->bi_sector & (chunk_sects - 1)) );
+
+ /* Each of these 'make_request' calls will call 'wait_barrier'.
+ * If the first succeeds but the second blocks due to the resync
+ * thread raising the barrier, we will deadlock because the
+ * IO to the underlying device will be queued in generic_make_request
+ * and will never complete, so will never reduce nr_pending.
+ * So increment nr_waiting here so no new raise_barriers will
+ * succeed, and so the second wait_barrier cannot block.
+ */
+ spin_lock_irq(&conf->resync_lock);
+ conf->nr_waiting++;
+ spin_unlock_irq(&conf->resync_lock);
+
if (make_request(mddev, &bp->bio1))
generic_make_request(&bp->bio1);
if (make_request(mddev, &bp->bio2))
generic_make_request(&bp->bio2);
+ spin_lock_irq(&conf->resync_lock);
+ conf->nr_waiting--;
+ wake_up(&conf->wait_barrier);
+ spin_unlock_irq(&conf->resync_lock);
+
bio_pair_release(bp);
return 0;
bad_map:
>
> Here's a typical set of messages found in kern.log:
>
> INFO: task oracle:7602 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> oracle D ffff8801a8837148 0 7602 1 0x00000000
> ffffffff813bc480 0000000000000082 0000000000000000 0000000000000001
> ffff8801a8b7fdd8 000000000000e1c8 ffff88003b397fd8 ffff88003f47d840
> ffff88003f47dbe0 000000012416219a ffff88002820e1c8 ffff88003f47dbe0
> Call Trace:
> [<ffffffffa018e8ae>] ? wait_barrier+0xee/0x130 [raid10]
> [<ffffffff8104f570>] ? default_wake_function+0x0/0x10
> [<ffffffffa0191852>] ? make_request+0x82/0x5f0 [raid10]
> [<ffffffffa007cb2c>] ? md_make_request+0xbc/0x130 [md_mod]
> [<ffffffff810c4722>] ? mempool_alloc+0x62/0x140
> [<ffffffff8117d26f>] ? generic_make_request+0x30f/0x410
> [<ffffffff8112eee4>] ? bio_alloc_bioset+0x54/0xf0
> [<ffffffff8112e28b>] ? __bio_add_page+0x12b/0x240
> [<ffffffff8117d3cc>] ? submit_bio+0x5c/0xe0
> [<ffffffff811313da>] ? dio_bio_submit+0x5a/0x90
> [<ffffffff81131d63>] ? __blockdev_direct_IO+0x5a3/0xcd0
> [<ffffffffa01f66ed>] ? xfs_vm_direct_IO+0x11d/0x140 [xfs]
> [<ffffffffa01f6af0>] ? xfs_get_blocks_direct+0x0/0x20 [xfs]
> [<ffffffffa01f6470>] ? xfs_end_io_direct+0x0/0x70 [xfs]
> [<ffffffff810c3738>] ? generic_file_direct_write+0xc8/0x1b0
> [<ffffffffa01fef18>] ? xfs_write+0x458/0x950 [xfs]
> [<ffffffff8106317b>] ? try_to_del_timer_sync+0x9b/0xd0
> [<ffffffff810f9251>] ? cache_alloc_refill+0x221/0x5e0
> [<ffffffffa01fafe0>] ? xfs_file_aio_write+0x0/0x60 [xfs]
> [<ffffffff8113a6ac>] ? aio_rw_vect_retry+0x7c/0x210
> [<ffffffff8113be02>] ? aio_run_iocb+0x82/0x150
> [<ffffffff8113c747>] ? sys_io_submit+0x2b7/0x6b0
> [<ffffffff8100b542>] ? system_call_fastpath+0x16/0x1b
>
> INFO: task oracle:7654 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> oracle D ffff8801a8837148 0 7654 1 0x00000000
> ffff8800614ac7c0 0000000000000086 0000000000000000 0000000000000206
> 0000000000000000 000000000000e1c8 ffff88018c175fd8 ffff88005c9ba040
> ffff88005c9ba3e0 ffffffff810c4722 000000038c175810 ffff88005c9ba3e0
> Call Trace:
> [<ffffffff810c4722>] ? mempool_alloc+0x62/0x140
> [<ffffffffa018e8ae>] ? wait_barrier+0xee/0x130 [raid10]
> [<ffffffff8104f570>] ? default_wake_function+0x0/0x10
> [<ffffffff8112ddd1>] ? __bio_clone+0x21/0x70
> [<ffffffffa0191852>] ? make_request+0x82/0x5f0 [raid10]
> [<ffffffff8112d765>] ? bio_split+0x25/0x2a0
> [<ffffffffa0191ce1>] ? make_request+0x511/0x5f0 [raid10]
> [<ffffffffa007cb2c>] ? md_make_request+0xbc/0x130 [md_mod]
> [<ffffffff8117d26f>] ? generic_make_request+0x30f/0x410
> [<ffffffff8112da4a>] ? bvec_alloc_bs+0x6a/0x120
> [<ffffffff8117d3cc>] ? submit_bio+0x5c/0xe0
> [<ffffffff811313da>] ? dio_bio_submit+0x5a/0x90
> [<ffffffff81131480>] ? dio_send_cur_page+0x70/0xc0
> [<ffffffff8113151e>] ? submit_page_section+0x4e/0x140
> [<ffffffff8113215a>] ? __blockdev_direct_IO+0x99a/0xcd0
> [<ffffffffa01f666e>] ? xfs_vm_direct_IO+0x9e/0x140 [xfs]
> [<ffffffffa01f6af0>] ? xfs_get_blocks_direct+0x0/0x20 [xfs]
> [<ffffffffa01f6470>] ? xfs_end_io_direct+0x0/0x70 [xfs]
> [<ffffffff810c4357>] ? generic_file_aio_read+0x607/0x620
> [<ffffffffa023fae8>] ? rpc_run_task+0x38/0x80 [sunrpc]
> [<ffffffffa01ff83b>] ? xfs_read+0x11b/0x270 [xfs]
> [<ffffffff81103453>] ? do_sync_read+0xe3/0x130
> [<ffffffff8113c32c>] ? sys_io_getevents+0x39c/0x420
> [<ffffffff810706b0>] ? autoremove_wake_function+0x0/0x30
> [<ffffffff8113adc0>] ? timeout_func+0x0/0x10
> [<ffffffff81104138>] ? vfs_read+0xc8/0x180
> [<ffffffff81104291>] ? sys_pread64+0xa1/0xb0
> [<ffffffff8100c2db>] ? device_not_available+0x1b/0x20
> [<ffffffff8100b542>] ? system_call_fastpath+0x16/0x1b
>
> INFO: task md11_resync:11976 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> md11_resync D ffff88017964d140 0 11976 2 0x00000000
> ffff8801af879880 0000000000000046 0000000000000000 0000000000000001
> ffff8801a8b7fdd8 000000000000e1c8 ffff8800577d1fd8 ffff88017964d140
> ffff88017964d4e0 000000012416219a ffff88002828e1c8 ffff88017964d4e0
> Call Trace:
> [<ffffffffa018e696>] ? raise_barrier+0xb6/0x1e0 [raid10]
> [<ffffffff8104f570>] ? default_wake_function+0x0/0x10
> [<ffffffff8103b263>] ? enqueue_task+0x53/0x60
> [<ffffffffa018f525>] ? sync_request+0x715/0xae0 [raid10]
> [<ffffffffa007dc76>] ? md_do_sync+0x606/0xc70 [md_mod]
> [<ffffffff8104ca4a>] ? finish_task_switch+0x3a/0xc0
> [<ffffffffa007ec47>] ? md_thread+0x67/0x140 [md_mod]
> [<ffffffffa007ebe0>] ? md_thread+0x0/0x140 [md_mod]
> [<ffffffff81070376>] ? kthread+0x96/0xb0
> [<ffffffff8100c52a>] ? child_rip+0xa/0x20
> [<ffffffff810702e0>] ? kthread+0x0/0xb0
> [<ffffffff8100c520>] ? child_rip+0x0/0x20
>
> (All 3 processes shown are reported at the same time).
> A few more processes are waiting in wait_barrier like the
> first mentioned above does. Note the 3 different places
> it is waiting:
>
> o raise_barrier
> o wait_barrier
> o mempool_alloc called from wait_barrier
>
> the whole thing look suspicious - smells like a deadlock
> somewhere.
>
> >From this point on, the array is completely dead, with many
> processes (like the above) blocked, with no way to umount the
> filesystem in question. Only forced reboot of the system
> helps.
>
> This is 2.6.32.15. I see there were a few patches for md
> after that, but it looks like they aren't relevant for this
> issue.
>
> Note that this is not a trivially-triggerable problem. The
> array survived several verify rounds (even during current
> uptime) without problems. But today the array had quite some
> load during verify.
>
> Thanks!
>
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2010-08-02 3:01 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-01 10:57 sw raid array completely hungs during verify in 2.6.32 Michael Tokarev
2010-08-02 3:01 ` Neil Brown [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100802130154.217b5cc5@notabene \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=mjt@tls.msk.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.