debugging md2_resync hang at raise

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* debugging md2_resync hang at raise_barrier
@ 2012-03-01  0:44 Ray Morris
  2012-03-01  1:34 ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Ray Morris @ 2012-03-01  0:44 UTC (permalink / raw)
  To: linux-raid

I am attempting to debug a hang in raid1 and possibly one raid5.
I have experienced the same problem with many kernel versions
over a couple of years, and with disparate hardware.

My current plan, if noone more experienced suggests I do otherwise, is
to compile a kernel with some printk() in strategic locations and try to 
narrow down the problem. I know very little about kernel work, so I am 
seeking suggestions from those who know better than I.

In the case logged below, it appears to hang at raise_barrier in md2_resync
at raise_barrier, then further access to the device hangs. I'm just a Perl 
programmer who dabbles in C, but my intuition said I check that if perhaps 
lower_barrier had been called with conf->barrier already at zero, so that's
the one printk I've added so far. It may take a week or more before it 
crashes again, so is there any more debugging I should add before waiting 
for it to hang?

Also below is some older logging from similar symptoms with raid5, 
hanging at raid5_quiesce. I got rid of the raid5 in hopes of getting 
rid of the problem, but if anyone has suggestions on how to further 
debug that I maybe be able to add a raid5 array.

The load when I've noticed it is rsync to LVM volumes with snapshots.
After some discussion, lvm-devel suggested linux-raid would be the next 
logical step. Tested kernels include 2.6.32-220.4.1.el6.x86_64 
2.6.32.26-175.fc12.x86_64, vmlinuz-2.6.32.9-70.fc12.x86_64, and others.
Since I already have updated the kernel several times in the last couple 
of years, I figured I'd try some debugging with the current EL 6 kernel.

Anyway, any thoughts on how to debug, where to stick some printk, other 
debugging functions? 


Feb 28 23:16:52 clonebox3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 28 23:16:52 clonebox3 kernel: md2_resync    D 0000000000000002     0  9681      2 0x00000080
Feb 28 23:16:52 clonebox3 kernel: ffff88006b961bb0 0000000000000046 0000000000000000 0000000300000001
Feb 28 23:16:52 clonebox3 kernel: ffff88006b961b30 ffff88003c9ea7d0 0000000000000286 0000000000000003
Feb 28 23:16:52 clonebox3 kernel: ffff880037c8d038 ffff88006b961fd8 000000000000f4e8 ffff880037c8d038
Feb 28 23:16:52 clonebox3 kernel: Call Trace:
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa021ae0d>] raise_barrier+0x9d/0x1a0 [raid1]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa021c319>] sync_request+0x129/0x6a0 [raid1]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa021ad65>] ? md_raid1_unplug_device+0x25/0x30 [raid1]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff813eaa6d>] md_do_sync+0x6ad/0xbe0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff813eb236>] md_thread+0x116/0x150
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff813eb120>] ? md_thread+0x0/0x150
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff81090726>] kthread+0x96/0xa0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff81090690>] ? kthread+0x0/0xa0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20


Feb 28 23:16:52 clonebox3 kernel: INFO: task kcopyd:10519 blocked for more than 120 seconds.
Feb 28 23:16:52 clonebox3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 28 23:16:52 clonebox3 kernel: kcopyd        D 0000000000000001     0 10519      2 0x00000080
Feb 28 23:16:52 clonebox3 kernel: ffff88004a76b920 0000000000000046 0000000000000000 0000000300000001
Feb 28 23:16:52 clonebox3 kernel: ffff880079eb0d80 ffff88003c9ea7d0 0000000000000286 0000000000000003
Feb 28 23:16:52 clonebox3 kernel: ffff880036a630b8 ffff88004a76bfd8 000000000000f4e8 ffff880036a630b8
Feb 28 23:16:52 clonebox3 kernel: Call Trace:
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa021b05d>] wait_barrier+0xed/0x130 [raid1]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa021d096>] make_request+0x76/0x820 [raid1]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa00016fd>] ? __map_bio+0xad/0x130 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa0002f07>] ? __split_and_process_bio+0x167/0x5d0 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8125e106>] ? throtl_find_tg+0x46/0x60
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff813e9973>] md_make_request+0xd3/0x210
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8124fe32>] generic_make_request+0x2b2/0x5c0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa0002d22>] ? dm_merge_bvec+0xc2/0x140 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff812501cf>] submit_bio+0x8f/0x120
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa0009e2f>] dispatch_io+0x1ff/0x260 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa0009970>] ? list_get_page+0x0/0x30 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa00099a0>] ? list_next_page+0x0/0x20 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa000a960>] ? complete_io+0x0/0xa0 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa000a0d5>] dm_io+0xc5/0x1c0 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa0009970>] ? list_get_page+0x0/0x30 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa00099a0>] ? list_next_page+0x0/0x20 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa000a8f1>] run_io_job+0xa1/0x110 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa000a960>] ? complete_io+0x0/0xa0 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa000a70b>] process_jobs+0x5b/0x100 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa000a850>] ? run_io_job+0x0/0x110 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa000a7b0>] ? do_work+0x0/0xa0 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa000a7fc>] do_work+0x4c/0xa0 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8108b150>] worker_thread+0x170/0x2a0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8108afe0>] ? worker_thread+0x0/0x2a0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff81090726>] kthread+0x96/0xa0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff81090690>] ? kthread+0x0/0xa0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20



Feb 28 23:16:52 clonebox3 kernel: INFO: task kjournald:10618 blocked for more than 120 seconds.
Feb 28 23:16:52 clonebox3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 28 23:16:52 clonebox3 kernel: kjournald     D 0000000000000002     0 10618      2 0x00000080
Feb 28 23:16:52 clonebox3 kernel: ffff88002557da20 0000000000000046 0000000000000000 0000000300000001
Feb 28 23:16:52 clonebox3 kernel: 0000000000011210 ffff88003c9ea7d0 0000000000000286 0000000000000003
Feb 28 23:16:52 clonebox3 kernel: ffff88006be8e5f8 ffff88002557dfd8 000000000000f4e8 ffff88006be8e5f8
Feb 28 23:16:52 clonebox3 kernel: Call Trace:
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa021b05d>] wait_barrier+0xed/0x130 [raid1]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa021d096>] make_request+0x76/0x820 [raid1]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa00016fd>] ? __map_bio+0xad/0x130 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa0002f07>] ? __split_and_process_bio+0x167/0x5d0 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8125e106>] ? throtl_find_tg+0x46/0x60
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff813e9973>] md_make_request+0xd3/0x210
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8124fe32>] generic_make_request+0x2b2/0x5c0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff812501cf>] submit_bio+0x8f/0x120
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff811a8046>] submit_bh+0xf6/0x150
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa01b5644>] journal_commit_transaction+0x1054/0x1310 [jbd]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8107ca1b>] ? try_to_del_timer_sync+0x7b/0xe0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa01bab88>] kjournald+0xe8/0x250 [jbd]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa01baaa0>] ? kjournald+0x0/0x250 [jbd]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff81090726>] kthread+0x96/0xa0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff81090690>] ? kthread+0x0/0xa0
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20

Feb 28 23:16:52 clonebox3 kernel: INFO: task flush-253:27:10629 blocked for more than 120 seconds.
Feb 28 23:16:52 clonebox3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 28 23:16:52 clonebox3 kernel: flush-253:27  D 0000000000000000     0 10629      2 0x00000080
Feb 28 23:16:52 clonebox3 kernel: ffff88006c619640 0000000000000046 0000000000000000 0000000300000001
Feb 28 23:16:52 clonebox3 kernel: ffff88006c619660 ffff88003c9ea7d0 0000000000000282 0000000000000003
Feb 28 23:16:52 clonebox3 kernel: ffff8800744dbaf8 ffff88006c619fd8 000000000000f4e8 ffff8800744dbaf8
Feb 28 23:16:52 clonebox3 kernel: Call Trace:
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa021b05d>] wait_barrier+0xed/0x130 [raid1]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8105e7f0>] ? default_wake_function+0x0/0x20
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa021d096>] make_request+0x76/0x820 [raid1]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa00016fd>] ? __map_bio+0xad/0x130 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffffa0002f07>] ? __split_and_process_bio+0x167/0x5d0 [dm_mod]
Feb 28 23:16:52 clonebox3 kernel: [<ffffffff8125e106>] ?
throtl_find_tg+0x46/0x60
...





Dec 13 09:15:52 clonebox3 kernel: INFO: task kdmflush:31627 blocked for more than 120 seconds.
Dec 13 09:15:52 clonebox3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 13 09:15:52 clonebox3 kernel: kdmflush      D ffff88007b824300     0 31627      2 0x00000080
Dec 13 09:15:52 clonebox3 kernel: ffff8800372af9f0 0000000000000046 ffff8800372af9b8 ffff8800372af9b4
Dec 13 09:15:52 clonebox3 kernel: ffff8800372af9e0 ffff88007b824300 ffff880001e96980 00000001083f7318
Dec 13 09:15:52 clonebox3 kernel: ffff880076f27ad8 ffff8800372affd8 0000000000010518 ffff880076f27ad8
Dec 13 09:15:52 clonebox3 kernel: Call Trace:
Dec 13 09:15:52 clonebox3 kernel: [<ffffffffa01feca5>] raid5_quiesce+0x125/0x1a0 [raid456]
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff8105c580>] ? default_wake_function+0x0/0x20
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff810563f3>] ? __wake_up+0x53/0x70
Dec 13 09:15:52 clonebox3 kernel: [<ffffffffa02070c1>] make_request+0x501/0x520 [raid456]
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff8102ea69>] ? native_smp_send_reschedule+0x49/0x60
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff810508e8>] ? resched_task+0x68/0x80
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff813d09fb>] md_make_request+0xcb/0x230
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff8105c484>] ? try_to_wake_up+0x284/0x380
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff81241982>] generic_make_request+0x1b2/0x4f0
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff8110e925>] ? mempool_alloc_slab+0x15/0x20
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff8110ea33>] ? mempool_alloc+0x63/0x140
Dec 13 09:15:52 clonebox3 kernel: [<ffffffffa00016bd>] __map_bio+0xad/0x130 [dm_mod]
Dec 13 09:15:52 clonebox3 kernel: [<ffffffffa00017ef>] __issue_target_requests+0xaf/0xd0 [dm_mod]
Dec 13 09:15:52 clonebox3 kernel: [<ffffffffa000351f>] __split_and_process_bio+0x59f/0x630 [dm_mod]
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff8109225c>] ? remove_wait_queue+0x3c/0x50
Dec 13 09:15:52 clonebox3 kernel: [<ffffffffa00029c4>] ? dm_wait_for_completion+0xd4/0x100 [dm_mod]
Dec 13 09:15:52 clonebox3 kernel: [<ffffffffa0003836>] dm_flush+0x56/0x70 [dm_mod]
Dec 13 09:15:52 clonebox3 kernel: [<ffffffffa00038a4>] dm_wq_work+0x54/0x200 [dm_mod]
Dec 13 09:15:52 clonebox3 kernel: [<ffffffffa0003850>] ? dm_wq_work+0x0/0x200 [dm_mod]
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff8108c7d0>] worker_thread+0x170/0x2a0
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff81091ea0>] ? autoremove_wake_function+0x0/0x40
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff8108c660>] ? worker_thread+0x0/0x2a0
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff81091b36>] kthread+0x96/0xa0
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff81091aa0>] ? kthread+0x0/0xa0
Dec 13 09:15:52 clonebox3 kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
Dec 13 09:15:52 clonebox3 kernel: INFO: task kcopyd:31629 blocked for more than 120 seconds.
Dec 13 09:15:52 clonebox3 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 13 09:15:52 clonebox3 kernel: kcopyd        D ffff88007b824700     0 31629      2 0x00000080
Dec 13 09:15:52 clonebox3 kernel: ffff880044aa7ac0 0000000000000046 ffff880044aa7a88 ffff880044aa7a84
Dec 13 09:15:52 clonebox3 kernel: ffff880044aa7ae0 ffff88007b824700 ffff880001e16980 00000001083f7280
-- 
Ray Morris
support@bettercgi.com

Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/

Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/

Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php





^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: debugging md2_resync hang at raise_barrier
  2012-03-01  0:44 debugging md2_resync hang at raise_barrier Ray Morris
@ 2012-03-01  1:34 ` NeilBrown
  2012-03-12 18:46   ` Ray Morris
  0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2012-03-01  1:34 UTC (permalink / raw)
  To: Ray Morris; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3725 bytes --]

On Wed, 29 Feb 2012 18:44:13 -0600 Ray Morris <support@bettercgi.com> wrote:

> I am attempting to debug a hang in raid1 and possibly one raid5.
> I have experienced the same problem with many kernel versions
> over a couple of years, and with disparate hardware.
> 
> My current plan, if noone more experienced suggests I do otherwise, is
> to compile a kernel with some printk() in strategic locations and try to 
> narrow down the problem. I know very little about kernel work, so I am 
> seeking suggestions from those who know better than I.
> 
> In the case logged below, it appears to hang at raise_barrier in md2_resync
> at raise_barrier, then further access to the device hangs. I'm just a Perl 
> programmer who dabbles in C, but my intuition said I check that if perhaps 
> lower_barrier had been called with conf->barrier already at zero, so that's
> the one printk I've added so far. It may take a week or more before it 
> crashes again, so is there any more debugging I should add before waiting 
> for it to hang?
> 
> Also below is some older logging from similar symptoms with raid5, 
> hanging at raid5_quiesce. I got rid of the raid5 in hopes of getting 
> rid of the problem, but if anyone has suggestions on how to further 
> debug that I maybe be able to add a raid5 array.
> 
> The load when I've noticed it is rsync to LVM volumes with snapshots.
> After some discussion, lvm-devel suggested linux-raid would be the next 
> logical step. Tested kernels include 2.6.32-220.4.1.el6.x86_64 
> 2.6.32.26-175.fc12.x86_64, vmlinuz-2.6.32.9-70.fc12.x86_64, and others.
> Since I already have updated the kernel several times in the last couple 
> of years, I figured I'd try some debugging with the current EL 6 kernel.
> 
> Anyway, any thoughts on how to debug, where to stick some printk, other 
> debugging functions? 

I might know what is happening.

It is kind-a complicated and involved the magic code in
block/blk-core.c:generic_make_request which turns recursive calls into tail
recursion.

The fs sends a request to dm.
dm split it in 2 for some reason and sends them both to md.
This involves them getting queued in generic_make_request.
The first gets actioned by md/raid1 and converted into a request to the
underlying device (it must be a read request for this to happen - so just one
device).  This gets added to the queue and is counted in nr_pending.

At this point sync_request is called by another thread and it tries to
raise_battier.  It gets past the first hurdle, increments ->barrier, and
waits for nr_pending to hit zero.

Now the second request from dm to md is passed to raid1.c:make_request where
it tries to wait_barrier.  This blocks because ->barrier is up, and we have a
deadlock - the request to the underlying device will not progress until this
md request progresses, and it is stuck.

This patch might fix it.  Maybe.  If it compiles.

NeilBrown

Index: linux-2.6.32-SLE11-SP1/drivers/md/raid1.c
===================================================================
--- linux-2.6.32-SLE11-SP1.orig/drivers/md/raid1.c	2012-03-01 12:28:05.000000000 +1100
+++ linux-2.6.32-SLE11-SP1/drivers/md/raid1.c	2012-03-01 12:28:22.427992913 +1100
@@ -695,7 +695,11 @@ static void wait_barrier(conf_t *conf)
 	spin_lock_irq(&conf->resync_lock);
 	if (conf->barrier) {
 		conf->nr_waiting++;
-		wait_event_lock_irq(conf->wait_barrier, !conf->barrier,
+		wait_event_lock_irq(conf->wait_barrier,
+				    !conf->barrier ||
+				    (current->bio_tail &&
+				     current->bio_list &&
+				     conf->nr_pending),
 				    conf->resync_lock,
 				    raid1_unplug(conf->mddev->queue));
 		conf->nr_waiting--;


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: debugging md2_resync hang at raise_barrier
  2012-03-01  1:34 ` NeilBrown
@ 2012-03-12 18:46   ` Ray Morris
  2012-03-13  0:16     ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: Ray Morris @ 2012-03-12 18:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

> On Wed, 29 Feb 2012 18:44:13 -0600 Ray Morris <support@bettercgi.com>
> wrote:
> 
> > I am attempting to debug a hang in raid1 and possibly one raid5.


On Thu, 1 Mar 2012 12:34:18 +1100
NeilBrown <neilb@suse.de> wrote:

> I might know what is happening.
> 
> It is kind-a complicated and involved the magic code in
> block/blk-core.c:generic_make_request which turns recursive calls
> into tail recursion.
...
> This patch might fix it.  Maybe.  If it compiles.

That appears to have solved it. Thanks so much. The delay in my
response is because I wanted to give it some time to recur, as it
didn't always manifest right away. Is there anything I should do
regarding this path in future versions? Is this patch already in or
planned to be in future versions, or should users possibly keep the
patch on hand in case it's still needed later? Is there any testing
or anything that I need to do to get included in future versions?


-- 
Ray Morris
support@bettercgi.com

Strongbox - The next generation in site security:
http://www.bettercgi.com/strongbox/

Throttlebox - Intelligent Bandwidth Control
http://www.bettercgi.com/throttlebox/

Strongbox / Throttlebox affiliate program:
http://www.bettercgi.com/affiliates/user/register.php




On Thu, 1 Mar 2012 12:34:18 +1100
NeilBrown <neilb@suse.de> wrote:

> On Wed, 29 Feb 2012 18:44:13 -0600 Ray Morris <support@bettercgi.com>
> wrote:
> 
> > I am attempting to debug a hang in raid1 and possibly one raid5.
> > I have experienced the same problem with many kernel versions
> > over a couple of years, and with disparate hardware.
> > 
> > My current plan, if noone more experienced suggests I do otherwise,
> > is to compile a kernel with some printk() in strategic locations
> > and try to narrow down the problem. I know very little about kernel
> > work, so I am seeking suggestions from those who know better than I.
> > 
> > In the case logged below, it appears to hang at raise_barrier in
> > md2_resync at raise_barrier, then further access to the device
> > hangs. I'm just a Perl programmer who dabbles in C, but my
> > intuition said I check that if perhaps lower_barrier had been
> > called with conf->barrier already at zero, so that's the one printk
> > I've added so far. It may take a week or more before it crashes
> > again, so is there any more debugging I should add before waiting
> > for it to hang?
> > 
> > Also below is some older logging from similar symptoms with raid5, 
> > hanging at raid5_quiesce. I got rid of the raid5 in hopes of
> > getting rid of the problem, but if anyone has suggestions on how to
> > further debug that I maybe be able to add a raid5 array.
> > 
> > The load when I've noticed it is rsync to LVM volumes with
> > snapshots. After some discussion, lvm-devel suggested linux-raid
> > would be the next logical step. Tested kernels include
> > 2.6.32-220.4.1.el6.x86_64 2.6.32.26-175.fc12.x86_64,
> > vmlinuz-2.6.32.9-70.fc12.x86_64, and others. Since I already have
> > updated the kernel several times in the last couple of years, I
> > figured I'd try some debugging with the current EL 6 kernel.
> > 
> > Anyway, any thoughts on how to debug, where to stick some printk,
> > other debugging functions? 
> 
> I might know what is happening.
> 
> It is kind-a complicated and involved the magic code in
> block/blk-core.c:generic_make_request which turns recursive calls
> into tail recursion.
> 
> The fs sends a request to dm.
> dm split it in 2 for some reason and sends them both to md.
> This involves them getting queued in generic_make_request.
> The first gets actioned by md/raid1 and converted into a request to
> the underlying device (it must be a read request for this to happen -
> so just one device).  This gets added to the queue and is counted in
> nr_pending.
> 
> At this point sync_request is called by another thread and it tries to
> raise_battier.  It gets past the first hurdle, increments ->barrier,
> and waits for nr_pending to hit zero.
> 
> Now the second request from dm to md is passed to
> raid1.c:make_request where it tries to wait_barrier.  This blocks
> because ->barrier is up, and we have a deadlock - the request to the
> underlying device will not progress until this md request progresses,
> and it is stuck.
> 
> This patch might fix it.  Maybe.  If it compiles.
> 
> NeilBrown
> 
> Index: linux-2.6.32-SLE11-SP1/drivers/md/raid1.c
> ===================================================================
> --- linux-2.6.32-SLE11-SP1.orig/drivers/md/raid1.c	2012-03-01
> 12:28:05.000000000 +1100 +++
> linux-2.6.32-SLE11-SP1/drivers/md/raid1.c	2012-03-01
> 12:28:22.427992913 +1100 @@ -695,7 +695,11 @@ static void
> wait_barrier(conf_t *conf) spin_lock_irq(&conf->resync_lock); if
> (conf->barrier) { conf->nr_waiting++;
> -
> wait_event_lock_irq(conf->wait_barrier, !conf->barrier,
> +		wait_event_lock_irq(conf->wait_barrier,
> +				    !conf->barrier ||
> +				    (current->bio_tail &&
> +				     current->bio_list &&
> +				     conf->nr_pending),
>  				    conf->resync_lock,
>  				    raid1_unplug(conf->mddev->queue));
>  		conf->nr_waiting--;
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: debugging md2_resync hang at raise_barrier
  2012-03-12 18:46   ` Ray Morris
@ 2012-03-13  0:16     ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2012-03-13  0:16 UTC (permalink / raw)
  To: Ray Morris; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1402 bytes --]

On Mon, 12 Mar 2012 13:46:03 -0500 Ray Morris <support@bettercgi.com> wrote:

> > On Wed, 29 Feb 2012 18:44:13 -0600 Ray Morris <support@bettercgi.com>
> > wrote:
> > 
> > > I am attempting to debug a hang in raid1 and possibly one raid5.
> 
> 
> On Thu, 1 Mar 2012 12:34:18 +1100
> NeilBrown <neilb@suse.de> wrote:
> 
> > I might know what is happening.
> > 
> > It is kind-a complicated and involved the magic code in
> > block/blk-core.c:generic_make_request which turns recursive calls
> > into tail recursion.
> ...
> > This patch might fix it.  Maybe.  If it compiles.
> 
> That appears to have solved it. Thanks so much. The delay in my
> response is because I wanted to give it some time to recur, as it
> didn't always manifest right away. Is there anything I should do
> regarding this path in future versions? Is this patch already in or
> planned to be in future versions, or should users possibly keep the
> patch on hand in case it's still needed later? Is there any testing
> or anything that I need to do to get included in future versions?
> 
> 

Thanks a lot.

I will probably submit this fix to Linus shortly after 3.3 is out, with a
request for it to be included in other -stable releases.

You seem to be using kernels from Redhat.  If you want them to include the
patch you should probably raise it as an issue with them.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-03-13  0:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-01  0:44 debugging md2_resync hang at raise_barrier Ray Morris
2012-03-01  1:34 ` NeilBrown
2012-03-12 18:46   ` Ray Morris
2012-03-13  0:16     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).