From: Tim Small <tim@seoss.co.uk>
To: Neil Brown <neilb@suse.de>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Deadlock in md barrier code? / RAID1 / LVM CoW snapshot + ext3 / Debian 5.0 - lenny 2.6.26 kernel
Date: Tue, 19 Oct 2010 17:24:33 +0100 [thread overview]
Message-ID: <4CBDC641.9090009@seoss.co.uk> (raw)
In-Reply-To: <20101019171650.20c83e7e@notabene>
On 19/10/10 07:16, Neil Brown wrote:
>
>> ... any use?
>>
> Only in as much as that it excludes some possible avenues of investigation,
> which has some value.
>
> But I'm out of ideas. I cannot think of anything that could keep
> ->nr_pending or ->barrier elevated such that raise_barrier would
> block for a long time.
>
> Maybe you could try adding printks to print
> conf->{barrier,nr_pending,nr_waiting,nr_queued}
> just before and after the second wait_event_lock_irq in raise_barrier().
>
>
run1:
[ 130.168389] Pre wait barrier: 1, pend: 30, wait: 0, queued: 0
[ 130.209456] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 130.228814] Pre wait barrier: 1, pend: 82, wait: 0, queued: 0
[ 130.267475] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[ 130.290556] Pre wait barrier: 1, pend: 57, wait: 0, queued: 0
[ 130.348259] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 130.376040] Pre wait barrier: 1, pend: 42, wait: 0, queued: 0
[ 130.414694] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[ 130.438693] Pre wait barrier: 1, pend: 46, wait: 0, queued: 0
[ 130.490211] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 130.510371] Pre wait barrier: 1, pend: 57, wait: 0, queued: 0
[ 130.551321] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 130.564969] Pre wait barrier: 1, pend: 16, wait: 0, queued: 0
<hang>
run2:
[ 155.947067] Pre wait barrier: 1, pend: 8, wait: 0, queued: 0
[ 156.000027] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[ 156.011959] Pre wait barrier: 1, pend: 13, wait: 0, queued: 0
[ 156.059050] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[ 156.076866] Pre wait barrier: 1, pend: 13, wait: 0, queued: 0
[ 156.108234] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 156.118437] Pre wait barrier: 1, pend: 93, wait: 0, queued: 0
[ 156.140136] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 156.140506] Pre wait barrier: 1, pend: 4, wait: 0, queued: 0
[ 156.157644] Post wait barrier: 1, pend: 0, wait: 3, queued: 0
[ 156.260883] Pre wait barrier: 1, pend: 51, wait: 0, queued: 0
<hang>
run3:
[ 113.862479] Pre wait barrier: 1, pend: 6, wait: 0, queued: 0
[ 113.892445] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 113.912622] Pre wait barrier: 1, pend: 6, wait: 0, queued: 0
[ 113.921417] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 113.947774] Pre wait barrier: 1, pend: 22, wait: 0, queued: 0
[ 113.970322] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 114.024782] Pre wait barrier: 1, pend: 2, wait: 0, queued: 0
[ 114.039825] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 114.055788] Pre wait barrier: 1, pend: 33, wait: 0, queued: 0
[ 114.071059] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 114.076240] Pre wait barrier: 1, pend: 5, wait: 0, queued: 0
[ 114.103378] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 114.118969] Pre wait barrier: 1, pend: 27, wait: 0, queued: 0
<hang>
md2 : active raid1 sda6[0] sdb6[1]
404600128 blocks [2/2] [UU]
[>....................] resync = 0.5% (2246720/404600128)
finish=256573.1min speed=26K/sec
I then stuck a few more printks in both wait_event_lock_irq and
raise_barrier,
static void raise_barrier(conf_t *conf)
{
spin_lock_irq(&conf->resync_lock);
/* Wait until no block IO is waiting */
printk(KERN_DEBUG "%i: Pre wait until no IO waiting: %d, pend:
%d, wait: %d, queued: %d\n", current->pid, conf->barrier,
conf->nr_pending, conf->nr_waiting, conf->nr_queued);
wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting,
conf->resync_lock,
raid1_unplug(conf->mddev->queue));
printk(KERN_DEBUG "%i: Post wait until no IO waiting: %d, pend:
%d, wait: %d, queued: %d\n", current->pid, conf->barrier,
conf->nr_pending, conf->nr_waiting, conf->nr_queued);
/* block any new IO from starting */
conf->barrier++;
/* No wait for all pending IO to complete */
printk(KERN_DEBUG "%i: Pre wait pending to complete: %d, pend:
%d, wait: %d, queued: %d\n", current->pid, conf->barrier,
conf->nr_pending, conf->nr_waiting, conf->nr_queued);
wait_event_lock_irq(conf->wait_barrier,
!conf->nr_pending && conf->barrier <
RESYNC_DEPTH,
conf->resync_lock,
raid1_unplug(conf->mddev->queue));
printk(KERN_DEBUG "%i: Post wait pending to complete: %d, pend:
%d, wait: %d, queued: %d\n", current->pid, conf->barrier,
conf->nr_pending, conf->nr_waiting, conf->nr_queued);
spin_unlock_irq(&conf->resync_lock);
}
#define __wait_event_lock_irq(wq, condition, lock, cmd) \
do { \
wait_queue_t __wait; \
int ourpid; \
ourpid = current->pid; \
printk(KERN_DEBUG "%i is pid in wait_event_lock_irq.\n",
ourpid); \
init_waitqueue_entry(&__wait, current); \
\
add_wait_queue(&wq, &__wait); \
for (;;) { \
set_current_state(TASK_UNINTERRUPTIBLE); \
if (condition) \
break; \
spin_unlock_irq(&lock); \
printk(KERN_DEBUG "%i Calling cmd\n",
ourpid); \
cmd; \
printk(KERN_DEBUG "%i Returned from cmd\n",
ourpid); \
schedule(); \
spin_lock_irq(&lock); \
} \
if (ourpid != (current->pid)) \
printk(KERN_DEBUG "PID changed! Old: %i, new: %i\n",
ourpid, current->pid); \
current->state = TASK_RUNNING; \
remove_wait_queue(&wq, &__wait); \
} while (0)
and got this:
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410: Post wait until no IO waiting: 0, pend: 448, wait:
0, queued: 0
[ 112.240838] 2410: Pre wait pending to complete: 1, pend: 448, wait:
0, queued: 0
[ 112.240838] 2410 is pid in wait_event_lock_irq.
[ 112.240838] 2410 Calling cmd
[ 112.243066] 5950 is pid in wait_event_lock_irq.
[ 112.243066] 5950 Calling cmd
[ 112.243066] 5950 Returned from cmd
[ 112.240838] 2410 Returned from cmd
[ 112.243066] 2430 is pid in wait_event_lock_irq.
[ 112.243066] 2430 Calling cmd
[ 112.243066] 2430 Returned from cmd
[ 112.257942] 2430 Calling cmd
[ 112.258447] 2410 Calling cmd
[ 112.258447] 5950 Calling cmd
[ 112.258453] 2410 Returned from cmd
[ 112.258447] 5950 Returned from cmd
[ 112.257942] 2430 Returned from cmd
[ 112.258447] 5745 is pid in wait_event_lock_irq.
[ 112.258451] 5745 Calling cmd
[ 112.258454] 5745 Returned from cmd
[ 112.262495] 5745 Calling cmd
[ 112.262495] 2430 Calling cmd
[ 112.262495] 2410 Calling cmd
[ 112.262495] 5745 Returned from cmd
[ 112.262495] 2430 Returned from cmd
[ 112.262495] 2410 Returned from cmd
[ 112.262495] 5950 Calling cmd
[ 112.262495] 5950 Returned from cmd
[ 112.263422] 2430 Calling cmd
[ 112.263422] 2430 Returned from cmd
[ 112.264770] 2410 Calling cmd
[ 112.264772] 2410 Returned from cmd
--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309
next prev parent reply other threads:[~2010-10-19 16:24 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-17 14:53 Deadlock in md barrier code? / RAID1 / LVM CoW snapshot + ext3 / Debian 5.0 - lenny 2.6.26 kernel Tim Small
2010-09-17 22:59 ` Neil Brown
2010-09-20 19:59 ` Tim Small
2010-09-21 21:02 ` Tim Small
2010-09-21 22:30 ` Neil Brown
2010-10-12 13:59 ` Tim Small
2010-10-12 14:06 ` Tim Small
2010-10-12 16:48 ` CoolCold
2010-10-13 8:51 ` Tim Small
2010-10-13 13:00 ` CoolCold
2010-10-18 18:52 ` Tim Small
2010-10-19 6:16 ` Neil Brown
2010-10-19 16:24 ` Tim Small [this message]
2010-10-19 16:29 ` Tim Small
2010-10-19 19:29 ` Tim Small
2010-10-20 20:34 ` Tim Small
2010-10-20 23:04 ` Neil Brown
2010-11-18 18:04 ` Tim Small
2010-11-21 23:05 ` Neil Brown
2010-12-06 15:42 ` Tim Small
2010-09-21 22:21 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CBDC641.9090009@seoss.co.uk \
--to=tim@seoss.co.uk \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).