From: Tim Small <tim@seoss.co.uk>
To: Neil Brown <neilb@suse.de>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Deadlock in md barrier code? / RAID1 / LVM CoW snapshot + ext3 / Debian 5.0 - lenny 2.6.26 kernel
Date: Tue, 19 Oct 2010 17:24:33 +0100 [thread overview]
Message-ID: <4CBDC641.9090009@seoss.co.uk> (raw)
In-Reply-To: <20101019171650.20c83e7e@notabene>
On 19/10/10 07:16, Neil Brown wrote:
>
>> ... any use?
>>
> Only in as much as that it excludes some possible avenues of investigation,
> which has some value.
>
> But I'm out of ideas. I cannot think of anything that could keep
> ->nr_pending or ->barrier elevated such that raise_barrier would
> block for a long time.
>
> Maybe you could try adding printks to print
> conf->{barrier,nr_pending,nr_waiting,nr_queued}
> just before and after the second wait_event_lock_irq in raise_barrier().
>
>
run1:
[ 130.168389] Pre wait barrier: 1, pend: 30, wait: 0, queued: 0
[ 130.209456] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 130.228814] Pre wait barrier: 1, pend: 82, wait: 0, queued: 0
[ 130.267475] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[ 130.290556] Pre wait barrier: 1, pend: 57, wait: 0, queued: 0
[ 130.348259] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 130.376040] Pre wait barrier: 1, pend: 42, wait: 0, queued: 0
[ 130.414694] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[ 130.438693] Pre wait barrier: 1, pend: 46, wait: 0, queued: 0
[ 130.490211] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 130.510371] Pre wait barrier: 1, pend: 57, wait: 0, queued: 0
[ 130.551321] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 130.564969] Pre wait barrier: 1, pend: 16, wait: 0, queued: 0
<hang>
run2:
[ 155.947067] Pre wait barrier: 1, pend: 8, wait: 0, queued: 0
[ 156.000027] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[ 156.011959] Pre wait barrier: 1, pend: 13, wait: 0, queued: 0
[ 156.059050] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[ 156.076866] Pre wait barrier: 1, pend: 13, wait: 0, queued: 0
[ 156.108234] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 156.118437] Pre wait barrier: 1, pend: 93, wait: 0, queued: 0
[ 156.140136] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[ 156.140506] Pre wait barrier: 1, pend: 4, wait: 0, queued: 0
[ 156.157644] Post wait barrier: 1, pend: 0, wait: 3, queued: 0
[ 156.260883] Pre wait barrier: 1, pend: 51, wait: 0, queued: 0
<hang>
run3:
[ 113.862479] Pre wait barrier: 1, pend: 6, wait: 0, queued: 0
[ 113.892445] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 113.912622] Pre wait barrier: 1, pend: 6, wait: 0, queued: 0
[ 113.921417] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 113.947774] Pre wait barrier: 1, pend: 22, wait: 0, queued: 0
[ 113.970322] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 114.024782] Pre wait barrier: 1, pend: 2, wait: 0, queued: 0
[ 114.039825] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 114.055788] Pre wait barrier: 1, pend: 33, wait: 0, queued: 0
[ 114.071059] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 114.076240] Pre wait barrier: 1, pend: 5, wait: 0, queued: 0
[ 114.103378] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[ 114.118969] Pre wait barrier: 1, pend: 27, wait: 0, queued: 0
<hang>
md2 : active raid1 sda6[0] sdb6[1]
404600128 blocks [2/2] [UU]
[>....................] resync = 0.5% (2246720/404600128)
finish=256573.1min speed=26K/sec
I then stuck a few more printks in both wait_event_lock_irq and
raise_barrier,
static void raise_barrier(conf_t *conf)
{
spin_lock_irq(&conf->resync_lock);
/* Wait until no block IO is waiting */
printk(KERN_DEBUG "%i: Pre wait until no IO waiting: %d, pend:
%d, wait: %d, queued: %d\n", current->pid, conf->barrier,
conf->nr_pending, conf->nr_waiting, conf->nr_queued);
wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting,
conf->resync_lock,
raid1_unplug(conf->mddev->queue));
printk(KERN_DEBUG "%i: Post wait until no IO waiting: %d, pend:
%d, wait: %d, queued: %d\n", current->pid, conf->barrier,
conf->nr_pending, conf->nr_waiting, conf->nr_queued);
/* block any new IO from starting */
conf->barrier++;
/* No wait for all pending IO to complete */
printk(KERN_DEBUG "%i: Pre wait pending to complete: %d, pend:
%d, wait: %d, queued: %d\n", current->pid, conf->barrier,
conf->nr_pending, conf->nr_waiting, conf->nr_queued);
wait_event_lock_irq(conf->wait_barrier,
!conf->nr_pending && conf->barrier <
RESYNC_DEPTH,
conf->resync_lock,
raid1_unplug(conf->mddev->queue));
printk(KERN_DEBUG "%i: Post wait pending to complete: %d, pend:
%d, wait: %d, queued: %d\n", current->pid, conf->barrier,
conf->nr_pending, conf->nr_waiting, conf->nr_queued);
spin_unlock_irq(&conf->resync_lock);
}
#define __wait_event_lock_irq(wq, condition, lock, cmd) \
do { \
wait_queue_t __wait; \
int ourpid; \
ourpid = current->pid; \
printk(KERN_DEBUG "%i is pid in wait_event_lock_irq.\n",
ourpid); \
init_waitqueue_entry(&__wait, current); \
\
add_wait_queue(&wq, &__wait); \
for (;;) { \
set_current_state(TASK_UNINTERRUPTIBLE); \
if (condition) \
break; \
spin_unlock_irq(&lock); \
printk(KERN_DEBUG "%i Calling cmd\n",
ourpid); \
cmd; \
printk(KERN_DEBUG "%i Returned from cmd\n",
ourpid); \
schedule(); \
spin_lock_irq(&lock); \
} \
if (ourpid != (current->pid)) \
printk(KERN_DEBUG "PID changed! Old: %i, new: %i\n",
ourpid, current->pid); \
current->state = TASK_RUNNING; \
remove_wait_queue(&wq, &__wait); \
} while (0)
and got this:
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410 Calling cmd
[ 112.240838] 2410 Returned from cmd
[ 112.240838] 2410: Post wait until no IO waiting: 0, pend: 448, wait:
0, queued: 0
[ 112.240838] 2410: Pre wait pending to complete: 1, pend: 448, wait:
0, queued: 0
[ 112.240838] 2410 is pid in wait_event_lock_irq.
[ 112.240838] 2410 Calling cmd
[ 112.243066] 5950 is pid in wait_event_lock_irq.
[ 112.243066] 5950 Calling cmd
[ 112.243066] 5950 Returned from cmd
[ 112.240838] 2410 Returned from cmd
[ 112.243066] 2430 is pid in wait_event_lock_irq.
[ 112.243066] 2430 Calling cmd
[ 112.243066] 2430 Returned from cmd
[ 112.257942] 2430 Calling cmd
[ 112.258447] 2410 Calling cmd
[ 112.258447] 5950 Calling cmd
[ 112.258453] 2410 Returned from cmd
[ 112.258447] 5950 Returned from cmd
[ 112.257942] 2430 Returned from cmd
[ 112.258447] 5745 is pid in wait_event_lock_irq.
[ 112.258451] 5745 Calling cmd
[ 112.258454] 5745 Returned from cmd
[ 112.262495] 5745 Calling cmd
[ 112.262495] 2430 Calling cmd
[ 112.262495] 2410 Calling cmd
[ 112.262495] 5745 Returned from cmd
[ 112.262495] 2430 Returned from cmd
[ 112.262495] 2410 Returned from cmd
[ 112.262495] 5950 Calling cmd
[ 112.262495] 5950 Returned from cmd
[ 112.263422] 2430 Calling cmd
[ 112.263422] 2430 Returned from cmd
[ 112.264770] 2410 Calling cmd
[ 112.264772] 2410 Returned from cmd
--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309
next prev parent reply other threads:[~2010-10-19 16:24 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-17 14:53 Deadlock in md barrier code? / RAID1 / LVM CoW snapshot + ext3 / Debian 5.0 - lenny 2.6.26 kernel Tim Small
2010-09-17 22:59 ` Neil Brown
2010-09-20 19:59 ` Tim Small
2010-09-21 21:02 ` Tim Small
2010-09-21 22:30 ` Neil Brown
2010-10-12 13:59 ` Tim Small
2010-10-12 14:06 ` Tim Small
2010-10-12 16:48 ` CoolCold
2010-10-13 8:51 ` Tim Small
2010-10-13 13:00 ` CoolCold
2010-10-18 18:52 ` Tim Small
2010-10-19 6:16 ` Neil Brown
2010-10-19 16:24 ` Tim Small [this message]
2010-10-19 16:29 ` Tim Small
2010-10-19 19:29 ` Tim Small
2010-10-20 20:34 ` Tim Small
2010-10-20 23:04 ` Neil Brown
2010-11-18 18:04 ` Tim Small
2010-11-21 23:05 ` Neil Brown
2010-12-06 15:42 ` Tim Small
2010-09-21 22:21 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CBDC641.9090009@seoss.co.uk \
--to=tim@seoss.co.uk \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.