linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tim Small <tim@seoss.co.uk>
To: Neil Brown <neilb@suse.de>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Deadlock in md barrier code? / RAID1 / LVM CoW snapshot + ext3 / Debian 5.0 - lenny 2.6.26 kernel
Date: Tue, 19 Oct 2010 17:24:33 +0100	[thread overview]
Message-ID: <4CBDC641.9090009@seoss.co.uk> (raw)
In-Reply-To: <20101019171650.20c83e7e@notabene>

On 19/10/10 07:16, Neil Brown wrote:
>
>> ... any use?
>>      
> Only in as much as that it excludes some possible avenues of investigation,
> which has some value.
>
> But I'm out of ideas.  I cannot think of anything that could keep
> ->nr_pending or ->barrier elevated such that raise_barrier would
> block for a long time.
>
> Maybe you could try adding printks to print
>    conf->{barrier,nr_pending,nr_waiting,nr_queued}
> just before and after the second wait_event_lock_irq in raise_barrier().
>
>    

run1:

[  130.168389] Pre wait barrier: 1, pend: 30, wait: 0, queued: 0
[  130.209456] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[  130.228814] Pre wait barrier: 1, pend: 82, wait: 0, queued: 0
[  130.267475] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[  130.290556] Pre wait barrier: 1, pend: 57, wait: 0, queued: 0
[  130.348259] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[  130.376040] Pre wait barrier: 1, pend: 42, wait: 0, queued: 0
[  130.414694] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[  130.438693] Pre wait barrier: 1, pend: 46, wait: 0, queued: 0
[  130.490211] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[  130.510371] Pre wait barrier: 1, pend: 57, wait: 0, queued: 0
[  130.551321] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[  130.564969] Pre wait barrier: 1, pend: 16, wait: 0, queued: 0
<hang>

run2:

[  155.947067] Pre wait barrier: 1, pend: 8, wait: 0, queued: 0
[  156.000027] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[  156.011959] Pre wait barrier: 1, pend: 13, wait: 0, queued: 0
[  156.059050] Post wait barrier: 1, pend: 0, wait: 5, queued: 0
[  156.076866] Pre wait barrier: 1, pend: 13, wait: 0, queued: 0
[  156.108234] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[  156.118437] Pre wait barrier: 1, pend: 93, wait: 0, queued: 0
[  156.140136] Post wait barrier: 1, pend: 0, wait: 4, queued: 0
[  156.140506] Pre wait barrier: 1, pend: 4, wait: 0, queued: 0
[  156.157644] Post wait barrier: 1, pend: 0, wait: 3, queued: 0
[  156.260883] Pre wait barrier: 1, pend: 51, wait: 0, queued: 0
<hang>

run3:

[  113.862479] Pre wait barrier: 1, pend: 6, wait: 0, queued: 0
[  113.892445] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[  113.912622] Pre wait barrier: 1, pend: 6, wait: 0, queued: 0
[  113.921417] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[  113.947774] Pre wait barrier: 1, pend: 22, wait: 0, queued: 0
[  113.970322] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[  114.024782] Pre wait barrier: 1, pend: 2, wait: 0, queued: 0
[  114.039825] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[  114.055788] Pre wait barrier: 1, pend: 33, wait: 0, queued: 0
[  114.071059] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[  114.076240] Pre wait barrier: 1, pend: 5, wait: 0, queued: 0
[  114.103378] Post wait barrier: 1, pend: 0, wait: 2, queued: 0
[  114.118969] Pre wait barrier: 1, pend: 27, wait: 0, queued: 0
<hang>


md2 : active raid1 sda6[0] sdb6[1]
       404600128 blocks [2/2] [UU]
       [>....................]  resync =  0.5% (2246720/404600128) 
finish=256573.1min speed=26K/sec




I then stuck a few more printks in both wait_event_lock_irq and 
raise_barrier,

static void raise_barrier(conf_t *conf)
{
         spin_lock_irq(&conf->resync_lock);

         /* Wait until no block IO is waiting */
         printk(KERN_DEBUG "%i: Pre wait until no IO waiting: %d, pend: 
%d, wait: %d, queued: %d\n", current->pid, conf->barrier, 
conf->nr_pending, conf->nr_waiting, conf->nr_queued);
         wait_event_lock_irq(conf->wait_barrier, !conf->nr_waiting,
                             conf->resync_lock,
                             raid1_unplug(conf->mddev->queue));
         printk(KERN_DEBUG "%i: Post wait until no IO waiting: %d, pend: 
%d, wait: %d, queued: %d\n", current->pid, conf->barrier, 
conf->nr_pending, conf->nr_waiting, conf->nr_queued);

         /* block any new IO from starting */
         conf->barrier++;

         /* No wait for all pending IO to complete */
         printk(KERN_DEBUG "%i: Pre wait pending to complete: %d, pend: 
%d, wait: %d, queued: %d\n", current->pid, conf->barrier, 
conf->nr_pending, conf->nr_waiting, conf->nr_queued);
         wait_event_lock_irq(conf->wait_barrier,
                             !conf->nr_pending && conf->barrier < 
RESYNC_DEPTH,
                             conf->resync_lock,
                             raid1_unplug(conf->mddev->queue));
         printk(KERN_DEBUG "%i: Post wait pending to complete: %d, pend: 
%d, wait: %d, queued: %d\n", current->pid, conf->barrier, 
conf->nr_pending, conf->nr_waiting, conf->nr_queued);

         spin_unlock_irq(&conf->resync_lock);
}

#define __wait_event_lock_irq(wq, condition, lock, cmd)                 \
do {                                                                    \
         wait_queue_t __wait;                                            \
         int ourpid;                                                     \
         ourpid = current->pid;                                          \
         printk(KERN_DEBUG "%i is pid in wait_event_lock_irq.\n", 
ourpid);      \
         init_waitqueue_entry(&__wait, current);                         \
                                                                         \
         add_wait_queue(&wq, &__wait);                                   \
         for (;;) {                                                      \
                 set_current_state(TASK_UNINTERRUPTIBLE);                \
                 if (condition)                                          \
                         break;                                          \
                 spin_unlock_irq(&lock);                                 \
                 printk(KERN_DEBUG "%i Calling cmd\n", 
ourpid);                  \
                 cmd;                                                    \
                 printk(KERN_DEBUG "%i Returned from cmd\n", 
ourpid);                    \
                 schedule();                                             \
                 spin_lock_irq(&lock);                                   \
         }                                                               \
         if (ourpid != (current->pid))                                   \
                 printk(KERN_DEBUG "PID changed!  Old: %i, new: %i\n", 
ourpid, current->pid); \
         current->state = TASK_RUNNING;                                  \
         remove_wait_queue(&wq, &__wait);                                \
} while (0)




  and got this:



[  112.240838] 2410 Returned from cmd
[  112.240838] 2410 Calling cmd
[  112.240838] 2410 Returned from cmd
[  112.240838] 2410 Calling cmd
[  112.240838] 2410 Returned from cmd
[  112.240838] 2410 Calling cmd
[  112.240838] 2410 Returned from cmd
[  112.240838] 2410 Calling cmd
[  112.240838] 2410 Returned from cmd
[  112.240838] 2410 Calling cmd
[  112.240838] 2410 Returned from cmd
[  112.240838] 2410 Calling cmd
[  112.240838] 2410 Returned from cmd
[  112.240838] 2410 Calling cmd
[  112.240838] 2410 Returned from cmd
[  112.240838] 2410 Calling cmd
[  112.240838] 2410 Returned from cmd
[  112.240838] 2410 Calling cmd
[  112.240838] 2410 Returned from cmd
[  112.240838] 2410 Calling cmd
[  112.240838] 2410 Returned from cmd
[  112.240838] 2410: Post wait until no IO waiting: 0, pend: 448, wait: 
0, queued: 0
[  112.240838] 2410: Pre wait pending to complete: 1, pend: 448, wait: 
0, queued: 0
[  112.240838] 2410 is pid in wait_event_lock_irq.
[  112.240838] 2410 Calling cmd
[  112.243066] 5950 is pid in wait_event_lock_irq.
[  112.243066] 5950 Calling cmd
[  112.243066] 5950 Returned from cmd
[  112.240838] 2410 Returned from cmd
[  112.243066] 2430 is pid in wait_event_lock_irq.
[  112.243066] 2430 Calling cmd
[  112.243066] 2430 Returned from cmd
[  112.257942] 2430 Calling cmd
[  112.258447] 2410 Calling cmd
[  112.258447] 5950 Calling cmd
[  112.258453] 2410 Returned from cmd
[  112.258447] 5950 Returned from cmd
[  112.257942] 2430 Returned from cmd
[  112.258447] 5745 is pid in wait_event_lock_irq.
[  112.258451] 5745 Calling cmd
[  112.258454] 5745 Returned from cmd
[  112.262495] 5745 Calling cmd
[  112.262495] 2430 Calling cmd
[  112.262495] 2410 Calling cmd
[  112.262495] 5745 Returned from cmd
[  112.262495] 2430 Returned from cmd
[  112.262495] 2410 Returned from cmd
[  112.262495] 5950 Calling cmd
[  112.262495] 5950 Returned from cmd
[  112.263422] 2430 Calling cmd
[  112.263422] 2430 Returned from cmd
[  112.264770] 2410 Calling cmd
[  112.264772] 2410 Returned from cmd






-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309


  reply	other threads:[~2010-10-19 16:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-17 14:53 Deadlock in md barrier code? / RAID1 / LVM CoW snapshot + ext3 / Debian 5.0 - lenny 2.6.26 kernel Tim Small
2010-09-17 22:59 ` Neil Brown
2010-09-20 19:59   ` Tim Small
2010-09-21 21:02     ` Tim Small
2010-09-21 22:30       ` Neil Brown
2010-10-12 13:59         ` Tim Small
2010-10-12 14:06         ` Tim Small
2010-10-12 16:48           ` CoolCold
2010-10-13  8:51             ` Tim Small
2010-10-13 13:00               ` CoolCold
2010-10-18 18:52         ` Tim Small
2010-10-19  6:16           ` Neil Brown
2010-10-19 16:24             ` Tim Small [this message]
2010-10-19 16:29               ` Tim Small
2010-10-19 19:29                 ` Tim Small
2010-10-20 20:34                   ` Tim Small
2010-10-20 23:04                     ` Neil Brown
2010-11-18 18:04                       ` Tim Small
2010-11-21 23:05                         ` Neil Brown
2010-12-06 15:42                           ` Tim Small
2010-09-21 22:21     ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CBDC641.9090009@seoss.co.uk \
    --to=tim@seoss.co.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).