linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Tim Small <tim@seoss.co.uk>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Deadlock in md barrier code? / RAID1 / LVM CoW snapshot + ext3 / Debian 5.0 - lenny 2.6.26 kernel
Date: Mon, 22 Nov 2010 10:05:09 +1100	[thread overview]
Message-ID: <20101122100509.76592b59@notabene.brown> (raw)
In-Reply-To: <4CE56AA9.2010905@seoss.co.uk>

On Thu, 18 Nov 2010 18:04:25 +0000
Tim Small <tim@seoss.co.uk> wrote:

> On 10/21/10 00:04, Neil Brown wrote:
> >
> > Maybe you could add a could of global atomic variables, one for reads and one
> > for writes.
> > Then on each call to generic_make_request in:
> >   flush_pending_writes, make_request, raid1d
> > increment one or the other depending on whether it is a read or a write.
> > Then in raid1_end_read_request and raid1_end_write_request decrement them
> > appropriately.
> >   
> 
> 
> Ended up with runs like this:
> 
> [  464.244109] 0 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.244109] conf nr_pending is 5
> [  464.244109] 0 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.244109] 2216: Post wait until no IO waiting - barrier: 0, pend:
> 5, wait: 0, queued: 0
> [  464.244113] 2216: Pre wait pending to complete - barrier: 1, pend: 5,
> wait: 0, queued: 0
> [  464.244116] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.244469] 5113 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.244469] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.244469] 5127 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.244469] conf nr_pending is 4
> [  464.244469] 5127 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.244469] 5127 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.246176] In raid1_unplug debug read count: 4 write count: 0
> conf->nr_queued: 0
> [  464.244469] conf nr_pending is 3
> [  464.244469] 5127 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.244469] 5127 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.244469] conf nr_pending is 2
> [  464.244469] 5127 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.246176] In raid1_unplug debug read count: 2 write count: 0
> conf->nr_queued: 0
> [  464.244469] 5127 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.244469] conf nr_pending is 1
> [  464.244469] 5127 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.246176] In raid1_unplug debug read count: 1 write count: 0
> conf->nr_queued: 0
> [  464.244469] 5127 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.244469] conf nr_pending is 0
> [  464.244469] 5127 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.244469] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246176] 2216: Post wait pending to complete - barrier: 1, pend:
> 0, wait: 1, queued: 0
> [  464.246176] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246176] 2216: Pre wait until no IO waiting - barrier: 1, pend: 0,
> wait: 1, queued: 0
> [  464.246176] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246176] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246176] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246633] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 1
> [  464.244469] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246176] In raid1_unplug debug read count: 0 write count: 0
> conf->nr_queued: 0
> [  464.246990] 5118 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.244469] 5113 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.246990] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246990] 5118 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.246990] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246990] 5118 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.244469] In raid1_unplug debug read count: 3 write count: 0
> conf->nr_queued: 0
> [  464.246990] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246990] 5118 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.246990] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246990] 5118 Released resync_lock and enabling interrupts for
> wait_barrier
> [  464.246176] 2216: Post wait until no IO waiting - barrier: 0, pend:
> 6, wait: 0, queued: 0
> [  464.246176] 2216: Pre wait pending to complete - barrier: 1, pend: 6,
> wait: 0, queued: 0
> [  464.246176] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.246990] 5118 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246990] In raid1_unplug debug read count: 6 write count: 0
> conf->nr_queued: 0
> [  464.247091] 0 Obtaining resync_lock and disabling interrupts for
> allow_barrier
> [  464.247091] conf nr_pending is 5
> [  464.247091] 0 Released resync_lock and enabling interrupts for
> allow_barrier
> [  464.247091] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.244469] 5113 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.246176] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.244469] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  464.328828] 3639 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  464.328834] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  466.065137] 2479 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  466.065137] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  467.051970] 2478 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  467.051975] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  471.293667] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  471.525081] 220 Obtaining resync_lock and disabling interrupts for
> wait_barrier
> [  471.525081] In raid1_unplug debug read count: 5 write count: 0
> conf->nr_queued: 0
> [  505.308962] 2216 WARNING, schedule_timeout timed out for raise_barrier
> [  505.308076] 5118 WARNING, schedule_timeout timed out for wait_barrier
> 
> 
> ... so does that mean that there are are read requests going awol in the
> block layer?
> 
> If so, then the circumstantial evidence from when the lockups occur make
> it look to me like this is probably an OpenVZ bug, but I'll try and do
> some more digging tomorrow...
> 
> Does that make sense?

Yes.  Superficially, it appears that there are still 5 outstanding read
requests that are not being completed.   I cannot guess how OpenVZ would
cause that, but I don't really know much about OpenVZ.

NeilBrown


  reply	other threads:[~2010-11-21 23:05 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-17 14:53 Deadlock in md barrier code? / RAID1 / LVM CoW snapshot + ext3 / Debian 5.0 - lenny 2.6.26 kernel Tim Small
2010-09-17 22:59 ` Neil Brown
2010-09-20 19:59   ` Tim Small
2010-09-21 21:02     ` Tim Small
2010-09-21 22:30       ` Neil Brown
2010-10-12 13:59         ` Tim Small
2010-10-12 14:06         ` Tim Small
2010-10-12 16:48           ` CoolCold
2010-10-13  8:51             ` Tim Small
2010-10-13 13:00               ` CoolCold
2010-10-18 18:52         ` Tim Small
2010-10-19  6:16           ` Neil Brown
2010-10-19 16:24             ` Tim Small
2010-10-19 16:29               ` Tim Small
2010-10-19 19:29                 ` Tim Small
2010-10-20 20:34                   ` Tim Small
2010-10-20 23:04                     ` Neil Brown
2010-11-18 18:04                       ` Tim Small
2010-11-21 23:05                         ` Neil Brown [this message]
2010-12-06 15:42                           ` Tim Small
2010-09-21 22:21     ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101122100509.76592b59@notabene.brown \
    --to=neilb@suse.de \
    --cc=linux-raid@vger.kernel.org \
    --cc=tim@seoss.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).