Re: Raid10 device hangs during resync and heavy I/O.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: Justin Bronder <jsbronder@gentoo.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid10 device hangs during resync and heavy I/O.
Date: Mon, 2 Aug 2010 12:29:49 +1000	[thread overview]
Message-ID: <20100802122949.7bea3e7c@notabene> (raw)
In-Reply-To: <20100723154701.GA2090@gmail.com>

On Fri, 23 Jul 2010 11:47:01 -0400
Justin Bronder <jsbronder@gentoo.org> wrote:

> On 23/07/10 13:19 +1000, Neil Brown wrote:
> > On Thu, 22 Jul 2010 14:49:33 -0400

> > 
> > So the 'dd' process successfully waited for the barrier to be gone at
> > 189.021179, and thus set pending to '1'.  It then submitted the IO request.
> > We should then see swapper (or possibly some other thread) calling
> > allow_barrier when the request completes.  But we don't.
> > A request could possibly take many milliseconds to complete, but it shouldn't
> > take seconds and certainly not minutes.
> > 
> > It might be helpful if you could run this again, and in make_request(), after
> > the call to "wait_barrier()" print out:
> >   bio->bi_sector, bio->bi_size, bio->bi_rw
> > 
> > I'm guessing that the last request that doesn't seem to complete will be
> > different from the other in some important way.
> 
> Nothing stood out to me, but here's the tail end of a couple of different
> traces.

Thanks a lot!  Something does stand out for me!....

>            <...>-5047  [002]   207.051215: wait_barrier: out: dd - w:0 p:1 b:0
>            <...>-5047  [002]   207.051216: make_request: dd - sector:7472081 sz:20480 rw:0
>            <...>-4958  [003]   207.051218: raise_barrier: mid: md99_resync - w:0 p:1 b:1
>            <...>-5047  [002]   207.051227: wait_barrier: in:  dd - w:0 p:1 b:1
>           <idle>-0     [002]   207.058929: allow_barrier:     swapper - w:1 p:0 b:1
>            <...>-4958  [003]   207.058938: raise_barrier: out: md99_resync - w:1 p:0 b:1
>            <...>-4958  [003]   207.059044: raise_barrier: in:  md99_resync - w:1 p:0 b:1
>            <...>-4957  [003]   207.067171: lower_barrier:     md99_raid10 - w:1 p:0 b:0
>            <...>-5047  [002]   207.067179: wait_barrier: out: dd - w:0 p:1 b:0
>            <...>-5047  [002]   207.067180: make_request: dd - sector:7472121 sz:3584 rw:0
>            <...>-4958  [003]   207.067182: raise_barrier: mid: md99_resync - w:0 p:1 b:1
>            <...>-5047  [002]   207.067184: wait_barrier: in:  dd - w:0 p:1 b:1

The last successful IO is only 3584 bytes - 7 sectors.  All the others are
much larger.
It is almost certain that the read needed to cross a chunk boundary, so some
goes to one device, some to the next.  It was probably a 64K read or similar.
The first 5 pages all fit in one device, and so goes through as a 20K
read. The next page doesn't so it comes down to md/raid10 as a 1 page read.
raid10 splits it into a 7 sector read and a 1 sector read.  We see the 7
sector read being initiated, but it doesn't complete for some reason and the
resync barrier gets in the way so the 1 sector read blocks in wait_barrier.

In the next trace....
> 
> 
> 
>           <idle>-0     [000]   463.231730: allow_barrier:     swapper - w:2 p:4 b:1
>           <idle>-0     [000]   463.231739: allow_barrier:     swapper - w:2 p:3 b:1
>           <idle>-0     [000]   463.231746: allow_barrier:     swapper - w:2 p:2 b:1
>           <idle>-0     [000]   463.231765: allow_barrier:     swapper - w:2 p:1 b:1
>           <idle>-0     [000]   463.231774: allow_barrier:     swapper - w:2 p:0 b:1
>            <...>-5004  [000]   463.231792: raise_barrier: out: md99_resync - w:2 p:0 b:1
>            <...>-5004  [000]   463.232005: raise_barrier: in:  md99_resync - w:2 p:0 b:1
>            <...>-5003  [001]   463.232453: lower_barrier:     md99_raid10 - w:2 p:0 b:0
>            <...>-5009  [000]   463.232463: wait_barrier: out: flush-9:99 - w:1 p:1 b:0
>            <...>-5009  [000]   463.232464: make_request: flush-9:99 - sector:13931137 sz:61440 rw:1
>            <...>-5105  [001]   463.232466: wait_barrier: out: dd - w:0 p:2 b:0
>            <...>-5105  [001]   463.232467: make_request: dd - sector:7204393 sz:40960 rw:0
>            <...>-5009  [000]   463.232476: wait_barrier: in:  flush-9:99 - w:0 p:2 b:0
>            <...>-5009  [000]   463.232477: wait_barrier: out: flush-9:99 - w:0 p:3 b:0
>            <...>-5009  [000]   463.232477: make_request: flush-9:99 - sector:13931257 sz:3584 rw:1
>            <...>-5009  [000]   463.232481: wait_barrier: in:  flush-9:99 - w:0 p:3 b:0
>            <...>-5009  [000]   463.232482: wait_barrier: out: flush-9:99 - w:0 p:4 b:0
>            <...>-5009  [000]   463.232483: make_request: flush-9:99 - sector:13931264 sz:512 rw:1
>            <...>-5105  [001]   463.232492: wait_barrier: in:  dd - w:0 p:4 b:0
>            <...>-5105  [001]   463.232493: wait_barrier: out: dd - w:0 p:5 b:0

We see a similar thing with a write being broken into a 15 page writes, then
a 7 sector write, then a 1 sector write - that all works.

>            <...>-5105  [001]   463.232494: make_request: dd - sector:7204473 sz:3584 rw:0
>            <...>-5004  [000]   463.232495: raise_barrier: mid: md99_resync - w:0 p:5 b:1
>            <...>-5105  [001]   463.232496: wait_barrier: in:  dd - w:0 p:5 b:1
>            <...>-5009  [000]   463.232522: wait_barrier: in:  flush-9:99 - w:1 p:5 b:1
>           <idle>-0     [000]   463.232726: allow_barrier:     swapper - w:2 p:4 b:1
>           <idle>-0     [001]   463.240520: allow_barrier:     swapper - w:2 p:3 b:1
>           <idle>-0     [000]   463.240946: allow_barrier:     swapper - w:2 p:2 b:1
>           <idle>-0     [000]   463.240955: allow_barrier:     swapper - w:2 p:1 b:1
> 

But again we see a 7 sector read following a larger read, and the 1 sector
read that should follow gets blocked.

So it is somehow related to the need to split one-page requests across
multiple devices, and it could be specific to read requests.

Ahhhh.... I see the problem.  Because a 'generic_make_request' is already
active, the once called by raid10::make_request just queues the request until
the top level one completes.   This results in a deadlock.

I'll have to ponder a bit to figure out the best way to fix this.

Thanks again for the report and the help tracking down the problem.

NeilBrown


> Thanks,
>

next prev parent reply	other threads:[~2010-08-02  2:29 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-16 18:46 Raid10 device hangs during resync and heavy I/O Justin Bronder
2010-07-16 18:49 ` Justin Bronder
2010-07-22 18:49 ` Justin Bronder
2010-07-23  3:19   ` Neil Brown
2010-07-23 15:47     ` Justin Bronder
2010-08-02  2:29       ` Neil Brown [this message]
2010-08-02  2:58         ` Neil Brown
2010-08-02 20:37           ` Justin Bronder
2010-08-07 11:22             ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100802122949.7bea3e7c@notabene \
    --to=neilb@suse.de \
    --cc=jsbronder@gentoo.org \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.