From: Neil Brown <neilb@suse.de>
To: Justin Bronder <jsbronder@gentoo.org>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid10 device hangs during resync and heavy I/O.
Date: Mon, 2 Aug 2010 12:29:49 +1000 [thread overview]
Message-ID: <20100802122949.7bea3e7c@notabene> (raw)
In-Reply-To: <20100723154701.GA2090@gmail.com>
On Fri, 23 Jul 2010 11:47:01 -0400
Justin Bronder <jsbronder@gentoo.org> wrote:
> On 23/07/10 13:19 +1000, Neil Brown wrote:
> > On Thu, 22 Jul 2010 14:49:33 -0400
> >
> > So the 'dd' process successfully waited for the barrier to be gone at
> > 189.021179, and thus set pending to '1'. It then submitted the IO request.
> > We should then see swapper (or possibly some other thread) calling
> > allow_barrier when the request completes. But we don't.
> > A request could possibly take many milliseconds to complete, but it shouldn't
> > take seconds and certainly not minutes.
> >
> > It might be helpful if you could run this again, and in make_request(), after
> > the call to "wait_barrier()" print out:
> > bio->bi_sector, bio->bi_size, bio->bi_rw
> >
> > I'm guessing that the last request that doesn't seem to complete will be
> > different from the other in some important way.
>
> Nothing stood out to me, but here's the tail end of a couple of different
> traces.
Thanks a lot! Something does stand out for me!....
> <...>-5047 [002] 207.051215: wait_barrier: out: dd - w:0 p:1 b:0
> <...>-5047 [002] 207.051216: make_request: dd - sector:7472081 sz:20480 rw:0
> <...>-4958 [003] 207.051218: raise_barrier: mid: md99_resync - w:0 p:1 b:1
> <...>-5047 [002] 207.051227: wait_barrier: in: dd - w:0 p:1 b:1
> <idle>-0 [002] 207.058929: allow_barrier: swapper - w:1 p:0 b:1
> <...>-4958 [003] 207.058938: raise_barrier: out: md99_resync - w:1 p:0 b:1
> <...>-4958 [003] 207.059044: raise_barrier: in: md99_resync - w:1 p:0 b:1
> <...>-4957 [003] 207.067171: lower_barrier: md99_raid10 - w:1 p:0 b:0
> <...>-5047 [002] 207.067179: wait_barrier: out: dd - w:0 p:1 b:0
> <...>-5047 [002] 207.067180: make_request: dd - sector:7472121 sz:3584 rw:0
> <...>-4958 [003] 207.067182: raise_barrier: mid: md99_resync - w:0 p:1 b:1
> <...>-5047 [002] 207.067184: wait_barrier: in: dd - w:0 p:1 b:1
The last successful IO is only 3584 bytes - 7 sectors. All the others are
much larger.
It is almost certain that the read needed to cross a chunk boundary, so some
goes to one device, some to the next. It was probably a 64K read or similar.
The first 5 pages all fit in one device, and so goes through as a 20K
read. The next page doesn't so it comes down to md/raid10 as a 1 page read.
raid10 splits it into a 7 sector read and a 1 sector read. We see the 7
sector read being initiated, but it doesn't complete for some reason and the
resync barrier gets in the way so the 1 sector read blocks in wait_barrier.
In the next trace....
>
>
>
> <idle>-0 [000] 463.231730: allow_barrier: swapper - w:2 p:4 b:1
> <idle>-0 [000] 463.231739: allow_barrier: swapper - w:2 p:3 b:1
> <idle>-0 [000] 463.231746: allow_barrier: swapper - w:2 p:2 b:1
> <idle>-0 [000] 463.231765: allow_barrier: swapper - w:2 p:1 b:1
> <idle>-0 [000] 463.231774: allow_barrier: swapper - w:2 p:0 b:1
> <...>-5004 [000] 463.231792: raise_barrier: out: md99_resync - w:2 p:0 b:1
> <...>-5004 [000] 463.232005: raise_barrier: in: md99_resync - w:2 p:0 b:1
> <...>-5003 [001] 463.232453: lower_barrier: md99_raid10 - w:2 p:0 b:0
> <...>-5009 [000] 463.232463: wait_barrier: out: flush-9:99 - w:1 p:1 b:0
> <...>-5009 [000] 463.232464: make_request: flush-9:99 - sector:13931137 sz:61440 rw:1
> <...>-5105 [001] 463.232466: wait_barrier: out: dd - w:0 p:2 b:0
> <...>-5105 [001] 463.232467: make_request: dd - sector:7204393 sz:40960 rw:0
> <...>-5009 [000] 463.232476: wait_barrier: in: flush-9:99 - w:0 p:2 b:0
> <...>-5009 [000] 463.232477: wait_barrier: out: flush-9:99 - w:0 p:3 b:0
> <...>-5009 [000] 463.232477: make_request: flush-9:99 - sector:13931257 sz:3584 rw:1
> <...>-5009 [000] 463.232481: wait_barrier: in: flush-9:99 - w:0 p:3 b:0
> <...>-5009 [000] 463.232482: wait_barrier: out: flush-9:99 - w:0 p:4 b:0
> <...>-5009 [000] 463.232483: make_request: flush-9:99 - sector:13931264 sz:512 rw:1
> <...>-5105 [001] 463.232492: wait_barrier: in: dd - w:0 p:4 b:0
> <...>-5105 [001] 463.232493: wait_barrier: out: dd - w:0 p:5 b:0
We see a similar thing with a write being broken into a 15 page writes, then
a 7 sector write, then a 1 sector write - that all works.
> <...>-5105 [001] 463.232494: make_request: dd - sector:7204473 sz:3584 rw:0
> <...>-5004 [000] 463.232495: raise_barrier: mid: md99_resync - w:0 p:5 b:1
> <...>-5105 [001] 463.232496: wait_barrier: in: dd - w:0 p:5 b:1
> <...>-5009 [000] 463.232522: wait_barrier: in: flush-9:99 - w:1 p:5 b:1
> <idle>-0 [000] 463.232726: allow_barrier: swapper - w:2 p:4 b:1
> <idle>-0 [001] 463.240520: allow_barrier: swapper - w:2 p:3 b:1
> <idle>-0 [000] 463.240946: allow_barrier: swapper - w:2 p:2 b:1
> <idle>-0 [000] 463.240955: allow_barrier: swapper - w:2 p:1 b:1
>
But again we see a 7 sector read following a larger read, and the 1 sector
read that should follow gets blocked.
So it is somehow related to the need to split one-page requests across
multiple devices, and it could be specific to read requests.
Ahhhh.... I see the problem. Because a 'generic_make_request' is already
active, the once called by raid10::make_request just queues the request until
the top level one completes. This results in a deadlock.
I'll have to ponder a bit to figure out the best way to fix this.
Thanks again for the report and the help tracking down the problem.
NeilBrown
> Thanks,
>
next prev parent reply other threads:[~2010-08-02 2:29 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-16 18:46 Raid10 device hangs during resync and heavy I/O Justin Bronder
2010-07-16 18:49 ` Justin Bronder
2010-07-22 18:49 ` Justin Bronder
2010-07-23 3:19 ` Neil Brown
2010-07-23 15:47 ` Justin Bronder
2010-08-02 2:29 ` Neil Brown [this message]
2010-08-02 2:58 ` Neil Brown
2010-08-02 20:37 ` Justin Bronder
2010-08-07 11:22 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100802122949.7bea3e7c@notabene \
--to=neilb@suse.de \
--cc=jsbronder@gentoo.org \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).