From: "Mike Snitzer" <snitzer@gmail.com>
To: "Linus Torvalds" <torvalds@linux-foundation.org>
Cc: "Mel Gorman" <mel@csn.ul.ie>,
"Martin Knoblauch" <spamtrap@knobisoft.de>,
"Fengguang Wu" <wfg@mail.ustc.edu.cn>,
"Peter Zijlstra" <peterz@infradead.org>,
jplatte@naasa.net, "Ingo Molnar" <mingo@elte.hu>,
linux-kernel@vger.kernel.org,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
James.Bottomley@steeleye.com
Subject: Re: regression: 100% io-wait with 2.6.24-rcX
Date: Fri, 18 Jan 2008 17:47:02 -0500 [thread overview]
Message-ID: <170fa0d20801181447h42308f40t73731ceb7d5e67@mail.gmail.com> (raw)
In-Reply-To: <170fa0d20801181200p50556132v3a9bafc9ad9e8c91@mail.gmail.com>
On Jan 18, 2008 3:00 PM, Mike Snitzer <snitzer@gmail.com> wrote:
>
> On Jan 18, 2008 12:46 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> >
> > On Fri, 18 Jan 2008, Mel Gorman wrote:
> > >
> > > Right, and this is consistent with other complaints about the PFN of the
> > > page mattering to some hardware.
> >
> > I don't think it's actually the PFN per se.
> >
> > I think it's simply that some controllers (quite probably affected by both
> > driver and hardware limits) have some subtle interactions with the size of
> > the IO commands.
> >
> > For example, let's say that you have a controller that has some limit X on
> > the size of IO in flight (whether due to hardware or driver issues doesn't
> > really matter) in addition to a limit on the size of the scatter-gather
> > size. They all tend to have limits, and they differ.
> >
> > Now, the PFN doesn't matter per se, but the allocation pattern definitely
> > matters for whether the IO's are physically contiguous, and thus matters
> > for the size of the scatter-gather thing.
> >
> > Now, generally the rule-of-thumb is that you want big commands, so
> > physical merging is good for you, but I could well imagine that the IO
> > limits interact, and end up hurting each other. Let's say that a better
> > allocation order allows for bigger contiguous physical areas, and thus
> > fewer scatter-gather entries.
> >
> > What does that result in? The obvious answer is
> >
> > "Better performance obviously, because the controller needs to do fewer
> > scatter-gather lookups, and the requests are bigger, because there are
> > fewer IO's that hit scatter-gather limits!"
> >
> > Agreed?
> >
> > Except maybe the *real* answer for some controllers end up being
> >
> > "Worse performance, because individual commands grow because they don't
> > hit the per-command limits, but now we hit the global size-in-flight
> > limits and have many fewer of these good commands in flight. And while
> > the commands are larger, it means that there are fewer outstanding
> > commands, which can mean that the disk cannot scheduling things
> > as well, or makes high latency of command generation by the controller
> > much more visible because there aren't enough concurrent requests
> > queued up to hide it"
> >
> > Is this the reason? I have no idea. But somebody who knows the AACRAID
> > hardware and driver limits might think about interactions like that.
> > Sometimes you actually might want to have smaller individual commands if
> > there is some other limit that means that it can be more advantageous to
> > have many small requests over a few big onees.
> >
> > RAID might well make it worse. Maybe small requests work better because
> > they are simpler to schedule because they only hit one disk (eg if you
> > have simple striping)! So that's another reason why one *large* request
> > may actually be slower than two requests half the size, even if it's
> > against the "normal rule".
> >
> > And it may be that that AACRAID box takes a big hit on DIO exactly because
> > DIO has been optimized almost purely for making one command as big as
> > possible.
> >
> > Just a theory.
>
> Oddly enough, I'm seeing the opposite here with 2.6.22.16 w/ AACRAID
> configured with 5 LUNS (each 2disk HW RAID0, 1024k stripesz). That
> is, with dd the avgrqsiz (from iostat) shows DIO to be ~130k whereas
> non-DIO is a mere ~13k! (NOTE: with aacraid, max_hw_sectors_kb=192)
...
> I can fire up 2.6.24-rc8 in short order to see if things are vastly
> improved (as Martin seems to indicate that he is happy with AACRAID on
> 2.6.24-rc8). Although even Martin's AACRAID numbers from 2.6.19.2 are
> still quite good (relative to mine). Martin can you share any tuning
> you may have done to get AACRAID to where it is for you right now?
I can confirm 2.6.24-rc8 behaves like Martin has posted for the
AACRAID. Slower DIO with smaller avgreqsiz. Much faster buffered IO
(for my config anyway) with a much larger avgreqsiz (180K).
I have no idea why 2.6.22.16's request size on non-DIO is _so_ small...
Mike
next prev parent reply other threads:[~2008-01-18 22:47 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-18 8:19 regression: 100% io-wait with 2.6.24-rcX Martin Knoblauch
2008-01-18 16:01 ` Mel Gorman
2008-01-18 17:46 ` Linus Torvalds
2008-01-18 19:01 ` Martin Knoblauch
2008-01-18 19:23 ` Linus Torvalds
2008-01-22 14:39 ` Alasdair G Kergon
2008-01-18 20:00 ` Mike Snitzer
2008-01-18 22:47 ` Mike Snitzer [this message]
-- strict thread matches above, loose matches on Subject: below --
2008-01-23 11:12 Martin Knoblauch
2008-01-22 18:51 Martin Knoblauch
2008-01-22 15:25 Martin Knoblauch
2008-01-22 23:40 ` Alasdair G Kergon
2008-01-19 10:24 Martin Knoblauch
2008-01-17 21:50 Martin Knoblauch
2008-01-17 22:12 ` Mel Gorman
2008-01-17 17:51 Martin Knoblauch
2008-01-17 17:44 Martin Knoblauch
2008-01-17 20:23 ` Mel Gorman
2008-01-17 13:52 Martin Knoblauch
2008-01-17 16:11 ` Mike Snitzer
2008-01-16 14:15 Martin Knoblauch
2008-01-16 16:27 ` Mike Snitzer
2008-01-16 9:26 Martin Knoblauch
[not found] ` <E1JF6w8-0000vs-HM@localhost.localdomain>
2008-01-16 12:00 ` Fengguang Wu
2008-01-16 12:00 ` Fengguang Wu
[not found] <200801071151.11200.lists@naasa.net>
[not found] ` <200801130905.44855.jplatte@naasa.net>
[not found] ` <400212488.11031@ustc.edu.cn>
[not found] ` <200801131049.33111.jplatte@naasa.net>
[not found] ` <E1JE1Uz-0002w5-6z@localhost.localdomain>
2008-01-13 11:59 ` Fengguang Wu
2008-01-13 11:59 ` Fengguang Wu
[not found] ` <20080113115933.GA11045@mail.ustc.edu.cn>
[not found] ` <E1JEGPH-0001uw-Df@localhost.localdomain>
2008-01-14 3:54 ` Fengguang Wu
2008-01-14 3:54 ` Fengguang Wu
[not found] ` <20080114035439.GA7330@mail.ustc.edu.cn>
[not found] ` <E1JEM2I-00010S-5U@localhost.localdomain>
2008-01-14 9:55 ` Fengguang Wu
2008-01-14 9:55 ` Fengguang Wu
2008-01-14 11:30 ` Joerg Platte
2008-01-14 11:41 ` Peter Zijlstra
[not found] ` <E1JEOmD-0001Ap-U7@localhost.localdomain>
2008-01-14 12:50 ` Fengguang Wu
2008-01-14 12:50 ` Fengguang Wu
2008-01-15 21:13 ` Mike Snitzer
[not found] ` <E1JF0m1-000101-OK@localhost.localdomain>
2008-01-16 5:25 ` Fengguang Wu
2008-01-16 5:25 ` Fengguang Wu
2008-01-15 21:42 ` Ingo Molnar
[not found] ` <E1JF0bJ-0000zU-FG@localhost.localdomain>
2008-01-16 5:14 ` Fengguang Wu
2008-01-16 5:14 ` Fengguang Wu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=170fa0d20801181447h42308f40t73731ceb7d5e67@mail.gmail.com \
--to=snitzer@gmail.com \
--cc=James.Bottomley@steeleye.com \
--cc=jplatte@naasa.net \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=spamtrap@knobisoft.de \
--cc=torvalds@linux-foundation.org \
--cc=wfg@mail.ustc.edu.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox