From: Dave Chinner <david@fromorbit.com>
To: Spelic <spelic@shiftmail.org>
Cc: xfs@oss.sgi.com
Subject: Re: Xfs delaylog hanged up
Date: Thu, 25 Nov 2010 08:50:04 +1100 [thread overview]
Message-ID: <20101124215004.GF22876@dastard> (raw)
In-Reply-To: <4CED0F40.606@shiftmail.org>
On Wed, Nov 24, 2010 at 02:12:32PM +0100, Spelic wrote:
> On 11/24/2010 01:20 AM, Dave Chinner wrote:
> >
> >512MB of BBWC backing the disks. The BBWC does a much better job of
> >reordering out-of-order writes than the Linux elevators because
> >512MB is a much bigger window than a couple of thousand 4k IOs.
>
> Hmmm very interesting...
> so you are using a MD or DM raid-0 above a SATA controller with a BBWC?
> That would probably be a RAID controller used as SATA because I have
> never seen SATA controllers with a BBWC. I'd be interested in the
> brand if you don't mind.
Actually, it's a SAS RAID controller:
03:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04)
With each disk exported as a RAID0 lun because the raid controller
does not do JBOD.
> Also I wanted to know... the requests to the drives are really only
> 4K in size for linux?
No, they do much larger than that. However, for small file
workloads the IO size is determined mostly by the file size.
> Then what purpose do the elevators' merges
> have? When the elevator merges two 4k requests doesn't it create an
> 8k request for the drive?
Yes. But when the two 4k blocks are not adjacent, they can't be
merged and hence are two IOs. And if the block that separated them
is then written 5ms after the other two completed, it's three IOs
that get combined in the BBWC into one...
> Also look at this competitor's link:
> http://thunk.org/tytso/blog/2010/11/01/i-have-the-money-shot-for-my-lca-presentation/
> post #9
> these scalability patches submit larger i/o than 4k. I can confirm
> that from within iostat -x 1 (I can't understand what he means with
> "bypasses the buffer cache layer" though, does it mean it's only for
> DIRECTIO? it does not seem to me).
It means he's calling submit_bio() rather than submit_bh(). Most of
that "new" code in ext4 was copied directly from XFS - XFS has been
using submit_bio() for large IO submission since roughly 2.6.15.
> When such large requests go into
> the elevator, are they broken up into 4K requests again?
No, the opposite used to happen - ext4 would break large contiguous
regions into 4k IOs (becaus submit_bh could only handle one block at
a time), and then the elevator would re-merge them into a large IO.
The issue here is CPU overhead of merging thousands of blocks
unnecessarily.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-11-24 21:48 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-22 19:27 Xfs delaylog hanged up Spelic
2010-11-22 23:29 ` Dave Chinner
2010-11-23 11:17 ` Spelic
2010-11-23 13:28 ` Spelic
2010-11-23 20:46 ` Dave Chinner
2010-11-23 22:14 ` Stan Hoeppner
2010-11-24 0:20 ` Dave Chinner
2010-11-24 13:12 ` Spelic
2010-11-24 21:50 ` Dave Chinner [this message]
2010-11-23 22:48 ` Emmanuel Florac
2010-11-24 0:36 ` Spelic
2010-11-24 1:40 ` Stan Hoeppner
2010-11-24 6:18 ` Michael Monnerie
2010-11-24 7:44 ` Emmanuel Florac
2010-11-24 0:58 ` Spelic
2010-11-24 5:44 ` Dave Chinner
2010-11-25 23:34 ` Spelic
2010-11-26 4:20 ` Dave Chinner
2010-11-24 22:52 ` Spelic
2010-11-26 22:43 ` Spelic
-- strict thread matches above, loose matches on Subject: below --
2010-11-24 4:03 Richard Scobie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101124215004.GF22876@dastard \
--to=david@fromorbit.com \
--cc=spelic@shiftmail.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox