From: Dave Chinner <david@fromorbit.com>
To: Stan Hoeppner <stan@hardwarefreak.com>
Cc: Eric Sandeen <sandeen@sandeen.net>, xfs@oss.sgi.com
Subject: Re: makefs alignment issue
Date: Tue, 28 Oct 2014 11:32:19 +1100 [thread overview]
Message-ID: <20141028003219.GC16186@dastard> (raw)
In-Reply-To: <544ECF65.8090806@hardwarefreak.com>
On Mon, Oct 27, 2014 at 06:04:05PM -0500, Stan Hoeppner wrote:
> On 10/26/2014 06:43 PM, Dave Chinner wrote:
> > On Sat, Oct 25, 2014 at 12:35:17PM -0500, Stan Hoeppner wrote:
> >> If the same interface is used for Linux logical block devices (md, dm,
> >> lvm, etc) and hardware RAID, I have a hunch it may be better to
> >> determine that, if possible, before doing anything with these values.
> >> As you said previously, and I agree 100%, a lot of RAID vendors don't
> >> export meaningful information here. In this specific case, I think the
> >> RAID engineers are exporting a value, 1 MB, that works best for their
> >> cache management, or some other path in their firmware. They're
> >> concerned with host interface xfer into the controller, not the IOs on
> >> the back end to the disks. They don't see this as an end-to-end deal.
> >> In fact, I'd guess most of these folks see their device as performing
> >> magic, and it doesn't matter what comes in or goes out either end.
> >> "We'll take care of it."
> >
> > Deja vu. This is an isochronous RAID array you are having trouble
> > with, isn't it?
>
> I don't believe so. I'm pretty sure the parity rotates; i.e. standard
> RAID5/6.
The location of parity doesn't dtermine that it is isochronous in
behaviour or not. Often RAID5/6 is marketing speak for "single/dual
parity", not the type of redundancy that is implemented in the
hardware ;)
> > FWIW, do your problems go away when you make you hardware LUN width
> > a multiple of the cache segment size?
>
> Hadn't tried it. And I don't have the opportunity now as my contract
> has ended. However the problems we were having weren't related to
> controller issues but excessive seeking. I mentioned this in that
> (rather lengthy) previous reply.
Right, but if you had a 768k stripe width and a 1MB cache segment
size, a cache segment operation would require two stripe widths to
be operated on, and only one would be a whole stripe width. hence
the possibility of doing more IOs than are necessary to populate
or write back cache segments. i.e. it's a potential reason for
why the back end disks didn't have anywhere near the expected seek
capability they were supposed to have....
> >> optimal_io_size. I'm guessing this has different meaning for different
> >> folks. You say optimal_io_size is the same as RAID width. Apply that
> >> to this case:
> >>
> >> hardware RAID 60 LUN, 4 arrays
> >> 16+2 RAID6, 256 KB stripe unit, 4096 KB stripe width
> >> 16 MB LUN stripe width
> >> optimal_io_size = 16 MB
> >>
> >> Is that an appropriate value for optimal_io_size even if this is the
> >> RAID width? I'm not saying it isn't. I don't know. I don't know what
> >> other layers of the Linux and RAID firmware stacks are affected by this,
> >> nor how they're affected.
> >
> > yup, i'd expect minimum = 4MB (i.e stripe unit 4MB so we align to
> > the underlying RAID6 luns) and optimal = 16MB for the stripe width
> > (and so with swalloc we align to the first lun in the RAID0).
>
> At minimum 4MB how does that affect journal writes which will be much
> smaller, especially with a large file streaming workload, for which this
> setup is appropriate? Isn't the minimum a hard setting? I.e. we can
> never do an IO less than 4MB? Do other layers of the stack use this
> variable? Are they expecting values this large?
No, "minimum_io_size" is for "minimum *efficient* IO size" not the
smallest supported IO size. The smallest supported IO sizes and
atomic IO sizes are defined by hw_sector_size,
physical_block_size and logical_block_size.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-10-28 0:33 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-24 20:11 makefs alignment issue Stan Hoeppner
2014-10-24 20:14 ` Eric Sandeen
2014-10-24 22:08 ` Stan Hoeppner
2014-10-24 22:19 ` Eric Sandeen
2014-10-24 22:27 ` Eric Sandeen
2014-10-25 3:08 ` Stan Hoeppner
2014-10-25 15:51 ` Eric Sandeen
2014-10-25 17:35 ` Stan Hoeppner
2014-10-26 23:43 ` Dave Chinner
2014-10-27 23:04 ` Stan Hoeppner
2014-10-28 0:32 ` Dave Chinner [this message]
2014-10-28 16:55 ` Stan Hoeppner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141028003219.GC16186@dastard \
--to=david@fromorbit.com \
--cc=sandeen@sandeen.net \
--cc=stan@hardwarefreak.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.