From: Stan Hoeppner <stan@hardwarefreak.com>
To: Eric Sandeen <sandeen@sandeen.net>, xfs@oss.sgi.com
Subject: Re: makefs alignment issue
Date: Sat, 25 Oct 2014 12:35:17 -0500 [thread overview]
Message-ID: <544BDF55.9040804@hardwarefreak.com> (raw)
In-Reply-To: <544BC6FA.8090101@sandeen.net>
On 10/25/2014 10:51 AM, Eric Sandeen wrote:
> On 10/24/14 10:08 PM, Stan Hoeppner wrote:
>> On 10/24/2014 05:27 PM, Eric Sandeen wrote:
>>> On 10/24/14 5:19 PM, Eric Sandeen wrote:
>>>> On 10/24/14 5:08 PM, Stan Hoeppner wrote:
>>>>>
>>>>> On 10/24/2014 03:14 PM, Eric Sandeen wrote:
>>>>
>>>> ...
>>>>
>>>>>>> Any ideas how to verify what's going on here and fix it?
>>>>>>
>>>>>> # blockdev --getiomin --getioopt /dev/s2d_a1l003
>>>
>>> Also, what does it show for the underlying non-multipath device(s)?
>>
>> # blockdev --getiomin --getioopt /dev/sdj
>> 512
>> 1048576
>> # blockdev --getiomin --getioopt /dev/sdf
>> 512
>> 1048576
>
> Ok, so dm multipath is just bubbling up what the device itself
> is claiming; not dm's doing.
>
> I forgot to ask (and you forgot to report...!) what version
> of xfsprogs you're using....
Sorry Eric, my bad. I should know better after all these years. :(
It's old Debian 6.0 IIRC, let's see...
# xfs_repair -V
xfs_repair version 3.1.4
> Currently, blkid_get_topology() in xfsprogs does:
>
> /*
> * Blkid reports the information in terms of bytes, but we want it in
> * terms of 512 bytes blocks (just to convert it to bytes later..)
> *
> * If the reported values are the same as the physical sector size
> * do not bother to report anything. It will just cause warnings
> * if people specify larger stripe units or widths manually.
> */
> val = blkid_topology_get_minimum_io_size(tp);
> if (val > *psectorsize)
> *sunit = val >> 9;
> val = blkid_topology_get_optimal_io_size(tp);
> if (val > *psectorsize)
> *swidth = val >> 9;
>
> so in your case sunit probably wouldn't get set (can you confirm with
> # blockdev --getpbsz that the physical sector size is also 512?)
# blockdev --getpbsz /dev/dm-0
512
> But the optimal size is > physical sector so swidth gets set.
>
> Bleah... can you just collect all of:
>
> # blockdev --getpbsz --getss --getiomin --getioopt
# blockdev --getpbsz --getss --getiomin --getioopt /dev/sdj
512
512
512
1048576
# blockdev --getpbsz --getss --getiomin --getioopt /dev/sdh
512
512
512
1048576
> for your underlying devices, and I'll dig into how xfsprogs is behaving for
> those values. I have a hunch that we should be ignoring stripe units of 512
> even if the "width" claims to be something larger.
Just a hunch? :)
If the same interface is used for Linux logical block devices (md, dm,
lvm, etc) and hardware RAID, I have a hunch it may be better to
determine that, if possible, before doing anything with these values.
As you said previously, and I agree 100%, a lot of RAID vendors don't
export meaningful information here. In this specific case, I think the
RAID engineers are exporting a value, 1 MB, that works best for their
cache management, or some other path in their firmware. They're
concerned with host interface xfer into the controller, not the IOs on
the back end to the disks. They don't see this as an end-to-end deal.
In fact, I'd guess most of these folks see their device as performing
magic, and it doesn't matter what comes in or goes out either end.
"We'll take care of it."
I don't know what underlying SCSI command is used for populating
optimal_io_size. I'm guessing this has different meaning for different
folks. You say optimal_io_size is the same as RAID width. Apply that
to this case:
hardware RAID 60 LUN, 4 arrays
16+2 RAID6, 256 KB stripe unit, 4096 KB stripe width
16 MB LUN stripe width
optimal_io_size = 16 MB
Is that an appropriate value for optimal_io_size even if this is the
RAID width? I'm not saying it isn't. I don't know. I don't know what
other layers of the Linux and RAID firmware stacks are affected by this,
nor how they're affected.
Thanks,
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-10-25 17:34 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-24 20:11 makefs alignment issue Stan Hoeppner
2014-10-24 20:14 ` Eric Sandeen
2014-10-24 22:08 ` Stan Hoeppner
2014-10-24 22:19 ` Eric Sandeen
2014-10-24 22:27 ` Eric Sandeen
2014-10-25 3:08 ` Stan Hoeppner
2014-10-25 15:51 ` Eric Sandeen
2014-10-25 17:35 ` Stan Hoeppner [this message]
2014-10-26 23:43 ` Dave Chinner
2014-10-27 23:04 ` Stan Hoeppner
2014-10-28 0:32 ` Dave Chinner
2014-10-28 16:55 ` Stan Hoeppner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=544BDF55.9040804@hardwarefreak.com \
--to=stan@hardwarefreak.com \
--cc=sandeen@sandeen.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox