From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Eric Sandeen <sandeen@redhat.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH RFC] xfs: set block device logical sector size on xfs_buftarg
Date: Thu, 14 Nov 2013 09:18:49 -0600 [thread overview]
Message-ID: <5284E9D9.8090706@sandeen.net> (raw)
In-Reply-To: <20131114064932.GO6188@dastard>
On 11/14/13, 12:49 AM, Dave Chinner wrote:
> On Wed, Nov 13, 2013 at 06:35:05PM -0600, Eric Sandeen wrote:
>> On 11/13/13, 12:25 PM, Eric Sandeen wrote:
>>> Pure RFC; this might be crazy. Here's the problem I'm trying to solve:
>>>
>>> Today, mkfs.xfs will select a 4k sector size for a 4k physical / 512 logical
>>> drive. (that change was done by me). The thought was that it'd be an
>>> efficiency gain to not make the drive do the (possible) RMW cycles on
>>> 512-byte log IO, primarily.
>>>
>>> However, now this restricts all DIO to 4k alignment, not the otherwise-
>>> possible 512.
>>
>> So, backing up... ;)
>>
>> XFS isn't doing anything wrong here. It can make sector sizes as it pleases,
>> and apps had darned well better accommodate its whims if they do direct IO.
>>
>> But some apps don't. And users are sad and confused, and grow to dislike
>> XFS, because it all worked just fine on that other filesystem, so screw you
>> XFS, and your flux capacitor drives with your power-fail interrupts!
>
> Funny how it's always XFS is at fault, when the same problem with 4k
> sectors will occur on ext4, for example....
Yep on a non-existent hard 4k disk, ext4 would have the same problem.
Meanwhile in the world of actual hardware, ext4 is fine. (there's no
similar sector-size switch for ext4).
Again; I'm NOT saying xfs is doing anything wrong, or is at fault.
We can be right all the way to the grave, if apps never get fixed,
and users have a choice of fs.
...
>> We could even ensure that XFS_IOC_DIOINFO offers up "4k" as the answer
>> to miniosz, so that apps which bother to ask get the optimal answer.
>
> Funnily enough, it does:
>
> da.d_mem = da.d_miniosz = 1 << target->bt_sshift;
...
Of course it does today; I was talking about whether we could report this
but still allow 512 under the covers.
>> Or, we could stop setting 4k sectors for AF drives.
>
> And just take the RMW penalty?
that, and the bonus of existing applications continuing to function.
>> Or we could just carry on, and keep telling users that it's their fault,
>> their app's fault, etc...
>
> ... and getting the problems fixed so they go away forever.
... or not. *cough*64 bit inodes*cough*
>> (I'm sympathetic to pushing the envelope and dragging apps into the 21st
>> century, but it's s double edged sword).
>
> Yes, it is, but if we don't take a stand and say "we, as an
> ecosystem, need to support 4k sectors *everywhere*", then we are
> going to have such problems *forever*. This isn't purely an XFS
> problem - this is something that the entire storage stack needs to
> support, from the hardware at the very bottom to the applications at
> the very top.
>
> XFS is stuck in the middle, where we cop it from both
> the hardware side ("why don't you support our hardware efficiently
> yet?") and from the application side when we do ("4k sectors break
> our assumptions!"). It's a no win situation for us no matter what we
> do, and history has shown that when we don't take a strong
> leadership position the problems don't get solved.
>
> So, let's take the initiative and make sure that everyone knows how
> to deal with these problems and get them fixed in the right places.
> I don't want to be spending the next 10 years complaining about a
> lack of 4k sector support in qemu. It's too much like the inode64
> saga over all over again.
which, TBH, has still never been fully addressed.
> Let's face it, it wouldn't be right if XFS wasn't fighting some
> battle to drag Linux kicking and screaming into the present...
Well. With my distro hat on I might have to be pragmatic, and keep
things working that are required to work.
Upstream, sure, we can keep beating users with a stick until they
force their app writers to make things work for them again. ;)
(Again, though, as middle ground - if there were a way for XFS to do
all internal IO efficiently as 4k-aligned, but allow applications
to do 512 emulation, that would be, IMHO, a great thing. I'm not
yet sure what it would take.)
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2013-11-14 15:18 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-13 18:25 [PATCH RFC] xfs: set block device logical sector size on xfs_buftarg Eric Sandeen
2013-11-13 18:56 ` Christoph Hellwig
2013-11-13 19:08 ` Eric Sandeen
2013-11-13 21:26 ` Dave Chinner
2013-11-13 21:32 ` Eric Sandeen
2013-11-13 22:10 ` Dave Chinner
2013-11-13 22:18 ` Eric Sandeen
2013-11-14 0:34 ` Dave Chinner
2013-11-14 13:37 ` Christoph Hellwig
2013-11-14 14:56 ` Eric Sandeen
2013-11-14 21:01 ` Dave Chinner
2013-11-22 14:13 ` Ric Wheeler
2013-11-22 14:20 ` Christoph Hellwig
2013-11-22 14:26 ` Ric Wheeler
2013-11-22 14:57 ` Eric Sandeen
2013-11-14 0:35 ` Eric Sandeen
2013-11-14 6:49 ` Dave Chinner
2013-11-14 13:09 ` Ric Wheeler
2013-11-14 15:03 ` Eric Sandeen
2013-11-14 15:18 ` Eric Sandeen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5284E9D9.8090706@sandeen.net \
--to=sandeen@sandeen.net \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=sandeen@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox