From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@infradead.org>,
Eric Sandeen <sandeen@redhat.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH RFC] xfs: set block device logical sector size on xfs_buftarg
Date: Thu, 14 Nov 2013 09:18:49 -0600 [thread overview]
Message-ID: <5284E9D9.8090706@sandeen.net> (raw)
In-Reply-To: <20131114064932.GO6188@dastard>
On 11/14/13, 12:49 AM, Dave Chinner wrote:
> On Wed, Nov 13, 2013 at 06:35:05PM -0600, Eric Sandeen wrote:
>> On 11/13/13, 12:25 PM, Eric Sandeen wrote:
>>> Pure RFC; this might be crazy. Here's the problem I'm trying to solve:
>>>
>>> Today, mkfs.xfs will select a 4k sector size for a 4k physical / 512 logical
>>> drive. (that change was done by me). The thought was that it'd be an
>>> efficiency gain to not make the drive do the (possible) RMW cycles on
>>> 512-byte log IO, primarily.
>>>
>>> However, now this restricts all DIO to 4k alignment, not the otherwise-
>>> possible 512.
>>
>> So, backing up... ;)
>>
>> XFS isn't doing anything wrong here. It can make sector sizes as it pleases,
>> and apps had darned well better accommodate its whims if they do direct IO.
>>
>> But some apps don't. And users are sad and confused, and grow to dislike
>> XFS, because it all worked just fine on that other filesystem, so screw you
>> XFS, and your flux capacitor drives with your power-fail interrupts!
>
> Funny how it's always XFS is at fault, when the same problem with 4k
> sectors will occur on ext4, for example....
Yep on a non-existent hard 4k disk, ext4 would have the same problem.
Meanwhile in the world of actual hardware, ext4 is fine. (there's no
similar sector-size switch for ext4).
Again; I'm NOT saying xfs is doing anything wrong, or is at fault.
We can be right all the way to the grave, if apps never get fixed,
and users have a choice of fs.
...
>> We could even ensure that XFS_IOC_DIOINFO offers up "4k" as the answer
>> to miniosz, so that apps which bother to ask get the optimal answer.
>
> Funnily enough, it does:
>
> da.d_mem = da.d_miniosz = 1 << target->bt_sshift;
...
Of course it does today; I was talking about whether we could report this
but still allow 512 under the covers.
>> Or, we could stop setting 4k sectors for AF drives.
>
> And just take the RMW penalty?
that, and the bonus of existing applications continuing to function.
>> Or we could just carry on, and keep telling users that it's their fault,
>> their app's fault, etc...
>
> ... and getting the problems fixed so they go away forever.
... or not. *cough*64 bit inodes*cough*
>> (I'm sympathetic to pushing the envelope and dragging apps into the 21st
>> century, but it's s double edged sword).
>
> Yes, it is, but if we don't take a stand and say "we, as an
> ecosystem, need to support 4k sectors *everywhere*", then we are
> going to have such problems *forever*. This isn't purely an XFS
> problem - this is something that the entire storage stack needs to
> support, from the hardware at the very bottom to the applications at
> the very top.
>
> XFS is stuck in the middle, where we cop it from both
> the hardware side ("why don't you support our hardware efficiently
> yet?") and from the application side when we do ("4k sectors break
> our assumptions!"). It's a no win situation for us no matter what we
> do, and history has shown that when we don't take a strong
> leadership position the problems don't get solved.
>
> So, let's take the initiative and make sure that everyone knows how
> to deal with these problems and get them fixed in the right places.
> I don't want to be spending the next 10 years complaining about a
> lack of 4k sector support in qemu. It's too much like the inode64
> saga over all over again.
which, TBH, has still never been fully addressed.
> Let's face it, it wouldn't be right if XFS wasn't fighting some
> battle to drag Linux kicking and screaming into the present...
Well. With my distro hat on I might have to be pragmatic, and keep
things working that are required to work.
Upstream, sure, we can keep beating users with a stick until they
force their app writers to make things work for them again. ;)
(Again, though, as middle ground - if there were a way for XFS to do
all internal IO efficiently as 4k-aligned, but allow applications
to do 512 emulation, that would be, IMHO, a great thing. I'm not
yet sure what it would take.)
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
prev parent reply other threads:[~2013-11-14 15:18 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-11-13 18:25 [PATCH RFC] xfs: set block device logical sector size on xfs_buftarg Eric Sandeen
2013-11-13 18:56 ` Christoph Hellwig
2013-11-13 19:08 ` Eric Sandeen
2013-11-13 21:26 ` Dave Chinner
2013-11-13 21:32 ` Eric Sandeen
2013-11-13 22:10 ` Dave Chinner
2013-11-13 22:18 ` Eric Sandeen
2013-11-14 0:34 ` Dave Chinner
2013-11-14 13:37 ` Christoph Hellwig
2013-11-14 14:56 ` Eric Sandeen
2013-11-14 21:01 ` Dave Chinner
2013-11-22 14:13 ` Ric Wheeler
2013-11-22 14:20 ` Christoph Hellwig
2013-11-22 14:26 ` Ric Wheeler
2013-11-22 14:57 ` Eric Sandeen
2013-11-14 0:35 ` Eric Sandeen
2013-11-14 6:49 ` Dave Chinner
2013-11-14 13:09 ` Ric Wheeler
2013-11-14 15:03 ` Eric Sandeen
2013-11-14 15:18 ` Eric Sandeen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5284E9D9.8090706@sandeen.net \
--to=sandeen@sandeen.net \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=sandeen@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.