From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>
Cc: Eric Sandeen <sandeen@redhat.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] xfs: allow logical-sector sized O_DIRECT for any fs sector size
Date: Fri, 17 Jan 2014 11:35:30 -0600 [thread overview]
Message-ID: <52D969E2.7030308@sandeen.net> (raw)
In-Reply-To: <20140116232132.GT3431@dastard>
On 1/16/14, 5:21 PM, Dave Chinner wrote:
> On Wed, Jan 15, 2014 at 04:52:05PM -0600, Eric Sandeen wrote:
>> On 1/15/14, 4:38 PM, Dave Chinner wrote:
>>> On Wed, Jan 15, 2014 at 11:59:45AM -0600, Eric Sandeen wrote:
>>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>>> index 33ad9a7..1f3431f 100644
>>>> --- a/fs/xfs/xfs_ioctl.c
>>>> +++ b/fs/xfs/xfs_ioctl.c
>>>> @@ -1587,7 +1587,12 @@ xfs_file_ioctl(
>>>> XFS_IS_REALTIME_INODE(ip) ?
>>>> mp->m_rtdev_targp : mp->m_ddev_targp;
>>>>
>>>> - da.d_mem = da.d_miniosz = 1 << target->bt_sshift;
>>>> + /*
>>>> + * Report device physical sector size as "optimal" minimum,
>>>> + * unless blocksize is smaller than that.
>>>> + */
>>>> + da.d_miniosz = min(target->bt_pssize, target->bt_bsize);
>>>
>>> Just grab the filesysetm block size from the xfs_mount:
>>>
>>> da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize);
>>>
>>>> + da.d_mem = da.d_miniosz;
>>>
>>> I'd suggest that this should be PAGE_SIZE - it's for memory buffer
>>> alignment, not IO alignment, so using the IO alignment just seems
>>> wrong to me...
>>
>> Ok. Was just sticking close to what we had before.
>>
>> So:
>> da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize);
>> da.d_mem = PAGE_SIZE;
>>
>> ? Then we can have a minimum IO size of 512, and a memory alignment of
>> 4k, isn't that a little odd?
>>
>> (IOWs we could do 512-aligned memory before, right? What's the downside,
>> or the value in changing it now?)
>
> We can do arbitrary byte aligned buffers if I understand
> get_user_pages() correctly - it just maps the page under the buffer
> into the kernel address space and then the bio is pointed at it.
> AFAICT, the reason for the "memory buffer needs 512 byte alignment"
> is simply that this is the minimum IO size supported.
Actually, it's fs/direct-io.c which enforces this (not sure why I couldn't
find that before), in do_blockdev_direct_IO(); the *enforced* minimum
memory alignment is the size of the bev's logical block size.
> However, for large IOs, 512 byte alignment is not really optimal. If
> we don't align the buffer to PAGE_SIZE, then we have partial head
> and tail pages, so for a 512k IO we need to map 129 pages into a bio
> instead of 128. When you have hardware that can only handle 128
> segments in a DMA transfer, that means the 512k IO needs to be sent
> in two IOs rather than one.
Ok, but I have a bit of a problem with changing what XFS_IOC_DIOINFO
reports. (I had originally thought to change the minimum IO size, but
I have talked myself out of that, too).
The xfsctl(3) manpage says that XFS_IOC_DIOINFO: "Get(s) information
required to perform direct I/O on the specified file descriptor."
and "the user’s data buffer must conform to the same type of
constraints as required for accessing a raw disk partition."
IOWs, the ioctl is documented as returning minimum, not optimal,
requirements, and it has always been implemented as such. Changing
its meaning now seems wrong. At least, I would not like to do so
as part of this functional change; to do so would probably be best
in a new ioctl which reports both minimum & optimal sizes. And at
that point we should just do a vfs interface. :)
Of course, applications don't have to use the minimum sizes reported
by the ioctl; they are free to be smarter and do larger sizes and
alignments. But if the ioctl was designed and documented to report
required *minimums*, I think we should leave it as such.
I'm going to resend the change, split up a bit more to separate
cleanups from functional changes, and maybe we can discuss the ioctl
change as a potential additional patch.
Thanks,
-Eric
> There's quite a bit of hardware out there that have a limit of 128
> segments to each IO....
>
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-01-17 17:35 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-15 17:59 [PATCH] xfs: allow logical-sector sized O_DIRECT for any fs sector size Eric Sandeen
2014-01-15 22:38 ` Dave Chinner
2014-01-15 22:52 ` Eric Sandeen
2014-01-16 23:21 ` Dave Chinner
2014-01-17 17:35 ` Eric Sandeen [this message]
2014-01-17 20:22 ` [PATCH 0/3 V2] " Eric Sandeen
2014-01-17 20:23 ` [PATCH 1/3] xfs: clean up xfs_buftarg Eric Sandeen
2014-01-20 14:21 ` Brian Foster
2014-01-17 20:26 ` [PATCH 2/3] xfs: rename xfs_buftarg structure members Eric Sandeen
2014-01-17 21:12 ` Roger Willcocks
2014-01-17 21:13 ` Eric Sandeen
2014-01-17 21:14 ` [PATCH 2/3 V2] " Eric Sandeen
2014-01-20 14:21 ` Brian Foster
2014-01-17 20:28 ` [PATCH 3/3] xfs: allow logical-sector sized O_DIRECT IOs Eric Sandeen
2014-01-20 14:21 ` Brian Foster
2014-01-20 14:53 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D969E2.7030308@sandeen.net \
--to=sandeen@sandeen.net \
--cc=david@fromorbit.com \
--cc=sandeen@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).