From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>
Cc: Eric Sandeen <sandeen@redhat.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] xfs: allow logical-sector sized O_DIRECT for any fs sector size
Date: Fri, 17 Jan 2014 11:35:30 -0600 [thread overview]
Message-ID: <52D969E2.7030308@sandeen.net> (raw)
In-Reply-To: <20140116232132.GT3431@dastard>
On 1/16/14, 5:21 PM, Dave Chinner wrote:
> On Wed, Jan 15, 2014 at 04:52:05PM -0600, Eric Sandeen wrote:
>> On 1/15/14, 4:38 PM, Dave Chinner wrote:
>>> On Wed, Jan 15, 2014 at 11:59:45AM -0600, Eric Sandeen wrote:
>>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>>> index 33ad9a7..1f3431f 100644
>>>> --- a/fs/xfs/xfs_ioctl.c
>>>> +++ b/fs/xfs/xfs_ioctl.c
>>>> @@ -1587,7 +1587,12 @@ xfs_file_ioctl(
>>>> XFS_IS_REALTIME_INODE(ip) ?
>>>> mp->m_rtdev_targp : mp->m_ddev_targp;
>>>>
>>>> - da.d_mem = da.d_miniosz = 1 << target->bt_sshift;
>>>> + /*
>>>> + * Report device physical sector size as "optimal" minimum,
>>>> + * unless blocksize is smaller than that.
>>>> + */
>>>> + da.d_miniosz = min(target->bt_pssize, target->bt_bsize);
>>>
>>> Just grab the filesysetm block size from the xfs_mount:
>>>
>>> da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize);
>>>
>>>> + da.d_mem = da.d_miniosz;
>>>
>>> I'd suggest that this should be PAGE_SIZE - it's for memory buffer
>>> alignment, not IO alignment, so using the IO alignment just seems
>>> wrong to me...
>>
>> Ok. Was just sticking close to what we had before.
>>
>> So:
>> da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize);
>> da.d_mem = PAGE_SIZE;
>>
>> ? Then we can have a minimum IO size of 512, and a memory alignment of
>> 4k, isn't that a little odd?
>>
>> (IOWs we could do 512-aligned memory before, right? What's the downside,
>> or the value in changing it now?)
>
> We can do arbitrary byte aligned buffers if I understand
> get_user_pages() correctly - it just maps the page under the buffer
> into the kernel address space and then the bio is pointed at it.
> AFAICT, the reason for the "memory buffer needs 512 byte alignment"
> is simply that this is the minimum IO size supported.
Actually, it's fs/direct-io.c which enforces this (not sure why I couldn't
find that before), in do_blockdev_direct_IO(); the *enforced* minimum
memory alignment is the size of the bev's logical block size.
> However, for large IOs, 512 byte alignment is not really optimal. If
> we don't align the buffer to PAGE_SIZE, then we have partial head
> and tail pages, so for a 512k IO we need to map 129 pages into a bio
> instead of 128. When you have hardware that can only handle 128
> segments in a DMA transfer, that means the 512k IO needs to be sent
> in two IOs rather than one.
Ok, but I have a bit of a problem with changing what XFS_IOC_DIOINFO
reports. (I had originally thought to change the minimum IO size, but
I have talked myself out of that, too).
The xfsctl(3) manpage says that XFS_IOC_DIOINFO: "Get(s) information
required to perform direct I/O on the specified file descriptor."
and "the user’s data buffer must conform to the same type of
constraints as required for accessing a raw disk partition."
IOWs, the ioctl is documented as returning minimum, not optimal,
requirements, and it has always been implemented as such. Changing
its meaning now seems wrong. At least, I would not like to do so
as part of this functional change; to do so would probably be best
in a new ioctl which reports both minimum & optimal sizes. And at
that point we should just do a vfs interface. :)
Of course, applications don't have to use the minimum sizes reported
by the ioctl; they are free to be smarter and do larger sizes and
alignments. But if the ioctl was designed and documented to report
required *minimums*, I think we should leave it as such.
I'm going to resend the change, split up a bit more to separate
cleanups from functional changes, and maybe we can discuss the ioctl
change as a potential additional patch.
Thanks,
-Eric
> There's quite a bit of hardware out there that have a limit of 128
> segments to each IO....
>
> Cheers,
>
> Dave.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-01-17 17:35 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-15 17:59 [PATCH] xfs: allow logical-sector sized O_DIRECT for any fs sector size Eric Sandeen
2014-01-15 22:38 ` Dave Chinner
2014-01-15 22:52 ` Eric Sandeen
2014-01-16 23:21 ` Dave Chinner
2014-01-17 17:35 ` Eric Sandeen [this message]
2014-01-17 20:22 ` [PATCH 0/3 V2] " Eric Sandeen
2014-01-17 20:23 ` [PATCH 1/3] xfs: clean up xfs_buftarg Eric Sandeen
2014-01-20 14:21 ` Brian Foster
2014-01-17 20:26 ` [PATCH 2/3] xfs: rename xfs_buftarg structure members Eric Sandeen
2014-01-17 21:12 ` Roger Willcocks
2014-01-17 21:13 ` Eric Sandeen
2014-01-17 21:14 ` [PATCH 2/3 V2] " Eric Sandeen
2014-01-20 14:21 ` Brian Foster
2014-01-17 20:28 ` [PATCH 3/3] xfs: allow logical-sector sized O_DIRECT IOs Eric Sandeen
2014-01-20 14:21 ` Brian Foster
2014-01-20 14:53 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D969E2.7030308@sandeen.net \
--to=sandeen@sandeen.net \
--cc=david@fromorbit.com \
--cc=sandeen@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.