linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>
Cc: Eric Sandeen <sandeen@redhat.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] xfs: allow logical-sector sized O_DIRECT for any fs sector size
Date: Fri, 17 Jan 2014 11:35:30 -0600	[thread overview]
Message-ID: <52D969E2.7030308@sandeen.net> (raw)
In-Reply-To: <20140116232132.GT3431@dastard>

On 1/16/14, 5:21 PM, Dave Chinner wrote:
> On Wed, Jan 15, 2014 at 04:52:05PM -0600, Eric Sandeen wrote:
>> On 1/15/14, 4:38 PM, Dave Chinner wrote:
>>> On Wed, Jan 15, 2014 at 11:59:45AM -0600, Eric Sandeen wrote:
>>>> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
>>>> index 33ad9a7..1f3431f 100644
>>>> --- a/fs/xfs/xfs_ioctl.c
>>>> +++ b/fs/xfs/xfs_ioctl.c
>>>> @@ -1587,7 +1587,12 @@ xfs_file_ioctl(
>>>>  			XFS_IS_REALTIME_INODE(ip) ?
>>>>  			mp->m_rtdev_targp : mp->m_ddev_targp;
>>>>  
>>>> -		da.d_mem = da.d_miniosz = 1 << target->bt_sshift;
>>>> +		/*
>>>> +		 * Report device physical sector size as "optimal" minimum,
>>>> +		 * unless blocksize is smaller than that.
>>>> +		 */
>>>> +		da.d_miniosz = min(target->bt_pssize, target->bt_bsize);
>>>
>>> Just grab the filesysetm block size from the xfs_mount:
>>>
>>> 		da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize);
>>>
>>>> +		da.d_mem = da.d_miniosz;
>>>
>>> I'd suggest that this should be PAGE_SIZE - it's for memory buffer
>>> alignment, not IO alignment, so using the IO alignment just seems
>>> wrong to me...
>>
>> Ok.  Was just sticking close to what we had before.
>>
>> So:
>>  		da.d_miniosz = min(target->bt_pssize, mp->m_sb.sb_blocksize);
>> 		da.d_mem = PAGE_SIZE;
>>
>> ?  Then we can have a minimum IO size of 512, and a memory alignment of
>> 4k, isn't that a little odd?
>>
>> (IOWs we could do 512-aligned memory before, right?  What's the downside,
>> or the value in changing it now?)
> 
> We can do arbitrary byte aligned buffers if I understand
> get_user_pages() correctly - it just maps the page under the buffer
> into the kernel address space and then the bio is pointed at it.
> AFAICT, the reason for the "memory buffer needs 512 byte alignment"
> is simply that this is the minimum IO size supported.

Actually, it's fs/direct-io.c which enforces this (not sure why I couldn't
find that before), in do_blockdev_direct_IO(); the *enforced* minimum
memory alignment is the size of the bev's logical block size.

> However, for large IOs, 512 byte alignment is not really optimal. If
> we don't align the buffer to PAGE_SIZE, then we have partial head
> and tail pages, so for a 512k IO we need to map 129 pages into a bio
> instead of 128. When you have hardware that can only handle 128
> segments in a DMA transfer, that means the 512k IO needs to be sent
> in two IOs rather than one.

Ok, but I have a bit of a problem with changing what XFS_IOC_DIOINFO
reports.  (I had originally thought to change the minimum IO size, but
I have talked myself out of that, too).

The xfsctl(3) manpage says that XFS_IOC_DIOINFO:  "Get(s) information
required to perform direct I/O on the specified file descriptor."
and "the user’s data buffer must conform to the same type of 
constraints as required for accessing a raw disk partition."

IOWs, the ioctl is documented as returning minimum, not optimal,
requirements, and it has always been implemented as such.  Changing
its meaning now seems wrong.  At least, I would not like to do so
as part of this functional change; to do so would probably be best
in a new ioctl which reports both minimum & optimal sizes.  And at
that point we should just do a vfs interface.  :)

Of course, applications don't have to use the minimum sizes reported
by the ioctl; they are free to be smarter and do larger sizes and
alignments.  But if the ioctl was designed and documented to report
required *minimums*, I think we should leave it as such.

I'm going to resend the change, split up a bit more to separate
cleanups from functional changes, and maybe we can discuss the ioctl
change as a potential additional patch.

Thanks,
-Eric

> There's quite a bit of hardware out there that have a limit of 128
> segments to each IO....
> 
> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-01-17 17:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-15 17:59 [PATCH] xfs: allow logical-sector sized O_DIRECT for any fs sector size Eric Sandeen
2014-01-15 22:38 ` Dave Chinner
2014-01-15 22:52   ` Eric Sandeen
2014-01-16 23:21     ` Dave Chinner
2014-01-17 17:35       ` Eric Sandeen [this message]
2014-01-17 20:22 ` [PATCH 0/3 V2] " Eric Sandeen
2014-01-17 20:23   ` [PATCH 1/3] xfs: clean up xfs_buftarg Eric Sandeen
2014-01-20 14:21     ` Brian Foster
2014-01-17 20:26   ` [PATCH 2/3] xfs: rename xfs_buftarg structure members Eric Sandeen
2014-01-17 21:12     ` Roger Willcocks
2014-01-17 21:13       ` Eric Sandeen
2014-01-17 21:14     ` [PATCH 2/3 V2] " Eric Sandeen
2014-01-20 14:21       ` Brian Foster
2014-01-17 20:28   ` [PATCH 3/3] xfs: allow logical-sector sized O_DIRECT IOs Eric Sandeen
2014-01-20 14:21     ` Brian Foster
2014-01-20 14:53       ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52D969E2.7030308@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=david@fromorbit.com \
    --cc=sandeen@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).