From: Eric Sandeen <sandeen@redhat.com>
To: xfs-oss <xfs@oss.sgi.com>
Subject: [PATCH] xfs: allow logical-sector sized O_DIRECT for any fs sector size
Date: Wed, 15 Jan 2014 11:59:45 -0600 [thread overview]
Message-ID: <52D6CC91.6000408@redhat.com> (raw)
Some time ago, mkfs.xfs started picking the storage physical
sector size as the default filesystem "sector size" in order
to avoid RMW costs incurred by doing IOs at logical sector
size alignments.
However, this means that for a filesystem made with i.e.
a 4k sector size on an "advanced format" 4k/512 disk,
512-byte direct IOs are no longer allowed. This means
that XFS has essentially turned this AF drive into a hard
4K device, from the filesystem on up.
XFS's mkfs-specified "sector size" is really just controlling
the minimum size & alignment of filesystem metadata IO.
There is no real need to tightly couple XFS's minimal
metadata size to the minimum allowed direct IO size;
XFS can continue doing metadata in optimal sizes, but
still allow smaller DIOs for apps which issue them,
for whatever reason.
This patch adds 2 new fields to the xfs_buftarg, so that
we now track 3 sizes:
1) The device logical sector size
2) The device physical sector size
3) The filesystem sector size, which is the minimum unit and
alignment of IO which will be performed by metadata operations.
The first is used for the minimum allowed direct IO alignment,
the 2nd is used to report DIO sizes in XFS_IOC_DIOINFO
(the theory being, if an app actually asks, we can give them
the optimal answer, even if we allow smaller IOs), and the
3rd is used internally by the filesystem for metadata IOs.
This has passed xfstests on filesystems made with 4k sectors,
including when run under the patch I sent to ignore
XFS_IOC_DIOINFO, and issue 512 DIOs anyway. I also directly
tested end of block behavior on preallocated, sparse, and
existing files when we do a 512 IO into a 4k file on a
4k-sector filesystem, to be sure there were no unexpected
behaviors.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
NB: This depends on this patch which is in the xfs tree,
but not yet upstream:
xfs: simplify xfs_setsize_buftarg callchain; remove unused arg
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 9fccfb5..a89dcdf 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1599,6 +1599,7 @@ xfs_setsize_buftarg(
unsigned int blocksize,
unsigned int sectorsize)
{
+ /* Set up filesystem block and sector sizes */
btp->bt_bsize = blocksize;
btp->bt_sshift = ffs(sectorsize) - 1;
btp->bt_smask = sectorsize - 1;
@@ -1614,6 +1615,9 @@ xfs_setsize_buftarg(
return EINVAL;
}
+ /* Set up device logical & physical sector size info */
+ btp->bt_lsmask = bdev_logical_block_size(btp->bt_bdev) - 1;
+ btp->bt_pssize = bdev_physical_block_size(btp->bt_bdev);
return 0;
}
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 1cf21a4..29a0db9 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -88,14 +88,28 @@ typedef unsigned int xfs_buf_flags_t;
*/
#define XFS_BSTATE_DISPOSE (1 << 0) /* buffer being discarded */
+/*
+ * The xfs_buftarg contains 3 notions of "sector size" -
+ *
+ * 1) The device logical sector size
+ * 2) The device physical sector size
+ * 3) The filesystem sector size, which is the minimum unit and
+ * alignment of IO which will be performed by metadata operations.
+ *
+ * The latter is specified at mkfs time, stored on-disk in the
+ * superblock's sb_sectsize, and is set from there.
+ */
+
typedef struct xfs_buftarg {
dev_t bt_dev;
struct block_device *bt_bdev;
struct backing_dev_info *bt_bdi;
struct xfs_mount *bt_mount;
- unsigned int bt_bsize;
- unsigned int bt_sshift;
- size_t bt_smask;
+ unsigned int bt_bsize; /* fs block size */
+ unsigned int bt_sshift; /* fs sector size shift */
+ size_t bt_smask; /* fs sector size mask */
+ size_t bt_lsmask; /* dev logical sectsz mask */
+ unsigned int bt_pssize; /* dev physical sector size */
/* LRU control structures */
struct shrinker bt_shrinker;
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 52c91e1..09f9df9 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -261,7 +261,8 @@ xfs_file_aio_read(
xfs_buftarg_t *target =
XFS_IS_REALTIME_INODE(ip) ?
mp->m_rtdev_targp : mp->m_ddev_targp;
- if ((pos & target->bt_smask) || (size & target->bt_smask)) {
+ /* DIO must be aligned to device logical sector size */
+ if ((pos & target->bt_lsmask) || (size & target->bt_lsmask)) {
if (pos == i_size_read(inode))
return 0;
return -XFS_ERROR(EINVAL);
@@ -641,9 +642,11 @@ xfs_file_dio_aio_write(
struct xfs_buftarg *target = XFS_IS_REALTIME_INODE(ip) ?
mp->m_rtdev_targp : mp->m_ddev_targp;
- if ((pos & target->bt_smask) || (count & target->bt_smask))
+ /* DIO must be aligned to device logical sector size */
+ if ((pos & target->bt_lsmask) || (count & target->bt_lsmask))
return -XFS_ERROR(EINVAL);
+ /* "unaligned" here means not aligned to a filesystem block */
if ((pos & mp->m_blockmask) || ((pos + count) & mp->m_blockmask))
unaligned_io = 1;
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 33ad9a7..1f3431f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1587,7 +1587,12 @@ xfs_file_ioctl(
XFS_IS_REALTIME_INODE(ip) ?
mp->m_rtdev_targp : mp->m_ddev_targp;
- da.d_mem = da.d_miniosz = 1 << target->bt_sshift;
+ /*
+ * Report device physical sector size as "optimal" minimum,
+ * unless blocksize is smaller than that.
+ */
+ da.d_miniosz = min(target->bt_pssize, target->bt_bsize);
+ da.d_mem = da.d_miniosz;
da.d_maxiosz = INT_MAX & ~(da.d_miniosz - 1);
if (copy_to_user(arg, &da, sizeof(da)))
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next reply other threads:[~2014-01-15 17:59 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-15 17:59 Eric Sandeen [this message]
2014-01-15 22:38 ` [PATCH] xfs: allow logical-sector sized O_DIRECT for any fs sector size Dave Chinner
2014-01-15 22:52 ` Eric Sandeen
2014-01-16 23:21 ` Dave Chinner
2014-01-17 17:35 ` Eric Sandeen
2014-01-17 20:22 ` [PATCH 0/3 V2] " Eric Sandeen
2014-01-17 20:23 ` [PATCH 1/3] xfs: clean up xfs_buftarg Eric Sandeen
2014-01-20 14:21 ` Brian Foster
2014-01-17 20:26 ` [PATCH 2/3] xfs: rename xfs_buftarg structure members Eric Sandeen
2014-01-17 21:12 ` Roger Willcocks
2014-01-17 21:13 ` Eric Sandeen
2014-01-17 21:14 ` [PATCH 2/3 V2] " Eric Sandeen
2014-01-20 14:21 ` Brian Foster
2014-01-17 20:28 ` [PATCH 3/3] xfs: allow logical-sector sized O_DIRECT IOs Eric Sandeen
2014-01-20 14:21 ` Brian Foster
2014-01-20 14:53 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52D6CC91.6000408@redhat.com \
--to=sandeen@redhat.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).