* [PATCH v6 0/3] XFS realtime device tweaks
@ 2017-10-11 2:37 Richard Wareing
2017-10-11 2:37 ` [PATCH v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Richard Wareing @ 2017-10-11 2:37 UTC (permalink / raw)
To: linux-xfs; +Cc: david, darrick.wong, hch
Sorry for the delay on turning this around, I believe addressed the various
points folks made from the last review:
1. Inode flag now correctly set when locks are held via XFS_BMAPI_RTDATA
flag.
2. Realtime flag is honored when set by user via ioctl or inherit flag on
directory.
3. Misc changes around formatting & bounds checks on sysfs options.
See individual patches for more details.
Please pay close attention to the change in xfs_file_iomap_begin (patch 2),
the new version of the patch by-passes the xfs_file_iomap_begin_delay function
in the "realtime" case, since the realtime code here is not reachable/dead
(see assert in this function). Instead, we by-pass this, hit
xfs_iomap_write_direct where the XFS_BMAPI_RTDATA will be passed on to the
xfs_bmapi_write function where it's set.
I'm curious if there is a better approach, and/or verification this is
sane/safe.
Patch set based off Linux 4.12 (commit
6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c) located @
https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git .
Richard Wareing (3):
xfs: Show realtime device stats on statfs calls if inherit flag set
xfs: Set realtime flag based on initial allocation size
xfs: Add realtime fallback if data device full
Documentation/filesystems/xfs.txt | 27 +++++++++++-
fs/xfs/libxfs/xfs_bmap.c | 35 +++++++++++++++
fs/xfs/libxfs/xfs_bmap.h | 3 ++
fs/xfs/xfs_bmap_util.c | 3 ++
fs/xfs/xfs_fsops.c | 2 +
fs/xfs/xfs_inode.c | 18 +++++---
fs/xfs/xfs_iomap.c | 19 +++++++--
fs/xfs/xfs_linux.h | 2 +
fs/xfs/xfs_mount.c | 24 +++++++++++
fs/xfs/xfs_mount.h | 8 ++++
fs/xfs/xfs_rtalloc.c | 90 +++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_rtalloc.h | 2 +
fs/xfs/xfs_super.c | 8 ++++
fs/xfs/xfs_sysfs.c | 80 ++++++++++++++++++++++++++++++++++
14 files changed, 311 insertions(+), 10 deletions(-)
--
2.9.5
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set
2017-10-11 2:37 [PATCH v6 0/3] XFS realtime device tweaks Richard Wareing
@ 2017-10-11 2:37 ` Richard Wareing
2017-10-11 2:37 ` [PATCH v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
2017-10-11 2:37 ` [PATCH v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing
2 siblings, 0 replies; 4+ messages in thread
From: Richard Wareing @ 2017-10-11 2:37 UTC (permalink / raw)
To: linux-xfs; +Cc: david, darrick.wong, hch
- Reports realtime device free blocks in statfs calls if inheritance
bit is set on the inode of directory. This is a bit more intuitive,
especially for use-cases which are using a much larger device for
the realtime device.
- Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
option.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* None
Changes since v4:
* None
Changes since v3:
* Fixed accounting bug, we are not required to substract m_alloc_set_aside
as this is a data device only requirement.
* Added XFS_IS_REALTIME_MOUNT macro based on learnings from CVE-2017-14340,
now provides similar gating on the mount as XFS_IS_REALTIME_INODE does
for the inode.
Changes since v2:
* Style updated per Christoph Hellwig's comment
* Fixed bug: statp->f_bavail = statp->f_bfree
fs/xfs/xfs_linux.h | 2 ++
fs/xfs/xfs_super.c | 8 ++++++++
2 files changed, 10 insertions(+)
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 044fb0e..fe46e71 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -280,8 +280,10 @@ static inline __uint64_t howmany_64(__uint64_t x, __uint32_t y)
#ifdef CONFIG_XFS_RT
#define XFS_IS_REALTIME_INODE(ip) ((ip)->i_d.di_flags & XFS_DIFLAG_REALTIME)
+#define XFS_IS_REALTIME_MOUNT(mp) ((mp)->m_rtdev_targp ? 1 : 0)
#else
#define XFS_IS_REALTIME_INODE(ip) (0)
+#define XFS_IS_REALTIME_MOUNT(mp) (0)
#endif
#endif /* __XFS_LINUX__ */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 455a575..6d33a5e 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1136,6 +1136,14 @@ xfs_fs_statfs(
((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))) ==
(XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))
xfs_qm_statvfs(ip, statp);
+
+ if (XFS_IS_REALTIME_MOUNT(mp) &&
+ (ip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)) {
+ statp->f_blocks = sbp->sb_rblocks;
+ statp->f_bavail = statp->f_bfree =
+ sbp->sb_frextents * sbp->sb_rextsize;
+ }
+
return 0;
}
--
2.9.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v6 2/3] xfs: Set realtime flag based on initial allocation size
2017-10-11 2:37 [PATCH v6 0/3] XFS realtime device tweaks Richard Wareing
2017-10-11 2:37 ` [PATCH v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
@ 2017-10-11 2:37 ` Richard Wareing
2017-10-11 2:37 ` [PATCH v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing
2 siblings, 0 replies; 4+ messages in thread
From: Richard Wareing @ 2017-10-11 2:37 UTC (permalink / raw)
To: linux-xfs; +Cc: david, darrick.wong, hch
- The rt_alloc_min sysfs option automatically selects the device (data
device, or realtime) based on the size of the initial allocation of the
file.
- This option can be used to route the storage of small files (and the
inefficient workloads associated with them) to a suitable storage
device such a SSD, while larger allocations are sent to a traditional
HDD.
- Supports writes via O_DIRECT, buffered (i.e. page cache), and
pre-allocations (i.e. fallocate)
- Available only when kernel is compiled w/ CONFIG_XFS_RT option.
Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* xfs_inode_select_target renamed to xfs_inode_select_rt_target and returns
boolean to indicate if realtime device target is desired.
* Introduction of XFS_BMAPI_RTDATA which provides signal to the
xfs_bmapi_allocate function the realtime flag must be set on the inode & the
inode logged.
* Manual setting of the realtime flag by ioctl or directory rt inherit flag
now takes precedence over the policy.
* Documentation
Changes since v4:
* Added xfs_inode_select_target function to hold target selection
code
* XFS_IS_REALTIME_MOUNT check now moved inside xfs_inode_select_target
function for better gating
* Improved consistency in the sysfs set behavior
* Style fixes
Changes since v3:
* Now functions via initial allocation regardless of O_DIRECT, buffered or
pre-allocation code paths. Provides a consistent user-experience.
* I Did do some experiments putting this in the xfs_bmapi_write code path
however pre-allocation accounting unfortunately prevents this cleaner
approach. As such, this proved to be the cleanest and functional approach.
* No longer a mount option, now a sysfs tunable
Documentation/filesystems/xfs.txt | 21 +++++++++++++++-
fs/xfs/libxfs/xfs_bmap.c | 35 +++++++++++++++++++++++++++
fs/xfs/libxfs/xfs_bmap.h | 3 +++
fs/xfs/xfs_bmap_util.c | 3 +++
fs/xfs/xfs_inode.c | 18 +++++++++-----
fs/xfs/xfs_iomap.c | 19 ++++++++++++---
fs/xfs/xfs_mount.h | 1 +
fs/xfs/xfs_rtalloc.c | 50 +++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_rtalloc.h | 2 ++
fs/xfs/xfs_sysfs.c | 42 ++++++++++++++++++++++++++++++++
10 files changed, 184 insertions(+), 10 deletions(-)
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 3b9b5c1..0763972 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -94,7 +94,7 @@ default behaviour.
When inode64 is specified, it indicates that XFS is allowed
to create inodes at any location in the filesystem,
including those which will result in inode numbers occupying
- more than 32 bits of significance.
+ more than 32 bits of significance.
inode32 is provided for backwards compatibility with older
systems and applications, since 64 bits inode numbers might
@@ -467,3 +467,22 @@ the class and error context. For example, the default values for
"metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
to "fail immediately" behaviour. This is done because ENODEV is a fatal,
unrecoverable error no matter how many times the metadata IO is retried.
+
+Realtime Device Sysfs Options
+=============================
+
+When using a realtime sub-volume, the following sysfs options are supported:
+
+ /sys/fs/xfs/<dev>/rt_alloc_min
+ (Units: bytes Min: 0 Default: 0 Max: INT_MAX)
+ When set, the file will be allocated blocks from the realtime device if the
+ initial allocation request size (in bytes) is equal to or above this value.
+ For XFS use-cases where appends are unlikely or not supported, this option
+ can be used to place smaller files on a the data device (typically an SSD),
+ while larger files are placed on the realtime device (typically an HDD).
+
+ Any files which have the realtime flag set by an ioctl call or realtime
+ inheritance flag on the directory will not be affected by this option.
+ Buffered, direct IO and pre-allocation are supported.
+
+ Setting the value to "0" disables this behavior.
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index a7048eaf..deb9ffd 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4185,6 +4185,39 @@ xfs_bmapi_reserve_delalloc(
return error;
}
+/*
+ * This function will set the XFS_DIFLAG_REALTIME flag on the inode if
+ * the XFS_BMAPI_RTDATA flag is set on the xfs_bmalloca struct.
+ *
+ * This function is only valid for realtime mounts, and only on the initial
+ * allocation for the file.
+ *
+ */
+void
+xfs_bmapi_rt_data_flag(
+ struct xfs_mount *mp,
+ struct xfs_bmalloca *bma)
+{
+
+ /* Only valid if this is a realtime mount */
+ if (!XFS_IS_REALTIME_MOUNT(mp))
+ return;
+
+ /* Only valid if file is empty */
+ if (!(bma->datatype & XFS_ALLOC_INITIAL_USER_DATA))
+ return;
+
+ /* Nothing to do, realtime flag already set */
+ if (bma->ip->i_d.di_flags & XFS_DIFLAG_REALTIME)
+ return;
+
+ /* Set realtime flag and log it if RTDATA flag is set */
+ if (bma->flags & XFS_BMAPI_RTDATA) {
+ bma->ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
+ bma->logflags |= XFS_ILOG_CORE;
+ }
+}
+
static int
xfs_bmapi_allocate(
struct xfs_bmalloca *bma)
@@ -4235,6 +4268,8 @@ xfs_bmapi_allocate(
bma->minlen = (bma->flags & XFS_BMAPI_CONTIG) ? bma->length : 1;
+ xfs_bmapi_rt_data_flag(mp, bma);
+
/*
* Only want to do the alignment at the eof if it is userdata and
* allocation length is larger than a stripe unit.
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index c35a14f..57bf954 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -113,6 +113,9 @@ struct xfs_extent_free_item
/* Only convert delalloc space, don't allocate entirely new extents */
#define XFS_BMAPI_DELALLOC 0x400
+/* Allocate to realtime device */
+#define XFS_BMAPI_RTDATA 0x800
+
#define XFS_BMAPI_FLAGS \
{ XFS_BMAPI_ENTIRE, "ENTIRE" }, \
{ XFS_BMAPI_METADATA, "METADATA" }, \
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 9e3cc21..7c07ec9 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1027,6 +1027,9 @@ xfs_alloc_file_space(
return -EINVAL;
rt = XFS_IS_REALTIME_INODE(ip);
+ if (!rt && (rt = xfs_inode_select_rt_target(ip, len)))
+ alloc_type |= XFS_BMAPI_RTDATA;
+
extsz = xfs_get_extsz_hint(ip);
count = len;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ec9826c..f9e2deb 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1620,12 +1620,18 @@ xfs_itruncate_extents(
if (error)
goto out;
- /*
- * Clear the reflink flag if we truncated everything.
- */
- if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip)) {
- ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
- xfs_inode_clear_cowblocks_tag(ip);
+ if (ip->i_d.di_nblocks == 0) {
+ /*
+ * Clear the reflink flag if we truncated everything.
+ */
+ if (xfs_is_reflink_inode(ip)) {
+ ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+ xfs_inode_clear_cowblocks_tag(ip);
+ }
+ /* Clear realtime flag if m_rt_alloc_min policy is in place */
+ if (XFS_IS_REALTIME_MOUNT(mp) && mp->m_rt_alloc_min) {
+ ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
+ }
}
/*
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 94e5bdf..4c545c0 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -40,6 +40,7 @@
#include "xfs_dquot_item.h"
#include "xfs_dquot.h"
#include "xfs_reflink.h"
+#include "xfs_rtalloc.h"
#define XFS_WRITEIO_ALIGN(mp,off) (((off) >> mp->m_writeio_log) \
@@ -174,7 +175,11 @@ xfs_iomap_write_direct(
int bmapi_flags = XFS_BMAPI_PREALLOC;
uint tflags = 0;
+
rt = XFS_IS_REALTIME_INODE(ip);
+ if (!rt && (rt = xfs_inode_select_rt_target(ip, count)))
+ bmapi_flags |= XFS_BMAPI_RTDATA;
+
extsz = xfs_get_extsz_hint(ip);
lockmode = XFS_ILOCK_SHARED; /* locked by caller */
@@ -983,9 +988,17 @@ xfs_file_iomap_begin(
if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
- /* Reserve delalloc blocks for regular writeback. */
- return xfs_file_iomap_begin_delay(inode, offset, length, flags,
- iomap);
+ /*
+ * For non-odirect writes, check if this will be allocated to
+ * realtime, if so we by-pass xfs_file_iomap_begin_delay as if
+ * the inode was already marked realtime (see xfs_get_extsz_hint).
+ * The actual setting of the realtime flag on the inode will be
+ * done later on.
+ */
+ if (!xfs_inode_select_rt_target(ip, XFS_FSB_TO_B(mp, length)))
+ /* Reserve delalloc blocks for regular writeback. */
+ return xfs_file_iomap_begin_delay(inode, offset, length,
+ flags, iomap);
}
if (need_excl_ilock(ip, flags)) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 9fa312a..e64936f 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -197,6 +197,7 @@ typedef struct xfs_mount {
__uint32_t m_generation;
bool m_fail_unmount;
+ xfs_off_t m_rt_alloc_min; /* Min RT allocation */
#ifdef DEBUG
/*
* DEBUG mode instrumentation to test and/or trigger delayed allocation
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index c57aa7f..4866e52 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1284,3 +1284,53 @@ xfs_rtpick_extent(
*pick = b;
return 0;
}
+
+/*
+ * If allocation length is less than rt_alloc_min threshold select the
+ * data device. Otherwise, select the realtime device.
+ */
+bool
+xfs_rt_alloc_min(
+ struct xfs_mount *mp,
+ xfs_off_t len)
+{
+ if (!mp->m_rt_alloc_min)
+ return false;
+
+ if (len >= mp->m_rt_alloc_min)
+ return true;
+
+ return false;
+}
+
+/*
+* Select the target device for the inode based on either the size of the
+* initial allocation, or the amount of space available on the data device.
+*
+*/
+bool
+xfs_inode_select_rt_target(
+ struct xfs_inode *ip,
+ xfs_off_t len)
+{
+ struct xfs_mount *mp = ip->i_mount;
+
+ /* If the mount does not have a realtime device configured, there's
+ * nothing to do here.
+ */
+ if (!XFS_IS_REALTIME_MOUNT(mp))
+ return false;
+
+ /* You cannot select a new device target once blocks have been allocated
+ * (e.g. fallocate() beyond EOF), or if data has been written already.
+ */
+ if (ip->i_d.di_nextents)
+ return false;
+ if (ip->i_d.di_size)
+ return false;
+
+ /* Select realtime device as our target based on the value of
+ * mp->m_rt_alloc_min. Target selection code if not valid if not set.
+ */
+ return xfs_rt_alloc_min(mp, len);
+}
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
index f13133e..76868d2 100644
--- a/fs/xfs/xfs_rtalloc.h
+++ b/fs/xfs/xfs_rtalloc.h
@@ -136,6 +136,7 @@ int xfs_rtalloc_query_range(struct xfs_trans *tp,
int xfs_rtalloc_query_all(struct xfs_trans *tp,
xfs_rtalloc_query_range_fn fn,
void *priv);
+bool xfs_inode_select_rt_target(struct xfs_inode *ip, xfs_off_t len);
#else
# define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (ENOSYS)
# define xfs_rtfree_extent(t,b,l) (ENOSYS)
@@ -155,6 +156,7 @@ xfs_rtmount_init(
}
# define xfs_rtmount_inodes(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS))
# define xfs_rtunmount_inodes(m)
+# define xfs_inode_select_rt_target(i,l) (0)
#endif /* CONFIG_XFS_RT */
#endif /* __XFS_RTALLOC_H__ */
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 80ac15f..954398d 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -129,10 +129,52 @@ XFS_SYSFS_ATTR_RW(drop_writes);
#endif /* DEBUG */
+#ifdef CONFIG_XFS_RT
+STATIC ssize_t
+rt_alloc_min_store(
+ struct kobject *kobject,
+ const char *buf,
+ size_t count)
+{
+ struct xfs_mount *mp = to_mp(kobject);
+ int ret;
+ int val;
+
+ ret = kstrtoint(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ /* Only valid if using a real-time device */
+ if(!XFS_IS_REALTIME_MOUNT(mp))
+ return -EINVAL;
+
+ if (val >= 0)
+ mp->m_rt_alloc_min = val;
+ else
+ return -EINVAL;
+
+ return count;
+}
+
+STATIC ssize_t
+rt_alloc_min_show(
+ struct kobject *kobject,
+ char *buf)
+{
+ struct xfs_mount *mp = to_mp(kobject);
+
+ return snprintf(buf, PAGE_SIZE, "%lld\n", mp->m_rt_alloc_min);
+}
+XFS_SYSFS_ATTR_RW(rt_alloc_min);
+#endif
+
static struct attribute *xfs_mp_attrs[] = {
#ifdef DEBUG
ATTR_LIST(drop_writes),
#endif
+#ifdef CONFIG_XFS_RT
+ ATTR_LIST(rt_alloc_min),
+#endif
NULL,
};
--
2.9.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v6 3/3] xfs: Add realtime fallback if data device full
2017-10-11 2:37 [PATCH v6 0/3] XFS realtime device tweaks Richard Wareing
2017-10-11 2:37 ` [PATCH v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
2017-10-11 2:37 ` [PATCH v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
@ 2017-10-11 2:37 ` Richard Wareing
2 siblings, 0 replies; 4+ messages in thread
From: Richard Wareing @ 2017-10-11 2:37 UTC (permalink / raw)
To: linux-xfs; +Cc: david, darrick.wong, hch
- For FSes which have a realtime device configured, rt_fallback_pct forces
allocations to the realtime device after data device usage reaches
rt_fallback_pct.
- Useful for realtime device users to help prevent ENOSPC errors when
selectively storing some files (e.g. small files) on data device, while
others are stored on realtime block device.
- Set via the "rt_fallback_pct" sysfs value which is available if
the kernel is compiled with CONFIG_XFS_RT.
Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* Minor change to work with XFS_BMAPI_RTDATA method described
in rt_alloc_min patch
* Fixed bounds checks on sysfs option
* Documentation
Changes since v4:
* Refactored to align with xfs_inode_select_target change
* Fallback percentage reworked to trigger on % space used on data device.
I find this a bit more intuitive as it aligns well with "df" output.
* mp->m_rt_min_fdblocks now assigned via function call
* Better consistency on sysfs options
Changes since v3:
* None, new patch to patch set
Documentation/filesystems/xfs.txt | 6 ++++++
fs/xfs/xfs_fsops.c | 2 ++
fs/xfs/xfs_mount.c | 24 ++++++++++++++++++++++
fs/xfs/xfs_mount.h | 7 +++++++
fs/xfs/xfs_rtalloc.c | 42 ++++++++++++++++++++++++++++++++++++++-
fs/xfs/xfs_sysfs.c | 38 +++++++++++++++++++++++++++++++++++
6 files changed, 118 insertions(+), 1 deletion(-)
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 0763972..ed6f6e2 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -486,3 +486,9 @@ When using a realtime sub-volume, the following sysfs options are supported:
Buffered, direct IO and pre-allocation are supported.
Setting the value to "0" disables this behavior.
+
+ /sys/fs/xfs/<dev>/rt_fallback_pct
+ (Units: percentage Min: 0 Default: 0, Max: 100)
+ When set, the file will be allocated blocks from the realtime device if the
+ data device space utilization rises above rt_fallback_pct. Setting the
+ value to "0" disables this behavior.
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 6ccaae9..80ccb14 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -610,6 +610,8 @@ xfs_growfs_data_private(
xfs_set_low_space_thresholds(mp);
mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
+ mp->m_rt_min_free_dblocks = xfs_rt_calc_min_free_dblocks(mp);
+
/*
* If we expanded the last AG, free the per-AG reservation
* so we can reinitialize it with the new size.
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 2eaf818..c91e6c4 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1396,3 +1396,27 @@ xfs_dev_is_read_only(
}
return 0;
}
+
+/*
+ * precalculate minimum of data blocks required, if we fall
+ * below this value, we will fallback to the real-time device.
+ *
+ * m_rt_fallback_pct can only be non-zero if a real-time device
+ * is configured.
+ */
+uint64_t
+xfs_rt_calc_min_free_dblocks(
+ struct xfs_mount *mp)
+{
+ xfs_rfsblock_t min_free_dblocks = 0;
+
+ if (!XFS_IS_REALTIME_MOUNT(mp))
+ return 0;
+
+ /* Pre-compute minimum data blocks required before
+ * falling back to RT device for allocations
+ */
+ min_free_dblocks = mp->m_sb.sb_dblocks * (100 - mp->m_rt_fallback_pct);
+ do_div(min_free_dblocks, 100);
+ return min_free_dblocks;
+}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e64936f..318bacc 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -198,6 +198,12 @@ typedef struct xfs_mount {
bool m_fail_unmount;
xfs_off_t m_rt_alloc_min; /* Min RT allocation */
+ /* Fallback to realtime device if data device usage above rt_fallback_pct */
+ uint m_rt_fallback_pct;
+ /* Use realtime device if free data device blocks falls below this; computed
+ * from m_rt_fallback_pct.
+ */
+ xfs_rfsblock_t m_rt_min_free_dblocks;
#ifdef DEBUG
/*
* DEBUG mode instrumentation to test and/or trigger delayed allocation
@@ -463,4 +469,5 @@ int xfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
int error_class, int error);
+uint64_t xfs_rt_calc_min_free_dblocks(struct xfs_mount *mp);
#endif /* __XFS_MOUNT_H__ */
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 4866e52..2dc9761 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1304,6 +1304,37 @@ xfs_rt_alloc_min(
}
/*
+ * m_rt_min_free_dblocks is a pre-computed threshold, which controls target
+ * selection based on how many free blocks are available on the data device.
+ *
+ * If the number of free data device blocks falls below
+ * mp->m_rt_min_free_dblocks, the realtime device is selected as the target
+ * device. If this value is not set, this target policy is in-active.
+ *
+ */
+bool
+xfs_rt_min_free_dblocks(
+ struct xfs_mount *mp,
+ struct xfs_inode *ip,
+ xfs_off_t len)
+{
+ /* Disabled */
+ if (!mp->m_rt_fallback_pct)
+ return false;
+
+ /* If inode target is already realtime device, nothing to do here */
+ if (!XFS_IS_REALTIME_INODE(ip)) {
+ uint64_t free_dblocks;
+ free_dblocks = percpu_counter_sum(&mp->m_fdblocks) -
+ mp->m_alloc_set_aside;
+ if (free_dblocks < mp->m_rt_min_free_dblocks) {
+ return true;
+ }
+ }
+ return false;
+}
+
+/*
* Select the target device for the inode based on either the size of the
* initial allocation, or the amount of space available on the data device.
*
@@ -1332,5 +1363,14 @@ xfs_inode_select_rt_target(
/* Select realtime device as our target based on the value of
* mp->m_rt_alloc_min. Target selection code if not valid if not set.
*/
- return xfs_rt_alloc_min(mp, len);
+ if (xfs_rt_alloc_min(mp, len))
+ return true;
+
+ /* Check if data device has enough space, if not fallback to realtime
+ * device. Valid only if mp->m_rt_fallback_pct is set.
+ */
+ if (xfs_rt_min_free_dblocks(mp, ip, len))
+ return true;
+
+ return false;
}
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 954398d..f8c3523 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -166,6 +166,43 @@ rt_alloc_min_show(
return snprintf(buf, PAGE_SIZE, "%lld\n", mp->m_rt_alloc_min);
}
XFS_SYSFS_ATTR_RW(rt_alloc_min);
+
+STATIC ssize_t
+rt_fallback_pct_store(
+ struct kobject *kobject,
+ const char *buf,
+ size_t count)
+{
+ struct xfs_mount *mp = to_mp(kobject);
+ int ret;
+ int val;
+
+ ret = kstrtoint(buf, 0, &val);
+ if (ret)
+ return ret;
+
+ if (!XFS_IS_REALTIME_MOUNT(mp))
+ return -EINVAL;
+
+ if (val < 0 || val > 100)
+ return -EINVAL;
+
+ /* Only valid if using a real-time device */
+ mp->m_rt_fallback_pct = val;
+ mp->m_rt_min_free_dblocks = xfs_rt_calc_min_free_dblocks(mp);
+ return count;
+}
+
+STATIC ssize_t
+rt_fallback_pct_show(
+ struct kobject *kobject,
+ char *buf)
+{
+ struct xfs_mount *mp = to_mp(kobject);
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_rt_fallback_pct);
+}
+XFS_SYSFS_ATTR_RW(rt_fallback_pct);
#endif
static struct attribute *xfs_mp_attrs[] = {
@@ -174,6 +211,7 @@ static struct attribute *xfs_mp_attrs[] = {
#endif
#ifdef CONFIG_XFS_RT
ATTR_LIST(rt_alloc_min),
+ ATTR_LIST(rt_fallback_pct),
#endif
NULL,
};
--
2.9.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-10-11 2:38 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-11 2:37 [PATCH v6 0/3] XFS realtime device tweaks Richard Wareing
2017-10-11 2:37 ` [PATCH v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
2017-10-11 2:37 ` [PATCH v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
2017-10-11 2:37 ` [PATCH v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).