linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND v6 0/3] XFS realtime device tweaks
@ 2017-11-22 22:40 Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

Re-sending patch; re-based to 4.14-rc8 (& re-tested).  Patch 1 in this series
is reviewed and is ready to be merged independent of the others.

====

1. Inode flag now correctly set when locks are held via XFS_BMAPI_RTDATA
   flag.
2. Realtime flag is honored when set by user via ioctl or inherit flag on
   directory.
3. Misc changes around formatting & bounds checks on sysfs options.

See individual patches for more details.

Please pay close attention to the change in xfs_file_iomap_begin (patch 2),
the new version of the patch by-passes the xfs_file_iomap_begin_delay function
in the "realtime" case, since the realtime code here is not reachable/dead
(see assert in this function).  Instead, we by-pass this, hit
xfs_iomap_write_direct where the XFS_BMAPI_RTDATA will be passed on to the
xfs_bmapi_write function where it's set.

I'm curious if there is a better approach, and/or verification this is
sane/safe.

Patch set based off Linux 4.14-rc8 (commit
39dae59d66acd86d1de24294bd2f343fd5e7a625) located @
https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git .


Richard Wareing (3):
  xfs: Show realtime device stats on statfs calls if inherit flag set
  xfs: Set realtime flag based on initial allocation size
  xfs: Add realtime fallback if data device full

 Documentation/filesystems/xfs.txt | 27 +++++++++++-
 fs/xfs/libxfs/xfs_bmap.c          | 35 +++++++++++++++
 fs/xfs/libxfs/xfs_bmap.h          |  3 ++
 fs/xfs/xfs_bmap_util.c            |  3 ++
 fs/xfs/xfs_fsops.c                |  2 +
 fs/xfs/xfs_inode.c                |  6 +++
 fs/xfs/xfs_iomap.c                | 18 +++++++-
 fs/xfs/xfs_linux.h                |  2 +
 fs/xfs/xfs_mount.c                | 24 +++++++++++
 fs/xfs/xfs_mount.h                |  8 ++++
 fs/xfs/xfs_rtalloc.c              | 90 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_rtalloc.h              |  2 +
 fs/xfs/xfs_super.c                |  8 ++++
 fs/xfs/xfs_sysfs.c                | 80 ++++++++++++++++++++++++++++++++++
 14 files changed, 305 insertions(+), 3 deletions(-)

-- 
2.9.5


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set
  2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
@ 2017-11-22 22:40 ` Richard Wareing
  2017-11-28 21:20   ` Darrick J. Wong
  2017-11-22 22:40 ` [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing
  2 siblings, 1 reply; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

- Reports realtime device free blocks in statfs calls if inheritance
  bit is set on the inode of directory.  This is a bit more intuitive,
  especially for use-cases which are using a much larger device for
  the realtime device.
- Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
  realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
  option.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* None

Changes since v4:
* None

Changes since v3:
* Fixed accounting bug, we are not required to substract m_alloc_set_aside
  as this is a data device only requirement.
* Added XFS_IS_REALTIME_MOUNT macro based on learnings from CVE-2017-14340,
  now provides similar gating on the mount as XFS_IS_REALTIME_INODE does
  for the inode.

Changes since v2:
* Style updated per Christoph Hellwig's comment
* Fixed bug: statp->f_bavail = statp->f_bfree


 fs/xfs/xfs_linux.h | 2 ++
 fs/xfs/xfs_super.c | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index dcd1292..944b02d 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -278,8 +278,10 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y)
 #define XFS_IS_REALTIME_INODE(ip)			\
 	(((ip)->i_d.di_flags & XFS_DIFLAG_REALTIME) &&	\
 	 (ip)->i_mount->m_rtdev_targp)
+#define XFS_IS_REALTIME_MOUNT(mp) ((mp)->m_rtdev_targp ? 1 : 0)
 #else
 #define XFS_IS_REALTIME_INODE(ip) (0)
+#define XFS_IS_REALTIME_MOUNT(mp) (0)
 #endif
 
 #endif /* __XFS_LINUX__ */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f663022..3c9a989 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1153,6 +1153,14 @@ xfs_fs_statfs(
 	    ((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))) ==
 			      (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))
 		xfs_qm_statvfs(ip, statp);
+
+	if (XFS_IS_REALTIME_MOUNT(mp) &&
+	    (ip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)) {
+		statp->f_blocks = sbp->sb_rblocks;
+		statp->f_bavail = statp->f_bfree =
+			sbp->sb_frextents * sbp->sb_rextsize;
+	}
+
 	return 0;
 }
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size
  2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
@ 2017-11-22 22:40 ` Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing
  2 siblings, 0 replies; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

- The rt_alloc_min sysfs option automatically selects the device (data
  device, or realtime) based on the size of the initial allocation of the
  file.
- This option can be used to route the storage of small files (and the
  inefficient workloads associated with them) to a suitable storage
  device such a SSD, while larger allocations are sent to a traditional
  HDD.
- Supports writes via O_DIRECT, buffered (i.e. page cache), and
  pre-allocations (i.e. fallocate)
- Available only when kernel is compiled w/ CONFIG_XFS_RT option.

Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* xfs_inode_select_target renamed to xfs_inode_select_rt_target and returns
  boolean to indicate if realtime device target is desired.
* Introduction of XFS_BMAPI_RTDATA which provides signal to the
  xfs_bmapi_allocate function the realtime flag must be set on the inode & the
  inode logged.
* Manual setting of the realtime flag by ioctl or directory rt inherit flag
  now takes precedence over the policy.
* Documentation

Changes since v4:
* Added xfs_inode_select_target function to hold target selection
  code
* XFS_IS_REALTIME_MOUNT check now moved inside xfs_inode_select_target
  function for better gating
* Improved consistency in the sysfs set behavior
* Style fixes

Changes since v3:
* Now functions via initial allocation regardless of O_DIRECT, buffered or
  pre-allocation code paths.  Provides a consistent user-experience.
* I Did do some experiments putting this in the xfs_bmapi_write code path
  however pre-allocation accounting unfortunately prevents this cleaner
  approach.  As such, this proved to be the cleanest and functional approach.
* No longer a mount option, now a sysfs tunable

 Documentation/filesystems/xfs.txt | 21 +++++++++++++++-
 fs/xfs/libxfs/xfs_bmap.c          | 35 +++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_bmap.h          |  3 +++
 fs/xfs/xfs_bmap_util.c            |  3 +++
 fs/xfs/xfs_inode.c                |  6 +++++
 fs/xfs/xfs_iomap.c                | 18 ++++++++++++--
 fs/xfs/xfs_mount.h                |  1 +
 fs/xfs/xfs_rtalloc.c              | 50 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_rtalloc.h              |  2 ++
 fs/xfs/xfs_sysfs.c                | 42 ++++++++++++++++++++++++++++++++
 10 files changed, 178 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 3b9b5c1..0763972 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -94,7 +94,7 @@ default behaviour.
 	When inode64 is specified, it indicates that XFS is allowed
 	to create inodes at any location in the filesystem,
 	including those which will result in inode numbers occupying
-	more than 32 bits of significance. 
+	more than 32 bits of significance.
 
 	inode32 is provided for backwards compatibility with older
 	systems and applications, since 64 bits inode numbers might
@@ -467,3 +467,22 @@ the class and error context. For example, the default values for
 "metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults
 to "fail immediately" behaviour. This is done because ENODEV is a fatal,
 unrecoverable error no matter how many times the metadata IO is retried.
+
+Realtime Device Sysfs Options
+=============================
+
+When using a realtime sub-volume, the following sysfs options are supported:
+
+  /sys/fs/xfs/<dev>/rt_alloc_min
+  (Units: bytes  Min: 0  Default: 0  Max: INT_MAX)
+	When set, the file will be allocated blocks from the realtime device if the
+	initial allocation request size (in bytes) is equal to or above this value.
+	For XFS use-cases where appends are unlikely or not supported, this option
+	can be used to place smaller files on a the data device (typically an SSD),
+	while larger files are placed on the realtime device (typically an HDD).
+
+	Any files which have the realtime flag set by an ioctl call or realtime
+	inheritance flag on the directory will not be affected by this option.
+	Buffered, direct IO and pre-allocation are supported.
+
+	Setting the value to "0" disables this behavior.
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 8926379..dd02a52 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -4188,6 +4188,39 @@ xfs_bmapi_reserve_delalloc(
 	return error;
 }
 
+/*
+ * This function will set the XFS_DIFLAG_REALTIME flag on the inode if
+ * the XFS_BMAPI_RTDATA flag is set on the xfs_bmalloca struct.
+ *
+ * This function is only valid for realtime mounts, and only on the initial
+ * allocation for the file.
+ *
+ */
+void
+xfs_bmapi_rt_data_flag(
+	struct xfs_mount	*mp,
+	struct xfs_bmalloca	*bma)
+{
+
+	/* Only valid if this is a realtime mount */
+	if (!XFS_IS_REALTIME_MOUNT(mp))
+		return;
+
+	/* Only valid if file is empty */
+	if (!(bma->datatype & XFS_ALLOC_INITIAL_USER_DATA))
+		return;
+
+	/* Nothing to do, realtime flag already set */
+	if (bma->ip->i_d.di_flags & XFS_DIFLAG_REALTIME)
+		return;
+
+	/* Set realtime flag and log it if RTDATA flag is set */
+	if (bma->flags & XFS_BMAPI_RTDATA) {
+		bma->ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
+		bma->logflags |= XFS_ILOG_CORE;
+	}
+}
+
 static int
 xfs_bmapi_allocate(
 	struct xfs_bmalloca	*bma)
@@ -4238,6 +4271,8 @@ xfs_bmapi_allocate(
 
 	bma->minlen = (bma->flags & XFS_BMAPI_CONTIG) ? bma->length : 1;
 
+	xfs_bmapi_rt_data_flag(mp, bma);
+
 	/*
 	 * Only want to do the alignment at the eof if it is userdata and
 	 * allocation length is larger than a stripe unit.
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 502e0d8..6f67588 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -113,6 +113,9 @@ struct xfs_extent_free_item
 /* Only convert delalloc space, don't allocate entirely new extents */
 #define XFS_BMAPI_DELALLOC	0x400
 
+/* Allocate to realtime device */
+#define XFS_BMAPI_RTDATA	0x800
+
 #define XFS_BMAPI_FLAGS \
 	{ XFS_BMAPI_ENTIRE,	"ENTIRE" }, \
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 6503cfa..b04363b 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -1053,6 +1053,9 @@ xfs_alloc_file_space(
 		return -EINVAL;
 
 	rt = XFS_IS_REALTIME_INODE(ip);
+	if (!rt && (rt = xfs_inode_select_rt_target(ip, len)))
+		alloc_type |= XFS_BMAPI_RTDATA;
+
 	extsz = xfs_get_extsz_hint(ip);
 
 	count = len;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 4ec5b7f..ed29549 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1633,6 +1633,12 @@ xfs_itruncate_extents(
 		xfs_inode_clear_cowblocks_tag(ip);
 	}
 
+	if (ip->i_d.di_nblocks == 0 && XFS_IS_REALTIME_MOUNT(mp) &&
+	    mp->m_rt_alloc_min) {
+		/* Clear realtime flag if m_rt_alloc_min policy is in place */
+		ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
+	}
+
 	/*
 	 * Always re-log the inode so that our permanent transaction can keep
 	 * on rolling it forward in the log.
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index f179bdf..518a9bb 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -40,6 +40,7 @@
 #include "xfs_dquot_item.h"
 #include "xfs_dquot.h"
 #include "xfs_reflink.h"
+#include "xfs_rtalloc.h"
 
 
 #define XFS_WRITEIO_ALIGN(mp,off)	(((off) >> mp->m_writeio_log) \
@@ -175,7 +176,11 @@ xfs_iomap_write_direct(
 	int		bmapi_flags = XFS_BMAPI_PREALLOC;
 	uint		tflags = 0;
 
+
 	rt = XFS_IS_REALTIME_INODE(ip);
+	if (!rt && (rt = xfs_inode_select_rt_target(ip, count)))
+		bmapi_flags |= XFS_BMAPI_RTDATA;
+
 	extsz = xfs_get_extsz_hint(ip);
 	lockmode = XFS_ILOCK_SHARED;	/* locked by caller */
 
@@ -985,8 +990,17 @@ xfs_file_iomap_begin(
 
 	if (((flags & (IOMAP_WRITE | IOMAP_DIRECT)) == IOMAP_WRITE) &&
 			!IS_DAX(inode) && !xfs_get_extsz_hint(ip)) {
-		/* Reserve delalloc blocks for regular writeback. */
-		return xfs_file_iomap_begin_delay(inode, offset, length, iomap);
+		/*
+		 * For non-odirect writes, check if this will be allocated to
+		 * realtime, if so we by-pass xfs_file_iomap_begin_delay as if
+		 * the inode was already marked realtime (see xfs_get_extsz_hint).
+		 * The actual setting of the realtime flag on the inode will be
+		 * done later on.
+		 */
+		if (!xfs_inode_select_rt_target(ip, XFS_FSB_TO_B(mp, length)))
+			/* Reserve delalloc blocks for regular writeback. */
+			return xfs_file_iomap_begin_delay(inode, offset, length,
+					iomap);
 	}
 
 	if (need_excl_ilock(ip, flags)) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index e0792d0..0db9731 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -197,6 +197,7 @@ typedef struct xfs_mount {
 	uint32_t		m_generation;
 
 	bool			m_fail_unmount;
+	xfs_off_t		m_rt_alloc_min; /* Min RT allocation */
 #ifdef DEBUG
 	/*
 	 * Frequency with which errors are injected.  Replaces xfs_etest; the
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 488719d..145007b 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1284,3 +1284,53 @@ xfs_rtpick_extent(
 	*pick = b;
 	return 0;
 }
+
+/*
+ * If allocation length is less than rt_alloc_min threshold select the
+ * data device.   Otherwise, select the realtime device.
+ */
+bool
+xfs_rt_alloc_min(
+	struct xfs_mount	*mp,
+	xfs_off_t		len)
+{
+	if (!mp->m_rt_alloc_min)
+		return false;
+
+	if (len >= mp->m_rt_alloc_min)
+		return true;
+
+	return false;
+}
+
+/*
+* Select the target device for the inode based on either the size of the
+* initial allocation, or the amount of space available on the data device.
+*
+*/
+bool
+xfs_inode_select_rt_target(
+	struct xfs_inode	*ip,
+	xfs_off_t		len)
+{
+	struct xfs_mount    *mp = ip->i_mount;
+
+	/* If the mount does not have a realtime device configured, there's
+	 * nothing to do here.
+	 */
+	if (!XFS_IS_REALTIME_MOUNT(mp))
+		return false;
+
+	/* You cannot select a new device target once blocks have been allocated
+	 * (e.g. fallocate() beyond EOF), or if data has been written already.
+	 */
+	if (ip->i_d.di_nextents)
+		return false;
+	if (ip->i_d.di_size)
+		return false;
+
+	/* Select realtime device as our target based on the value of
+	 * mp->m_rt_alloc_min.  Target selection code if not valid if not set.
+	 */
+	return xfs_rt_alloc_min(mp, len);
+}
diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
index 79defa7..4f058b5 100644
--- a/fs/xfs/xfs_rtalloc.h
+++ b/fs/xfs/xfs_rtalloc.h
@@ -138,6 +138,7 @@ int xfs_rtalloc_query_range(struct xfs_trans *tp,
 int xfs_rtalloc_query_all(struct xfs_trans *tp,
 			  xfs_rtalloc_query_range_fn fn,
 			  void *priv);
+bool xfs_inode_select_rt_target(struct xfs_inode *ip, xfs_off_t len);
 #else
 # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb)    (ENOSYS)
 # define xfs_rtfree_extent(t,b,l)                       (ENOSYS)
@@ -158,6 +159,7 @@ xfs_rtmount_init(
 }
 # define xfs_rtmount_inodes(m)  (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS))
 # define xfs_rtunmount_inodes(m)
+# define xfs_inode_select_rt_target(i,l)		(0)
 #endif	/* CONFIG_XFS_RT */
 
 #endif	/* __XFS_RTALLOC_H__ */
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 8b2ccc2..8b425be 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -90,7 +90,49 @@ to_mp(struct kobject *kobject)
 	return container_of(kobj, struct xfs_mount, m_kobj);
 }
 
+#ifdef CONFIG_XFS_RT
+STATIC ssize_t
+rt_alloc_min_store(
+	struct kobject		*kobject,
+	const char		*buf,
+	size_t			count)
+{
+	struct xfs_mount	*mp = to_mp(kobject);
+	int			ret;
+	int			val;
+
+	ret = kstrtoint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	/* Only valid if using a real-time device */
+	if(!XFS_IS_REALTIME_MOUNT(mp))
+		return -EINVAL;
+
+	if (val >= 0)
+		mp->m_rt_alloc_min = val;
+	else
+		return -EINVAL;
+
+	return count;
+}
+
+STATIC ssize_t
+rt_alloc_min_show(
+	struct kobject		*kobject,
+	char			*buf)
+{
+	struct xfs_mount	*mp = to_mp(kobject);
+
+	return snprintf(buf, PAGE_SIZE, "%lld\n", mp->m_rt_alloc_min);
+}
+XFS_SYSFS_ATTR_RW(rt_alloc_min);
+#endif
+
 static struct attribute *xfs_mp_attrs[] = {
+#ifdef CONFIG_XFS_RT
+	ATTR_LIST(rt_alloc_min),
+#endif
 	NULL,
 };
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full
  2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
  2017-11-22 22:40 ` [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
@ 2017-11-22 22:40 ` Richard Wareing
  2 siblings, 0 replies; 5+ messages in thread
From: Richard Wareing @ 2017-11-22 22:40 UTC (permalink / raw)
  To: linux-xfs; +Cc: david, darrick.wong, hch

- For FSes which have a realtime device configured, rt_fallback_pct forces
  allocations to the realtime device after data device usage reaches
  rt_fallback_pct.
- Useful for realtime device users to help prevent ENOSPC errors when
  selectively storing some files (e.g. small files) on data device, while
  others are stored on realtime block device.
- Set via the "rt_fallback_pct" sysfs value which is available if
  the kernel is compiled with CONFIG_XFS_RT.

Signed-off-by: Richard Wareing <rwareing@fb.com>
---
Changes since v5:
* Minor change to work with XFS_BMAPI_RTDATA method described
  in rt_alloc_min patch
* Fixed bounds checks on sysfs option
* Documentation

Changes since v4:
* Refactored to align with xfs_inode_select_target change
* Fallback percentage reworked to trigger on % space used on data device.
  I find this a bit more intuitive as it aligns well with "df" output.
* mp->m_rt_min_fdblocks now assigned via function call
* Better consistency on sysfs options

Changes since v3:
* None, new patch to patch set

 Documentation/filesystems/xfs.txt |  6 ++++++
 fs/xfs/xfs_fsops.c                |  2 ++
 fs/xfs/xfs_mount.c                | 24 ++++++++++++++++++++++
 fs/xfs/xfs_mount.h                |  7 +++++++
 fs/xfs/xfs_rtalloc.c              | 42 ++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_sysfs.c                | 38 +++++++++++++++++++++++++++++++++++
 6 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 0763972..ed6f6e2 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -486,3 +486,9 @@ When using a realtime sub-volume, the following sysfs options are supported:
 	Buffered, direct IO and pre-allocation are supported.
 
 	Setting the value to "0" disables this behavior.
+
+  /sys/fs/xfs/<dev>/rt_fallback_pct
+  (Units: percentage  Min: 0  Default: 0,  Max: 100)
+	When set, the file will be allocated blocks from the realtime device if the
+	data device space utilization rises above rt_fallback_pct.  Setting the
+	value to "0" disables this behavior.
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 8f22fc5..89713f1 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -610,6 +610,8 @@ xfs_growfs_data_private(
 	xfs_set_low_space_thresholds(mp);
 	mp->m_alloc_set_aside = xfs_alloc_set_aside(mp);
 
+	mp->m_rt_min_free_dblocks = xfs_rt_calc_min_free_dblocks(mp);
+
 	/*
 	 * If we expanded the last AG, free the per-AG reservation
 	 * so we can reinitialize it with the new size.
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index e9727d0..3905e57 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1396,3 +1396,27 @@ xfs_dev_is_read_only(
 	}
 	return 0;
 }
+
+/*
+ * precalculate minimum of data blocks required, if we fall
+ * below this value, we will fallback to the real-time device.
+ *
+ * m_rt_fallback_pct can only be non-zero if a real-time device
+ * is configured.
+ */
+uint64_t
+xfs_rt_calc_min_free_dblocks(
+	struct xfs_mount	*mp)
+{
+	xfs_rfsblock_t		min_free_dblocks = 0;
+
+	if (!XFS_IS_REALTIME_MOUNT(mp))
+		return 0;
+
+	/* Pre-compute minimum data blocks required before
+	 * falling back to RT device for allocations
+	 */
+	min_free_dblocks = mp->m_sb.sb_dblocks * (100 - mp->m_rt_fallback_pct);
+	do_div(min_free_dblocks, 100);
+	return min_free_dblocks;
+}
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 0db9731..9dc17b8 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -198,6 +198,12 @@ typedef struct xfs_mount {
 
 	bool			m_fail_unmount;
 	xfs_off_t		m_rt_alloc_min; /* Min RT allocation */
+	/* Fallback to realtime device if data device usage above rt_fallback_pct */
+	uint			m_rt_fallback_pct;
+	/* Use realtime device if free data device blocks falls below this; computed
+	 * from m_rt_fallback_pct.
+	 */
+	xfs_rfsblock_t		m_rt_min_free_dblocks;
 #ifdef DEBUG
 	/*
 	 * Frequency with which errors are injected.  Replaces xfs_etest; the
@@ -447,4 +453,5 @@ int	xfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
 struct xfs_error_cfg * xfs_error_get_cfg(struct xfs_mount *mp,
 		int error_class, int error);
 
+uint64_t	xfs_rt_calc_min_free_dblocks(struct xfs_mount *mp);
 #endif	/* __XFS_MOUNT_H__ */
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index 145007b..3abd403 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1304,6 +1304,37 @@ xfs_rt_alloc_min(
 }
 
 /*
+ * m_rt_min_free_dblocks is a pre-computed threshold, which controls target
+ * selection based on how many free blocks are available on the data device.
+ *
+ * If the number of free data device blocks falls below
+ * mp->m_rt_min_free_dblocks, the realtime device is selected as the target
+ * device.  If this value is not set, this target policy is in-active.
+ *
+ */
+bool
+xfs_rt_min_free_dblocks(
+	struct xfs_mount	*mp,
+	struct xfs_inode	*ip,
+	xfs_off_t		len)
+{
+	/* Disabled */
+	if (!mp->m_rt_fallback_pct)
+		return false;
+
+	/* If inode target is already realtime device, nothing to do here */
+	if (!XFS_IS_REALTIME_INODE(ip)) {
+		uint64_t	free_dblocks;
+		free_dblocks = percpu_counter_sum(&mp->m_fdblocks) -
+			mp->m_alloc_set_aside;
+		if (free_dblocks < mp->m_rt_min_free_dblocks) {
+			return true;
+		}
+	}
+	return false;
+}
+
+/*
 * Select the target device for the inode based on either the size of the
 * initial allocation, or the amount of space available on the data device.
 *
@@ -1332,5 +1363,14 @@ xfs_inode_select_rt_target(
 	/* Select realtime device as our target based on the value of
 	 * mp->m_rt_alloc_min.  Target selection code if not valid if not set.
 	 */
-	return xfs_rt_alloc_min(mp, len);
+	if (xfs_rt_alloc_min(mp, len))
+		return true;
+
+	/* Check if data device has enough space, if not fallback to realtime
+	 * device.  Valid only if mp->m_rt_fallback_pct is set.
+	 */
+	if (xfs_rt_min_free_dblocks(mp, ip, len))
+		return true;
+
+	return false;
 }
diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
index 8b425be..64f29b6 100644
--- a/fs/xfs/xfs_sysfs.c
+++ b/fs/xfs/xfs_sysfs.c
@@ -127,11 +127,49 @@ rt_alloc_min_show(
 	return snprintf(buf, PAGE_SIZE, "%lld\n", mp->m_rt_alloc_min);
 }
 XFS_SYSFS_ATTR_RW(rt_alloc_min);
+
+STATIC ssize_t
+rt_fallback_pct_store(
+	struct kobject		*kobject,
+	const char		*buf,
+	size_t			count)
+{
+	struct xfs_mount	*mp = to_mp(kobject);
+	int			ret;
+	int			val;
+
+	ret = kstrtoint(buf, 0, &val);
+	if (ret)
+		return ret;
+
+	if (!XFS_IS_REALTIME_MOUNT(mp))
+		return -EINVAL;
+
+	if (val < 0 || val > 100)
+		return -EINVAL;
+
+	/* Only valid if using a real-time device */
+	mp->m_rt_fallback_pct = val;
+	mp->m_rt_min_free_dblocks = xfs_rt_calc_min_free_dblocks(mp);
+	return count;
+}
+
+STATIC ssize_t
+rt_fallback_pct_show(
+	struct kobject          *kobject,
+	char                    *buf)
+{
+	struct xfs_mount        *mp = to_mp(kobject);
+
+	return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_rt_fallback_pct);
+}
+XFS_SYSFS_ATTR_RW(rt_fallback_pct);
 #endif
 
 static struct attribute *xfs_mp_attrs[] = {
 #ifdef CONFIG_XFS_RT
 	ATTR_LIST(rt_alloc_min),
+	ATTR_LIST(rt_fallback_pct),
 #endif
 	NULL,
 };
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set
  2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
@ 2017-11-28 21:20   ` Darrick J. Wong
  0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2017-11-28 21:20 UTC (permalink / raw)
  To: Richard Wareing; +Cc: linux-xfs, david, hch

On Wed, Nov 22, 2017 at 02:40:07PM -0800, Richard Wareing wrote:
> - Reports realtime device free blocks in statfs calls if inheritance
>   bit is set on the inode of directory.  This is a bit more intuitive,
>   especially for use-cases which are using a much larger device for
>   the realtime device.
> - Add XFS_IS_REALTIME_MOUNT option to gate based on the existence of a
>   realtime device on the mount, similar to the XFS_IS_REALTIME_INODE
>   option.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Richard Wareing <rwareing@fb.com>
> ---
> Changes since v5:
> * None
> 
> Changes since v4:
> * None
> 
> Changes since v3:
> * Fixed accounting bug, we are not required to substract m_alloc_set_aside
>   as this is a data device only requirement.
> * Added XFS_IS_REALTIME_MOUNT macro based on learnings from CVE-2017-14340,
>   now provides similar gating on the mount as XFS_IS_REALTIME_INODE does
>   for the inode.
> 
> Changes since v2:
> * Style updated per Christoph Hellwig's comment
> * Fixed bug: statp->f_bavail = statp->f_bfree
> 
> 
>  fs/xfs/xfs_linux.h | 2 ++
>  fs/xfs/xfs_super.c | 8 ++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
> index dcd1292..944b02d 100644
> --- a/fs/xfs/xfs_linux.h
> +++ b/fs/xfs/xfs_linux.h
> @@ -278,8 +278,10 @@ static inline uint64_t howmany_64(uint64_t x, uint32_t y)
>  #define XFS_IS_REALTIME_INODE(ip)			\
>  	(((ip)->i_d.di_flags & XFS_DIFLAG_REALTIME) &&	\
>  	 (ip)->i_mount->m_rtdev_targp)
> +#define XFS_IS_REALTIME_MOUNT(mp) ((mp)->m_rtdev_targp ? 1 : 0)
>  #else
>  #define XFS_IS_REALTIME_INODE(ip) (0)
> +#define XFS_IS_REALTIME_MOUNT(mp) (0)
>  #endif
>  
>  #endif /* __XFS_LINUX__ */
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index f663022..3c9a989 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1153,6 +1153,14 @@ xfs_fs_statfs(
>  	    ((mp->m_qflags & (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))) ==
>  			      (XFS_PQUOTA_ACCT|XFS_PQUOTA_ENFD))
>  		xfs_qm_statvfs(ip, statp);
> +
> +	if (XFS_IS_REALTIME_MOUNT(mp) &&
> +	    (ip->i_d.di_flags & XFS_DIFLAG_RTINHERIT)) {

For everyone else following at home: I asked on IRC, shouldn't we report
rtdev stats for any file that has REALTIME, but not RTINHERIT, set?

--D

> +		statp->f_blocks = sbp->sb_rblocks;
> +		statp->f_bavail = statp->f_bfree =
> +			sbp->sb_frextents * sbp->sb_rextsize;
> +	}
> +
>  	return 0;
>  }
>  
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-11-28 21:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-22 22:40 [PATCH RESEND v6 0/3] XFS realtime device tweaks Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
2017-11-28 21:20   ` Darrick J. Wong
2017-11-22 22:40 ` [PATCH RESEND v6 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
2017-11-22 22:40 ` [PATCH RESEND v6 3/3] xfs: Add realtime fallback if data device full Richard Wareing

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).