From: Dave Chinner <david@fromorbit.com>
To: Richard Wareing <rwareing@fb.com>
Cc: linux-xfs@vger.kernel.org, darrick.wong@oracle.com, hch@infradead.org
Subject: Re: [PATCH v4 2/3] xfs: Set realtime flag based on initial allocation size
Date: Fri, 22 Sep 2017 15:54:34 +1000 [thread overview]
Message-ID: <20170922055434.GD10955@dastard> (raw)
In-Reply-To: <20170919035238.3976871-3-rwareing@fb.com>
On Mon, Sep 18, 2017 at 08:52:37PM -0700, Richard Wareing wrote:
> - The rt_alloc_min sysfs option automatically selects the device (data
> device, or realtime) based on the size of the initial allocation of the
> file.
> - This option can be used to route the storage of small files (and the
> inefficient workloads associated with them) to a suitable storage
> device such a SSD, while larger allocations are sent to a traditional
> HDD.
> - Supports writes via O_DIRECT, buffered (i.e. page cache), and
> pre-allocations (i.e. fallocate)
> - Available only when kernel is compiled w/ CONFIG_XFS_RT option.
>
> Signed-off-by: Richard Wareing <rwareing@fb.com>
> ---
> Changes since v3:
> * Now functions via initial allocation regardless of O_DIRECT, buffered or
> pre-allocation code paths. Provides a consistent user-experience.
> * I Did do some experiments putting this in the xfs_bmapi_write code path
> however pre-allocation accounting unfortunately prevents this cleaner
> approach. As such, this proved to be the cleanest and functional approach.
> * No longer a mount option, now a sysfs tunable
>
> Changes since v2:
> * None
>
> fs/xfs/xfs_bmap_util.c | 3 +++
> fs/xfs/xfs_inode.c | 18 ++++++++++++------
> fs/xfs/xfs_iomap.c | 8 ++++++++
> fs/xfs/xfs_mount.h | 1 +
> fs/xfs/xfs_rtalloc.c | 26 ++++++++++++++++++++++++++
> fs/xfs/xfs_rtalloc.h | 4 ++++
> fs/xfs/xfs_sysfs.c | 41 +++++++++++++++++++++++++++++++++++++++++
> 7 files changed, 95 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 9e3cc21..2d253fb 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1026,6 +1026,9 @@ xfs_alloc_file_space(
> if (len <= 0)
> return -EINVAL;
>
> + if (XFS_IS_REALTIME_MOUNT(mp))
> + xfs_rt_alloc_min(ip, len);
I'd put the XFS_IS_REALTIME_MOUNT() check inside xfs_rt_alloc_min().
That way we can compile the code out completely when
CONFIG_XFS_RT=n.
> rt = XFS_IS_REALTIME_INODE(ip);
> extsz = xfs_get_extsz_hint(ip);
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index ec9826c..f9e2deb 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1620,12 +1620,18 @@ xfs_itruncate_extents(
> if (error)
> goto out;
>
> - /*
> - * Clear the reflink flag if we truncated everything.
> - */
> - if (ip->i_d.di_nblocks == 0 && xfs_is_reflink_inode(ip)) {
> - ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
> - xfs_inode_clear_cowblocks_tag(ip);
> + if (ip->i_d.di_nblocks == 0) {
> + /*
> + * Clear the reflink flag if we truncated everything.
> + */
> + if (xfs_is_reflink_inode(ip)) {
> + ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
> + xfs_inode_clear_cowblocks_tag(ip);
> + }
> + /* Clear realtime flag if m_rt_alloc_min policy is in place */
> + if (XFS_IS_REALTIME_MOUNT(mp) && mp->m_rt_alloc_min) {
Won't m_rt_alloc_min always be zero on non-rt filesystems?
> + ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
> + }
> }
>
> /*
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 94e5bdf..11f1c95 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -40,6 +40,7 @@
> #include "xfs_dquot_item.h"
> #include "xfs_dquot.h"
> #include "xfs_reflink.h"
> +#include "xfs_rtalloc.h"
>
>
> #define XFS_WRITEIO_ALIGN(mp,off) (((off) >> mp->m_writeio_log) \
> @@ -174,6 +175,10 @@ xfs_iomap_write_direct(
> int bmapi_flags = XFS_BMAPI_PREALLOC;
> uint tflags = 0;
>
> +
> + if (XFS_IS_REALTIME_MOUNT(mp))
> + xfs_rt_alloc_min(ip, count);
Reading this makes me wonder what we are allocating here :/
A better name might be in order - something that indicates we're
selecting the target device rather allocating something. e.g.
xfs_inode_select_target()?
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 9fa312a..067be3b 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -197,6 +197,7 @@ typedef struct xfs_mount {
> __uint32_t m_generation;
>
> bool m_fail_unmount;
> + uint m_rt_alloc_min; /* Min RT allocation */
WHitespace problem - looks like spaces instead of tabs....
> #ifdef DEBUG
> /*
> * DEBUG mode instrumentation to test and/or trigger delayed allocation
> diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
> index c57aa7f..e51cb25 100644
> --- a/fs/xfs/xfs_rtalloc.c
> +++ b/fs/xfs/xfs_rtalloc.c
> @@ -1284,3 +1284,29 @@ xfs_rtpick_extent(
> *pick = b;
> return 0;
> }
> +
> +/*
> + * Automatically set real-time flag if initial write to inode is > m_rt_alloc_min
> + *
> + * Also valid on truncations.
> + *
> + */
> +void xfs_rt_alloc_min(
> + struct xfs_inode *ip,
> + xfs_off_t len)
> +{
> + struct xfs_mount *mp = ip->i_mount;
> +
> + if (!mp->m_rt_alloc_min || ip->i_d.di_size)
> + return;
I kinda prefer stacking single checks like
/*
* m_rt_alloc_min controls the target selection. It is
* inactive if it is not set.
*/
if (!mp->m_rt_alloc_min)
return;
/*
* Can't select a different target if we've already
* allocated blocks (e.g. fallocate() beyond EOF) or has
* data in it already.
*/
if (!ip->i_nextents)
return;
if (!ip->i_d.di_size)
return;
> + if (XFS_IS_REALTIME_INODE(ip)) {
> + if (len < mp->m_rt_alloc_min) {
> + ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
> + }
> + } else {
> + if (len >= mp->m_rt_alloc_min) {
> + ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
> + }
> + }
Checking for XFS_IS_REALTIME_INODE() is redundant here. This does
the same thing:
/*
* if the allocation length is less than the threshold,
* always select the data device. Otherwise we should
* select the realtime device.
*/
if (len < mp->m_rt_alloc_min)
ip->i_d.di_flags &= ~XFS_DIFLAG_REALTIME;
else
ip->i_d.di_flags |= XFS_DIFLAG_REALTIME;
> +}
> diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h
> index f13133e..12939d9 100644
> --- a/fs/xfs/xfs_rtalloc.h
> +++ b/fs/xfs/xfs_rtalloc.h
> @@ -136,6 +136,9 @@ int xfs_rtalloc_query_range(struct xfs_trans *tp,
> int xfs_rtalloc_query_all(struct xfs_trans *tp,
> xfs_rtalloc_query_range_fn fn,
> void *priv);
> +void xfs_rt_alloc_min(struct xfs_inode *ip, xfs_off_t len);
> +
> +
> #else
> # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (ENOSYS)
> # define xfs_rtfree_extent(t,b,l) (ENOSYS)
> @@ -155,6 +158,7 @@ xfs_rtmount_init(
> }
> # define xfs_rtmount_inodes(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS))
> # define xfs_rtunmount_inodes(m)
> +# define xfs_rt_alloc_min(i,l) (ENOSYS)
> #endif /* CONFIG_XFS_RT */
>
> #endif /* __XFS_RTALLOC_H__ */
> diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
> index 80ac15f..3c8dedb 100644
> --- a/fs/xfs/xfs_sysfs.c
> +++ b/fs/xfs/xfs_sysfs.c
> @@ -129,10 +129,51 @@ XFS_SYSFS_ATTR_RW(drop_writes);
>
> #endif /* DEBUG */
>
> +#ifdef CONFIG_XFS_RT
> +STATIC ssize_t
> +rt_alloc_min_store(
> + struct kobject *kobject,
> + const char *buf,
> + size_t count)
> +{
> + struct xfs_mount *mp = to_mp(kobject);
> + int ret;
> + int val;
> +
> + ret = kstrtoint(buf, 0, &val);
> + if (ret)
> + return ret;
> +
> + /* Only valid if using a real-time device */
> + if (XFS_IS_REALTIME_MOUNT(mp) && val > 0)
> + mp->m_rt_alloc_min = val;
> + else if (val <= 0)
> + mp->m_rt_alloc_min = 0;
> + else
> + return -EINVAL;
Seems inconsistent. This will allow a value <= 0 for a non-realtime
mount, but will return EINVAL for values > 0. Perhaps it would be
more consistent to return EINVAL for a non-realtime mount or
any value < 0?
> +
> + return count;
> +}
> +
> +STATIC ssize_t
> +rt_alloc_min_show(
> + struct kobject *kobject,
> + char *buf)
> +{
> + struct xfs_mount *mp = to_mp(kobject);
> +
> + return snprintf(buf, PAGE_SIZE, "%d\n", mp->m_rt_alloc_min);
> +}
> +XFS_SYSFS_ATTR_RW(rt_alloc_min);
> +#endif /* CONFIG_XFS_RT */
> +
> static struct attribute *xfs_mp_attrs[] = {
> #ifdef DEBUG
> ATTR_LIST(drop_writes),
> #endif
> +#ifdef CONFIG_XFS_RT
> + ATTR_LIST(rt_alloc_min),
> +#endif
This is userspace visible - shouldn't we always compile this in,
even if all it does it return EINVAL to attempts to change it when
CONFIG_XFS_RT=n?
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-09-22 5:54 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-19 3:52 [PATCH v4 0/3] XFS realtime device tweaks Richard Wareing
2017-09-19 3:52 ` [PATCH v4 1/3] xfs: Show realtime device stats on statfs calls if inherit flag set Richard Wareing
2017-09-22 5:35 ` Dave Chinner
2017-09-22 18:26 ` Richard Wareing
2017-09-19 3:52 ` [PATCH v4 2/3] xfs: Set realtime flag based on initial allocation size Richard Wareing
2017-09-22 5:54 ` Dave Chinner [this message]
2017-09-22 19:06 ` Richard Wareing
2017-09-19 3:52 ` [PATCH v4 3/3] xfs: Add realtime fallback if data device full Richard Wareing
2017-09-22 7:04 ` Dave Chinner
2017-09-25 18:37 ` Richard Wareing
2017-09-25 23:16 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170922055434.GD10955@dastard \
--to=david@fromorbit.com \
--cc=darrick.wong@oracle.com \
--cc=hch@infradead.org \
--cc=linux-xfs@vger.kernel.org \
--cc=rwareing@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.