All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 4/5] xfs: run an eofblocks scan on ENOSPC/EDQUOT
Date: Tue, 1 Apr 2014 09:22:47 +1100	[thread overview]
Message-ID: <20140331222246.GF17603@dastard> (raw)
In-Reply-To: <1396012563-60973-5-git-send-email-bfoster@redhat.com>

On Fri, Mar 28, 2014 at 09:16:02AM -0400, Brian Foster wrote:
> Speculative preallocation and and the associated throttling metrics
> assume we're working with large files on large filesystems. Users have
> reported inefficiencies in these mechanisms when we happen to be dealing
> with large files on smaller filesystems. This can occur because while
> prealloc throttling is aggressive under low free space conditions, it is
> not active until we reach 5% free space or less.
> 
> For example, a 40GB filesystem has enough space for several files large
> enough to have multi-GB preallocations at any given time. If those files
> are slow growing, they might reserve preallocation for long periods of
> time as well as avoid the background scanner due to frequent
> modification. If a new file is written under these conditions, said file
> has no access to this already reserved space and premature ENOSPC is
> imminent.
> 
> To handle this scenario, modify the buffered write ENOSPC handling and
> retry sequence to invoke an eofblocks scan. The eofblocks scan is
> attempted prior to the inode flush as it is lighter weight and the
> former is a last resort to free space. In the smaller filesystem
> scenario, the eofblocks scan resets the usage of preallocation such that
> when the 5% free space threshold is met, throttling effectively takes
> over to provide fair and efficient preallocation until legitimate
> ENOSPC.
> 
> The eofblocks scan is selective based on the nature of the failure. For
> example, an EDQUOT failure in a particular quota will use a filtered
> scan for that quota. Because we don't know which quota might have caused
> an allocation failure at any given time, we run a scan against each
> applicable quota determined to be under low free space conditions.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_dquot.h  | 15 ++++++++++++++
>  fs/xfs/xfs_file.c   | 32 ++++++++++++++++++++++++++---
>  fs/xfs/xfs_icache.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_icache.h |  1 +
>  4 files changed, 104 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> index d22ed00..899b99f 100644
> --- a/fs/xfs/xfs_dquot.h
> +++ b/fs/xfs/xfs_dquot.h
> @@ -141,6 +141,21 @@ static inline xfs_dquot_t *xfs_inode_dquot(struct xfs_inode *ip, int type)
>  	}
>  }
>  
> +/*
> + * Check whether a dquot is under low free space conditions. We assume the quota
> + * is enabled and enforced.
> + */
> +static inline bool xfs_dquot_lowsp(struct xfs_dquot *dqp)
> +{
> +	int64_t freesp;
> +
> +	freesp = be64_to_cpu(dqp->q_core.d_blk_hardlimit) - dqp->q_res_bcount;
> +	if (freesp < dqp->q_low_space[XFS_QLOWSP_1_PCNT])
> +		return true;
> +
> +	return false;
> +}
> +
>  #define XFS_DQ_IS_LOCKED(dqp)	(mutex_is_locked(&((dqp)->q_qlock)))
>  #define XFS_DQ_IS_DIRTY(dqp)	((dqp)->dq_flags & XFS_DQ_DIRTY)
>  #define XFS_QM_ISUDQ(dqp)	((dqp)->dq_flags & XFS_DQ_USER)
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 2e7989e..1ca16ce 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -38,6 +38,7 @@
>  #include "xfs_trace.h"
>  #include "xfs_log.h"
>  #include "xfs_dinode.h"
> +#include "xfs_icache.h"
>  
>  #include <linux/aio.h>
>  #include <linux/dcache.h>
> @@ -723,6 +724,7 @@ xfs_file_buffered_aio_write(
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	ssize_t			ret;
>  	int			enospc = 0;
> +	int			scanned = 0;
>  	int			iolock = XFS_IOLOCK_EXCL;
>  	size_t			count = ocount;
>  
> @@ -741,10 +743,34 @@ write_retry:
>  			pos, &iocb->ki_pos, count, 0);
>  
>  	/*
> -	 * If we just got an ENOSPC, try to write back all dirty inodes to
> -	 * convert delalloc space to free up some of the excess reserved
> -	 * metadata space.
> +	 * If we hit ENOSPC or a quota limit, use the selective nature of the
> +	 * eofblocks scan to try and free up some lingering speculative
> +	 * preallocation delalloc blocks.
> +	 *
> +	 * If we hit a quota limit, only scan for files covered by the quota. We
> +	 * also consider ENOSPC here because project quota failure can return
> +	 * ENOSPC instead of EDQUOT. The quota scanning only sets 'scanned' if
> +	 * the inode is covered by a quota with low free space. This should
> +	 * minimize interference with global ENOSPC handling.
> +	 *
> +	 * If a scan does not free enough space, resort to the inode flush big
> +	 * hammer to convert delalloc space to free up some of the excess
> +	 * reserved metadata space.
>  	 */
> +	if ((ret == -EDQUOT || ret == -ENOSPC) && !scanned) {
> +		scanned = xfs_inode_free_quota_eofblocks(ip);
> +		if (scanned)
> +			goto write_retry;
> +	}
> +	if (ret == -ENOSPC && !scanned) {
> +		struct xfs_eofblocks eofb = {0,};

IIRC, you can just use "{ 0 }" for initialisation, no "," needed.

> +
> +		eofb.eof_scan_owner = ip->i_ino; /* for locking */
> +		eofb.eof_flags = XFS_EOF_FLAGS_SYNC | XFS_EOF_FLAGS_FLUSH;
> +		xfs_icache_free_eofblocks(ip->i_mount, &eofb);
> +		scanned = 1;
> +		goto write_retry;
> +	}
>  	if (ret == -ENOSPC && !enospc) {
>  		enospc = 1;
>  		xfs_flush_inodes(ip->i_mount);

This seems overly complex and fragile. I'd much prefer that we don't
bury data writeback deep in the EOF block freeing code - we've done
a lot of work in the past to remove exactly that sort of behaviour
from XFS inode scanners.

I'd prefer to see something like this:

	if (ret == -EDQUOT && !enospc) {
		enospc = 1;
		xfs_inode_free_quota_eofblocks(ip);
		goto retry;
	else if (ret == -ENOSPC && !enospc) {
		enospc = 1;
		xfs_flush_inodes(ip->i_mount);
		....
		xfs_icache_free_eofblocks(ip->i_mount, &eofb);
		goto retry;
	}

This way has specific behaviours for EDQUOT vs ENOSPC, and we treat
them appropriately with a minimum of differences. And ENOSPC is
global, because we can't tell the difference here between a project
quota ENOSPC and a global ENOSPC at this point.

> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index bd0ab7d..471ccfa 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -33,6 +33,9 @@
>  #include "xfs_trace.h"
>  #include "xfs_icache.h"
>  #include "xfs_bmap_util.h"
> +#include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
>  
>  #include <linux/kthread.h>
>  #include <linux/freezer.h>
> @@ -1277,6 +1280,62 @@ xfs_icache_free_eofblocks(
>  					 eofb, XFS_ICI_EOFBLOCKS_TAG);
>  }
>  
> +/*
> + * Run eofblocks scans on the quotas applicable to the inode. For inodes with
> + * multiple quotas, we don't know exactly which quota caused an allocation
> + * failure. We make a best effort by running scans for each quota considered
> + * to be under low free space conditions (less than 1% available free space).
> + */
> +int
> +xfs_inode_free_quota_eofblocks(
> +	struct xfs_inode *ip)
> +{
> +	int scanned = 0;
> +	struct xfs_eofblocks eofb = {0,};
> +	struct xfs_dquot *dq;
> +
> +	ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL));
> +
> +	/* set the scan owner to avoid potential livelock */
> +	eofb.eof_scan_owner = ip->i_ino;
> +
> +	if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) {
> +		dq = xfs_inode_dquot(ip, XFS_DQ_USER);
> +		if (dq && xfs_dquot_lowsp(dq)) {
> +			eofb.eof_uid = VFS_I(ip)->i_uid;
> +			eofb.eof_flags = XFS_EOF_FLAGS_SYNC|

whitespace.

> +					 XFS_EOF_FLAGS_UID;
> +			xfs_icache_free_eofblocks(ip->i_mount, &eofb);
> +			scanned = 1;
> +		}
> +	}
> +
> +	if (XFS_IS_GQUOTA_ENFORCED(ip->i_mount)) {
> +		dq = xfs_inode_dquot(ip, XFS_DQ_GROUP);
> +		if (dq && xfs_dquot_lowsp(dq)) {
> +			eofb.eof_gid = VFS_I(ip)->i_gid;
> +			eofb.eof_flags = XFS_EOF_FLAGS_SYNC|
> +					 XFS_EOF_FLAGS_GID;
> +			xfs_icache_free_eofblocks(ip->i_mount, &eofb);
> +			scanned = 1;
> +		}
> +	}
> +
> +	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount)) {
> +		dq = xfs_inode_dquot(ip, XFS_DQ_PROJ);
> +		if (dq && xfs_dquot_lowsp(dq)) {
> +			eofb.eof_prid = xfs_get_projid(ip);
> +			eofb.eof_flags = XFS_EOF_FLAGS_SYNC|
> +					 XFS_EOF_FLAGS_PRID|
> +					 XFS_EOF_FLAGS_FLUSH;
> +			xfs_icache_free_eofblocks(ip->i_mount, &eofb);
> +			scanned = 1;
> +		}
> +	}

I really don't like the fact that project quota is hiding a data
flush in the "free_quota_eofblocks" logic. It just strikes me a the
wrong thing to do because if it's a real ENOSPC we're just going to
have to do this anyway...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2014-03-31 22:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-03-28 13:15 [PATCH 0/5] xfs: run eofblocks scan on ENOSPC Brian Foster
2014-03-28 13:15 ` [PATCH 1/5] xfs: do eofb filtering before dirty check Brian Foster
2014-03-28 13:16 ` [PATCH 2/5] xfs: add flush flag to xfs_eofblocks Brian Foster
2014-03-31 21:47   ` Dave Chinner
2014-04-01 13:48     ` Brian Foster
2014-03-28 13:16 ` [PATCH 3/5] xfs: add scan owner field " Brian Foster
2014-03-28 13:16 ` [PATCH 4/5] xfs: run an eofblocks scan on ENOSPC/EDQUOT Brian Foster
2014-03-31 22:22   ` Dave Chinner [this message]
2014-04-01 13:55     ` Brian Foster
2014-04-01 21:19       ` Dave Chinner
2014-04-01 23:20         ` Brian Foster
2014-04-02  5:11           ` Dave Chinner
2014-04-02 20:11             ` Brian Foster
2014-04-03 22:18               ` Dave Chinner
2014-03-28 13:16 ` [PATCH 5/5] xfs: squash prealloc while over quota free space as well Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140331222246.GF17603@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.