All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jian Wen <wenjianhn@gmail.com>
Cc: linux-xfs@vger.kernel.org, djwong@kernel.org, hch@lst.de,
	dchinner@redhat.com, Jian Wen <wenjian1@xiaomi.com>
Subject: Re: [PATCH v2] xfs: improve handling of prjquot ENOSPC
Date: Tue, 19 Dec 2023 09:00:38 +1100	[thread overview]
Message-ID: <ZYDBBmZWabnbd3zq@dread.disaster.area> (raw)
In-Reply-To: <20231216153522.52767-1-wenjianhn@gmail.com>

On Sat, Dec 16, 2023 at 11:35:22PM +0800, Jian Wen wrote:
> Don't clear space of the whole fs when the project quota limit is
> reached, since it affects the writing performance of files of
> the directories that are under quota.
> 
> Changes since v1:
> - use the want_blockgc_free_quota helper that written by Darrick

I'm not convinced this is correct behaviour.

That is, we can get a real full filesystem ENOSPC even when project
quotas are on and the the project quota space is low. With this
change we will only flush project quotas rather than the whole
filesystem.

That seems like scope for real world ENOSPC regressions when project
quotas are enabled.

Hence my suggestion that we should be returning -EDQUOT from project
quotas and only converting it to -ENOSPC once the project quota has
been flushed and failed with EDQUOT a second time.

Keep in mind that I'm not interested in changing this code to
simplify it - this EDQUOT/ENOSPC flushing is replicated across
multiple fuinctions and so -all- of them need to change, not just
the buffered write path.

IOWs, I'm interested in having the code behave correctly in these
situations. If correctness means the code has to become more
complex, then so be it. However, with some simple refactoring, we
can isolate the complexity and make the code simpler.

> Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> ---
>  fs/xfs/xfs_file.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e33e5e13b95f..7764697e7822 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -24,6 +24,9 @@
>  #include "xfs_pnfs.h"
>  #include "xfs_iomap.h"
>  #include "xfs_reflink.h"
> +#include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
>  
>  #include <linux/dax.h>
>  #include <linux/falloc.h>
> @@ -761,6 +764,20 @@ xfs_file_dax_write(
>  	return ret;
>  }
>  
> +static inline bool want_blockgc_free_quota(struct xfs_inode *ip, int ret)
> +{
> +	if (ret == -EDQUOT)
> +		return true;
> +	if (ret != -ENOSPC)
> +		return false;
> +
> +	if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount) &&
> +	    ip->i_pdquot && xfs_dquot_lowsp(ip->i_pdquot))
> +		return true;
> +
> +	return false;
> +}

>  STATIC ssize_t
>  xfs_file_buffered_write(
>  	struct kiocb		*iocb,
> @@ -796,7 +813,7 @@ xfs_file_buffered_write(
>  	 * running at the same time.  Use a synchronous scan to increase the
>  	 * effectiveness of the scan.
>  	 */
> -	if (ret == -EDQUOT && !cleared_space) {
> +	if (want_blockgc_free_quota(ip, ret) && !cleared_space) {
>  		xfs_iunlock(ip, iolock);
>  		xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
>  		cleared_space = true;

IMO, this makes messy code even more messy, and makes it even more
inconsistent with all the EDQUOT/ENOSPC flushing that is done in the
xfs_trans_alloc_*() inode transaction setup helpers.

So, with the assumption that project quotas return EDQUOT and not
ENOSPC, we add this helper to fs/xfs/xfs_dquot.h:

static inline bool
xfs_dquot_is_enospc(
	struct xfs_dquot	*dqp)
{
	if (!dqp)
		return false;
	if (!xfs_dquot_is_enforced(dqp)
		return false;
	if (dqp->q_blk.hardlimit - dqp->q_blk.reserved > 0)
		return false;
	return true;
}

And this helper to fs/xfs/xfs_icache.c:

static void
xfs_blockgc_nospace_flush(
	struct xfs_inode	*ip,
	int			what)
{
	if (what == -EDQUOT) {
		xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
		return;
	}
	ASSERT(what == -ENOSPC);

	xfs_flush_inodes(ip->i_mount);
	icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
	xfs_blockgc_free_space(ip->i_mount, &icw);
}

The buffered write code ends up as:

	.....
	do {
		iolock = XFS_IOLOCK_EXCL;
		ret = xfs_ilock_iocb(iocb, iolock);
		if (ret)
			return ret;

		ret = xfs_file_write_checks(iocb, from, &iolock);
		if (ret)
			goto out;

		trace_xfs_file_buffered_write(iocb, from);
		ret = iomap_file_buffered_write(iocb, from,
				&xfs_buffered_write_iomap_ops);
		if (!(ret == -EDQUOT || ret = -ENOSPC))
			break;

		xfs_iunlock(ip, iolock);
		xfs_blockgc_nospace_flush(ip, ret);
	} while (retries++ == 0);

	if (ret == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
		ret = -ENOSPC;
	.....

This factors out the actual flushing behaviour when an error occurs
from the write code, and removes the clunky goto from the code. It
now clearly loops on a single retry after a ENOSPC/EDQUOT error,
and the high level code transforms the EDQUOT project quota error
once the loop errors out completely.

We can then do the same transformation to xfs_trans_alloc_icreate(),
xfs_trans_alloc_inode(), xfs_trans_alloc_ichange() and
xfs_trans_alloc_dir() using xfs_blockgc_nospace_flush() and
xfs_dquot_is_enospc().

This will then give us consistent project quota only flushing on
project quota failure, as well as consistent full filesystem ENOSPC
flushing behaviour across all types of inode operations.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-12-18 22:00 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-14 15:07 [PATCH] xfs: improve handling of prjquot ENOSPC Jian Wen
2023-12-14 15:29 ` Christoph Hellwig
2023-12-14 17:06 ` Darrick J. Wong
2023-12-14 21:13 ` Dave Chinner
2023-12-16 15:49   ` Jian Wen
2023-12-16 15:35 ` [PATCH v2] " Jian Wen
2023-12-18 22:00   ` Dave Chinner [this message]
2023-12-19  5:47     ` Christoph Hellwig
2023-12-19 13:50     ` Jian Wen
2023-12-23 11:00     ` Jian Wen
2024-01-04  6:22   ` [PATCH v4] " Jian Wen
2024-01-08  0:35     ` Dave Chinner
2024-01-09  6:14       ` Darrick J. Wong
2024-01-09  6:38         ` Dave Chinner
2024-01-11  1:42           ` Darrick J. Wong
2024-01-11  7:24             ` Dave Chinner
2024-01-10 14:08     ` kernel test robot
2023-12-23 10:56 ` [PATCH v3] " Jian Wen
2024-01-03  1:42   ` Darrick J. Wong
2024-01-03  3:45     ` Jian Wen
2024-01-04  1:46       ` Darrick J. Wong
2024-01-04  3:36         ` Jian Wen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZYDBBmZWabnbd3zq@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wenjian1@xiaomi.com \
    --cc=wenjianhn@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.