* [PATCH] xfs: improve handling of prjquot ENOSPC
@ 2023-12-14 15:07 Jian Wen
2023-12-14 15:29 ` Christoph Hellwig
` (4 more replies)
0 siblings, 5 replies; 22+ messages in thread
From: Jian Wen @ 2023-12-14 15:07 UTC (permalink / raw)
To: linux-xfs; +Cc: Jian Wen, djwong, Jian Wen
Don't clear space of the whole fs when the project quota limit is
reached, since it affects the writing performance of files of
the directories that are under quota.
Only run cow/eofblocks scans on the quota attached to the inode.
Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
---
fs/xfs/xfs_file.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e33e5e13b95f..4fbe262d33cc 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -24,6 +24,9 @@
#include "xfs_pnfs.h"
#include "xfs_iomap.h"
#include "xfs_reflink.h"
+#include "xfs_quota.h"
+#include "xfs_dquot_item.h"
+#include "xfs_dquot.h"
#include <linux/dax.h>
#include <linux/falloc.h>
@@ -803,8 +806,18 @@ xfs_file_buffered_write(
goto write_retry;
} else if (ret == -ENOSPC && !cleared_space) {
struct xfs_icwalk icw = {0};
+ struct xfs_dquot *pdqp = ip->i_pdquot;
cleared_space = true;
+ if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount) &&
+ pdqp && xfs_dquot_lowsp(pdqp)) {
+ xfs_iunlock(ip, iolock);
+ icw.icw_prid = pdqp->q_id;
+ icw.icw_flags |= XFS_ICWALK_FLAG_PRID;
+ xfs_blockgc_free_space(ip->i_mount, &icw);
+ goto write_retry;
+ }
+
xfs_flush_inodes(ip->i_mount);
xfs_iunlock(ip, iolock);
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] xfs: improve handling of prjquot ENOSPC
2023-12-14 15:07 [PATCH] xfs: improve handling of prjquot ENOSPC Jian Wen
@ 2023-12-14 15:29 ` Christoph Hellwig
2023-12-14 17:06 ` Darrick J. Wong
` (3 subsequent siblings)
4 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-12-14 15:29 UTC (permalink / raw)
To: Jian Wen; +Cc: linux-xfs, djwong, Jian Wen
On Thu, Dec 14, 2023 at 11:07:08PM +0800, Jian Wen wrote:
> } else if (ret == -ENOSPC && !cleared_space) {
> struct xfs_icwalk icw = {0};
> + struct xfs_dquot *pdqp = ip->i_pdquot;
>
> cleared_space = true;
> + if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount) &&
> + pdqp && xfs_dquot_lowsp(pdqp)) {
wrong identation here, broken up control statements must not be
indented at the same level as the following block.
Otherwise this looks reaonable to me, but I'm a little worried
about the amount of ENOSPC/EDQUOT handling we're growing in
xfs_file_buffered_write. Can we split all this into a helper?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xfs: improve handling of prjquot ENOSPC
2023-12-14 15:07 [PATCH] xfs: improve handling of prjquot ENOSPC Jian Wen
2023-12-14 15:29 ` Christoph Hellwig
@ 2023-12-14 17:06 ` Darrick J. Wong
2023-12-14 21:13 ` Dave Chinner
` (2 subsequent siblings)
4 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2023-12-14 17:06 UTC (permalink / raw)
To: Jian Wen; +Cc: linux-xfs, Jian Wen
On Thu, Dec 14, 2023 at 11:07:08PM +0800, Jian Wen wrote:
> Don't clear space of the whole fs when the project quota limit is
> reached, since it affects the writing performance of files of
> the directories that are under quota.
>
> Only run cow/eofblocks scans on the quota attached to the inode.
>
> Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> ---
> fs/xfs/xfs_file.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e33e5e13b95f..4fbe262d33cc 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -24,6 +24,9 @@
> #include "xfs_pnfs.h"
> #include "xfs_iomap.h"
> #include "xfs_reflink.h"
> +#include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
>
> #include <linux/dax.h>
> #include <linux/falloc.h>
> @@ -803,8 +806,18 @@ xfs_file_buffered_write(
> goto write_retry;
> } else if (ret == -ENOSPC && !cleared_space) {
> struct xfs_icwalk icw = {0};
> + struct xfs_dquot *pdqp = ip->i_pdquot;
>
> cleared_space = true;
> + if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount) &&
> + pdqp && xfs_dquot_lowsp(pdqp)) {
Do not align the if test code with the body.
> + xfs_iunlock(ip, iolock);
> + icw.icw_prid = pdqp->q_id;
> + icw.icw_flags |= XFS_ICWALK_FLAG_PRID;
> + xfs_blockgc_free_space(ip->i_mount, &icw);
This is an open-coded version of the xfs_blockgc_free_quota above.
The decision-making is complex enough to warrant a helper predicate:
static inline bool want_blockgc_free_quota(struct xfs_inode *ip, int ret)
{
if (ret == -EDQUOT)
return true;
if (ret != -ENOSPC)
return false;
if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount) &&
ip->i_pdquot && xfs_dquot_lowsp(ip->i_pdquot))
return true;
return false;
}
if (want_blockgc_free_quota(ip, ret) && !cleared_space) {
xfs_iunlock(ip, iolock);
xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
cleared_space = true;
goto write_retry;
} else if (ret == -ENOSPC && !cleared_space) {
--D
> + goto write_retry;
> + }
> +
> xfs_flush_inodes(ip->i_mount);
>
> xfs_iunlock(ip, iolock);
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] xfs: improve handling of prjquot ENOSPC
2023-12-14 15:07 [PATCH] xfs: improve handling of prjquot ENOSPC Jian Wen
2023-12-14 15:29 ` Christoph Hellwig
2023-12-14 17:06 ` Darrick J. Wong
@ 2023-12-14 21:13 ` Dave Chinner
2023-12-16 15:49 ` Jian Wen
2023-12-16 15:35 ` [PATCH v2] " Jian Wen
2023-12-23 10:56 ` [PATCH v3] " Jian Wen
4 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2023-12-14 21:13 UTC (permalink / raw)
To: Jian Wen; +Cc: linux-xfs, djwong, Jian Wen
On Thu, Dec 14, 2023 at 11:07:08PM +0800, Jian Wen wrote:
> Don't clear space of the whole fs when the project quota limit is
> reached, since it affects the writing performance of files of
> the directories that are under quota.
>
> Only run cow/eofblocks scans on the quota attached to the inode.
>
> Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> ---
> fs/xfs/xfs_file.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e33e5e13b95f..4fbe262d33cc 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -24,6 +24,9 @@
> #include "xfs_pnfs.h"
> #include "xfs_iomap.h"
> #include "xfs_reflink.h"
> +#include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
>
> #include <linux/dax.h>
> #include <linux/falloc.h>
> @@ -803,8 +806,18 @@ xfs_file_buffered_write(
> goto write_retry;
> } else if (ret == -ENOSPC && !cleared_space) {
> struct xfs_icwalk icw = {0};
> + struct xfs_dquot *pdqp = ip->i_pdquot;
>
> cleared_space = true;
> + if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount) &&
> + pdqp && xfs_dquot_lowsp(pdqp)) {
> + xfs_iunlock(ip, iolock);
> + icw.icw_prid = pdqp->q_id;
> + icw.icw_flags |= XFS_ICWALK_FLAG_PRID;
> + xfs_blockgc_free_space(ip->i_mount, &icw);
> + goto write_retry;
> + }
This is just duplicating the EDQUOT error handling path for the
specific case that project quota exhaustion returns ENOSPC instead
of EDQUOT. i.e. the root cause of the problem is that project
quotas are returning ENOSPC rather than EDQUOT, right?
Perhaps we should look at having project quotas return EDQUOT like
the other quotas so we get the project quota block scan done in the
correct places, then convert the error to ENOSPC if we get a second
EDQUOT from the project quota on retry?
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2] xfs: improve handling of prjquot ENOSPC
2023-12-14 15:07 [PATCH] xfs: improve handling of prjquot ENOSPC Jian Wen
` (2 preceding siblings ...)
2023-12-14 21:13 ` Dave Chinner
@ 2023-12-16 15:35 ` Jian Wen
2023-12-18 22:00 ` Dave Chinner
2024-01-04 6:22 ` [PATCH v4] " Jian Wen
2023-12-23 10:56 ` [PATCH v3] " Jian Wen
4 siblings, 2 replies; 22+ messages in thread
From: Jian Wen @ 2023-12-16 15:35 UTC (permalink / raw)
To: linux-xfs; +Cc: Jian Wen, djwong, hch, dchinner, Jian Wen
Don't clear space of the whole fs when the project quota limit is
reached, since it affects the writing performance of files of
the directories that are under quota.
Changes since v1:
- use the want_blockgc_free_quota helper that written by Darrick
Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
---
fs/xfs/xfs_file.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e33e5e13b95f..7764697e7822 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -24,6 +24,9 @@
#include "xfs_pnfs.h"
#include "xfs_iomap.h"
#include "xfs_reflink.h"
+#include "xfs_quota.h"
+#include "xfs_dquot_item.h"
+#include "xfs_dquot.h"
#include <linux/dax.h>
#include <linux/falloc.h>
@@ -761,6 +764,20 @@ xfs_file_dax_write(
return ret;
}
+static inline bool want_blockgc_free_quota(struct xfs_inode *ip, int ret)
+{
+ if (ret == -EDQUOT)
+ return true;
+ if (ret != -ENOSPC)
+ return false;
+
+ if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount) &&
+ ip->i_pdquot && xfs_dquot_lowsp(ip->i_pdquot))
+ return true;
+
+ return false;
+}
+
STATIC ssize_t
xfs_file_buffered_write(
struct kiocb *iocb,
@@ -796,7 +813,7 @@ xfs_file_buffered_write(
* running at the same time. Use a synchronous scan to increase the
* effectiveness of the scan.
*/
- if (ret == -EDQUOT && !cleared_space) {
+ if (want_blockgc_free_quota(ip, ret) && !cleared_space) {
xfs_iunlock(ip, iolock);
xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
cleared_space = true;
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] xfs: improve handling of prjquot ENOSPC
2023-12-14 21:13 ` Dave Chinner
@ 2023-12-16 15:49 ` Jian Wen
0 siblings, 0 replies; 22+ messages in thread
From: Jian Wen @ 2023-12-16 15:49 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs, djwong, Jian Wen
On Fri, Dec 15, 2023 at 5:13 AM Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Dec 14, 2023 at 11:07:08PM +0800, Jian Wen wrote:
> > Don't clear space of the whole fs when the project quota limit is
> > reached, since it affects the writing performance of files of
> > the directories that are under quota.
> >
> > Only run cow/eofblocks scans on the quota attached to the inode.
> >
> > Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> > ---
> > fs/xfs/xfs_file.c | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index e33e5e13b95f..4fbe262d33cc 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -24,6 +24,9 @@
> > #include "xfs_pnfs.h"
> > #include "xfs_iomap.h"
> > #include "xfs_reflink.h"
> > +#include "xfs_quota.h"
> > +#include "xfs_dquot_item.h"
> > +#include "xfs_dquot.h"
> >
> > #include <linux/dax.h>
> > #include <linux/falloc.h>
> > @@ -803,8 +806,18 @@ xfs_file_buffered_write(
> > goto write_retry;
> > } else if (ret == -ENOSPC && !cleared_space) {
> > struct xfs_icwalk icw = {0};
> > + struct xfs_dquot *pdqp = ip->i_pdquot;
> >
> > cleared_space = true;
> > + if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount) &&
> > + pdqp && xfs_dquot_lowsp(pdqp)) {
> > + xfs_iunlock(ip, iolock);
> > + icw.icw_prid = pdqp->q_id;
> > + icw.icw_flags |= XFS_ICWALK_FLAG_PRID;
> > + xfs_blockgc_free_space(ip->i_mount, &icw);
> > + goto write_retry;
> > + }
>
> This is just duplicating the EDQUOT error handling path for the
> specific case that project quota exhaustion returns ENOSPC instead
> of EDQUOT. i.e. the root cause of the problem is that project
> quotas are returning ENOSPC rather than EDQUOT, right?
>
Yes, it is.
> Perhaps we should look at having project quotas return EDQUOT like
> the other quotas so we get the project quota block scan done in the
> correct places, then convert the error to ENOSPC if we get a second
> EDQUOT from the project quota on retry?
>
I did so by only returning EDQUOT in xfs_trans_dqresv(), it made error
handling more complex.
And after we get a second EDQUOT, we still need to check if it is
project quota that is over limit.
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
--
Best,
Jian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] xfs: improve handling of prjquot ENOSPC
2023-12-16 15:35 ` [PATCH v2] " Jian Wen
@ 2023-12-18 22:00 ` Dave Chinner
2023-12-19 5:47 ` Christoph Hellwig
` (2 more replies)
2024-01-04 6:22 ` [PATCH v4] " Jian Wen
1 sibling, 3 replies; 22+ messages in thread
From: Dave Chinner @ 2023-12-18 22:00 UTC (permalink / raw)
To: Jian Wen; +Cc: linux-xfs, djwong, hch, dchinner, Jian Wen
On Sat, Dec 16, 2023 at 11:35:22PM +0800, Jian Wen wrote:
> Don't clear space of the whole fs when the project quota limit is
> reached, since it affects the writing performance of files of
> the directories that are under quota.
>
> Changes since v1:
> - use the want_blockgc_free_quota helper that written by Darrick
I'm not convinced this is correct behaviour.
That is, we can get a real full filesystem ENOSPC even when project
quotas are on and the the project quota space is low. With this
change we will only flush project quotas rather than the whole
filesystem.
That seems like scope for real world ENOSPC regressions when project
quotas are enabled.
Hence my suggestion that we should be returning -EDQUOT from project
quotas and only converting it to -ENOSPC once the project quota has
been flushed and failed with EDQUOT a second time.
Keep in mind that I'm not interested in changing this code to
simplify it - this EDQUOT/ENOSPC flushing is replicated across
multiple fuinctions and so -all- of them need to change, not just
the buffered write path.
IOWs, I'm interested in having the code behave correctly in these
situations. If correctness means the code has to become more
complex, then so be it. However, with some simple refactoring, we
can isolate the complexity and make the code simpler.
> Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> ---
> fs/xfs/xfs_file.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e33e5e13b95f..7764697e7822 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -24,6 +24,9 @@
> #include "xfs_pnfs.h"
> #include "xfs_iomap.h"
> #include "xfs_reflink.h"
> +#include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
>
> #include <linux/dax.h>
> #include <linux/falloc.h>
> @@ -761,6 +764,20 @@ xfs_file_dax_write(
> return ret;
> }
>
> +static inline bool want_blockgc_free_quota(struct xfs_inode *ip, int ret)
> +{
> + if (ret == -EDQUOT)
> + return true;
> + if (ret != -ENOSPC)
> + return false;
> +
> + if (XFS_IS_PQUOTA_ENFORCED(ip->i_mount) &&
> + ip->i_pdquot && xfs_dquot_lowsp(ip->i_pdquot))
> + return true;
> +
> + return false;
> +}
> STATIC ssize_t
> xfs_file_buffered_write(
> struct kiocb *iocb,
> @@ -796,7 +813,7 @@ xfs_file_buffered_write(
> * running at the same time. Use a synchronous scan to increase the
> * effectiveness of the scan.
> */
> - if (ret == -EDQUOT && !cleared_space) {
> + if (want_blockgc_free_quota(ip, ret) && !cleared_space) {
> xfs_iunlock(ip, iolock);
> xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
> cleared_space = true;
IMO, this makes messy code even more messy, and makes it even more
inconsistent with all the EDQUOT/ENOSPC flushing that is done in the
xfs_trans_alloc_*() inode transaction setup helpers.
So, with the assumption that project quotas return EDQUOT and not
ENOSPC, we add this helper to fs/xfs/xfs_dquot.h:
static inline bool
xfs_dquot_is_enospc(
struct xfs_dquot *dqp)
{
if (!dqp)
return false;
if (!xfs_dquot_is_enforced(dqp)
return false;
if (dqp->q_blk.hardlimit - dqp->q_blk.reserved > 0)
return false;
return true;
}
And this helper to fs/xfs/xfs_icache.c:
static void
xfs_blockgc_nospace_flush(
struct xfs_inode *ip,
int what)
{
if (what == -EDQUOT) {
xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
return;
}
ASSERT(what == -ENOSPC);
xfs_flush_inodes(ip->i_mount);
icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
xfs_blockgc_free_space(ip->i_mount, &icw);
}
The buffered write code ends up as:
.....
do {
iolock = XFS_IOLOCK_EXCL;
ret = xfs_ilock_iocb(iocb, iolock);
if (ret)
return ret;
ret = xfs_file_write_checks(iocb, from, &iolock);
if (ret)
goto out;
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
&xfs_buffered_write_iomap_ops);
if (!(ret == -EDQUOT || ret = -ENOSPC))
break;
xfs_iunlock(ip, iolock);
xfs_blockgc_nospace_flush(ip, ret);
} while (retries++ == 0);
if (ret == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
ret = -ENOSPC;
.....
This factors out the actual flushing behaviour when an error occurs
from the write code, and removes the clunky goto from the code. It
now clearly loops on a single retry after a ENOSPC/EDQUOT error,
and the high level code transforms the EDQUOT project quota error
once the loop errors out completely.
We can then do the same transformation to xfs_trans_alloc_icreate(),
xfs_trans_alloc_inode(), xfs_trans_alloc_ichange() and
xfs_trans_alloc_dir() using xfs_blockgc_nospace_flush() and
xfs_dquot_is_enospc().
This will then give us consistent project quota only flushing on
project quota failure, as well as consistent full filesystem ENOSPC
flushing behaviour across all types of inode operations.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] xfs: improve handling of prjquot ENOSPC
2023-12-18 22:00 ` Dave Chinner
@ 2023-12-19 5:47 ` Christoph Hellwig
2023-12-19 13:50 ` Jian Wen
2023-12-23 11:00 ` Jian Wen
2 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2023-12-19 5:47 UTC (permalink / raw)
To: Dave Chinner; +Cc: Jian Wen, linux-xfs, djwong, hch, dchinner, Jian Wen
On Tue, Dec 19, 2023 at 09:00:38AM +1100, Dave Chinner wrote:
> I'm not convinced this is correct behaviour.
>
> That is, we can get a real full filesystem ENOSPC even when project
> quotas are on and the the project quota space is low. With this
> change we will only flush project quotas rather than the whole
> filesystem.
Yes.
> quotas are enabled.
>
> Hence my suggestion that we should be returning -EDQUOT from project
> quotas and only converting it to -ENOSPC once the project quota has
> been flushed and failed with EDQUOT a second time.
FYI, my suggestion of turning cleared_space into a counter and still
falling back to the normal ENOSPC clearing would also work. But in
the long run moving this pretty messy abuse of ENOSPC for out of qupta
in the low-level code into the highest syscall boudary is probably a
good thing for maintainability.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2] xfs: improve handling of prjquot ENOSPC
2023-12-18 22:00 ` Dave Chinner
2023-12-19 5:47 ` Christoph Hellwig
@ 2023-12-19 13:50 ` Jian Wen
2023-12-23 11:00 ` Jian Wen
2 siblings, 0 replies; 22+ messages in thread
From: Jian Wen @ 2023-12-19 13:50 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs, djwong, hch, dchinner, Jian Wen
On Tue, Dec 19, 2023 at 6:00 AM Dave Chinner <david@fromorbit.com> wrote:
>
>
> This will then give us consistent project quota only flushing on
> project quota failure, as well as consistent full filesystem ENOSPC
> flushing behaviour across all types of inode operations.
Thanks for the detailed explanation. I will try to make it consistent this week.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v3] xfs: improve handling of prjquot ENOSPC
2023-12-14 15:07 [PATCH] xfs: improve handling of prjquot ENOSPC Jian Wen
` (3 preceding siblings ...)
2023-12-16 15:35 ` [PATCH v2] " Jian Wen
@ 2023-12-23 10:56 ` Jian Wen
2024-01-03 1:42 ` Darrick J. Wong
4 siblings, 1 reply; 22+ messages in thread
From: Jian Wen @ 2023-12-23 10:56 UTC (permalink / raw)
To: linux-xfs; +Cc: Jian Wen, djwong, hch, dchinner, Dave Chinner, Jian Wen
Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
limit is reached. As a result, xfs_file_buffered_write() will flush
the whole filesystem instead of the project quota.
Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
-ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
for both EDQUOT and ENOSPC consistent.
Changes since v2:
- completely rewrote based on the suggestions from Dave
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
---
fs/xfs/xfs_dquot.h | 13 +++++++++++
fs/xfs/xfs_file.c | 40 +++++++++++---------------------
fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++-----------
fs/xfs/xfs_icache.h | 7 +++---
fs/xfs/xfs_inode.c | 19 ++++++++-------
fs/xfs/xfs_reflink.c | 2 ++
fs/xfs/xfs_trans.c | 39 +++++++++++++++++++++++--------
fs/xfs/xfs_trans_dquot.c | 3 ---
8 files changed, 109 insertions(+), 64 deletions(-)
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
index 80c8f851a2f3..c5f4a170eef1 100644
--- a/fs/xfs/xfs_dquot.h
+++ b/fs/xfs/xfs_dquot.h
@@ -183,6 +183,19 @@ xfs_dquot_is_enforced(
return false;
}
+static inline bool
+xfs_dquot_is_enospc(
+ struct xfs_dquot *dqp)
+{
+ if (!dqp)
+ return false;
+ if (!xfs_dquot_is_enforced(dqp))
+ return false;
+ if (dqp->q_blk.hardlimit - dqp->q_blk.reserved > 0)
+ return false;
+ return true;
+}
+
/*
* Check whether a dquot is under low free space conditions. We assume the quota
* is enabled and enforced.
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e33e5e13b95f..4b6e90bb1c59 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -24,6 +24,9 @@
#include "xfs_pnfs.h"
#include "xfs_iomap.h"
#include "xfs_reflink.h"
+#include "xfs_quota.h"
+#include "xfs_dquot_item.h"
+#include "xfs_dquot.h"
#include <linux/dax.h>
#include <linux/falloc.h>
@@ -785,32 +788,17 @@ xfs_file_buffered_write(
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
&xfs_buffered_write_iomap_ops);
-
- /*
- * If we hit a space limit, try to free up some lingering preallocated
- * space before returning an error. In the case of ENOSPC, first try to
- * write back all dirty inodes to free up some of the excess reserved
- * metadata space. This reduces the chances that the eofblocks scan
- * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
- * also behaves as a filter to prevent too many eofblocks scans from
- * running at the same time. Use a synchronous scan to increase the
- * effectiveness of the scan.
- */
- if (ret == -EDQUOT && !cleared_space) {
- xfs_iunlock(ip, iolock);
- xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
- cleared_space = true;
- goto write_retry;
- } else if (ret == -ENOSPC && !cleared_space) {
- struct xfs_icwalk icw = {0};
-
- cleared_space = true;
- xfs_flush_inodes(ip->i_mount);
-
- xfs_iunlock(ip, iolock);
- icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
- xfs_blockgc_free_space(ip->i_mount, &icw);
- goto write_retry;
+ if (ret == -EDQUOT || ret == -ENOSPC) {
+ if (!cleared_space) {
+ xfs_iunlock(ip, iolock);
+ xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
+ ip->i_gdquot, ip->i_pdquot,
+ XFS_ICWALK_FLAG_SYNC, ret);
+ cleared_space = true;
+ goto write_retry;
+ }
+ if (ret == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
+ ret = -ENOSPC;
}
out:
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index dba514a2c84d..d2dcb653befc 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -64,6 +64,10 @@ static int xfs_icwalk_ag(struct xfs_perag *pag,
XFS_ICWALK_FLAG_RECLAIM_SICK | \
XFS_ICWALK_FLAG_UNION)
+static int xfs_blockgc_free_dquots(struct xfs_mount *mp,
+ struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
+ struct xfs_dquot *pdqp, unsigned int iwalk_flags);
+
/*
* Allocate and initialise an xfs_inode.
*/
@@ -1477,6 +1481,38 @@ xfs_blockgc_free_space(
return xfs_inodegc_flush(mp);
}
+/*
+ * If we hit a space limit, try to free up some lingering preallocated
+ * space before returning an error. In the case of ENOSPC, first try to
+ * write back all dirty inodes to free up some of the excess reserved
+ * metadata space. This reduces the chances that the eofblocks scan
+ * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
+ * also behaves as a filter to prevent too many eofblocks scans from
+ * running at the same time. Use a synchronous scan to increase the
+ * effectiveness of the scan.
+ */
+void
+xfs_blockgc_nospace_flush(
+ struct xfs_mount *mp,
+ struct xfs_dquot *udqp,
+ struct xfs_dquot *gdqp,
+ struct xfs_dquot *pdqp,
+ unsigned int iwalk_flags,
+ int what)
+{
+ ASSERT(what == -EDQUOT || what == -ENOSPC);
+
+ if (what == -EDQUOT) {
+ xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, iwalk_flags);
+ } else if (what == -ENOSPC) {
+ struct xfs_icwalk icw = {0};
+
+ xfs_flush_inodes(mp);
+ icw.icw_flags = iwalk_flags;
+ xfs_blockgc_free_space(mp, &icw);
+ }
+}
+
/*
* Reclaim all the free space that we can by scheduling the background blockgc
* and inodegc workers immediately and waiting for them all to clear.
@@ -1515,7 +1551,7 @@ xfs_blockgc_flush_all(
* (XFS_ICWALK_FLAG_SYNC), the caller also must not hold any inode's IOLOCK or
* MMAPLOCK.
*/
-int
+static int
xfs_blockgc_free_dquots(
struct xfs_mount *mp,
struct xfs_dquot *udqp,
@@ -1559,18 +1595,6 @@ xfs_blockgc_free_dquots(
return xfs_blockgc_free_space(mp, &icw);
}
-/* Run cow/eofblocks scans on the quotas attached to the inode. */
-int
-xfs_blockgc_free_quota(
- struct xfs_inode *ip,
- unsigned int iwalk_flags)
-{
- return xfs_blockgc_free_dquots(ip->i_mount,
- xfs_inode_dquot(ip, XFS_DQTYPE_USER),
- xfs_inode_dquot(ip, XFS_DQTYPE_GROUP),
- xfs_inode_dquot(ip, XFS_DQTYPE_PROJ), iwalk_flags);
-}
-
/* XFS Inode Cache Walking Code */
/*
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 905944dafbe5..c0833450969d 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -57,11 +57,10 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, unsigned long nr_to_scan);
void xfs_inode_mark_reclaimable(struct xfs_inode *ip);
-int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp,
- struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
- unsigned int iwalk_flags);
-int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags);
int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm);
+void xfs_blockgc_nospace_flush(struct xfs_mount *mp, struct xfs_dquot *udqp,
+ struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
+ unsigned int iwalk_flags, int what);
int xfs_blockgc_flush_all(struct xfs_mount *mp);
void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index c0f1c89786c2..e99ffa17d3d0 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -27,6 +27,8 @@
#include "xfs_errortag.h"
#include "xfs_error.h"
#include "xfs_quota.h"
+#include "xfs_dquot_item.h"
+#include "xfs_dquot.h"
#include "xfs_filestream.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
@@ -1007,12 +1009,6 @@ xfs_create(
*/
error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp, resblks,
&tp);
- if (error == -ENOSPC) {
- /* flush outstanding delalloc blocks and retry */
- xfs_flush_inodes(mp);
- error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp,
- resblks, &tp);
- }
if (error)
goto out_release_dquots;
@@ -2951,14 +2947,21 @@ xfs_rename(
if (spaceres != 0) {
error = xfs_trans_reserve_quota_nblks(tp, target_dp, spaceres,
0, false);
- if (error == -EDQUOT || error == -ENOSPC) {
+ if (error == -EDQUOT) {
if (!retried) {
xfs_trans_cancel(tp);
- xfs_blockgc_free_quota(target_dp, 0);
+ xfs_blockgc_nospace_flush(target_dp->i_mount,
+ target_dp->i_udquot,
+ target_dp->i_gdquot,
+ target_dp->i_pdquot,
+ 0, error);
retried = true;
goto retry;
}
+ if (xfs_dquot_is_enospc(target_dp->i_pdquot))
+ error = -ENOSPC;
+
nospace_error = error;
spaceres = 0;
error = 0;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index e5b62dc28466..cb036e1173ae 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -25,6 +25,8 @@
#include "xfs_bit.h"
#include "xfs_alloc.h"
#include "xfs_quota.h"
+#include "xfs_dquot_item.h"
+#include "xfs_dquot.h"
#include "xfs_reflink.h"
#include "xfs_iomap.h"
#include "xfs_ag.h"
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 305c9d07bf1b..1574d7aa49c4 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1217,15 +1217,21 @@ xfs_trans_alloc_inode(
}
error = xfs_trans_reserve_quota_nblks(tp, ip, dblocks, rblocks, force);
- if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
+ if (error == -EDQUOT && !retried) {
xfs_trans_cancel(tp);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
- xfs_blockgc_free_quota(ip, 0);
+ xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
+ ip->i_gdquot, ip->i_pdquot,
+ 0, error);
retried = true;
goto retry;
}
- if (error)
+ if (error) {
+ if (error == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
+ error = -ENOSPC;
+
goto out_cancel;
+ }
*tpp = tp;
return 0;
@@ -1260,13 +1266,16 @@ xfs_trans_alloc_icreate(
return error;
error = xfs_trans_reserve_quota_icreate(tp, udqp, gdqp, pdqp, dblocks);
- if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
+ if (error == -EDQUOT && !retried) {
xfs_trans_cancel(tp);
- xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
+ xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0, error);
retried = true;
goto retry;
}
if (error) {
+ if (error == -EDQUOT && xfs_dquot_is_enospc(pdqp))
+ error = -ENOSPC;
+
xfs_trans_cancel(tp);
return error;
}
@@ -1340,14 +1349,19 @@ xfs_trans_alloc_ichange(
error = xfs_trans_reserve_quota_bydquots(tp, mp, udqp, gdqp,
pdqp, ip->i_nblocks + ip->i_delayed_blks,
1, qflags);
- if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
+ if (error == -EDQUOT && !retried) {
xfs_trans_cancel(tp);
- xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
+ xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0,
+ error);
retried = true;
goto retry;
}
- if (error)
+ if (error) {
+ if (error == -EDQUOT && xfs_dquot_is_enospc(pdqp))
+ error = -ENOSPC;
+
goto out_cancel;
+ }
}
*tpp = tp;
@@ -1419,14 +1433,19 @@ xfs_trans_alloc_dir(
goto done;
error = xfs_trans_reserve_quota_nblks(tp, dp, resblks, 0, false);
- if (error == -EDQUOT || error == -ENOSPC) {
+ if (error == -EDQUOT) {
if (!retried) {
xfs_trans_cancel(tp);
- xfs_blockgc_free_quota(dp, 0);
+ xfs_blockgc_nospace_flush(dp->i_mount, ip->i_udquot,
+ ip->i_gdquot, ip->i_pdquot,
+ 0, error);
retried = true;
goto retry;
}
+ if (xfs_dquot_is_enospc(dp->i_pdquot))
+ error = -ENOSPC;
+
*nospace_error = error;
resblks = 0;
error = 0;
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index aa00cf67ad72..7201b86ef2c2 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -700,8 +700,6 @@ xfs_trans_dqresv(
error_return:
xfs_dqunlock(dqp);
- if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ)
- return -ENOSPC;
return -EDQUOT;
error_corrupt:
xfs_dqunlock(dqp);
@@ -717,7 +715,6 @@ xfs_trans_dqresv(
* approach.
*
* flags = XFS_QMOPT_FORCE_RES evades limit enforcement. Used by chown.
- * XFS_QMOPT_ENOSPC returns ENOSPC not EDQUOT. Used by pquota.
* XFS_TRANS_DQ_RES_BLKS reserves regular disk blocks
* XFS_TRANS_DQ_RES_RTBLKS reserves realtime disk blocks
* dquots are unlocked on return, if they were not locked by caller.
--
2.34.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2] xfs: improve handling of prjquot ENOSPC
2023-12-18 22:00 ` Dave Chinner
2023-12-19 5:47 ` Christoph Hellwig
2023-12-19 13:50 ` Jian Wen
@ 2023-12-23 11:00 ` Jian Wen
2 siblings, 0 replies; 22+ messages in thread
From: Jian Wen @ 2023-12-23 11:00 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs, djwong, hch, dchinner, Jian Wen
On Tue, Dec 19, 2023 at 6:00 AM Dave Chinner <david@fromorbit.com> wrote:
>
> So, with the assumption that project quotas return EDQUOT and not
> ENOSPC, we add this helper to fs/xfs/xfs_dquot.h:
>
> static inline bool
> xfs_dquot_is_enospc(
> struct xfs_dquot *dqp)
> {
> if (!dqp)
> return false;
> if (!xfs_dquot_is_enforced(dqp)
> return false;
> if (dqp->q_blk.hardlimit - dqp->q_blk.reserved > 0)
I need some help on how to improve the above enospc check.
It seems that we need a value that is larger than 0.
4.0K space is available.
# df -h ./
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 100M 100M 4.0K 100% /tmp/roothome/vda
ENOSPC is expected, but gets EDQUOT.
# touch t
touch: cannot touch 't': Disk quota exceeded
>
> The buffered write code ends up as:
>
> .....
> do {
> iolock = XFS_IOLOCK_EXCL;
> ret = xfs_ilock_iocb(iocb, iolock);
> if (ret)
> return ret;
>
> ret = xfs_file_write_checks(iocb, from, &iolock);
> if (ret)
> goto out;
>
> trace_xfs_file_buffered_write(iocb, from);
> ret = iomap_file_buffered_write(iocb, from,
> &xfs_buffered_write_iomap_ops);
> if (!(ret == -EDQUOT || ret = -ENOSPC))
> break;
>
> xfs_iunlock(ip, iolock);
xfs_iunlock() is called after the retry.
> xfs_blockgc_nospace_flush(ip, ret);
> } while (retries++ == 0);
>
> if (ret == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
> ret = -ENOSPC;
> .....
out:
if (iolock)
xfs_iunlock(ip, iolock);
Double xfs_iunlock().
Please take a look at the v3 patch.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] xfs: improve handling of prjquot ENOSPC
2023-12-23 10:56 ` [PATCH v3] " Jian Wen
@ 2024-01-03 1:42 ` Darrick J. Wong
2024-01-03 3:45 ` Jian Wen
0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2024-01-03 1:42 UTC (permalink / raw)
To: Jian Wen; +Cc: linux-xfs, hch, dchinner, Dave Chinner, Jian Wen
On Sat, Dec 23, 2023 at 06:56:32PM +0800, Jian Wen wrote:
> Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> limit is reached. As a result, xfs_file_buffered_write() will flush
> the whole filesystem instead of the project quota.
>
> Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> for both EDQUOT and ENOSPC consistent.
>
> Changes since v2:
> - completely rewrote based on the suggestions from Dave
>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> ---
> fs/xfs/xfs_dquot.h | 13 +++++++++++
> fs/xfs/xfs_file.c | 40 +++++++++++---------------------
> fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++-----------
> fs/xfs/xfs_icache.h | 7 +++---
> fs/xfs/xfs_inode.c | 19 ++++++++-------
> fs/xfs/xfs_reflink.c | 2 ++
> fs/xfs/xfs_trans.c | 39 +++++++++++++++++++++++--------
> fs/xfs/xfs_trans_dquot.c | 3 ---
> 8 files changed, 109 insertions(+), 64 deletions(-)
>
> diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> index 80c8f851a2f3..c5f4a170eef1 100644
> --- a/fs/xfs/xfs_dquot.h
> +++ b/fs/xfs/xfs_dquot.h
> @@ -183,6 +183,19 @@ xfs_dquot_is_enforced(
> return false;
> }
>
> +static inline bool
> +xfs_dquot_is_enospc(
I don't like encoding error codes in a function name, especially since
EDQUOT is used for more dquot types than ENOSPC.
"xfs_dquot_hardlimit_exceeded" ?
> + struct xfs_dquot *dqp)
> +{
> + if (!dqp)
> + return false;
> + if (!xfs_dquot_is_enforced(dqp))
> + return false;
> + if (dqp->q_blk.hardlimit - dqp->q_blk.reserved > 0)
> + return false;
return q_blk.reserved > dqp->q_blk.hardlimit; ?
hardlimit == reserved shouldn't be considered an edquot condition.
Also, locking is needed here.
> + return true;
> +}
> +
> /*
> * Check whether a dquot is under low free space conditions. We assume the quota
> * is enabled and enforced.
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e33e5e13b95f..4b6e90bb1c59 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -24,6 +24,9 @@
> #include "xfs_pnfs.h"
> #include "xfs_iomap.h"
> #include "xfs_reflink.h"
> +#include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
>
> #include <linux/dax.h>
> #include <linux/falloc.h>
> @@ -785,32 +788,17 @@ xfs_file_buffered_write(
> trace_xfs_file_buffered_write(iocb, from);
> ret = iomap_file_buffered_write(iocb, from,
> &xfs_buffered_write_iomap_ops);
> -
> - /*
> - * If we hit a space limit, try to free up some lingering preallocated
> - * space before returning an error. In the case of ENOSPC, first try to
> - * write back all dirty inodes to free up some of the excess reserved
> - * metadata space. This reduces the chances that the eofblocks scan
> - * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
> - * also behaves as a filter to prevent too many eofblocks scans from
> - * running at the same time. Use a synchronous scan to increase the
> - * effectiveness of the scan.
> - */
> - if (ret == -EDQUOT && !cleared_space) {
> - xfs_iunlock(ip, iolock);
> - xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
> - cleared_space = true;
> - goto write_retry;
> - } else if (ret == -ENOSPC && !cleared_space) {
> - struct xfs_icwalk icw = {0};
> -
> - cleared_space = true;
> - xfs_flush_inodes(ip->i_mount);
> -
> - xfs_iunlock(ip, iolock);
> - icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
> - xfs_blockgc_free_space(ip->i_mount, &icw);
> - goto write_retry;
> + if (ret == -EDQUOT || ret == -ENOSPC) {
Huh?
Dave commented earlier:
"Hence my suggestion that we should be returning -EDQUOT from project
quotas and only converting it to -ENOSPC once the project quota has been
flushed and failed with EDQUOT a second time."
I think what he meant was changing xfs_trans_dqresv to return EDQUOT in
all circumstances. I don't see that anywhere in this patch?
Granted I think it's messy to set the /wrong/ errno in low level code
and require higher level code to detect and change it. But I don't see
a better way to do that.
Also, a question for Dave: What happens if xfs_trans_dqresv detects a
fatal overage in the project dquot, but the overage condition clears by
the time this caller rechecks the dquot? Is it ok that we then return
EDQUOT whereas the current code would return ENOSPC?
--D
> + if (!cleared_space) {
> + xfs_iunlock(ip, iolock);
> + xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
> + ip->i_gdquot, ip->i_pdquot,
> + XFS_ICWALK_FLAG_SYNC, ret);
> + cleared_space = true;
> + goto write_retry;
> + }
> + if (ret == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
> + ret = -ENOSPC;
> }
>
> out:
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index dba514a2c84d..d2dcb653befc 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -64,6 +64,10 @@ static int xfs_icwalk_ag(struct xfs_perag *pag,
> XFS_ICWALK_FLAG_RECLAIM_SICK | \
> XFS_ICWALK_FLAG_UNION)
>
> +static int xfs_blockgc_free_dquots(struct xfs_mount *mp,
> + struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
> + struct xfs_dquot *pdqp, unsigned int iwalk_flags);
> +
> /*
> * Allocate and initialise an xfs_inode.
> */
> @@ -1477,6 +1481,38 @@ xfs_blockgc_free_space(
> return xfs_inodegc_flush(mp);
> }
>
> +/*
> + * If we hit a space limit, try to free up some lingering preallocated
> + * space before returning an error. In the case of ENOSPC, first try to
> + * write back all dirty inodes to free up some of the excess reserved
> + * metadata space. This reduces the chances that the eofblocks scan
> + * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
> + * also behaves as a filter to prevent too many eofblocks scans from
> + * running at the same time. Use a synchronous scan to increase the
> + * effectiveness of the scan.
> + */
> +void
> +xfs_blockgc_nospace_flush(
> + struct xfs_mount *mp,
> + struct xfs_dquot *udqp,
> + struct xfs_dquot *gdqp,
> + struct xfs_dquot *pdqp,
> + unsigned int iwalk_flags,
> + int what)
> +{
> + ASSERT(what == -EDQUOT || what == -ENOSPC);
> +
> + if (what == -EDQUOT) {
> + xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, iwalk_flags);
> + } else if (what == -ENOSPC) {
> + struct xfs_icwalk icw = {0};
> +
> + xfs_flush_inodes(mp);
> + icw.icw_flags = iwalk_flags;
> + xfs_blockgc_free_space(mp, &icw);
> + }
> +}
> +
> /*
> * Reclaim all the free space that we can by scheduling the background blockgc
> * and inodegc workers immediately and waiting for them all to clear.
> @@ -1515,7 +1551,7 @@ xfs_blockgc_flush_all(
> * (XFS_ICWALK_FLAG_SYNC), the caller also must not hold any inode's IOLOCK or
> * MMAPLOCK.
> */
> -int
> +static int
> xfs_blockgc_free_dquots(
> struct xfs_mount *mp,
> struct xfs_dquot *udqp,
> @@ -1559,18 +1595,6 @@ xfs_blockgc_free_dquots(
> return xfs_blockgc_free_space(mp, &icw);
> }
>
> -/* Run cow/eofblocks scans on the quotas attached to the inode. */
> -int
> -xfs_blockgc_free_quota(
> - struct xfs_inode *ip,
> - unsigned int iwalk_flags)
> -{
> - return xfs_blockgc_free_dquots(ip->i_mount,
> - xfs_inode_dquot(ip, XFS_DQTYPE_USER),
> - xfs_inode_dquot(ip, XFS_DQTYPE_GROUP),
> - xfs_inode_dquot(ip, XFS_DQTYPE_PROJ), iwalk_flags);
> -}
> -
> /* XFS Inode Cache Walking Code */
>
> /*
> diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> index 905944dafbe5..c0833450969d 100644
> --- a/fs/xfs/xfs_icache.h
> +++ b/fs/xfs/xfs_icache.h
> @@ -57,11 +57,10 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, unsigned long nr_to_scan);
>
> void xfs_inode_mark_reclaimable(struct xfs_inode *ip);
>
> -int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp,
> - struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
> - unsigned int iwalk_flags);
> -int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags);
> int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm);
> +void xfs_blockgc_nospace_flush(struct xfs_mount *mp, struct xfs_dquot *udqp,
> + struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
> + unsigned int iwalk_flags, int what);
> int xfs_blockgc_flush_all(struct xfs_mount *mp);
>
> void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index c0f1c89786c2..e99ffa17d3d0 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -27,6 +27,8 @@
> #include "xfs_errortag.h"
> #include "xfs_error.h"
> #include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
> #include "xfs_filestream.h"
> #include "xfs_trace.h"
> #include "xfs_icache.h"
> @@ -1007,12 +1009,6 @@ xfs_create(
> */
> error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp, resblks,
> &tp);
> - if (error == -ENOSPC) {
> - /* flush outstanding delalloc blocks and retry */
> - xfs_flush_inodes(mp);
> - error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp,
> - resblks, &tp);
> - }
> if (error)
> goto out_release_dquots;
>
> @@ -2951,14 +2947,21 @@ xfs_rename(
> if (spaceres != 0) {
> error = xfs_trans_reserve_quota_nblks(tp, target_dp, spaceres,
> 0, false);
> - if (error == -EDQUOT || error == -ENOSPC) {
> + if (error == -EDQUOT) {
> if (!retried) {
> xfs_trans_cancel(tp);
> - xfs_blockgc_free_quota(target_dp, 0);
> + xfs_blockgc_nospace_flush(target_dp->i_mount,
> + target_dp->i_udquot,
> + target_dp->i_gdquot,
> + target_dp->i_pdquot,
> + 0, error);
> retried = true;
> goto retry;
> }
>
> + if (xfs_dquot_is_enospc(target_dp->i_pdquot))
> + error = -ENOSPC;
> +
> nospace_error = error;
> spaceres = 0;
> error = 0;
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index e5b62dc28466..cb036e1173ae 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -25,6 +25,8 @@
> #include "xfs_bit.h"
> #include "xfs_alloc.h"
> #include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
> #include "xfs_reflink.h"
> #include "xfs_iomap.h"
> #include "xfs_ag.h"
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 305c9d07bf1b..1574d7aa49c4 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -1217,15 +1217,21 @@ xfs_trans_alloc_inode(
> }
>
> error = xfs_trans_reserve_quota_nblks(tp, ip, dblocks, rblocks, force);
> - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> + if (error == -EDQUOT && !retried) {
> xfs_trans_cancel(tp);
> xfs_iunlock(ip, XFS_ILOCK_EXCL);
> - xfs_blockgc_free_quota(ip, 0);
> + xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
> + ip->i_gdquot, ip->i_pdquot,
> + 0, error);
> retried = true;
> goto retry;
> }
> - if (error)
> + if (error) {
> + if (error == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
> + error = -ENOSPC;
> +
> goto out_cancel;
> + }
>
> *tpp = tp;
> return 0;
> @@ -1260,13 +1266,16 @@ xfs_trans_alloc_icreate(
> return error;
>
> error = xfs_trans_reserve_quota_icreate(tp, udqp, gdqp, pdqp, dblocks);
> - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> + if (error == -EDQUOT && !retried) {
> xfs_trans_cancel(tp);
> - xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
> + xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0, error);
> retried = true;
> goto retry;
> }
> if (error) {
> + if (error == -EDQUOT && xfs_dquot_is_enospc(pdqp))
> + error = -ENOSPC;
> +
> xfs_trans_cancel(tp);
> return error;
> }
> @@ -1340,14 +1349,19 @@ xfs_trans_alloc_ichange(
> error = xfs_trans_reserve_quota_bydquots(tp, mp, udqp, gdqp,
> pdqp, ip->i_nblocks + ip->i_delayed_blks,
> 1, qflags);
> - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> + if (error == -EDQUOT && !retried) {
> xfs_trans_cancel(tp);
> - xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
> + xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0,
> + error);
> retried = true;
> goto retry;
> }
> - if (error)
> + if (error) {
> + if (error == -EDQUOT && xfs_dquot_is_enospc(pdqp))
> + error = -ENOSPC;
> +
> goto out_cancel;
> + }
> }
>
> *tpp = tp;
> @@ -1419,14 +1433,19 @@ xfs_trans_alloc_dir(
> goto done;
>
> error = xfs_trans_reserve_quota_nblks(tp, dp, resblks, 0, false);
> - if (error == -EDQUOT || error == -ENOSPC) {
> + if (error == -EDQUOT) {
> if (!retried) {
> xfs_trans_cancel(tp);
> - xfs_blockgc_free_quota(dp, 0);
> + xfs_blockgc_nospace_flush(dp->i_mount, ip->i_udquot,
> + ip->i_gdquot, ip->i_pdquot,
> + 0, error);
> retried = true;
> goto retry;
> }
>
> + if (xfs_dquot_is_enospc(dp->i_pdquot))
> + error = -ENOSPC;
> +
> *nospace_error = error;
> resblks = 0;
> error = 0;
> diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
> index aa00cf67ad72..7201b86ef2c2 100644
> --- a/fs/xfs/xfs_trans_dquot.c
> +++ b/fs/xfs/xfs_trans_dquot.c
> @@ -700,8 +700,6 @@ xfs_trans_dqresv(
>
> error_return:
> xfs_dqunlock(dqp);
> - if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ)
> - return -ENOSPC;
> return -EDQUOT;
> error_corrupt:
> xfs_dqunlock(dqp);
> @@ -717,7 +715,6 @@ xfs_trans_dqresv(
> * approach.
> *
> * flags = XFS_QMOPT_FORCE_RES evades limit enforcement. Used by chown.
> - * XFS_QMOPT_ENOSPC returns ENOSPC not EDQUOT. Used by pquota.
> * XFS_TRANS_DQ_RES_BLKS reserves regular disk blocks
> * XFS_TRANS_DQ_RES_RTBLKS reserves realtime disk blocks
> * dquots are unlocked on return, if they were not locked by caller.
> --
> 2.34.1
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] xfs: improve handling of prjquot ENOSPC
2024-01-03 1:42 ` Darrick J. Wong
@ 2024-01-03 3:45 ` Jian Wen
2024-01-04 1:46 ` Darrick J. Wong
0 siblings, 1 reply; 22+ messages in thread
From: Jian Wen @ 2024-01-03 3:45 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, Dave Chinner, Jian Wen
> Dave commented earlier:
>
> "Hence my suggestion that we should be returning -EDQUOT from project
> quotas and only converting it to -ENOSPC once the project quota has been
> flushed and failed with EDQUOT a second time."
>
> I think what he meant was changing xfs_trans_dqresv to return EDQUOT in
> all circumstances. I don't see that anywhere in this patch?
The related code that makes xfs_trans_dqresv() return -EDQUOT if the
project quota limit is reached is as below.
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -700,8 +700,6 @@ xfs_trans_dqresv(
error_return:
xfs_dqunlock(dqp);
- if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ)
- return -ENOSPC;
return -EDQUOT;
error_corrupt:
xfs_dqunlock(dqp);
On Wed, Jan 3, 2024 at 9:42 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Sat, Dec 23, 2023 at 06:56:32PM +0800, Jian Wen wrote:
> > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> > limit is reached. As a result, xfs_file_buffered_write() will flush
> > the whole filesystem instead of the project quota.
> >
> > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> > for both EDQUOT and ENOSPC consistent.
> >
> > Changes since v2:
> > - completely rewrote based on the suggestions from Dave
> >
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> > ---
> > fs/xfs/xfs_dquot.h | 13 +++++++++++
> > fs/xfs/xfs_file.c | 40 +++++++++++---------------------
> > fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++-----------
> > fs/xfs/xfs_icache.h | 7 +++---
> > fs/xfs/xfs_inode.c | 19 ++++++++-------
> > fs/xfs/xfs_reflink.c | 2 ++
> > fs/xfs/xfs_trans.c | 39 +++++++++++++++++++++++--------
> > fs/xfs/xfs_trans_dquot.c | 3 ---
> > 8 files changed, 109 insertions(+), 64 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > index 80c8f851a2f3..c5f4a170eef1 100644
> > --- a/fs/xfs/xfs_dquot.h
> > +++ b/fs/xfs/xfs_dquot.h
> > @@ -183,6 +183,19 @@ xfs_dquot_is_enforced(
> > return false;
> > }
> >
> > +static inline bool
> > +xfs_dquot_is_enospc(
>
> I don't like encoding error codes in a function name, especially since
> EDQUOT is used for more dquot types than ENOSPC.
>
> "xfs_dquot_hardlimit_exceeded" ?
>
> > + struct xfs_dquot *dqp)
> > +{
> > + if (!dqp)
> > + return false;
> > + if (!xfs_dquot_is_enforced(dqp))
> > + return false;
> > + if (dqp->q_blk.hardlimit - dqp->q_blk.reserved > 0)
> > + return false;
>
> return q_blk.reserved > dqp->q_blk.hardlimit; ?
>
> hardlimit == reserved shouldn't be considered an edquot condition.
>
> Also, locking is needed here.
>
> > + return true;
> > +}
> > +
> > /*
> > * Check whether a dquot is under low free space conditions. We assume the quota
> > * is enabled and enforced.
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index e33e5e13b95f..4b6e90bb1c59 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -24,6 +24,9 @@
> > #include "xfs_pnfs.h"
> > #include "xfs_iomap.h"
> > #include "xfs_reflink.h"
> > +#include "xfs_quota.h"
> > +#include "xfs_dquot_item.h"
> > +#include "xfs_dquot.h"
> >
> > #include <linux/dax.h>
> > #include <linux/falloc.h>
> > @@ -785,32 +788,17 @@ xfs_file_buffered_write(
> > trace_xfs_file_buffered_write(iocb, from);
> > ret = iomap_file_buffered_write(iocb, from,
> > &xfs_buffered_write_iomap_ops);
> > -
> > - /*
> > - * If we hit a space limit, try to free up some lingering preallocated
> > - * space before returning an error. In the case of ENOSPC, first try to
> > - * write back all dirty inodes to free up some of the excess reserved
> > - * metadata space. This reduces the chances that the eofblocks scan
> > - * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
> > - * also behaves as a filter to prevent too many eofblocks scans from
> > - * running at the same time. Use a synchronous scan to increase the
> > - * effectiveness of the scan.
> > - */
> > - if (ret == -EDQUOT && !cleared_space) {
> > - xfs_iunlock(ip, iolock);
> > - xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
> > - cleared_space = true;
> > - goto write_retry;
> > - } else if (ret == -ENOSPC && !cleared_space) {
> > - struct xfs_icwalk icw = {0};
> > -
> > - cleared_space = true;
> > - xfs_flush_inodes(ip->i_mount);
> > -
> > - xfs_iunlock(ip, iolock);
> > - icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
> > - xfs_blockgc_free_space(ip->i_mount, &icw);
> > - goto write_retry;
> > + if (ret == -EDQUOT || ret == -ENOSPC) {
>
> Huh?
>
> Dave commented earlier:
>
> "Hence my suggestion that we should be returning -EDQUOT from project
> quotas and only converting it to -ENOSPC once the project quota has been
> flushed and failed with EDQUOT a second time."
>
> I think what he meant was changing xfs_trans_dqresv to return EDQUOT in
> all circumstances. I don't see that anywhere in this patch?
>
> Granted I think it's messy to set the /wrong/ errno in low level code
> and require higher level code to detect and change it. But I don't see
> a better way to do that.
>
> Also, a question for Dave: What happens if xfs_trans_dqresv detects a
> fatal overage in the project dquot, but the overage condition clears by
> the time this caller rechecks the dquot? Is it ok that we then return
> EDQUOT whereas the current code would return ENOSPC?
>
> --D
>
> > + if (!cleared_space) {
> > + xfs_iunlock(ip, iolock);
> > + xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
> > + ip->i_gdquot, ip->i_pdquot,
> > + XFS_ICWALK_FLAG_SYNC, ret);
> > + cleared_space = true;
> > + goto write_retry;
> > + }
> > + if (ret == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
> > + ret = -ENOSPC;
> > }
> >
> > out:
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index dba514a2c84d..d2dcb653befc 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -64,6 +64,10 @@ static int xfs_icwalk_ag(struct xfs_perag *pag,
> > XFS_ICWALK_FLAG_RECLAIM_SICK | \
> > XFS_ICWALK_FLAG_UNION)
> >
> > +static int xfs_blockgc_free_dquots(struct xfs_mount *mp,
> > + struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
> > + struct xfs_dquot *pdqp, unsigned int iwalk_flags);
> > +
> > /*
> > * Allocate and initialise an xfs_inode.
> > */
> > @@ -1477,6 +1481,38 @@ xfs_blockgc_free_space(
> > return xfs_inodegc_flush(mp);
> > }
> >
> > +/*
> > + * If we hit a space limit, try to free up some lingering preallocated
> > + * space before returning an error. In the case of ENOSPC, first try to
> > + * write back all dirty inodes to free up some of the excess reserved
> > + * metadata space. This reduces the chances that the eofblocks scan
> > + * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
> > + * also behaves as a filter to prevent too many eofblocks scans from
> > + * running at the same time. Use a synchronous scan to increase the
> > + * effectiveness of the scan.
> > + */
> > +void
> > +xfs_blockgc_nospace_flush(
> > + struct xfs_mount *mp,
> > + struct xfs_dquot *udqp,
> > + struct xfs_dquot *gdqp,
> > + struct xfs_dquot *pdqp,
> > + unsigned int iwalk_flags,
> > + int what)
> > +{
> > + ASSERT(what == -EDQUOT || what == -ENOSPC);
> > +
> > + if (what == -EDQUOT) {
> > + xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, iwalk_flags);
> > + } else if (what == -ENOSPC) {
> > + struct xfs_icwalk icw = {0};
> > +
> > + xfs_flush_inodes(mp);
> > + icw.icw_flags = iwalk_flags;
> > + xfs_blockgc_free_space(mp, &icw);
> > + }
> > +}
> > +
> > /*
> > * Reclaim all the free space that we can by scheduling the background blockgc
> > * and inodegc workers immediately and waiting for them all to clear.
> > @@ -1515,7 +1551,7 @@ xfs_blockgc_flush_all(
> > * (XFS_ICWALK_FLAG_SYNC), the caller also must not hold any inode's IOLOCK or
> > * MMAPLOCK.
> > */
> > -int
> > +static int
> > xfs_blockgc_free_dquots(
> > struct xfs_mount *mp,
> > struct xfs_dquot *udqp,
> > @@ -1559,18 +1595,6 @@ xfs_blockgc_free_dquots(
> > return xfs_blockgc_free_space(mp, &icw);
> > }
> >
> > -/* Run cow/eofblocks scans on the quotas attached to the inode. */
> > -int
> > -xfs_blockgc_free_quota(
> > - struct xfs_inode *ip,
> > - unsigned int iwalk_flags)
> > -{
> > - return xfs_blockgc_free_dquots(ip->i_mount,
> > - xfs_inode_dquot(ip, XFS_DQTYPE_USER),
> > - xfs_inode_dquot(ip, XFS_DQTYPE_GROUP),
> > - xfs_inode_dquot(ip, XFS_DQTYPE_PROJ), iwalk_flags);
> > -}
> > -
> > /* XFS Inode Cache Walking Code */
> >
> > /*
> > diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> > index 905944dafbe5..c0833450969d 100644
> > --- a/fs/xfs/xfs_icache.h
> > +++ b/fs/xfs/xfs_icache.h
> > @@ -57,11 +57,10 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, unsigned long nr_to_scan);
> >
> > void xfs_inode_mark_reclaimable(struct xfs_inode *ip);
> >
> > -int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp,
> > - struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
> > - unsigned int iwalk_flags);
> > -int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags);
> > int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm);
> > +void xfs_blockgc_nospace_flush(struct xfs_mount *mp, struct xfs_dquot *udqp,
> > + struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
> > + unsigned int iwalk_flags, int what);
> > int xfs_blockgc_flush_all(struct xfs_mount *mp);
> >
> > void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index c0f1c89786c2..e99ffa17d3d0 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -27,6 +27,8 @@
> > #include "xfs_errortag.h"
> > #include "xfs_error.h"
> > #include "xfs_quota.h"
> > +#include "xfs_dquot_item.h"
> > +#include "xfs_dquot.h"
> > #include "xfs_filestream.h"
> > #include "xfs_trace.h"
> > #include "xfs_icache.h"
> > @@ -1007,12 +1009,6 @@ xfs_create(
> > */
> > error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp, resblks,
> > &tp);
> > - if (error == -ENOSPC) {
> > - /* flush outstanding delalloc blocks and retry */
> > - xfs_flush_inodes(mp);
> > - error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp,
> > - resblks, &tp);
> > - }
> > if (error)
> > goto out_release_dquots;
> >
> > @@ -2951,14 +2947,21 @@ xfs_rename(
> > if (spaceres != 0) {
> > error = xfs_trans_reserve_quota_nblks(tp, target_dp, spaceres,
> > 0, false);
> > - if (error == -EDQUOT || error == -ENOSPC) {
> > + if (error == -EDQUOT) {
> > if (!retried) {
> > xfs_trans_cancel(tp);
> > - xfs_blockgc_free_quota(target_dp, 0);
> > + xfs_blockgc_nospace_flush(target_dp->i_mount,
> > + target_dp->i_udquot,
> > + target_dp->i_gdquot,
> > + target_dp->i_pdquot,
> > + 0, error);
> > retried = true;
> > goto retry;
> > }
> >
> > + if (xfs_dquot_is_enospc(target_dp->i_pdquot))
> > + error = -ENOSPC;
> > +
> > nospace_error = error;
> > spaceres = 0;
> > error = 0;
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index e5b62dc28466..cb036e1173ae 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -25,6 +25,8 @@
> > #include "xfs_bit.h"
> > #include "xfs_alloc.h"
> > #include "xfs_quota.h"
> > +#include "xfs_dquot_item.h"
> > +#include "xfs_dquot.h"
> > #include "xfs_reflink.h"
> > #include "xfs_iomap.h"
> > #include "xfs_ag.h"
> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index 305c9d07bf1b..1574d7aa49c4 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -1217,15 +1217,21 @@ xfs_trans_alloc_inode(
> > }
> >
> > error = xfs_trans_reserve_quota_nblks(tp, ip, dblocks, rblocks, force);
> > - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> > + if (error == -EDQUOT && !retried) {
> > xfs_trans_cancel(tp);
> > xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > - xfs_blockgc_free_quota(ip, 0);
> > + xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
> > + ip->i_gdquot, ip->i_pdquot,
> > + 0, error);
> > retried = true;
> > goto retry;
> > }
> > - if (error)
> > + if (error) {
> > + if (error == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
> > + error = -ENOSPC;
> > +
> > goto out_cancel;
> > + }
> >
> > *tpp = tp;
> > return 0;
> > @@ -1260,13 +1266,16 @@ xfs_trans_alloc_icreate(
> > return error;
> >
> > error = xfs_trans_reserve_quota_icreate(tp, udqp, gdqp, pdqp, dblocks);
> > - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> > + if (error == -EDQUOT && !retried) {
> > xfs_trans_cancel(tp);
> > - xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
> > + xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0, error);
> > retried = true;
> > goto retry;
> > }
> > if (error) {
> > + if (error == -EDQUOT && xfs_dquot_is_enospc(pdqp))
> > + error = -ENOSPC;
> > +
> > xfs_trans_cancel(tp);
> > return error;
> > }
> > @@ -1340,14 +1349,19 @@ xfs_trans_alloc_ichange(
> > error = xfs_trans_reserve_quota_bydquots(tp, mp, udqp, gdqp,
> > pdqp, ip->i_nblocks + ip->i_delayed_blks,
> > 1, qflags);
> > - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> > + if (error == -EDQUOT && !retried) {
> > xfs_trans_cancel(tp);
> > - xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
> > + xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0,
> > + error);
> > retried = true;
> > goto retry;
> > }
> > - if (error)
> > + if (error) {
> > + if (error == -EDQUOT && xfs_dquot_is_enospc(pdqp))
> > + error = -ENOSPC;
> > +
> > goto out_cancel;
> > + }
> > }
> >
> > *tpp = tp;
> > @@ -1419,14 +1433,19 @@ xfs_trans_alloc_dir(
> > goto done;
> >
> > error = xfs_trans_reserve_quota_nblks(tp, dp, resblks, 0, false);
> > - if (error == -EDQUOT || error == -ENOSPC) {
> > + if (error == -EDQUOT) {
> > if (!retried) {
> > xfs_trans_cancel(tp);
> > - xfs_blockgc_free_quota(dp, 0);
> > + xfs_blockgc_nospace_flush(dp->i_mount, ip->i_udquot,
> > + ip->i_gdquot, ip->i_pdquot,
> > + 0, error);
> > retried = true;
> > goto retry;
> > }
> >
> > + if (xfs_dquot_is_enospc(dp->i_pdquot))
> > + error = -ENOSPC;
> > +
> > *nospace_error = error;
> > resblks = 0;
> > error = 0;
> > diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
> > index aa00cf67ad72..7201b86ef2c2 100644
> > --- a/fs/xfs/xfs_trans_dquot.c
> > +++ b/fs/xfs/xfs_trans_dquot.c
> > @@ -700,8 +700,6 @@ xfs_trans_dqresv(
> >
> > error_return:
> > xfs_dqunlock(dqp);
> > - if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ)
> > - return -ENOSPC;
> > return -EDQUOT;
> > error_corrupt:
> > xfs_dqunlock(dqp);
> > @@ -717,7 +715,6 @@ xfs_trans_dqresv(
> > * approach.
> > *
> > * flags = XFS_QMOPT_FORCE_RES evades limit enforcement. Used by chown.
> > - * XFS_QMOPT_ENOSPC returns ENOSPC not EDQUOT. Used by pquota.
> > * XFS_TRANS_DQ_RES_BLKS reserves regular disk blocks
> > * XFS_TRANS_DQ_RES_RTBLKS reserves realtime disk blocks
> > * dquots are unlocked on return, if they were not locked by caller.
> > --
> > 2.34.1
> >
> >
--
Best,
Jian
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] xfs: improve handling of prjquot ENOSPC
2024-01-03 3:45 ` Jian Wen
@ 2024-01-04 1:46 ` Darrick J. Wong
2024-01-04 3:36 ` Jian Wen
0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2024-01-04 1:46 UTC (permalink / raw)
To: Jian Wen; +Cc: linux-xfs, hch, dchinner, Dave Chinner, Jian Wen
On Wed, Jan 03, 2024 at 11:45:30AM +0800, Jian Wen wrote:
> > Dave commented earlier:
> >
> > "Hence my suggestion that we should be returning -EDQUOT from project
> > quotas and only converting it to -ENOSPC once the project quota has been
> > flushed and failed with EDQUOT a second time."
> >
> > I think what he meant was changing xfs_trans_dqresv to return EDQUOT in
> > all circumstances. I don't see that anywhere in this patch?
>
> The related code that makes xfs_trans_dqresv() return -EDQUOT if the
> project quota limit is reached is as below.
> +++ b/fs/xfs/xfs_trans_dquot.c
> @@ -700,8 +700,6 @@ xfs_trans_dqresv(
>
> error_return:
> xfs_dqunlock(dqp);
> - if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ)
> - return -ENOSPC;
> return -EDQUOT;
> error_corrupt:
> xfs_dqunlock(dqp);
Oh, silly me, I missed that change, sorry about that.
> On Wed, Jan 3, 2024 at 9:42 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Sat, Dec 23, 2023 at 06:56:32PM +0800, Jian Wen wrote:
> > > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> > > limit is reached. As a result, xfs_file_buffered_write() will flush
> > > the whole filesystem instead of the project quota.
> > >
> > > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> > > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> > > for both EDQUOT and ENOSPC consistent.
> > >
> > > Changes since v2:
> > > - completely rewrote based on the suggestions from Dave
> > >
> > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> > > ---
> > > fs/xfs/xfs_dquot.h | 13 +++++++++++
> > > fs/xfs/xfs_file.c | 40 +++++++++++---------------------
> > > fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++-----------
> > > fs/xfs/xfs_icache.h | 7 +++---
> > > fs/xfs/xfs_inode.c | 19 ++++++++-------
> > > fs/xfs/xfs_reflink.c | 2 ++
> > > fs/xfs/xfs_trans.c | 39 +++++++++++++++++++++++--------
> > > fs/xfs/xfs_trans_dquot.c | 3 ---
> > > 8 files changed, 109 insertions(+), 64 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > > index 80c8f851a2f3..c5f4a170eef1 100644
> > > --- a/fs/xfs/xfs_dquot.h
> > > +++ b/fs/xfs/xfs_dquot.h
> > > @@ -183,6 +183,19 @@ xfs_dquot_is_enforced(
> > > return false;
> > > }
> > >
> > > +static inline bool
> > > +xfs_dquot_is_enospc(
> >
> > I don't like encoding error codes in a function name, especially since
> > EDQUOT is used for more dquot types than ENOSPC.
> >
> > "xfs_dquot_hardlimit_exceeded" ?
> >
> > > + struct xfs_dquot *dqp)
> > > +{
> > > + if (!dqp)
> > > + return false;
> > > + if (!xfs_dquot_is_enforced(dqp))
> > > + return false;
> > > + if (dqp->q_blk.hardlimit - dqp->q_blk.reserved > 0)
> > > + return false;
> >
> > return q_blk.reserved > dqp->q_blk.hardlimit; ?
> >
> > hardlimit == reserved shouldn't be considered an edquot condition.
> >
> > Also, locking is needed here.
Any response to this?
> > > + return true;
> > > +}
> > > +
> > > /*
> > > * Check whether a dquot is under low free space conditions. We assume the quota
> > > * is enabled and enforced.
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index e33e5e13b95f..4b6e90bb1c59 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -24,6 +24,9 @@
> > > #include "xfs_pnfs.h"
> > > #include "xfs_iomap.h"
> > > #include "xfs_reflink.h"
> > > +#include "xfs_quota.h"
> > > +#include "xfs_dquot_item.h"
> > > +#include "xfs_dquot.h"
> > >
> > > #include <linux/dax.h>
> > > #include <linux/falloc.h>
> > > @@ -785,32 +788,17 @@ xfs_file_buffered_write(
> > > trace_xfs_file_buffered_write(iocb, from);
> > > ret = iomap_file_buffered_write(iocb, from,
> > > &xfs_buffered_write_iomap_ops);
> > > -
> > > - /*
> > > - * If we hit a space limit, try to free up some lingering preallocated
> > > - * space before returning an error. In the case of ENOSPC, first try to
> > > - * write back all dirty inodes to free up some of the excess reserved
> > > - * metadata space. This reduces the chances that the eofblocks scan
> > > - * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
> > > - * also behaves as a filter to prevent too many eofblocks scans from
> > > - * running at the same time. Use a synchronous scan to increase the
> > > - * effectiveness of the scan.
> > > - */
> > > - if (ret == -EDQUOT && !cleared_space) {
> > > - xfs_iunlock(ip, iolock);
> > > - xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
> > > - cleared_space = true;
> > > - goto write_retry;
> > > - } else if (ret == -ENOSPC && !cleared_space) {
> > > - struct xfs_icwalk icw = {0};
> > > -
> > > - cleared_space = true;
> > > - xfs_flush_inodes(ip->i_mount);
> > > -
> > > - xfs_iunlock(ip, iolock);
> > > - icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
> > > - xfs_blockgc_free_space(ip->i_mount, &icw);
> > > - goto write_retry;
> > > + if (ret == -EDQUOT || ret == -ENOSPC) {
> >
> > Huh?
> >
> > Dave commented earlier:
> >
> > "Hence my suggestion that we should be returning -EDQUOT from project
> > quotas and only converting it to -ENOSPC once the project quota has been
> > flushed and failed with EDQUOT a second time."
> >
> > I think what he meant was changing xfs_trans_dqresv to return EDQUOT in
> > all circumstances. I don't see that anywhere in this patch?
> >
> > Granted I think it's messy to set the /wrong/ errno in low level code
> > and require higher level code to detect and change it. But I don't see
> > a better way to do that.
> >
> > Also, a question for Dave: What happens if xfs_trans_dqresv detects a
> > fatal overage in the project dquot, but the overage condition clears by
> > the time this caller rechecks the dquot? Is it ok that we then return
> > EDQUOT whereas the current code would return ENOSPC?
I think this question is still relevant, though. Or perhaps we should
define our own code for project quota exceeded, and translate that to
ENOSPC in the callers?
> >
> > --D
> >
> > > + if (!cleared_space) {
> > > + xfs_iunlock(ip, iolock);
> > > + xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
> > > + ip->i_gdquot, ip->i_pdquot,
> > > + XFS_ICWALK_FLAG_SYNC, ret);
> > > + cleared_space = true;
> > > + goto write_retry;
> > > + }
> > > + if (ret == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
> > > + ret = -ENOSPC;
> > > }
> > >
> > > out:
> > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > index dba514a2c84d..d2dcb653befc 100644
> > > --- a/fs/xfs/xfs_icache.c
> > > +++ b/fs/xfs/xfs_icache.c
> > > @@ -64,6 +64,10 @@ static int xfs_icwalk_ag(struct xfs_perag *pag,
> > > XFS_ICWALK_FLAG_RECLAIM_SICK | \
> > > XFS_ICWALK_FLAG_UNION)
> > >
> > > +static int xfs_blockgc_free_dquots(struct xfs_mount *mp,
> > > + struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
> > > + struct xfs_dquot *pdqp, unsigned int iwalk_flags);
> > > +
> > > /*
> > > * Allocate and initialise an xfs_inode.
> > > */
> > > @@ -1477,6 +1481,38 @@ xfs_blockgc_free_space(
> > > return xfs_inodegc_flush(mp);
> > > }
> > >
> > > +/*
> > > + * If we hit a space limit, try to free up some lingering preallocated
> > > + * space before returning an error. In the case of ENOSPC, first try to
> > > + * write back all dirty inodes to free up some of the excess reserved
> > > + * metadata space. This reduces the chances that the eofblocks scan
> > > + * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
> > > + * also behaves as a filter to prevent too many eofblocks scans from
> > > + * running at the same time. Use a synchronous scan to increase the
> > > + * effectiveness of the scan.
> > > + */
> > > +void
> > > +xfs_blockgc_nospace_flush(
> > > + struct xfs_mount *mp,
> > > + struct xfs_dquot *udqp,
> > > + struct xfs_dquot *gdqp,
> > > + struct xfs_dquot *pdqp,
> > > + unsigned int iwalk_flags,
> > > + int what)
> > > +{
> > > + ASSERT(what == -EDQUOT || what == -ENOSPC);
> > > +
> > > + if (what == -EDQUOT) {
> > > + xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, iwalk_flags);
> > > + } else if (what == -ENOSPC) {
> > > + struct xfs_icwalk icw = {0};
> > > +
> > > + xfs_flush_inodes(mp);
> > > + icw.icw_flags = iwalk_flags;
> > > + xfs_blockgc_free_space(mp, &icw);
> > > + }
> > > +}
> > > +
> > > /*
> > > * Reclaim all the free space that we can by scheduling the background blockgc
> > > * and inodegc workers immediately and waiting for them all to clear.
> > > @@ -1515,7 +1551,7 @@ xfs_blockgc_flush_all(
> > > * (XFS_ICWALK_FLAG_SYNC), the caller also must not hold any inode's IOLOCK or
> > > * MMAPLOCK.
> > > */
> > > -int
> > > +static int
> > > xfs_blockgc_free_dquots(
> > > struct xfs_mount *mp,
> > > struct xfs_dquot *udqp,
> > > @@ -1559,18 +1595,6 @@ xfs_blockgc_free_dquots(
> > > return xfs_blockgc_free_space(mp, &icw);
> > > }
> > >
> > > -/* Run cow/eofblocks scans on the quotas attached to the inode. */
> > > -int
> > > -xfs_blockgc_free_quota(
> > > - struct xfs_inode *ip,
> > > - unsigned int iwalk_flags)
> > > -{
> > > - return xfs_blockgc_free_dquots(ip->i_mount,
> > > - xfs_inode_dquot(ip, XFS_DQTYPE_USER),
> > > - xfs_inode_dquot(ip, XFS_DQTYPE_GROUP),
> > > - xfs_inode_dquot(ip, XFS_DQTYPE_PROJ), iwalk_flags);
> > > -}
> > > -
> > > /* XFS Inode Cache Walking Code */
> > >
> > > /*
> > > diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> > > index 905944dafbe5..c0833450969d 100644
> > > --- a/fs/xfs/xfs_icache.h
> > > +++ b/fs/xfs/xfs_icache.h
> > > @@ -57,11 +57,10 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, unsigned long nr_to_scan);
> > >
> > > void xfs_inode_mark_reclaimable(struct xfs_inode *ip);
> > >
> > > -int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp,
> > > - struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
> > > - unsigned int iwalk_flags);
> > > -int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags);
> > > int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm);
> > > +void xfs_blockgc_nospace_flush(struct xfs_mount *mp, struct xfs_dquot *udqp,
> > > + struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
> > > + unsigned int iwalk_flags, int what);
> > > int xfs_blockgc_flush_all(struct xfs_mount *mp);
> > >
> > > void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index c0f1c89786c2..e99ffa17d3d0 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -27,6 +27,8 @@
> > > #include "xfs_errortag.h"
> > > #include "xfs_error.h"
> > > #include "xfs_quota.h"
> > > +#include "xfs_dquot_item.h"
> > > +#include "xfs_dquot.h"
> > > #include "xfs_filestream.h"
> > > #include "xfs_trace.h"
> > > #include "xfs_icache.h"
> > > @@ -1007,12 +1009,6 @@ xfs_create(
> > > */
> > > error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp, resblks,
> > > &tp);
> > > - if (error == -ENOSPC) {
> > > - /* flush outstanding delalloc blocks and retry */
> > > - xfs_flush_inodes(mp);
> > > - error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp,
> > > - resblks, &tp);
> > > - }
> > > if (error)
> > > goto out_release_dquots;
> > >
> > > @@ -2951,14 +2947,21 @@ xfs_rename(
> > > if (spaceres != 0) {
> > > error = xfs_trans_reserve_quota_nblks(tp, target_dp, spaceres,
> > > 0, false);
> > > - if (error == -EDQUOT || error == -ENOSPC) {
> > > + if (error == -EDQUOT) {
> > > if (!retried) {
> > > xfs_trans_cancel(tp);
> > > - xfs_blockgc_free_quota(target_dp, 0);
> > > + xfs_blockgc_nospace_flush(target_dp->i_mount,
> > > + target_dp->i_udquot,
> > > + target_dp->i_gdquot,
> > > + target_dp->i_pdquot,
> > > + 0, error);
> > > retried = true;
> > > goto retry;
> > > }
> > >
> > > + if (xfs_dquot_is_enospc(target_dp->i_pdquot))
> > > + error = -ENOSPC;
> > > +
> > > nospace_error = error;
> > > spaceres = 0;
> > > error = 0;
> > > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > > index e5b62dc28466..cb036e1173ae 100644
> > > --- a/fs/xfs/xfs_reflink.c
> > > +++ b/fs/xfs/xfs_reflink.c
> > > @@ -25,6 +25,8 @@
> > > #include "xfs_bit.h"
> > > #include "xfs_alloc.h"
> > > #include "xfs_quota.h"
> > > +#include "xfs_dquot_item.h"
> > > +#include "xfs_dquot.h"
> > > #include "xfs_reflink.h"
> > > #include "xfs_iomap.h"
> > > #include "xfs_ag.h"
I wonder, what about the xfs_trans_reserve_quota_nblks in
xfs_reflink_remap_extent? Does it need to filter EDQUOT?
Just looking through the list, I think xfs_ioctl_setattr_get_trans and
xfs_setattr_nonsize also need to check for EDQUOT and project dquots
being over, don't they?
> > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > index 305c9d07bf1b..1574d7aa49c4 100644
> > > --- a/fs/xfs/xfs_trans.c
> > > +++ b/fs/xfs/xfs_trans.c
> > > @@ -1217,15 +1217,21 @@ xfs_trans_alloc_inode(
> > > }
> > >
> > > error = xfs_trans_reserve_quota_nblks(tp, ip, dblocks, rblocks, force);
> > > - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> > > + if (error == -EDQUOT && !retried) {
> > > xfs_trans_cancel(tp);
> > > xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > - xfs_blockgc_free_quota(ip, 0);
> > > + xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
> > > + ip->i_gdquot, ip->i_pdquot,
> > > + 0, error);
> > > retried = true;
> > > goto retry;
> > > }
> > > - if (error)
> > > + if (error) {
> > > + if (error == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
> > > + error = -ENOSPC;
> > > +
> > > goto out_cancel;
> > > + }
> > >
> > > *tpp = tp;
> > > return 0;
> > > @@ -1260,13 +1266,16 @@ xfs_trans_alloc_icreate(
> > > return error;
> > >
> > > error = xfs_trans_reserve_quota_icreate(tp, udqp, gdqp, pdqp, dblocks);
> > > - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> > > + if (error == -EDQUOT && !retried) {
> > > xfs_trans_cancel(tp);
> > > - xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
> > > + xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0, error);
> > > retried = true;
> > > goto retry;
> > > }
> > > if (error) {
> > > + if (error == -EDQUOT && xfs_dquot_is_enospc(pdqp))
> > > + error = -ENOSPC;
> > > +
> > > xfs_trans_cancel(tp);
> > > return error;
> > > }
> > > @@ -1340,14 +1349,19 @@ xfs_trans_alloc_ichange(
> > > error = xfs_trans_reserve_quota_bydquots(tp, mp, udqp, gdqp,
> > > pdqp, ip->i_nblocks + ip->i_delayed_blks,
> > > 1, qflags);
> > > - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> > > + if (error == -EDQUOT && !retried) {
> > > xfs_trans_cancel(tp);
> > > - xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
> > > + xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0,
> > > + error);
> > > retried = true;
> > > goto retry;
> > > }
> > > - if (error)
> > > + if (error) {
> > > + if (error == -EDQUOT && xfs_dquot_is_enospc(pdqp))
> > > + error = -ENOSPC;
> > > +
> > > goto out_cancel;
> > > + }
> > > }
> > >
> > > *tpp = tp;
> > > @@ -1419,14 +1433,19 @@ xfs_trans_alloc_dir(
> > > goto done;
> > >
> > > error = xfs_trans_reserve_quota_nblks(tp, dp, resblks, 0, false);
> > > - if (error == -EDQUOT || error == -ENOSPC) {
> > > + if (error == -EDQUOT) {
> > > if (!retried) {
> > > xfs_trans_cancel(tp);
> > > - xfs_blockgc_free_quota(dp, 0);
> > > + xfs_blockgc_nospace_flush(dp->i_mount, ip->i_udquot,
> > > + ip->i_gdquot, ip->i_pdquot,
> > > + 0, error);
> > > retried = true;
> > > goto retry;
> > > }
> > >
> > > + if (xfs_dquot_is_enospc(dp->i_pdquot))
> > > + error = -ENOSPC;
> > > +
> > > *nospace_error = error;
> > > resblks = 0;
> > > error = 0;
> > > diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
> > > index aa00cf67ad72..7201b86ef2c2 100644
> > > --- a/fs/xfs/xfs_trans_dquot.c
> > > +++ b/fs/xfs/xfs_trans_dquot.c
> > > @@ -700,8 +700,6 @@ xfs_trans_dqresv(
> > >
> > > error_return:
> > > xfs_dqunlock(dqp);
> > > - if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ)
> > > - return -ENOSPC;
> > > return -EDQUOT;
> > > error_corrupt:
> > > xfs_dqunlock(dqp);
> > > @@ -717,7 +715,6 @@ xfs_trans_dqresv(
> > > * approach.
> > > *
> > > * flags = XFS_QMOPT_FORCE_RES evades limit enforcement. Used by chown.
> > > - * XFS_QMOPT_ENOSPC returns ENOSPC not EDQUOT. Used by pquota.
> > > * XFS_TRANS_DQ_RES_BLKS reserves regular disk blocks
> > > * XFS_TRANS_DQ_RES_RTBLKS reserves realtime disk blocks
> > > * dquots are unlocked on return, if they were not locked by caller.
> > > --
> > > 2.34.1
> > >
> > >
>
>
>
> --
> Best,
>
> Jian
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v3] xfs: improve handling of prjquot ENOSPC
2024-01-04 1:46 ` Darrick J. Wong
@ 2024-01-04 3:36 ` Jian Wen
0 siblings, 0 replies; 22+ messages in thread
From: Jian Wen @ 2024-01-04 3:36 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs, hch, dchinner, Dave Chinner, Jian Wen
On Thu, Jan 4, 2024 at 9:46 AM Darrick J. Wong <djwong@kernel.org> wrote:
> > > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > > > index 80c8f851a2f3..c5f4a170eef1 100644
> > > > --- a/fs/xfs/xfs_dquot.h
> > > > +++ b/fs/xfs/xfs_dquot.h
> > > > @@ -183,6 +183,19 @@ xfs_dquot_is_enforced(
> > > > return false;
> > > > }
> > > >
> > > > +static inline bool
> > > > +xfs_dquot_is_enospc(
> > >
> > > I don't like encoding error codes in a function name, especially since
> > > EDQUOT is used for more dquot types than ENOSPC.
> > >
> > > "xfs_dquot_hardlimit_exceeded" ?
> > >
> > > > + struct xfs_dquot *dqp)
> > > > +{
> > > > + if (!dqp)
> > > > + return false;
> > > > + if (!xfs_dquot_is_enforced(dqp))
> > > > + return false;
> > > > + if (dqp->q_blk.hardlimit - dqp->q_blk.reserved > 0)
> > > > + return false;
> > >
> > > return q_blk.reserved > dqp->q_blk.hardlimit; ?
> > >
> > > hardlimit == reserved shouldn't be considered an edquot condition.
> > >
> > > Also, locking is needed here.
>
> Any response to this?
I will address it in v4.
> > >
> > > Also, a question for Dave: What happens if xfs_trans_dqresv detects a
> > > fatal overage in the project dquot, but the overage condition clears by
> > > the time this caller rechecks the dquot? Is it ok that we then return
> > > EDQUOT whereas the current code would return ENOSPC?
The v3 patch will return EDQUOT if the condition clears. e.g.
STATIC ssize_t
xfs_file_buffered_write(
...
if (ret == -EDQUOT && xfs_dquot_is_enospc(ip->i_pdquot))
ret = -ENOSPC;
ret will not be translated from EDQUOT to ENOSPC.
>
> I think this question is still relevant, though. Or perhaps we should
> define our own code for project quota exceeded, and translate that to
> ENOSPC in the callers?
>
> I wonder, what about the xfs_trans_reserve_quota_nblks in
> xfs_reflink_remap_extent? Does it need to filter EDQUOT?
Yes, it does. I will address it in v4.
>
> Just looking through the list, I think xfs_ioctl_setattr_get_trans and
> xfs_setattr_nonsize also need to check for EDQUOT and project dquots
> being over, don't they?
Yes, xfs_trans_alloc_ichange has got them covered.
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v4] xfs: improve handling of prjquot ENOSPC
2023-12-16 15:35 ` [PATCH v2] " Jian Wen
2023-12-18 22:00 ` Dave Chinner
@ 2024-01-04 6:22 ` Jian Wen
2024-01-08 0:35 ` Dave Chinner
2024-01-10 14:08 ` kernel test robot
1 sibling, 2 replies; 22+ messages in thread
From: Jian Wen @ 2024-01-04 6:22 UTC (permalink / raw)
To: linux-xfs; +Cc: Jian Wen, djwong, hch, dchinner, Dave Chinner, Jian Wen
From: Jian Wen <wenjianhn@gmail.com>
Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
limit is reached. As a result, xfs_file_buffered_write() will flush
the whole filesystem instead of the project quota.
Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
-ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
for both EDQUOT and ENOSPC consistent.
Changes since v3:
- rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded
- acquire the dquot lock before checking the free space
Changes since v2:
- completely rewrote based on the suggestions from Dave
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
---
fs/xfs/xfs_dquot.h | 22 +++++++++++++++---
fs/xfs/xfs_file.c | 41 ++++++++++++--------------------
fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++-----------
fs/xfs/xfs_icache.h | 7 +++---
fs/xfs/xfs_inode.c | 19 ++++++++-------
fs/xfs/xfs_reflink.c | 5 ++++
fs/xfs/xfs_trans.c | 41 ++++++++++++++++++++++++--------
fs/xfs/xfs_trans_dquot.c | 3 ---
8 files changed, 121 insertions(+), 67 deletions(-)
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
index 80c8f851a2f3..d28dce0ed61a 100644
--- a/fs/xfs/xfs_dquot.h
+++ b/fs/xfs/xfs_dquot.h
@@ -183,6 +183,22 @@ xfs_dquot_is_enforced(
return false;
}
+static inline bool
+xfs_dquot_hardlimit_exceeded(
+ struct xfs_dquot *dqp)
+{
+ int64_t freesp;
+
+ if (!dqp)
+ return false;
+ if (!xfs_dquot_is_enforced(dqp))
+ return false;
+ xfs_dqlock(dqp);
+ freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
+ xfs_dqunlock(dqp);
+ return freesp < 0;
+}
+
/*
* Check whether a dquot is under low free space conditions. We assume the quota
* is enabled and enforced.
@@ -191,11 +207,11 @@ static inline bool xfs_dquot_lowsp(struct xfs_dquot *dqp)
{
int64_t freesp;
+ xfs_dqlock(dqp);
freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
- if (freesp < dqp->q_low_space[XFS_QLOWSP_1_PCNT])
- return true;
+ xfs_dqunlock(dqp);
- return false;
+ return freesp < dqp->q_low_space[XFS_QLOWSP_1_PCNT];
}
void xfs_dquot_to_disk(struct xfs_disk_dquot *ddqp, struct xfs_dquot *dqp);
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index e33e5e13b95f..c19d82d922c5 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -24,6 +24,9 @@
#include "xfs_pnfs.h"
#include "xfs_iomap.h"
#include "xfs_reflink.h"
+#include "xfs_quota.h"
+#include "xfs_dquot_item.h"
+#include "xfs_dquot.h"
#include <linux/dax.h>
#include <linux/falloc.h>
@@ -785,32 +788,18 @@ xfs_file_buffered_write(
trace_xfs_file_buffered_write(iocb, from);
ret = iomap_file_buffered_write(iocb, from,
&xfs_buffered_write_iomap_ops);
-
- /*
- * If we hit a space limit, try to free up some lingering preallocated
- * space before returning an error. In the case of ENOSPC, first try to
- * write back all dirty inodes to free up some of the excess reserved
- * metadata space. This reduces the chances that the eofblocks scan
- * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
- * also behaves as a filter to prevent too many eofblocks scans from
- * running at the same time. Use a synchronous scan to increase the
- * effectiveness of the scan.
- */
- if (ret == -EDQUOT && !cleared_space) {
- xfs_iunlock(ip, iolock);
- xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
- cleared_space = true;
- goto write_retry;
- } else if (ret == -ENOSPC && !cleared_space) {
- struct xfs_icwalk icw = {0};
-
- cleared_space = true;
- xfs_flush_inodes(ip->i_mount);
-
- xfs_iunlock(ip, iolock);
- icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
- xfs_blockgc_free_space(ip->i_mount, &icw);
- goto write_retry;
+ if (ret == -EDQUOT || ret == -ENOSPC) {
+ if (!cleared_space) {
+ xfs_iunlock(ip, iolock);
+ xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
+ ip->i_gdquot, ip->i_pdquot,
+ XFS_ICWALK_FLAG_SYNC, ret);
+ cleared_space = true;
+ goto write_retry;
+ }
+ if (ret == -EDQUOT && xfs_dquot_hardlimit_exceeded(
+ ip->i_pdquot))
+ ret = -ENOSPC;
}
out:
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index dba514a2c84d..d2dcb653befc 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -64,6 +64,10 @@ static int xfs_icwalk_ag(struct xfs_perag *pag,
XFS_ICWALK_FLAG_RECLAIM_SICK | \
XFS_ICWALK_FLAG_UNION)
+static int xfs_blockgc_free_dquots(struct xfs_mount *mp,
+ struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
+ struct xfs_dquot *pdqp, unsigned int iwalk_flags);
+
/*
* Allocate and initialise an xfs_inode.
*/
@@ -1477,6 +1481,38 @@ xfs_blockgc_free_space(
return xfs_inodegc_flush(mp);
}
+/*
+ * If we hit a space limit, try to free up some lingering preallocated
+ * space before returning an error. In the case of ENOSPC, first try to
+ * write back all dirty inodes to free up some of the excess reserved
+ * metadata space. This reduces the chances that the eofblocks scan
+ * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
+ * also behaves as a filter to prevent too many eofblocks scans from
+ * running at the same time. Use a synchronous scan to increase the
+ * effectiveness of the scan.
+ */
+void
+xfs_blockgc_nospace_flush(
+ struct xfs_mount *mp,
+ struct xfs_dquot *udqp,
+ struct xfs_dquot *gdqp,
+ struct xfs_dquot *pdqp,
+ unsigned int iwalk_flags,
+ int what)
+{
+ ASSERT(what == -EDQUOT || what == -ENOSPC);
+
+ if (what == -EDQUOT) {
+ xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, iwalk_flags);
+ } else if (what == -ENOSPC) {
+ struct xfs_icwalk icw = {0};
+
+ xfs_flush_inodes(mp);
+ icw.icw_flags = iwalk_flags;
+ xfs_blockgc_free_space(mp, &icw);
+ }
+}
+
/*
* Reclaim all the free space that we can by scheduling the background blockgc
* and inodegc workers immediately and waiting for them all to clear.
@@ -1515,7 +1551,7 @@ xfs_blockgc_flush_all(
* (XFS_ICWALK_FLAG_SYNC), the caller also must not hold any inode's IOLOCK or
* MMAPLOCK.
*/
-int
+static int
xfs_blockgc_free_dquots(
struct xfs_mount *mp,
struct xfs_dquot *udqp,
@@ -1559,18 +1595,6 @@ xfs_blockgc_free_dquots(
return xfs_blockgc_free_space(mp, &icw);
}
-/* Run cow/eofblocks scans on the quotas attached to the inode. */
-int
-xfs_blockgc_free_quota(
- struct xfs_inode *ip,
- unsigned int iwalk_flags)
-{
- return xfs_blockgc_free_dquots(ip->i_mount,
- xfs_inode_dquot(ip, XFS_DQTYPE_USER),
- xfs_inode_dquot(ip, XFS_DQTYPE_GROUP),
- xfs_inode_dquot(ip, XFS_DQTYPE_PROJ), iwalk_flags);
-}
-
/* XFS Inode Cache Walking Code */
/*
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 905944dafbe5..c0833450969d 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -57,11 +57,10 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, unsigned long nr_to_scan);
void xfs_inode_mark_reclaimable(struct xfs_inode *ip);
-int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp,
- struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
- unsigned int iwalk_flags);
-int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags);
int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm);
+void xfs_blockgc_nospace_flush(struct xfs_mount *mp, struct xfs_dquot *udqp,
+ struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
+ unsigned int iwalk_flags, int what);
int xfs_blockgc_flush_all(struct xfs_mount *mp);
void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index c0f1c89786c2..0dcb614da7df 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -27,6 +27,8 @@
#include "xfs_errortag.h"
#include "xfs_error.h"
#include "xfs_quota.h"
+#include "xfs_dquot_item.h"
+#include "xfs_dquot.h"
#include "xfs_filestream.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
@@ -1007,12 +1009,6 @@ xfs_create(
*/
error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp, resblks,
&tp);
- if (error == -ENOSPC) {
- /* flush outstanding delalloc blocks and retry */
- xfs_flush_inodes(mp);
- error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp,
- resblks, &tp);
- }
if (error)
goto out_release_dquots;
@@ -2951,14 +2947,21 @@ xfs_rename(
if (spaceres != 0) {
error = xfs_trans_reserve_quota_nblks(tp, target_dp, spaceres,
0, false);
- if (error == -EDQUOT || error == -ENOSPC) {
+ if (error == -EDQUOT) {
if (!retried) {
xfs_trans_cancel(tp);
- xfs_blockgc_free_quota(target_dp, 0);
+ xfs_blockgc_nospace_flush(target_dp->i_mount,
+ target_dp->i_udquot,
+ target_dp->i_gdquot,
+ target_dp->i_pdquot,
+ 0, error);
retried = true;
goto retry;
}
+ if (xfs_dquot_hardlimit_exceeded(target_dp->i_pdquot))
+ error = -ENOSPC;
+
nospace_error = error;
spaceres = 0;
error = 0;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index e5b62dc28466..a94691348784 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -25,6 +25,8 @@
#include "xfs_bit.h"
#include "xfs_alloc.h"
#include "xfs_quota.h"
+#include "xfs_dquot_item.h"
+#include "xfs_dquot.h"
#include "xfs_reflink.h"
#include "xfs_iomap.h"
#include "xfs_ag.h"
@@ -1270,6 +1272,9 @@ xfs_reflink_remap_extent(
if (!quota_reserved && !smap_real && dmap_written) {
error = xfs_trans_reserve_quota_nblks(tp, ip,
dmap->br_blockcount, 0, false);
+ if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(
+ ip->i_pdquot))
+ error = -ENOSPC;
if (error)
goto out_cancel;
}
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 305c9d07bf1b..3b930f3472c5 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1217,15 +1217,22 @@ xfs_trans_alloc_inode(
}
error = xfs_trans_reserve_quota_nblks(tp, ip, dblocks, rblocks, force);
- if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
+ if (error == -EDQUOT && !retried) {
xfs_trans_cancel(tp);
xfs_iunlock(ip, XFS_ILOCK_EXCL);
- xfs_blockgc_free_quota(ip, 0);
+ xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
+ ip->i_gdquot, ip->i_pdquot,
+ 0, error);
retried = true;
goto retry;
}
- if (error)
+ if (error) {
+ if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(
+ ip->i_pdquot))
+ error = -ENOSPC;
+
goto out_cancel;
+ }
*tpp = tp;
return 0;
@@ -1260,13 +1267,16 @@ xfs_trans_alloc_icreate(
return error;
error = xfs_trans_reserve_quota_icreate(tp, udqp, gdqp, pdqp, dblocks);
- if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
+ if (error == -EDQUOT && !retried) {
xfs_trans_cancel(tp);
- xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
+ xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0, error);
retried = true;
goto retry;
}
if (error) {
+ if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(pdqp))
+ error = -ENOSPC;
+
xfs_trans_cancel(tp);
return error;
}
@@ -1340,14 +1350,20 @@ xfs_trans_alloc_ichange(
error = xfs_trans_reserve_quota_bydquots(tp, mp, udqp, gdqp,
pdqp, ip->i_nblocks + ip->i_delayed_blks,
1, qflags);
- if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
+ if (error == -EDQUOT && !retried) {
xfs_trans_cancel(tp);
- xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
+ xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0,
+ error);
retried = true;
goto retry;
}
- if (error)
+ if (error) {
+ if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(
+ pdqp))
+ error = -ENOSPC;
+
goto out_cancel;
+ }
}
*tpp = tp;
@@ -1419,14 +1435,19 @@ xfs_trans_alloc_dir(
goto done;
error = xfs_trans_reserve_quota_nblks(tp, dp, resblks, 0, false);
- if (error == -EDQUOT || error == -ENOSPC) {
+ if (error == -EDQUOT) {
if (!retried) {
xfs_trans_cancel(tp);
- xfs_blockgc_free_quota(dp, 0);
+ xfs_blockgc_nospace_flush(dp->i_mount, ip->i_udquot,
+ ip->i_gdquot, ip->i_pdquot,
+ 0, error);
retried = true;
goto retry;
}
+ if (xfs_dquot_hardlimit_exceeded(dp->i_pdquot))
+ error = -ENOSPC;
+
*nospace_error = error;
resblks = 0;
error = 0;
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index aa00cf67ad72..7201b86ef2c2 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -700,8 +700,6 @@ xfs_trans_dqresv(
error_return:
xfs_dqunlock(dqp);
- if (xfs_dquot_type(dqp) == XFS_DQTYPE_PROJ)
- return -ENOSPC;
return -EDQUOT;
error_corrupt:
xfs_dqunlock(dqp);
@@ -717,7 +715,6 @@ xfs_trans_dqresv(
* approach.
*
* flags = XFS_QMOPT_FORCE_RES evades limit enforcement. Used by chown.
- * XFS_QMOPT_ENOSPC returns ENOSPC not EDQUOT. Used by pquota.
* XFS_TRANS_DQ_RES_BLKS reserves regular disk blocks
* XFS_TRANS_DQ_RES_RTBLKS reserves realtime disk blocks
* dquots are unlocked on return, if they were not locked by caller.
--
2.25.1
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC
2024-01-04 6:22 ` [PATCH v4] " Jian Wen
@ 2024-01-08 0:35 ` Dave Chinner
2024-01-09 6:14 ` Darrick J. Wong
2024-01-10 14:08 ` kernel test robot
1 sibling, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2024-01-08 0:35 UTC (permalink / raw)
To: Jian Wen; +Cc: linux-xfs, djwong, hch, dchinner, Jian Wen
On Thu, Jan 04, 2024 at 02:22:48PM +0800, Jian Wen wrote:
> From: Jian Wen <wenjianhn@gmail.com>
>
> Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> limit is reached. As a result, xfs_file_buffered_write() will flush
> the whole filesystem instead of the project quota.
>
> Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> for both EDQUOT and ENOSPC consistent.
>
> Changes since v3:
> - rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded
> - acquire the dquot lock before checking the free space
>
> Changes since v2:
> - completely rewrote based on the suggestions from Dave
>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
Please send new patch versions as a new thread, not as a reply to
a random email in the middle of the review thread for a previous
version.
> ---
> fs/xfs/xfs_dquot.h | 22 +++++++++++++++---
> fs/xfs/xfs_file.c | 41 ++++++++++++--------------------
> fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++-----------
> fs/xfs/xfs_icache.h | 7 +++---
> fs/xfs/xfs_inode.c | 19 ++++++++-------
> fs/xfs/xfs_reflink.c | 5 ++++
> fs/xfs/xfs_trans.c | 41 ++++++++++++++++++++++++--------
> fs/xfs/xfs_trans_dquot.c | 3 ---
> 8 files changed, 121 insertions(+), 67 deletions(-)
>
> diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> index 80c8f851a2f3..d28dce0ed61a 100644
> --- a/fs/xfs/xfs_dquot.h
> +++ b/fs/xfs/xfs_dquot.h
> @@ -183,6 +183,22 @@ xfs_dquot_is_enforced(
> return false;
> }
>
> +static inline bool
> +xfs_dquot_hardlimit_exceeded(
> + struct xfs_dquot *dqp)
> +{
> + int64_t freesp;
> +
> + if (!dqp)
> + return false;
> + if (!xfs_dquot_is_enforced(dqp))
> + return false;
> + xfs_dqlock(dqp);
> + freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> + xfs_dqunlock(dqp);
> + return freesp < 0;
> +}
Ok, what about if the project quota EDQUOT has come about because we
are over the inode count limit or the realtime block limit? Both of
those need to be converted to ENOSPC, too.
i.e. all the inode creation operation need to be checked against
both the data device block space and the inode count space, whilst
data writes need to be checked against data space for normal IO
and both data space and real time space for inodes that are writing
to real time devices.
Also, why do we care about locking here? If something is modifying
dqp->q_blk.reserved concurrently, holding the lock here does nothing
to protect this code from races. All it means is that we we'll block
waiting for the transaction that holds the dquot locked to complete
and we'll either get the same random failure or success as if we
didn't hold the lock during this calculation...
> +
> /*
> * Check whether a dquot is under low free space conditions. We assume the quota
> * is enabled and enforced.
> @@ -191,11 +207,11 @@ static inline bool xfs_dquot_lowsp(struct xfs_dquot *dqp)
> {
> int64_t freesp;
>
> + xfs_dqlock(dqp);
> freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> - if (freesp < dqp->q_low_space[XFS_QLOWSP_1_PCNT])
> - return true;
> + xfs_dqunlock(dqp);
>
> - return false;
> + return freesp < dqp->q_low_space[XFS_QLOWSP_1_PCNT];
> }
That doesn't need locking for the same reason - it doesn't
serialise/synchronise the accounting against anything
useful, either..
>
> void xfs_dquot_to_disk(struct xfs_disk_dquot *ddqp, struct xfs_dquot *dqp);
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e33e5e13b95f..c19d82d922c5 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -24,6 +24,9 @@
> #include "xfs_pnfs.h"
> #include "xfs_iomap.h"
> #include "xfs_reflink.h"
> +#include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
>
> #include <linux/dax.h>
> #include <linux/falloc.h>
> @@ -785,32 +788,18 @@ xfs_file_buffered_write(
> trace_xfs_file_buffered_write(iocb, from);
> ret = iomap_file_buffered_write(iocb, from,
> &xfs_buffered_write_iomap_ops);
> -
> - /*
> - * If we hit a space limit, try to free up some lingering preallocated
> - * space before returning an error. In the case of ENOSPC, first try to
> - * write back all dirty inodes to free up some of the excess reserved
> - * metadata space. This reduces the chances that the eofblocks scan
> - * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
> - * also behaves as a filter to prevent too many eofblocks scans from
> - * running at the same time. Use a synchronous scan to increase the
> - * effectiveness of the scan.
> - */
> - if (ret == -EDQUOT && !cleared_space) {
> - xfs_iunlock(ip, iolock);
> - xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
> - cleared_space = true;
> - goto write_retry;
> - } else if (ret == -ENOSPC && !cleared_space) {
> - struct xfs_icwalk icw = {0};
> -
> - cleared_space = true;
> - xfs_flush_inodes(ip->i_mount);
> -
> - xfs_iunlock(ip, iolock);
> - icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
> - xfs_blockgc_free_space(ip->i_mount, &icw);
> - goto write_retry;
> + if (ret == -EDQUOT || ret == -ENOSPC) {
> + if (!cleared_space) {
> + xfs_iunlock(ip, iolock);
> + xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
> + ip->i_gdquot, ip->i_pdquot,
> + XFS_ICWALK_FLAG_SYNC, ret);
> + cleared_space = true;
> + goto write_retry;
> + }
> + if (ret == -EDQUOT && xfs_dquot_hardlimit_exceeded(
> + ip->i_pdquot))
> + ret = -ENOSPC;
> }
This isn't really I suggested. The code needs to be restructured to
remove the "goto write_retry" case that makes understanding how this
code works difficult.
It's also unnecessary to enumerate all the possible dquots for
xfs_blockgc_nospace_flush() because we already have a blockgc
function that does this for us - xfs_blockgc_free_quota(). This
patch is made more complex than it needs to be by you attempt to
optimise away xfs_blockgc_free_quota()....
>
> out:
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index dba514a2c84d..d2dcb653befc 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -64,6 +64,10 @@ static int xfs_icwalk_ag(struct xfs_perag *pag,
> XFS_ICWALK_FLAG_RECLAIM_SICK | \
> XFS_ICWALK_FLAG_UNION)
>
> +static int xfs_blockgc_free_dquots(struct xfs_mount *mp,
> + struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
> + struct xfs_dquot *pdqp, unsigned int iwalk_flags);
> +
> /*
> * Allocate and initialise an xfs_inode.
> */
If you need function prototypes, then it is likely that either the
new code is in the wrong place or the factoring can be improved.
> @@ -1477,6 +1481,38 @@ xfs_blockgc_free_space(
> return xfs_inodegc_flush(mp);
> }
>
> +/*
> + * If we hit a space limit, try to free up some lingering preallocated
> + * space before returning an error. In the case of ENOSPC, first try to
> + * write back all dirty inodes to free up some of the excess reserved
> + * metadata space. This reduces the chances that the eofblocks scan
> + * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
> + * also behaves as a filter to prevent too many eofblocks scans from
> + * running at the same time. Use a synchronous scan to increase the
> + * effectiveness of the scan.
> + */
> +void
> +xfs_blockgc_nospace_flush(
> + struct xfs_mount *mp,
> + struct xfs_dquot *udqp,
> + struct xfs_dquot *gdqp,
> + struct xfs_dquot *pdqp,
> + unsigned int iwalk_flags,
> + int what)
> +{
> + ASSERT(what == -EDQUOT || what == -ENOSPC);
> +
> + if (what == -EDQUOT) {
> + xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, iwalk_flags);
> + } else if (what == -ENOSPC) {
> + struct xfs_icwalk icw = {0};
> +
> + xfs_flush_inodes(mp);
> + icw.icw_flags = iwalk_flags;
> + xfs_blockgc_free_space(mp, &icw);
> + }
> +}
This is messy, and it's also why you need that forward prototype.
The EDQUOT case should just call xfs_blockgc_free_quota() and
return, then there is no need for the "else if (ENOSPC)" case.
> +
> /*
> * Reclaim all the free space that we can by scheduling the background blockgc
> * and inodegc workers immediately and waiting for them all to clear.
> @@ -1515,7 +1551,7 @@ xfs_blockgc_flush_all(
> * (XFS_ICWALK_FLAG_SYNC), the caller also must not hold any inode's IOLOCK or
> * MMAPLOCK.
> */
> -int
> +static int
> xfs_blockgc_free_dquots(
> struct xfs_mount *mp,
> struct xfs_dquot *udqp,
> @@ -1559,18 +1595,6 @@ xfs_blockgc_free_dquots(
> return xfs_blockgc_free_space(mp, &icw);
> }
>
> -/* Run cow/eofblocks scans on the quotas attached to the inode. */
> -int
> -xfs_blockgc_free_quota(
> - struct xfs_inode *ip,
> - unsigned int iwalk_flags)
> -{
> - return xfs_blockgc_free_dquots(ip->i_mount,
> - xfs_inode_dquot(ip, XFS_DQTYPE_USER),
> - xfs_inode_dquot(ip, XFS_DQTYPE_GROUP),
> - xfs_inode_dquot(ip, XFS_DQTYPE_PROJ), iwalk_flags);
> -}
> -
> /* XFS Inode Cache Walking Code */
>
> /*
> diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> index 905944dafbe5..c0833450969d 100644
> --- a/fs/xfs/xfs_icache.h
> +++ b/fs/xfs/xfs_icache.h
> @@ -57,11 +57,10 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, unsigned long nr_to_scan);
>
> void xfs_inode_mark_reclaimable(struct xfs_inode *ip);
>
> -int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp,
> - struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
> - unsigned int iwalk_flags);
> -int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags);
> int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm);
> +void xfs_blockgc_nospace_flush(struct xfs_mount *mp, struct xfs_dquot *udqp,
> + struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
> + unsigned int iwalk_flags, int what);
> int xfs_blockgc_flush_all(struct xfs_mount *mp);
>
> void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index c0f1c89786c2..0dcb614da7df 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -27,6 +27,8 @@
> #include "xfs_errortag.h"
> #include "xfs_error.h"
> #include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
> #include "xfs_filestream.h"
> #include "xfs_trace.h"
> #include "xfs_icache.h"
> @@ -1007,12 +1009,6 @@ xfs_create(
> */
> error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp, resblks,
> &tp);
> - if (error == -ENOSPC) {
> - /* flush outstanding delalloc blocks and retry */
> - xfs_flush_inodes(mp);
> - error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp,
> - resblks, &tp);
> - }
> if (error)
> goto out_release_dquots;
>
> @@ -2951,14 +2947,21 @@ xfs_rename(
> if (spaceres != 0) {
> error = xfs_trans_reserve_quota_nblks(tp, target_dp, spaceres,
> 0, false);
> - if (error == -EDQUOT || error == -ENOSPC) {
> + if (error == -EDQUOT) {
> if (!retried) {
> xfs_trans_cancel(tp);
> - xfs_blockgc_free_quota(target_dp, 0);
> + xfs_blockgc_nospace_flush(target_dp->i_mount,
> + target_dp->i_udquot,
> + target_dp->i_gdquot,
> + target_dp->i_pdquot,
> + 0, error);
Why do we need to change this? The current call to
xfs_blockgc_free_quota() is correct and there's no need to call
xfs_blockgc_nospace_flush() because ENOSPC can no longer occur here.
> retried = true;
> goto retry;
> }
>
> + if (xfs_dquot_hardlimit_exceeded(target_dp->i_pdquot))
> + error = -ENOSPC;
> +
> nospace_error = error;
> spaceres = 0;
> error = 0;
> diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> index e5b62dc28466..a94691348784 100644
> --- a/fs/xfs/xfs_reflink.c
> +++ b/fs/xfs/xfs_reflink.c
> @@ -25,6 +25,8 @@
> #include "xfs_bit.h"
> #include "xfs_alloc.h"
> #include "xfs_quota.h"
> +#include "xfs_dquot_item.h"
> +#include "xfs_dquot.h"
> #include "xfs_reflink.h"
> #include "xfs_iomap.h"
> #include "xfs_ag.h"
> @@ -1270,6 +1272,9 @@ xfs_reflink_remap_extent(
> if (!quota_reserved && !smap_real && dmap_written) {
> error = xfs_trans_reserve_quota_nblks(tp, ip,
> dmap->br_blockcount, 0, false);
> + if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(
> + ip->i_pdquot))
> + error = -ENOSPC;
Break the line logically, not at function parameters.
if (error == -EDQUOT &&
xfs_dquot_hardlimit_exceeded(ip->i_pdquot))
error = -ENOSPC;
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 305c9d07bf1b..3b930f3472c5 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -1217,15 +1217,22 @@ xfs_trans_alloc_inode(
> }
>
> error = xfs_trans_reserve_quota_nblks(tp, ip, dblocks, rblocks, force);
> - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> + if (error == -EDQUOT && !retried) {
> xfs_trans_cancel(tp);
> xfs_iunlock(ip, XFS_ILOCK_EXCL);
> - xfs_blockgc_free_quota(ip, 0);
> + xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
> + ip->i_gdquot, ip->i_pdquot,
> + 0, error);
Again, xfs_blockgc_free_quota() is the right thing to call here
because -ENOSPC cannot be returned anymore.
> retried = true;
> goto retry;
> }
> - if (error)
> + if (error) {
> + if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(
> + ip->i_pdquot))
> + error = -ENOSPC;
> +
> goto out_cancel;
> + }
The error handling can be done much cleaner:
if (error == -EDQUOT) {
if (retries++ > 0) {
if (xfs_dquot_hardlimit_exceeded(pdqp))
error = -ENOSPC;
goto out_cancel;
}
xfs_trans_cancel(tp);
xfs_blockgc_free_quota(ip, 0);
goto retry;
}
if (error)
goto out_cancel;
>
> *tpp = tp;
> return 0;
> @@ -1260,13 +1267,16 @@ xfs_trans_alloc_icreate(
> return error;
>
> error = xfs_trans_reserve_quota_icreate(tp, udqp, gdqp, pdqp, dblocks);
> - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> + if (error == -EDQUOT && !retried) {
> xfs_trans_cancel(tp);
> - xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
> + xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0, error);
> retried = true;
> goto retry;
> }
> if (error) {
> + if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(pdqp))
> + error = -ENOSPC;
This needs to check fo inode count beeing exceed for ENOSPC.
> +
> xfs_trans_cancel(tp);
> return error;
> }
Same comments - xfs_blockgc_free_dquots() is just fine here,
error handling is neater as per above.
> @@ -1340,14 +1350,20 @@ xfs_trans_alloc_ichange(
> error = xfs_trans_reserve_quota_bydquots(tp, mp, udqp, gdqp,
> pdqp, ip->i_nblocks + ip->i_delayed_blks,
> 1, qflags);
> - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> + if (error == -EDQUOT && !retried) {
> xfs_trans_cancel(tp);
> - xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
> + xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0,
> + error);
> retried = true;
> goto retry;
> }
> - if (error)
> + if (error) {
> + if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(
> + pdqp))
> + error = -ENOSPC;
> +
> goto out_cancel;
> + }
Same again.
> }
>
> *tpp = tp;
> @@ -1419,14 +1435,19 @@ xfs_trans_alloc_dir(
> goto done;
>
> error = xfs_trans_reserve_quota_nblks(tp, dp, resblks, 0, false);
> - if (error == -EDQUOT || error == -ENOSPC) {
> + if (error == -EDQUOT) {
> if (!retried) {
> xfs_trans_cancel(tp);
> - xfs_blockgc_free_quota(dp, 0);
> + xfs_blockgc_nospace_flush(dp->i_mount, ip->i_udquot,
> + ip->i_gdquot, ip->i_pdquot,
> + 0, error);
> retried = true;
> goto retry;
> }
>
> + if (xfs_dquot_hardlimit_exceeded(dp->i_pdquot))
> + error = -ENOSPC;
> +
And again.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC
2024-01-08 0:35 ` Dave Chinner
@ 2024-01-09 6:14 ` Darrick J. Wong
2024-01-09 6:38 ` Dave Chinner
0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2024-01-09 6:14 UTC (permalink / raw)
To: Dave Chinner; +Cc: Jian Wen, linux-xfs, hch, dchinner, Jian Wen
On Mon, Jan 08, 2024 at 11:35:17AM +1100, Dave Chinner wrote:
> On Thu, Jan 04, 2024 at 02:22:48PM +0800, Jian Wen wrote:
> > From: Jian Wen <wenjianhn@gmail.com>
> >
> > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> > limit is reached. As a result, xfs_file_buffered_write() will flush
> > the whole filesystem instead of the project quota.
> >
> > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> > for both EDQUOT and ENOSPC consistent.
> >
> > Changes since v3:
> > - rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded
> > - acquire the dquot lock before checking the free space
> >
> > Changes since v2:
> > - completely rewrote based on the suggestions from Dave
> >
> > Suggested-by: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
>
> Please send new patch versions as a new thread, not as a reply to
> a random email in the middle of the review thread for a previous
> version.
>
> > ---
> > fs/xfs/xfs_dquot.h | 22 +++++++++++++++---
> > fs/xfs/xfs_file.c | 41 ++++++++++++--------------------
> > fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++-----------
> > fs/xfs/xfs_icache.h | 7 +++---
> > fs/xfs/xfs_inode.c | 19 ++++++++-------
> > fs/xfs/xfs_reflink.c | 5 ++++
> > fs/xfs/xfs_trans.c | 41 ++++++++++++++++++++++++--------
> > fs/xfs/xfs_trans_dquot.c | 3 ---
> > 8 files changed, 121 insertions(+), 67 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > index 80c8f851a2f3..d28dce0ed61a 100644
> > --- a/fs/xfs/xfs_dquot.h
> > +++ b/fs/xfs/xfs_dquot.h
> > @@ -183,6 +183,22 @@ xfs_dquot_is_enforced(
> > return false;
> > }
> >
> > +static inline bool
> > +xfs_dquot_hardlimit_exceeded(
> > + struct xfs_dquot *dqp)
> > +{
> > + int64_t freesp;
> > +
> > + if (!dqp)
> > + return false;
> > + if (!xfs_dquot_is_enforced(dqp))
> > + return false;
> > + xfs_dqlock(dqp);
> > + freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> > + xfs_dqunlock(dqp);
> > + return freesp < 0;
> > +}
>
> Ok, what about if the project quota EDQUOT has come about because we
> are over the inode count limit or the realtime block limit? Both of
> those need to be converted to ENOSPC, too.
>
> i.e. all the inode creation operation need to be checked against
> both the data device block space and the inode count space, whilst
> data writes need to be checked against data space for normal IO
> and both data space and real time space for inodes that are writing
> to real time devices.
(Yeah.)
> Also, why do we care about locking here? If something is modifying
> dqp->q_blk.reserved concurrently, holding the lock here does nothing
> to protect this code from races. All it means is that we we'll block
> waiting for the transaction that holds the dquot locked to complete
> and we'll either get the same random failure or success as if we
> didn't hold the lock during this calculation...
I thought we had to hold the dquot lock before accessing its fields.
Or are you really saying that it's silly to take the dquot lock *again*
having already decided (under dqlock elsewhere) that we were over a
quota? In that case, perhaps it makes more sense to have
xfs_trans_dqresv return an unusual errno for "project quota over limits"
so that callers can trap that magic value and translate it into ENOSPC?
(Yes that's clunky but I've suggested that several times now and nobody
has replied to it directly.)
--D
>
> > +
> > /*
> > * Check whether a dquot is under low free space conditions. We assume the quota
> > * is enabled and enforced.
> > @@ -191,11 +207,11 @@ static inline bool xfs_dquot_lowsp(struct xfs_dquot *dqp)
> > {
> > int64_t freesp;
> >
> > + xfs_dqlock(dqp);
> > freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> > - if (freesp < dqp->q_low_space[XFS_QLOWSP_1_PCNT])
> > - return true;
> > + xfs_dqunlock(dqp);
> >
> > - return false;
> > + return freesp < dqp->q_low_space[XFS_QLOWSP_1_PCNT];
> > }
>
> That doesn't need locking for the same reason - it doesn't
> serialise/synchronise the accounting against anything
> useful, either..
>
> >
> > void xfs_dquot_to_disk(struct xfs_disk_dquot *ddqp, struct xfs_dquot *dqp);
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index e33e5e13b95f..c19d82d922c5 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -24,6 +24,9 @@
> > #include "xfs_pnfs.h"
> > #include "xfs_iomap.h"
> > #include "xfs_reflink.h"
> > +#include "xfs_quota.h"
> > +#include "xfs_dquot_item.h"
> > +#include "xfs_dquot.h"
> >
> > #include <linux/dax.h>
> > #include <linux/falloc.h>
> > @@ -785,32 +788,18 @@ xfs_file_buffered_write(
> > trace_xfs_file_buffered_write(iocb, from);
> > ret = iomap_file_buffered_write(iocb, from,
> > &xfs_buffered_write_iomap_ops);
> > -
> > - /*
> > - * If we hit a space limit, try to free up some lingering preallocated
> > - * space before returning an error. In the case of ENOSPC, first try to
> > - * write back all dirty inodes to free up some of the excess reserved
> > - * metadata space. This reduces the chances that the eofblocks scan
> > - * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
> > - * also behaves as a filter to prevent too many eofblocks scans from
> > - * running at the same time. Use a synchronous scan to increase the
> > - * effectiveness of the scan.
> > - */
> > - if (ret == -EDQUOT && !cleared_space) {
> > - xfs_iunlock(ip, iolock);
> > - xfs_blockgc_free_quota(ip, XFS_ICWALK_FLAG_SYNC);
> > - cleared_space = true;
> > - goto write_retry;
> > - } else if (ret == -ENOSPC && !cleared_space) {
> > - struct xfs_icwalk icw = {0};
> > -
> > - cleared_space = true;
> > - xfs_flush_inodes(ip->i_mount);
> > -
> > - xfs_iunlock(ip, iolock);
> > - icw.icw_flags = XFS_ICWALK_FLAG_SYNC;
> > - xfs_blockgc_free_space(ip->i_mount, &icw);
> > - goto write_retry;
> > + if (ret == -EDQUOT || ret == -ENOSPC) {
> > + if (!cleared_space) {
> > + xfs_iunlock(ip, iolock);
> > + xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
> > + ip->i_gdquot, ip->i_pdquot,
> > + XFS_ICWALK_FLAG_SYNC, ret);
> > + cleared_space = true;
> > + goto write_retry;
> > + }
> > + if (ret == -EDQUOT && xfs_dquot_hardlimit_exceeded(
> > + ip->i_pdquot))
> > + ret = -ENOSPC;
> > }
>
> This isn't really I suggested. The code needs to be restructured to
> remove the "goto write_retry" case that makes understanding how this
> code works difficult.
>
> It's also unnecessary to enumerate all the possible dquots for
> xfs_blockgc_nospace_flush() because we already have a blockgc
> function that does this for us - xfs_blockgc_free_quota(). This
> patch is made more complex than it needs to be by you attempt to
> optimise away xfs_blockgc_free_quota()....
>
> >
> > out:
> > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > index dba514a2c84d..d2dcb653befc 100644
> > --- a/fs/xfs/xfs_icache.c
> > +++ b/fs/xfs/xfs_icache.c
> > @@ -64,6 +64,10 @@ static int xfs_icwalk_ag(struct xfs_perag *pag,
> > XFS_ICWALK_FLAG_RECLAIM_SICK | \
> > XFS_ICWALK_FLAG_UNION)
> >
> > +static int xfs_blockgc_free_dquots(struct xfs_mount *mp,
> > + struct xfs_dquot *udqp, struct xfs_dquot *gdqp,
> > + struct xfs_dquot *pdqp, unsigned int iwalk_flags);
> > +
> > /*
> > * Allocate and initialise an xfs_inode.
> > */
>
> If you need function prototypes, then it is likely that either the
> new code is in the wrong place or the factoring can be improved.
>
> > @@ -1477,6 +1481,38 @@ xfs_blockgc_free_space(
> > return xfs_inodegc_flush(mp);
> > }
> >
> > +/*
> > + * If we hit a space limit, try to free up some lingering preallocated
> > + * space before returning an error. In the case of ENOSPC, first try to
> > + * write back all dirty inodes to free up some of the excess reserved
> > + * metadata space. This reduces the chances that the eofblocks scan
> > + * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this
> > + * also behaves as a filter to prevent too many eofblocks scans from
> > + * running at the same time. Use a synchronous scan to increase the
> > + * effectiveness of the scan.
> > + */
> > +void
> > +xfs_blockgc_nospace_flush(
> > + struct xfs_mount *mp,
> > + struct xfs_dquot *udqp,
> > + struct xfs_dquot *gdqp,
> > + struct xfs_dquot *pdqp,
> > + unsigned int iwalk_flags,
> > + int what)
> > +{
> > + ASSERT(what == -EDQUOT || what == -ENOSPC);
> > +
> > + if (what == -EDQUOT) {
> > + xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, iwalk_flags);
> > + } else if (what == -ENOSPC) {
> > + struct xfs_icwalk icw = {0};
> > +
> > + xfs_flush_inodes(mp);
> > + icw.icw_flags = iwalk_flags;
> > + xfs_blockgc_free_space(mp, &icw);
> > + }
> > +}
>
> This is messy, and it's also why you need that forward prototype.
> The EDQUOT case should just call xfs_blockgc_free_quota() and
> return, then there is no need for the "else if (ENOSPC)" case.
>
> > +
> > /*
> > * Reclaim all the free space that we can by scheduling the background blockgc
> > * and inodegc workers immediately and waiting for them all to clear.
> > @@ -1515,7 +1551,7 @@ xfs_blockgc_flush_all(
> > * (XFS_ICWALK_FLAG_SYNC), the caller also must not hold any inode's IOLOCK or
> > * MMAPLOCK.
> > */
> > -int
> > +static int
> > xfs_blockgc_free_dquots(
> > struct xfs_mount *mp,
> > struct xfs_dquot *udqp,
> > @@ -1559,18 +1595,6 @@ xfs_blockgc_free_dquots(
> > return xfs_blockgc_free_space(mp, &icw);
> > }
> >
> > -/* Run cow/eofblocks scans on the quotas attached to the inode. */
> > -int
> > -xfs_blockgc_free_quota(
> > - struct xfs_inode *ip,
> > - unsigned int iwalk_flags)
> > -{
> > - return xfs_blockgc_free_dquots(ip->i_mount,
> > - xfs_inode_dquot(ip, XFS_DQTYPE_USER),
> > - xfs_inode_dquot(ip, XFS_DQTYPE_GROUP),
> > - xfs_inode_dquot(ip, XFS_DQTYPE_PROJ), iwalk_flags);
> > -}
> > -
> > /* XFS Inode Cache Walking Code */
> >
> > /*
> > diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
> > index 905944dafbe5..c0833450969d 100644
> > --- a/fs/xfs/xfs_icache.h
> > +++ b/fs/xfs/xfs_icache.h
> > @@ -57,11 +57,10 @@ long xfs_reclaim_inodes_nr(struct xfs_mount *mp, unsigned long nr_to_scan);
> >
> > void xfs_inode_mark_reclaimable(struct xfs_inode *ip);
> >
> > -int xfs_blockgc_free_dquots(struct xfs_mount *mp, struct xfs_dquot *udqp,
> > - struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
> > - unsigned int iwalk_flags);
> > -int xfs_blockgc_free_quota(struct xfs_inode *ip, unsigned int iwalk_flags);
> > int xfs_blockgc_free_space(struct xfs_mount *mp, struct xfs_icwalk *icm);
> > +void xfs_blockgc_nospace_flush(struct xfs_mount *mp, struct xfs_dquot *udqp,
> > + struct xfs_dquot *gdqp, struct xfs_dquot *pdqp,
> > + unsigned int iwalk_flags, int what);
> > int xfs_blockgc_flush_all(struct xfs_mount *mp);
> >
> > void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip);
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index c0f1c89786c2..0dcb614da7df 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -27,6 +27,8 @@
> > #include "xfs_errortag.h"
> > #include "xfs_error.h"
> > #include "xfs_quota.h"
> > +#include "xfs_dquot_item.h"
> > +#include "xfs_dquot.h"
> > #include "xfs_filestream.h"
> > #include "xfs_trace.h"
> > #include "xfs_icache.h"
> > @@ -1007,12 +1009,6 @@ xfs_create(
> > */
> > error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp, resblks,
> > &tp);
> > - if (error == -ENOSPC) {
> > - /* flush outstanding delalloc blocks and retry */
> > - xfs_flush_inodes(mp);
> > - error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp,
> > - resblks, &tp);
> > - }
> > if (error)
> > goto out_release_dquots;
> >
> > @@ -2951,14 +2947,21 @@ xfs_rename(
> > if (spaceres != 0) {
> > error = xfs_trans_reserve_quota_nblks(tp, target_dp, spaceres,
> > 0, false);
> > - if (error == -EDQUOT || error == -ENOSPC) {
> > + if (error == -EDQUOT) {
> > if (!retried) {
> > xfs_trans_cancel(tp);
> > - xfs_blockgc_free_quota(target_dp, 0);
> > + xfs_blockgc_nospace_flush(target_dp->i_mount,
> > + target_dp->i_udquot,
> > + target_dp->i_gdquot,
> > + target_dp->i_pdquot,
> > + 0, error);
>
> Why do we need to change this? The current call to
> xfs_blockgc_free_quota() is correct and there's no need to call
> xfs_blockgc_nospace_flush() because ENOSPC can no longer occur here.
>
> > retried = true;
> > goto retry;
> > }
> >
> > + if (xfs_dquot_hardlimit_exceeded(target_dp->i_pdquot))
> > + error = -ENOSPC;
> > +
> > nospace_error = error;
> > spaceres = 0;
> > error = 0;
> > diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
> > index e5b62dc28466..a94691348784 100644
> > --- a/fs/xfs/xfs_reflink.c
> > +++ b/fs/xfs/xfs_reflink.c
> > @@ -25,6 +25,8 @@
> > #include "xfs_bit.h"
> > #include "xfs_alloc.h"
> > #include "xfs_quota.h"
> > +#include "xfs_dquot_item.h"
> > +#include "xfs_dquot.h"
> > #include "xfs_reflink.h"
> > #include "xfs_iomap.h"
> > #include "xfs_ag.h"
> > @@ -1270,6 +1272,9 @@ xfs_reflink_remap_extent(
> > if (!quota_reserved && !smap_real && dmap_written) {
> > error = xfs_trans_reserve_quota_nblks(tp, ip,
> > dmap->br_blockcount, 0, false);
> > + if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(
> > + ip->i_pdquot))
> > + error = -ENOSPC;
>
> Break the line logically, not at function parameters.
>
> if (error == -EDQUOT &&
> xfs_dquot_hardlimit_exceeded(ip->i_pdquot))
> error = -ENOSPC;
>
> > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > index 305c9d07bf1b..3b930f3472c5 100644
> > --- a/fs/xfs/xfs_trans.c
> > +++ b/fs/xfs/xfs_trans.c
> > @@ -1217,15 +1217,22 @@ xfs_trans_alloc_inode(
> > }
> >
> > error = xfs_trans_reserve_quota_nblks(tp, ip, dblocks, rblocks, force);
> > - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> > + if (error == -EDQUOT && !retried) {
> > xfs_trans_cancel(tp);
> > xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > - xfs_blockgc_free_quota(ip, 0);
> > + xfs_blockgc_nospace_flush(ip->i_mount, ip->i_udquot,
> > + ip->i_gdquot, ip->i_pdquot,
> > + 0, error);
>
> Again, xfs_blockgc_free_quota() is the right thing to call here
> because -ENOSPC cannot be returned anymore.
>
> > retried = true;
> > goto retry;
> > }
> > - if (error)
> > + if (error) {
> > + if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(
> > + ip->i_pdquot))
> > + error = -ENOSPC;
> > +
> > goto out_cancel;
> > + }
>
> The error handling can be done much cleaner:
>
> if (error == -EDQUOT) {
> if (retries++ > 0) {
> if (xfs_dquot_hardlimit_exceeded(pdqp))
> error = -ENOSPC;
> goto out_cancel;
> }
> xfs_trans_cancel(tp);
> xfs_blockgc_free_quota(ip, 0);
> goto retry;
> }
>
> if (error)
> goto out_cancel;
>
> >
> > *tpp = tp;
> > return 0;
> > @@ -1260,13 +1267,16 @@ xfs_trans_alloc_icreate(
> > return error;
> >
> > error = xfs_trans_reserve_quota_icreate(tp, udqp, gdqp, pdqp, dblocks);
> > - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> > + if (error == -EDQUOT && !retried) {
> > xfs_trans_cancel(tp);
> > - xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
> > + xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0, error);
> > retried = true;
> > goto retry;
> > }
> > if (error) {
> > + if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(pdqp))
> > + error = -ENOSPC;
>
> This needs to check fo inode count beeing exceed for ENOSPC.
>
> > +
> > xfs_trans_cancel(tp);
> > return error;
> > }
>
> Same comments - xfs_blockgc_free_dquots() is just fine here,
> error handling is neater as per above.
>
> > @@ -1340,14 +1350,20 @@ xfs_trans_alloc_ichange(
> > error = xfs_trans_reserve_quota_bydquots(tp, mp, udqp, gdqp,
> > pdqp, ip->i_nblocks + ip->i_delayed_blks,
> > 1, qflags);
> > - if ((error == -EDQUOT || error == -ENOSPC) && !retried) {
> > + if (error == -EDQUOT && !retried) {
> > xfs_trans_cancel(tp);
> > - xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0);
> > + xfs_blockgc_nospace_flush(mp, udqp, gdqp, pdqp, 0,
> > + error);
> > retried = true;
> > goto retry;
> > }
> > - if (error)
> > + if (error) {
> > + if (error == -EDQUOT && xfs_dquot_hardlimit_exceeded(
> > + pdqp))
> > + error = -ENOSPC;
> > +
> > goto out_cancel;
> > + }
>
> Same again.
>
> > }
> >
> > *tpp = tp;
> > @@ -1419,14 +1435,19 @@ xfs_trans_alloc_dir(
> > goto done;
> >
> > error = xfs_trans_reserve_quota_nblks(tp, dp, resblks, 0, false);
> > - if (error == -EDQUOT || error == -ENOSPC) {
> > + if (error == -EDQUOT) {
> > if (!retried) {
> > xfs_trans_cancel(tp);
> > - xfs_blockgc_free_quota(dp, 0);
> > + xfs_blockgc_nospace_flush(dp->i_mount, ip->i_udquot,
> > + ip->i_gdquot, ip->i_pdquot,
> > + 0, error);
> > retried = true;
> > goto retry;
> > }
> >
> > + if (xfs_dquot_hardlimit_exceeded(dp->i_pdquot))
> > + error = -ENOSPC;
> > +
>
> And again.
>
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC
2024-01-09 6:14 ` Darrick J. Wong
@ 2024-01-09 6:38 ` Dave Chinner
2024-01-11 1:42 ` Darrick J. Wong
0 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2024-01-09 6:38 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Jian Wen, linux-xfs, hch, dchinner, Jian Wen
On Mon, Jan 08, 2024 at 10:14:42PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 08, 2024 at 11:35:17AM +1100, Dave Chinner wrote:
> > On Thu, Jan 04, 2024 at 02:22:48PM +0800, Jian Wen wrote:
> > > From: Jian Wen <wenjianhn@gmail.com>
> > >
> > > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> > > limit is reached. As a result, xfs_file_buffered_write() will flush
> > > the whole filesystem instead of the project quota.
> > >
> > > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> > > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> > > for both EDQUOT and ENOSPC consistent.
> > >
> > > Changes since v3:
> > > - rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded
> > > - acquire the dquot lock before checking the free space
> > >
> > > Changes since v2:
> > > - completely rewrote based on the suggestions from Dave
> > >
> > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> >
> > Please send new patch versions as a new thread, not as a reply to
> > a random email in the middle of the review thread for a previous
> > version.
> >
> > > ---
> > > fs/xfs/xfs_dquot.h | 22 +++++++++++++++---
> > > fs/xfs/xfs_file.c | 41 ++++++++++++--------------------
> > > fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++-----------
> > > fs/xfs/xfs_icache.h | 7 +++---
> > > fs/xfs/xfs_inode.c | 19 ++++++++-------
> > > fs/xfs/xfs_reflink.c | 5 ++++
> > > fs/xfs/xfs_trans.c | 41 ++++++++++++++++++++++++--------
> > > fs/xfs/xfs_trans_dquot.c | 3 ---
> > > 8 files changed, 121 insertions(+), 67 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > > index 80c8f851a2f3..d28dce0ed61a 100644
> > > --- a/fs/xfs/xfs_dquot.h
> > > +++ b/fs/xfs/xfs_dquot.h
> > > @@ -183,6 +183,22 @@ xfs_dquot_is_enforced(
> > > return false;
> > > }
> > >
> > > +static inline bool
> > > +xfs_dquot_hardlimit_exceeded(
> > > + struct xfs_dquot *dqp)
> > > +{
> > > + int64_t freesp;
> > > +
> > > + if (!dqp)
> > > + return false;
> > > + if (!xfs_dquot_is_enforced(dqp))
> > > + return false;
> > > + xfs_dqlock(dqp);
> > > + freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> > > + xfs_dqunlock(dqp);
> > > + return freesp < 0;
> > > +}
> >
> > Ok, what about if the project quota EDQUOT has come about because we
> > are over the inode count limit or the realtime block limit? Both of
> > those need to be converted to ENOSPC, too.
> >
> > i.e. all the inode creation operation need to be checked against
> > both the data device block space and the inode count space, whilst
> > data writes need to be checked against data space for normal IO
> > and both data space and real time space for inodes that are writing
> > to real time devices.
>
> (Yeah.)
>
> > Also, why do we care about locking here? If something is modifying
> > dqp->q_blk.reserved concurrently, holding the lock here does nothing
> > to protect this code from races. All it means is that we we'll block
> > waiting for the transaction that holds the dquot locked to complete
> > and we'll either get the same random failure or success as if we
> > didn't hold the lock during this calculation...
>
> I thought we had to hold the dquot lock before accessing its fields.
Only if we care about avoiding races with ongoing modifications or
we want to serialise against new references (e.g. because we are
about to reclaim the dquot).
The inode holds a reference to the dquot at this point (because of
xfs_qm_dqattach()), so we really don't need to hold a lock just
to sample the contents of the attached dquot.
> Or are you really saying that it's silly to take the dquot lock *again*
> having already decided (under dqlock elsewhere) that we were over a
> quota?
No, I'm saying that we really don't have to hold the dqlock to
determine if the dquot is over quota limits. It's either going to
over or under, and holding the dqlock while sampling it really
doesn't change the fact that it the dquot accounting can change
between the initial check under the dqlock and a subsequent check
on the second failure under a different hold of the dqlock.
It's an inherently racy check, and holding the dqlock does nothing
to make it less racy or more accurate.
> In that case, perhaps it makes more sense to have
> xfs_trans_dqresv return an unusual errno for "project quota over limits"
> so that callers can trap that magic value and translate it into ENOSPC?
Sure, that's another option, but it means we have to trap EDQUOT,
ENOSPC and the new special EDQUOT-but-really-means-ENOSPC return
errors. I'm not sure it will improve the code a great deal, but if
there's a clean way to implement such error handling it may make
more sense. Have you prototyped how such error handling would look
in these cases?
Which also makes me wonder if we should actually be returning what
quota limit failed, not EDQUOT. To take the correct flush action, we
really need to know if we failed on data blocks, inode count or rt
extents. e.g. flushing data won't help alleviate an inode count
overrun...
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC
2024-01-04 6:22 ` [PATCH v4] " Jian Wen
2024-01-08 0:35 ` Dave Chinner
@ 2024-01-10 14:08 ` kernel test robot
1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2024-01-10 14:08 UTC (permalink / raw)
To: Jian Wen
Cc: oe-lkp, lkp, Dave Chinner, Jian Wen, linux-xfs, Jian Wen, djwong,
hch, dchinner, oliver.sang
Hello,
kernel test robot noticed "xfstests.generic.603.fail" on:
commit: 7616836bd77299614d6ff314f6fd6c65b433f56a ("[PATCH v4] xfs: improve handling of prjquot ENOSPC")
url: https://github.com/intel-lab-lkp/linux/commits/Jian-Wen/xfs-improve-handling-of-prjquot-ENOSPC/20240104-142759
base: https://git.kernel.org/cgit/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/all/20240104062248.3245102-1-wenjian1@xiaomi.com/
patch subject: [PATCH v4] xfs: improve handling of prjquot ENOSPC
in testcase: xfstests
version: xfstests-x86_64-f814a0d8-1_20231225
with following parameters:
disk: 4HDD
fs: xfs
test: generic-603
compiler: gcc-12
test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202401102103.1a66c72-oliver.sang@intel.com
2024-01-09 17:34:02 export TEST_DIR=/fs/sda1
2024-01-09 17:34:03 export TEST_DEV=/dev/sda1
2024-01-09 17:34:03 export FSTYP=xfs
2024-01-09 17:34:03 export SCRATCH_MNT=/fs/scratch
2024-01-09 17:34:03 mkdir /fs/scratch -p
2024-01-09 17:34:03 export SCRATCH_DEV=/dev/sda4
2024-01-09 17:34:03 export SCRATCH_LOGDEV=/dev/sda2
meta-data=/dev/sda1 isize=512 agcount=4, agsize=13107200 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=0
data = bsize=4096 blocks=52428800, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=25600, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
2024-01-09 17:34:03 export MKFS_OPTIONS=-mreflink=1
2024-01-09 17:34:03 echo generic/603
2024-01-09 17:34:04 ./check generic/603
FSTYP -- xfs (debug)
PLATFORM -- Linux/x86_64 lkp-skl-d03 6.7.0-rc4-00137-g7616836bd772 #1 SMP PREEMPT_DYNAMIC Mon Jan 8 09:32:19 CST 2024
MKFS_OPTIONS -- -f -mreflink=1 /dev/sda4
MOUNT_OPTIONS -- /dev/sda4 /fs/scratch
generic/603 - output mismatch (see /lkp/benchmarks/xfstests/results//generic/603.out.bad)
--- tests/generic/603.out 2023-12-25 18:24:02.000000000 +0000
+++ /lkp/benchmarks/xfstests/results//generic/603.out.bad 2024-01-09 17:37:42.162919905 +0000
@@ -7,11 +7,11 @@
Write 225 blocks...
Rewrite 250 blocks plus 1 byte, over the block softlimit...
Try to write 1 one more block after grace...
-pwrite: Disk quota exceeded
+pwrite: No space left on device
--- Test inode quota ---
Create 2 more files, over the inode softlimit...
...
(Run 'diff -u /lkp/benchmarks/xfstests/tests/generic/603.out /lkp/benchmarks/xfstests/results//generic/603.out.bad' to see the entire diff)
Ran: generic/603
Failures: generic/603
Failed 1 of 1 tests
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240110/202401102103.1a66c72-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC
2024-01-09 6:38 ` Dave Chinner
@ 2024-01-11 1:42 ` Darrick J. Wong
2024-01-11 7:24 ` Dave Chinner
0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2024-01-11 1:42 UTC (permalink / raw)
To: Dave Chinner; +Cc: Jian Wen, linux-xfs, hch, dchinner, Jian Wen
On Tue, Jan 09, 2024 at 05:38:16PM +1100, Dave Chinner wrote:
> On Mon, Jan 08, 2024 at 10:14:42PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 08, 2024 at 11:35:17AM +1100, Dave Chinner wrote:
> > > On Thu, Jan 04, 2024 at 02:22:48PM +0800, Jian Wen wrote:
> > > > From: Jian Wen <wenjianhn@gmail.com>
> > > >
> > > > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> > > > limit is reached. As a result, xfs_file_buffered_write() will flush
> > > > the whole filesystem instead of the project quota.
> > > >
> > > > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> > > > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> > > > for both EDQUOT and ENOSPC consistent.
> > > >
> > > > Changes since v3:
> > > > - rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded
> > > > - acquire the dquot lock before checking the free space
> > > >
> > > > Changes since v2:
> > > > - completely rewrote based on the suggestions from Dave
> > > >
> > > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > > Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> > >
> > > Please send new patch versions as a new thread, not as a reply to
> > > a random email in the middle of the review thread for a previous
> > > version.
> > >
> > > > ---
> > > > fs/xfs/xfs_dquot.h | 22 +++++++++++++++---
> > > > fs/xfs/xfs_file.c | 41 ++++++++++++--------------------
> > > > fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++-----------
> > > > fs/xfs/xfs_icache.h | 7 +++---
> > > > fs/xfs/xfs_inode.c | 19 ++++++++-------
> > > > fs/xfs/xfs_reflink.c | 5 ++++
> > > > fs/xfs/xfs_trans.c | 41 ++++++++++++++++++++++++--------
> > > > fs/xfs/xfs_trans_dquot.c | 3 ---
> > > > 8 files changed, 121 insertions(+), 67 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > > > index 80c8f851a2f3..d28dce0ed61a 100644
> > > > --- a/fs/xfs/xfs_dquot.h
> > > > +++ b/fs/xfs/xfs_dquot.h
> > > > @@ -183,6 +183,22 @@ xfs_dquot_is_enforced(
> > > > return false;
> > > > }
> > > >
> > > > +static inline bool
> > > > +xfs_dquot_hardlimit_exceeded(
> > > > + struct xfs_dquot *dqp)
> > > > +{
> > > > + int64_t freesp;
> > > > +
> > > > + if (!dqp)
> > > > + return false;
> > > > + if (!xfs_dquot_is_enforced(dqp))
> > > > + return false;
> > > > + xfs_dqlock(dqp);
> > > > + freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> > > > + xfs_dqunlock(dqp);
> > > > + return freesp < 0;
> > > > +}
> > >
> > > Ok, what about if the project quota EDQUOT has come about because we
> > > are over the inode count limit or the realtime block limit? Both of
> > > those need to be converted to ENOSPC, too.
> > >
> > > i.e. all the inode creation operation need to be checked against
> > > both the data device block space and the inode count space, whilst
> > > data writes need to be checked against data space for normal IO
> > > and both data space and real time space for inodes that are writing
> > > to real time devices.
> >
> > (Yeah.)
> >
> > > Also, why do we care about locking here? If something is modifying
> > > dqp->q_blk.reserved concurrently, holding the lock here does nothing
> > > to protect this code from races. All it means is that we we'll block
> > > waiting for the transaction that holds the dquot locked to complete
> > > and we'll either get the same random failure or success as if we
> > > didn't hold the lock during this calculation...
> >
> > I thought we had to hold the dquot lock before accessing its fields.
>
> Only if we care about avoiding races with ongoing modifications or
> we want to serialise against new references (e.g. because we are
> about to reclaim the dquot).
>
> The inode holds a reference to the dquot at this point (because of
> xfs_qm_dqattach()), so we really don't need to hold a lock just
> to sample the contents of the attached dquot.
>
> > Or are you really saying that it's silly to take the dquot lock *again*
> > having already decided (under dqlock elsewhere) that we were over a
> > quota?
>
> No, I'm saying that we really don't have to hold the dqlock to
> determine if the dquot is over quota limits. It's either going to
> over or under, and holding the dqlock while sampling it really
> doesn't change the fact that it the dquot accounting can change
> between the initial check under the dqlock and a subsequent check
> on the second failure under a different hold of the dqlock.
>
> It's an inherently racy check, and holding the dqlock does nothing
> to make it less racy or more accurate.
>
> > In that case, perhaps it makes more sense to have
> > xfs_trans_dqresv return an unusual errno for "project quota over limits"
> > so that callers can trap that magic value and translate it into ENOSPC?
>
> Sure, that's another option, but it means we have to trap EDQUOT,
> ENOSPC and the new special EDQUOT-but-really-means-ENOSPC return
> errors. I'm not sure it will improve the code a great deal, but if
> there's a clean way to implement such error handling it may make
> more sense. Have you prototyped how such error handling would look
> in these cases?
>
> Which also makes me wonder if we should actually be returning what
> quota limit failed, not EDQUOT. To take the correct flush action, we
> really need to know if we failed on data blocks, inode count or rt
> extents. e.g. flushing data won't help alleviate an inode count
> overrun...
Yeah. If it's an rtbcount, then it only makes sense to flush realtime
files; if it's a bcount, we flush nonrt files, and if it's an inode
count then I guess we push inodegc?
--D
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC
2024-01-11 1:42 ` Darrick J. Wong
@ 2024-01-11 7:24 ` Dave Chinner
0 siblings, 0 replies; 22+ messages in thread
From: Dave Chinner @ 2024-01-11 7:24 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Jian Wen, linux-xfs, hch, dchinner, Jian Wen
On Wed, Jan 10, 2024 at 05:42:04PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 09, 2024 at 05:38:16PM +1100, Dave Chinner wrote:
> > On Mon, Jan 08, 2024 at 10:14:42PM -0800, Darrick J. Wong wrote:
> > > In that case, perhaps it makes more sense to have
> > > xfs_trans_dqresv return an unusual errno for "project quota over limits"
> > > so that callers can trap that magic value and translate it into ENOSPC?
> >
> > Sure, that's another option, but it means we have to trap EDQUOT,
> > ENOSPC and the new special EDQUOT-but-really-means-ENOSPC return
> > errors. I'm not sure it will improve the code a great deal, but if
> > there's a clean way to implement such error handling it may make
> > more sense. Have you prototyped how such error handling would look
> > in these cases?
> >
> > Which also makes me wonder if we should actually be returning what
> > quota limit failed, not EDQUOT. To take the correct flush action, we
> > really need to know if we failed on data blocks, inode count or rt
> > extents. e.g. flushing data won't help alleviate an inode count
> > overrun...
>
> Yeah. If it's an rtbcount, then it only makes sense to flush realtime
> files; if it's a bcount, we flush nonrt files, and if it's an inode
> count then I guess we push inodegc?
Yes, that sounds like a reasonable set of actions to take for the
different failure modes.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-01-11 7:24 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-14 15:07 [PATCH] xfs: improve handling of prjquot ENOSPC Jian Wen
2023-12-14 15:29 ` Christoph Hellwig
2023-12-14 17:06 ` Darrick J. Wong
2023-12-14 21:13 ` Dave Chinner
2023-12-16 15:49 ` Jian Wen
2023-12-16 15:35 ` [PATCH v2] " Jian Wen
2023-12-18 22:00 ` Dave Chinner
2023-12-19 5:47 ` Christoph Hellwig
2023-12-19 13:50 ` Jian Wen
2023-12-23 11:00 ` Jian Wen
2024-01-04 6:22 ` [PATCH v4] " Jian Wen
2024-01-08 0:35 ` Dave Chinner
2024-01-09 6:14 ` Darrick J. Wong
2024-01-09 6:38 ` Dave Chinner
2024-01-11 1:42 ` Darrick J. Wong
2024-01-11 7:24 ` Dave Chinner
2024-01-10 14:08 ` kernel test robot
2023-12-23 10:56 ` [PATCH v3] " Jian Wen
2024-01-03 1:42 ` Darrick J. Wong
2024-01-03 3:45 ` Jian Wen
2024-01-04 1:46 ` Darrick J. Wong
2024-01-04 3:36 ` Jian Wen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).