From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Jian Wen <wenjianhn@gmail.com>,
linux-xfs@vger.kernel.org, hch@lst.de, dchinner@redhat.com,
Jian Wen <wenjian1@xiaomi.com>
Subject: Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC
Date: Wed, 10 Jan 2024 17:42:04 -0800 [thread overview]
Message-ID: <20240111014204.GM722975@frogsfrogsfrogs> (raw)
In-Reply-To: <ZZzp2ARmwf3FrkUV@dread.disaster.area>
On Tue, Jan 09, 2024 at 05:38:16PM +1100, Dave Chinner wrote:
> On Mon, Jan 08, 2024 at 10:14:42PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 08, 2024 at 11:35:17AM +1100, Dave Chinner wrote:
> > > On Thu, Jan 04, 2024 at 02:22:48PM +0800, Jian Wen wrote:
> > > > From: Jian Wen <wenjianhn@gmail.com>
> > > >
> > > > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> > > > limit is reached. As a result, xfs_file_buffered_write() will flush
> > > > the whole filesystem instead of the project quota.
> > > >
> > > > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> > > > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> > > > for both EDQUOT and ENOSPC consistent.
> > > >
> > > > Changes since v3:
> > > > - rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded
> > > > - acquire the dquot lock before checking the free space
> > > >
> > > > Changes since v2:
> > > > - completely rewrote based on the suggestions from Dave
> > > >
> > > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > > Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> > >
> > > Please send new patch versions as a new thread, not as a reply to
> > > a random email in the middle of the review thread for a previous
> > > version.
> > >
> > > > ---
> > > > fs/xfs/xfs_dquot.h | 22 +++++++++++++++---
> > > > fs/xfs/xfs_file.c | 41 ++++++++++++--------------------
> > > > fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++-----------
> > > > fs/xfs/xfs_icache.h | 7 +++---
> > > > fs/xfs/xfs_inode.c | 19 ++++++++-------
> > > > fs/xfs/xfs_reflink.c | 5 ++++
> > > > fs/xfs/xfs_trans.c | 41 ++++++++++++++++++++++++--------
> > > > fs/xfs/xfs_trans_dquot.c | 3 ---
> > > > 8 files changed, 121 insertions(+), 67 deletions(-)
> > > >
> > > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > > > index 80c8f851a2f3..d28dce0ed61a 100644
> > > > --- a/fs/xfs/xfs_dquot.h
> > > > +++ b/fs/xfs/xfs_dquot.h
> > > > @@ -183,6 +183,22 @@ xfs_dquot_is_enforced(
> > > > return false;
> > > > }
> > > >
> > > > +static inline bool
> > > > +xfs_dquot_hardlimit_exceeded(
> > > > + struct xfs_dquot *dqp)
> > > > +{
> > > > + int64_t freesp;
> > > > +
> > > > + if (!dqp)
> > > > + return false;
> > > > + if (!xfs_dquot_is_enforced(dqp))
> > > > + return false;
> > > > + xfs_dqlock(dqp);
> > > > + freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> > > > + xfs_dqunlock(dqp);
> > > > + return freesp < 0;
> > > > +}
> > >
> > > Ok, what about if the project quota EDQUOT has come about because we
> > > are over the inode count limit or the realtime block limit? Both of
> > > those need to be converted to ENOSPC, too.
> > >
> > > i.e. all the inode creation operation need to be checked against
> > > both the data device block space and the inode count space, whilst
> > > data writes need to be checked against data space for normal IO
> > > and both data space and real time space for inodes that are writing
> > > to real time devices.
> >
> > (Yeah.)
> >
> > > Also, why do we care about locking here? If something is modifying
> > > dqp->q_blk.reserved concurrently, holding the lock here does nothing
> > > to protect this code from races. All it means is that we we'll block
> > > waiting for the transaction that holds the dquot locked to complete
> > > and we'll either get the same random failure or success as if we
> > > didn't hold the lock during this calculation...
> >
> > I thought we had to hold the dquot lock before accessing its fields.
>
> Only if we care about avoiding races with ongoing modifications or
> we want to serialise against new references (e.g. because we are
> about to reclaim the dquot).
>
> The inode holds a reference to the dquot at this point (because of
> xfs_qm_dqattach()), so we really don't need to hold a lock just
> to sample the contents of the attached dquot.
>
> > Or are you really saying that it's silly to take the dquot lock *again*
> > having already decided (under dqlock elsewhere) that we were over a
> > quota?
>
> No, I'm saying that we really don't have to hold the dqlock to
> determine if the dquot is over quota limits. It's either going to
> over or under, and holding the dqlock while sampling it really
> doesn't change the fact that it the dquot accounting can change
> between the initial check under the dqlock and a subsequent check
> on the second failure under a different hold of the dqlock.
>
> It's an inherently racy check, and holding the dqlock does nothing
> to make it less racy or more accurate.
>
> > In that case, perhaps it makes more sense to have
> > xfs_trans_dqresv return an unusual errno for "project quota over limits"
> > so that callers can trap that magic value and translate it into ENOSPC?
>
> Sure, that's another option, but it means we have to trap EDQUOT,
> ENOSPC and the new special EDQUOT-but-really-means-ENOSPC return
> errors. I'm not sure it will improve the code a great deal, but if
> there's a clean way to implement such error handling it may make
> more sense. Have you prototyped how such error handling would look
> in these cases?
>
> Which also makes me wonder if we should actually be returning what
> quota limit failed, not EDQUOT. To take the correct flush action, we
> really need to know if we failed on data blocks, inode count or rt
> extents. e.g. flushing data won't help alleviate an inode count
> overrun...
Yeah. If it's an rtbcount, then it only makes sense to flush realtime
files; if it's a bcount, we flush nonrt files, and if it's an inode
count then I guess we push inodegc?
--D
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2024-01-11 1:42 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-14 15:07 [PATCH] xfs: improve handling of prjquot ENOSPC Jian Wen
2023-12-14 15:29 ` Christoph Hellwig
2023-12-14 17:06 ` Darrick J. Wong
2023-12-14 21:13 ` Dave Chinner
2023-12-16 15:49 ` Jian Wen
2023-12-16 15:35 ` [PATCH v2] " Jian Wen
2023-12-18 22:00 ` Dave Chinner
2023-12-19 5:47 ` Christoph Hellwig
2023-12-19 13:50 ` Jian Wen
2023-12-23 11:00 ` Jian Wen
2024-01-04 6:22 ` [PATCH v4] " Jian Wen
2024-01-08 0:35 ` Dave Chinner
2024-01-09 6:14 ` Darrick J. Wong
2024-01-09 6:38 ` Dave Chinner
2024-01-11 1:42 ` Darrick J. Wong [this message]
2024-01-11 7:24 ` Dave Chinner
2024-01-10 14:08 ` kernel test robot
2023-12-23 10:56 ` [PATCH v3] " Jian Wen
2024-01-03 1:42 ` Darrick J. Wong
2024-01-03 3:45 ` Jian Wen
2024-01-04 1:46 ` Darrick J. Wong
2024-01-04 3:36 ` Jian Wen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240111014204.GM722975@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=dchinner@redhat.com \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
--cc=wenjian1@xiaomi.com \
--cc=wenjianhn@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).