linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Jian Wen <wenjianhn@gmail.com>,
	linux-xfs@vger.kernel.org, hch@lst.de, dchinner@redhat.com,
	Jian Wen <wenjian1@xiaomi.com>
Subject: Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC
Date: Wed, 10 Jan 2024 17:42:04 -0800	[thread overview]
Message-ID: <20240111014204.GM722975@frogsfrogsfrogs> (raw)
In-Reply-To: <ZZzp2ARmwf3FrkUV@dread.disaster.area>

On Tue, Jan 09, 2024 at 05:38:16PM +1100, Dave Chinner wrote:
> On Mon, Jan 08, 2024 at 10:14:42PM -0800, Darrick J. Wong wrote:
> > On Mon, Jan 08, 2024 at 11:35:17AM +1100, Dave Chinner wrote:
> > > On Thu, Jan 04, 2024 at 02:22:48PM +0800, Jian Wen wrote:
> > > > From: Jian Wen <wenjianhn@gmail.com>
> > > > 
> > > > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> > > > limit is reached. As a result, xfs_file_buffered_write() will flush
> > > > the whole filesystem instead of the project quota.
> > > > 
> > > > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> > > > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> > > > for both EDQUOT and ENOSPC consistent.
> > > > 
> > > > Changes since v3:
> > > >   - rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded
> > > >   - acquire the dquot lock before checking the free space
> > > > 
> > > > Changes since v2:
> > > >   - completely rewrote based on the suggestions from Dave
> > > > 
> > > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > > Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> > > 
> > > Please send new patch versions as a new thread, not as a reply to
> > > a random email in the middle of the review thread for a previous
> > > version.
> > > 
> > > > ---
> > > >  fs/xfs/xfs_dquot.h       | 22 +++++++++++++++---
> > > >  fs/xfs/xfs_file.c        | 41 ++++++++++++--------------------
> > > >  fs/xfs/xfs_icache.c      | 50 +++++++++++++++++++++++++++++-----------
> > > >  fs/xfs/xfs_icache.h      |  7 +++---
> > > >  fs/xfs/xfs_inode.c       | 19 ++++++++-------
> > > >  fs/xfs/xfs_reflink.c     |  5 ++++
> > > >  fs/xfs/xfs_trans.c       | 41 ++++++++++++++++++++++++--------
> > > >  fs/xfs/xfs_trans_dquot.c |  3 ---
> > > >  8 files changed, 121 insertions(+), 67 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > > > index 80c8f851a2f3..d28dce0ed61a 100644
> > > > --- a/fs/xfs/xfs_dquot.h
> > > > +++ b/fs/xfs/xfs_dquot.h
> > > > @@ -183,6 +183,22 @@ xfs_dquot_is_enforced(
> > > >  	return false;
> > > >  }
> > > >  
> > > > +static inline bool
> > > > +xfs_dquot_hardlimit_exceeded(
> > > > +	struct xfs_dquot	*dqp)
> > > > +{
> > > > +	int64_t freesp;
> > > > +
> > > > +	if (!dqp)
> > > > +		return false;
> > > > +	if (!xfs_dquot_is_enforced(dqp))
> > > > +		return false;
> > > > +	xfs_dqlock(dqp);
> > > > +	freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> > > > +	xfs_dqunlock(dqp);
> > > > +	return freesp < 0;
> > > > +}
> > > 
> > > Ok, what about if the project quota EDQUOT has come about because we
> > > are over the inode count limit or the realtime block limit? Both of
> > > those need to be converted to ENOSPC, too.
> > > 
> > > i.e. all the inode creation operation need to be checked against
> > > both the data device block space and the inode count space, whilst
> > > data writes need to be checked against data space for normal IO
> > > and both data space and real time space for inodes that are writing
> > > to real time devices.
> > 
> > (Yeah.)
> > 
> > > Also, why do we care about locking here? If something is modifying
> > > dqp->q_blk.reserved concurrently, holding the lock here does nothing
> > > to protect this code from races. All it means is that we we'll block
> > > waiting for the transaction that holds the dquot locked to complete
> > > and we'll either get the same random failure or success as if we
> > > didn't hold the lock during this calculation...
> > 
> > I thought we had to hold the dquot lock before accessing its fields.
> 
> Only if we care about avoiding races with ongoing modifications or
> we want to serialise against new references (e.g. because we are
> about to reclaim the dquot).
> 
> The inode holds a reference to the dquot at this point (because of
> xfs_qm_dqattach()), so we really don't need to hold a lock just
> to sample the contents of the attached dquot.
> 
> > Or are you really saying that it's silly to take the dquot lock *again*
> > having already decided (under dqlock elsewhere) that we were over a
> > quota?
> 
> No, I'm saying that we really don't have to hold the dqlock to
> determine if the dquot is over quota limits. It's either going to
> over or under, and holding the dqlock while sampling it really
> doesn't change the fact that it the dquot accounting can change
> between the initial check under the dqlock and a subsequent check
> on the second failure under a different hold of the dqlock.
> 
> It's an inherently racy check, and holding the dqlock does nothing
> to make it less racy or more accurate.
> 
> > In that case, perhaps it makes more sense to have
> > xfs_trans_dqresv return an unusual errno for "project quota over limits"
> > so that callers can trap that magic value and translate it into ENOSPC?
> 
> Sure, that's another option, but it means we have to trap EDQUOT,
> ENOSPC and the new special EDQUOT-but-really-means-ENOSPC return
> errors. I'm not sure it will improve the code a great deal, but if
> there's a clean way to implement such error handling it may make
> more sense. Have you prototyped how such error handling would look
> in these cases?
> 
> Which also makes me wonder if we should actually be returning what
> quota limit failed, not EDQUOT. To take the correct flush action, we
> really need to know if we failed on data blocks, inode count or rt
> extents. e.g. flushing data won't help alleviate an inode count
> overrun...

Yeah.  If it's an rtbcount, then it only makes sense to flush realtime
files; if it's a bcount, we flush nonrt files, and if it's an inode
count then I guess we push inodegc?

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

  reply	other threads:[~2024-01-11  1:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-14 15:07 [PATCH] xfs: improve handling of prjquot ENOSPC Jian Wen
2023-12-14 15:29 ` Christoph Hellwig
2023-12-14 17:06 ` Darrick J. Wong
2023-12-14 21:13 ` Dave Chinner
2023-12-16 15:49   ` Jian Wen
2023-12-16 15:35 ` [PATCH v2] " Jian Wen
2023-12-18 22:00   ` Dave Chinner
2023-12-19  5:47     ` Christoph Hellwig
2023-12-19 13:50     ` Jian Wen
2023-12-23 11:00     ` Jian Wen
2024-01-04  6:22   ` [PATCH v4] " Jian Wen
2024-01-08  0:35     ` Dave Chinner
2024-01-09  6:14       ` Darrick J. Wong
2024-01-09  6:38         ` Dave Chinner
2024-01-11  1:42           ` Darrick J. Wong [this message]
2024-01-11  7:24             ` Dave Chinner
2024-01-10 14:08     ` kernel test robot
2023-12-23 10:56 ` [PATCH v3] " Jian Wen
2024-01-03  1:42   ` Darrick J. Wong
2024-01-03  3:45     ` Jian Wen
2024-01-04  1:46       ` Darrick J. Wong
2024-01-04  3:36         ` Jian Wen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240111014204.GM722975@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wenjian1@xiaomi.com \
    --cc=wenjianhn@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).