All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jian Wen <wenjianhn@gmail.com>,
	linux-xfs@vger.kernel.org, hch@lst.de, dchinner@redhat.com,
	Jian Wen <wenjian1@xiaomi.com>
Subject: Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC
Date: Tue, 9 Jan 2024 17:38:16 +1100	[thread overview]
Message-ID: <ZZzp2ARmwf3FrkUV@dread.disaster.area> (raw)
In-Reply-To: <20240109061442.GD722975@frogsfrogsfrogs>

On Mon, Jan 08, 2024 at 10:14:42PM -0800, Darrick J. Wong wrote:
> On Mon, Jan 08, 2024 at 11:35:17AM +1100, Dave Chinner wrote:
> > On Thu, Jan 04, 2024 at 02:22:48PM +0800, Jian Wen wrote:
> > > From: Jian Wen <wenjianhn@gmail.com>
> > > 
> > > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota
> > > limit is reached. As a result, xfs_file_buffered_write() will flush
> > > the whole filesystem instead of the project quota.
> > > 
> > > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than
> > > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing
> > > for both EDQUOT and ENOSPC consistent.
> > > 
> > > Changes since v3:
> > >   - rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded
> > >   - acquire the dquot lock before checking the free space
> > > 
> > > Changes since v2:
> > >   - completely rewrote based on the suggestions from Dave
> > > 
> > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Jian Wen <wenjian1@xiaomi.com>
> > 
> > Please send new patch versions as a new thread, not as a reply to
> > a random email in the middle of the review thread for a previous
> > version.
> > 
> > > ---
> > >  fs/xfs/xfs_dquot.h       | 22 +++++++++++++++---
> > >  fs/xfs/xfs_file.c        | 41 ++++++++++++--------------------
> > >  fs/xfs/xfs_icache.c      | 50 +++++++++++++++++++++++++++++-----------
> > >  fs/xfs/xfs_icache.h      |  7 +++---
> > >  fs/xfs/xfs_inode.c       | 19 ++++++++-------
> > >  fs/xfs/xfs_reflink.c     |  5 ++++
> > >  fs/xfs/xfs_trans.c       | 41 ++++++++++++++++++++++++--------
> > >  fs/xfs/xfs_trans_dquot.c |  3 ---
> > >  8 files changed, 121 insertions(+), 67 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
> > > index 80c8f851a2f3..d28dce0ed61a 100644
> > > --- a/fs/xfs/xfs_dquot.h
> > > +++ b/fs/xfs/xfs_dquot.h
> > > @@ -183,6 +183,22 @@ xfs_dquot_is_enforced(
> > >  	return false;
> > >  }
> > >  
> > > +static inline bool
> > > +xfs_dquot_hardlimit_exceeded(
> > > +	struct xfs_dquot	*dqp)
> > > +{
> > > +	int64_t freesp;
> > > +
> > > +	if (!dqp)
> > > +		return false;
> > > +	if (!xfs_dquot_is_enforced(dqp))
> > > +		return false;
> > > +	xfs_dqlock(dqp);
> > > +	freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved;
> > > +	xfs_dqunlock(dqp);
> > > +	return freesp < 0;
> > > +}
> > 
> > Ok, what about if the project quota EDQUOT has come about because we
> > are over the inode count limit or the realtime block limit? Both of
> > those need to be converted to ENOSPC, too.
> > 
> > i.e. all the inode creation operation need to be checked against
> > both the data device block space and the inode count space, whilst
> > data writes need to be checked against data space for normal IO
> > and both data space and real time space for inodes that are writing
> > to real time devices.
> 
> (Yeah.)
> 
> > Also, why do we care about locking here? If something is modifying
> > dqp->q_blk.reserved concurrently, holding the lock here does nothing
> > to protect this code from races. All it means is that we we'll block
> > waiting for the transaction that holds the dquot locked to complete
> > and we'll either get the same random failure or success as if we
> > didn't hold the lock during this calculation...
> 
> I thought we had to hold the dquot lock before accessing its fields.

Only if we care about avoiding races with ongoing modifications or
we want to serialise against new references (e.g. because we are
about to reclaim the dquot).

The inode holds a reference to the dquot at this point (because of
xfs_qm_dqattach()), so we really don't need to hold a lock just
to sample the contents of the attached dquot.

> Or are you really saying that it's silly to take the dquot lock *again*
> having already decided (under dqlock elsewhere) that we were over a
> quota?

No, I'm saying that we really don't have to hold the dqlock to
determine if the dquot is over quota limits. It's either going to
over or under, and holding the dqlock while sampling it really
doesn't change the fact that it the dquot accounting can change
between the initial check under the dqlock and a subsequent check
on the second failure under a different hold of the dqlock.

It's an inherently racy check, and holding the dqlock does nothing
to make it less racy or more accurate.

> In that case, perhaps it makes more sense to have
> xfs_trans_dqresv return an unusual errno for "project quota over limits"
> so that callers can trap that magic value and translate it into ENOSPC?

Sure, that's another option, but it means we have to trap EDQUOT,
ENOSPC and the new special EDQUOT-but-really-means-ENOSPC return
errors. I'm not sure it will improve the code a great deal, but if
there's a clean way to implement such error handling it may make
more sense. Have you prototyped how such error handling would look
in these cases?

Which also makes me wonder if we should actually be returning what
quota limit failed, not EDQUOT. To take the correct flush action, we
really need to know if we failed on data blocks, inode count or rt
extents. e.g. flushing data won't help alleviate an inode count
overrun...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2024-01-09  6:38 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-14 15:07 [PATCH] xfs: improve handling of prjquot ENOSPC Jian Wen
2023-12-14 15:29 ` Christoph Hellwig
2023-12-14 17:06 ` Darrick J. Wong
2023-12-14 21:13 ` Dave Chinner
2023-12-16 15:49   ` Jian Wen
2023-12-16 15:35 ` [PATCH v2] " Jian Wen
2023-12-18 22:00   ` Dave Chinner
2023-12-19  5:47     ` Christoph Hellwig
2023-12-19 13:50     ` Jian Wen
2023-12-23 11:00     ` Jian Wen
2024-01-04  6:22   ` [PATCH v4] " Jian Wen
2024-01-08  0:35     ` Dave Chinner
2024-01-09  6:14       ` Darrick J. Wong
2024-01-09  6:38         ` Dave Chinner [this message]
2024-01-11  1:42           ` Darrick J. Wong
2024-01-11  7:24             ` Dave Chinner
2024-01-10 14:08     ` kernel test robot
2023-12-23 10:56 ` [PATCH v3] " Jian Wen
2024-01-03  1:42   ` Darrick J. Wong
2024-01-03  3:45     ` Jian Wen
2024-01-04  1:46       ` Darrick J. Wong
2024-01-04  3:36         ` Jian Wen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZZzp2ARmwf3FrkUV@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wenjian1@xiaomi.com \
    --cc=wenjianhn@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.