From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADCEA17C1 for ; Thu, 11 Jan 2024 01:42:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="R6QB0OGw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 78541C433F1; Thu, 11 Jan 2024 01:42:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1704937325; bh=hS6kws158fUDrvPNkxqv4s+Lw2w2/Wxw6YCrLLqxL4w=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=R6QB0OGwqO6rp/m67gH+5lugnfQ6unFBwwy84ZiD8RGE/ImhNW1hEwrb6pDTRZGbR PupKoxTz1EKxWg5RZRiokR6wqEx0sA7VaJYxMZ2cNJKR527ffr3sl6SYXutH/5tg9m sWdD/VsgEIhVlIJ2XHMCaCr2uHjGhrzoO32nkvj6pXD2IGTgMJhLHnVVRFeaHd0xs2 xPYn+ojABRwTMzTIW9yZ8AFSboR7vfDxgL3r5IsmQN7j7b7b99nCGMJkMS1RhPBYAn xDs/SoAY7VBX6x6oWKBPfQL4aIqNAkT2fcEAFjRRQI2yFCsLS6N5j6rExgmpcjbV2o zJK41S3zfv25A== Date: Wed, 10 Jan 2024 17:42:04 -0800 From: "Darrick J. Wong" To: Dave Chinner Cc: Jian Wen , linux-xfs@vger.kernel.org, hch@lst.de, dchinner@redhat.com, Jian Wen Subject: Re: [PATCH v4] xfs: improve handling of prjquot ENOSPC Message-ID: <20240111014204.GM722975@frogsfrogsfrogs> References: <20231216153522.52767-1-wenjianhn@gmail.com> <20240104062248.3245102-1-wenjian1@xiaomi.com> <20240109061442.GD722975@frogsfrogsfrogs> Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Tue, Jan 09, 2024 at 05:38:16PM +1100, Dave Chinner wrote: > On Mon, Jan 08, 2024 at 10:14:42PM -0800, Darrick J. Wong wrote: > > On Mon, Jan 08, 2024 at 11:35:17AM +1100, Dave Chinner wrote: > > > On Thu, Jan 04, 2024 at 02:22:48PM +0800, Jian Wen wrote: > > > > From: Jian Wen > > > > > > > > Currently, xfs_trans_dqresv() return -ENOSPC when the project quota > > > > limit is reached. As a result, xfs_file_buffered_write() will flush > > > > the whole filesystem instead of the project quota. > > > > > > > > Fix the issue by make xfs_trans_dqresv() return -EDQUOT rather than > > > > -ENOSPC. Add a helper, xfs_blockgc_nospace_flush(), to make flushing > > > > for both EDQUOT and ENOSPC consistent. > > > > > > > > Changes since v3: > > > > - rename xfs_dquot_is_enospc to xfs_dquot_hardlimit_exceeded > > > > - acquire the dquot lock before checking the free space > > > > > > > > Changes since v2: > > > > - completely rewrote based on the suggestions from Dave > > > > > > > > Suggested-by: Dave Chinner > > > > Signed-off-by: Jian Wen > > > > > > Please send new patch versions as a new thread, not as a reply to > > > a random email in the middle of the review thread for a previous > > > version. > > > > > > > --- > > > > fs/xfs/xfs_dquot.h | 22 +++++++++++++++--- > > > > fs/xfs/xfs_file.c | 41 ++++++++++++-------------------- > > > > fs/xfs/xfs_icache.c | 50 +++++++++++++++++++++++++++++----------- > > > > fs/xfs/xfs_icache.h | 7 +++--- > > > > fs/xfs/xfs_inode.c | 19 ++++++++------- > > > > fs/xfs/xfs_reflink.c | 5 ++++ > > > > fs/xfs/xfs_trans.c | 41 ++++++++++++++++++++++++-------- > > > > fs/xfs/xfs_trans_dquot.c | 3 --- > > > > 8 files changed, 121 insertions(+), 67 deletions(-) > > > > > > > > diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h > > > > index 80c8f851a2f3..d28dce0ed61a 100644 > > > > --- a/fs/xfs/xfs_dquot.h > > > > +++ b/fs/xfs/xfs_dquot.h > > > > @@ -183,6 +183,22 @@ xfs_dquot_is_enforced( > > > > return false; > > > > } > > > > > > > > +static inline bool > > > > +xfs_dquot_hardlimit_exceeded( > > > > + struct xfs_dquot *dqp) > > > > +{ > > > > + int64_t freesp; > > > > + > > > > + if (!dqp) > > > > + return false; > > > > + if (!xfs_dquot_is_enforced(dqp)) > > > > + return false; > > > > + xfs_dqlock(dqp); > > > > + freesp = dqp->q_blk.hardlimit - dqp->q_blk.reserved; > > > > + xfs_dqunlock(dqp); > > > > + return freesp < 0; > > > > +} > > > > > > Ok, what about if the project quota EDQUOT has come about because we > > > are over the inode count limit or the realtime block limit? Both of > > > those need to be converted to ENOSPC, too. > > > > > > i.e. all the inode creation operation need to be checked against > > > both the data device block space and the inode count space, whilst > > > data writes need to be checked against data space for normal IO > > > and both data space and real time space for inodes that are writing > > > to real time devices. > > > > (Yeah.) > > > > > Also, why do we care about locking here? If something is modifying > > > dqp->q_blk.reserved concurrently, holding the lock here does nothing > > > to protect this code from races. All it means is that we we'll block > > > waiting for the transaction that holds the dquot locked to complete > > > and we'll either get the same random failure or success as if we > > > didn't hold the lock during this calculation... > > > > I thought we had to hold the dquot lock before accessing its fields. > > Only if we care about avoiding races with ongoing modifications or > we want to serialise against new references (e.g. because we are > about to reclaim the dquot). > > The inode holds a reference to the dquot at this point (because of > xfs_qm_dqattach()), so we really don't need to hold a lock just > to sample the contents of the attached dquot. > > > Or are you really saying that it's silly to take the dquot lock *again* > > having already decided (under dqlock elsewhere) that we were over a > > quota? > > No, I'm saying that we really don't have to hold the dqlock to > determine if the dquot is over quota limits. It's either going to > over or under, and holding the dqlock while sampling it really > doesn't change the fact that it the dquot accounting can change > between the initial check under the dqlock and a subsequent check > on the second failure under a different hold of the dqlock. > > It's an inherently racy check, and holding the dqlock does nothing > to make it less racy or more accurate. > > > In that case, perhaps it makes more sense to have > > xfs_trans_dqresv return an unusual errno for "project quota over limits" > > so that callers can trap that magic value and translate it into ENOSPC? > > Sure, that's another option, but it means we have to trap EDQUOT, > ENOSPC and the new special EDQUOT-but-really-means-ENOSPC return > errors. I'm not sure it will improve the code a great deal, but if > there's a clean way to implement such error handling it may make > more sense. Have you prototyped how such error handling would look > in these cases? > > Which also makes me wonder if we should actually be returning what > quota limit failed, not EDQUOT. To take the correct flush action, we > really need to know if we failed on data blocks, inode count or rt > extents. e.g. flushing data won't help alleviate an inode count > overrun... Yeah. If it's an rtbcount, then it only makes sense to flush realtime files; if it's a bcount, we flush nonrt files, and if it's an inode count then I guess we push inodegc? --D > -Dave. > -- > Dave Chinner > david@fromorbit.com >