public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>,
	linux-xfs@vger.kernel.org, hch@infradead.org,
	david@fromorbit.com
Subject: Re: [PATCH 01/16] xfs: fix chown leaking delalloc quota blocks when fssetxattr fails
Date: Tue, 2 Feb 2021 13:34:29 -0500	[thread overview]
Message-ID: <20210202183429.GK3336100@bfoster> (raw)
In-Reply-To: <20210202174726.GM7193@magnolia>

On Tue, Feb 02, 2021 at 09:47:26AM -0800, Darrick J. Wong wrote:
> On Tue, Feb 02, 2021 at 08:13:15AM -0500, Brian Foster wrote:
> > On Mon, Feb 01, 2021 at 06:03:23PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > While refactoring the quota code to create a function to allocate inode
> > > change transactions, I noticed that xfs_qm_vop_chown_reserve does more
> > > than just make reservations: it also *modifies* the incore counts
> > > directly to handle the owner id change for the delalloc blocks.
> > > 
> > > I then observed that the fssetxattr code continues validating input
> > > arguments after making the quota reservation but before dirtying the
> > > transaction.  If the routine decides to error out, it fails to undo the
> > > accounting switch!  This leads to incorrect quota reservation and
> > > failure down the line.
> > > 
> > > We can fix this by making the reservation function do only that -- for
> > > the new dquot, it reserves ondisk and delalloc blocks to the
> > > transaction, and the old dquot hangs on to its incore reservation for
> > > now.  Once we actually switch the dquots, we can then update the incore
> > > reservations because we've dirtied the transaction and it's too late to
> > > turn back now.
> > > 
> > > No fixes tag because this has been broken since the start of git.
> > > 
> > 
> > Do we have any test coverage for this problem?
> 
> Working on it, yeah.  The tricky part is finding a combination of
> FSSETXATTR values that will reliably trigger an error return.  I think
> it can be done by allocating delalloc blocks and then trying to set an
> extent size hint and change the project id in a single call.
> 
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/xfs/xfs_qm.c |   91 +++++++++++++++++++++----------------------------------
> > >  1 file changed, 34 insertions(+), 57 deletions(-)
> > > 
> > > 
> > > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> > > index c134eb4aeaa8..322d337b5dca 100644
> > > --- a/fs/xfs/xfs_qm.c
> > > +++ b/fs/xfs/xfs_qm.c
> > > @@ -1785,6 +1785,28 @@ xfs_qm_vop_chown(
> > >  	xfs_trans_mod_dquot(tp, newdq, bfield, ip->i_d.di_nblocks);
> > >  	xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_ICOUNT, 1);
> > >  
> > > +	/*
> > > +	 * Back when we made quota reservations for the chown, we reserved the
> > > +	 * ondisk blocks + delalloc blocks with the new dquot.  Now that we've
> > > +	 * switched the dquots, decrease the new dquot's block reservation
> > > +	 * (having already bumped up the real counter) so that we don't have
> > > +	 * any reservation to give back when we commit.
> > > +	 */
> > > +	xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_RES_BLKS,
> > > +			-ip->i_delayed_blks);
> > > +
> > 
> > Ok, so the reservation code code below continues to reserve di_blocks
> > and delalloc blocks as it did before, but associates it with the
> > transaction and no longer reduces delalloc reservation from the old
> > dquots. Instead, this function increases the consumption of reserved
> > blocks for di_blocks and then removes the additional quota reservation
> > from the transaction, but not the new dquots. Thus when the transaction
> > commits, this has the side effect of leaving additional reservation on
> > the new dquots that correspond to the delalloc blocks on the inode. Am I
> > following that correctly?
> 
> Correct.
> 

Thanks.

> > > +	/*
> > > +	 * Give the incore reservation for delalloc blocks back to the old
> > > +	 * dquot.  We don't normally handle delalloc quota reservations
> > > +	 * transactionally, so just lock the dquot and subtract from the
> > > +	 * reservation.  We've dirtied the transaction, so it's too late to
> > > +	 * turn back now.
> > > +	 */
> > > +	tp->t_flags |= XFS_TRANS_DIRTY;
> > > +	xfs_dqlock(prevdq);
> > > +	prevdq->q_blk.reserved -= ip->i_delayed_blks;
> > > +	xfs_dqunlock(prevdq);
> > > +
> > 
> > What's the reason for not using xfs_trans_reserve_quota_bydquots(NULL,
> > ...) here like the original code?
> 
> xfs_trans_reserve_quota_bydquots() makes the caller pass in user, group,
> and project dquots.  It's not difficult to add more code to declare and
> route parameters, but that just felt overdone.
> 

Ah, Ok.

> Given that this is the only place in the codebase where we want to
> change the incore quota reservation on a single dquot, I also didn't
> think it was worth making a whole new function.
> 
> FWIW I don't really mind doing it, it just seemed like more work.
> Alternately I suppose I could expose xfs_trans_dqresv.
> 

I have no strong preference. It seems fine to me as it is, the reason
for the change just wasn't clear to me at first glance:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> --D
> 
> > Brian
> > 
> > >  	/*
> > >  	 * Take an extra reference, because the inode is going to keep
> > >  	 * this dquot pointer even after the trans_commit.
> > > @@ -1807,84 +1829,39 @@ xfs_qm_vop_chown_reserve(
> > >  	uint			flags)
> > >  {
> > >  	struct xfs_mount	*mp = ip->i_mount;
> > > -	uint64_t		delblks;
> > >  	unsigned int		blkflags;
> > > -	struct xfs_dquot	*udq_unres = NULL;
> > > -	struct xfs_dquot	*gdq_unres = NULL;
> > > -	struct xfs_dquot	*pdq_unres = NULL;
> > >  	struct xfs_dquot	*udq_delblks = NULL;
> > >  	struct xfs_dquot	*gdq_delblks = NULL;
> > >  	struct xfs_dquot	*pdq_delblks = NULL;
> > > -	int			error;
> > > -
> > >  
> > >  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
> > >  	ASSERT(XFS_IS_QUOTA_RUNNING(mp));
> > >  
> > > -	delblks = ip->i_delayed_blks;
> > >  	blkflags = XFS_IS_REALTIME_INODE(ip) ?
> > >  			XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS;
> > >  
> > >  	if (XFS_IS_UQUOTA_ON(mp) && udqp &&
> > > -	    i_uid_read(VFS_I(ip)) != udqp->q_id) {
> > > +	    i_uid_read(VFS_I(ip)) != udqp->q_id)
> > >  		udq_delblks = udqp;
> > > -		/*
> > > -		 * If there are delayed allocation blocks, then we have to
> > > -		 * unreserve those from the old dquot, and add them to the
> > > -		 * new dquot.
> > > -		 */
> > > -		if (delblks) {
> > > -			ASSERT(ip->i_udquot);
> > > -			udq_unres = ip->i_udquot;
> > > -		}
> > > -	}
> > > +
> > >  	if (XFS_IS_GQUOTA_ON(ip->i_mount) && gdqp &&
> > > -	    i_gid_read(VFS_I(ip)) != gdqp->q_id) {
> > > +	    i_gid_read(VFS_I(ip)) != gdqp->q_id)
> > >  		gdq_delblks = gdqp;
> > > -		if (delblks) {
> > > -			ASSERT(ip->i_gdquot);
> > > -			gdq_unres = ip->i_gdquot;
> > > -		}
> > > -	}
> > >  
> > >  	if (XFS_IS_PQUOTA_ON(ip->i_mount) && pdqp &&
> > > -	    ip->i_d.di_projid != pdqp->q_id) {
> > > +	    ip->i_d.di_projid != pdqp->q_id)
> > >  		pdq_delblks = pdqp;
> > > -		if (delblks) {
> > > -			ASSERT(ip->i_pdquot);
> > > -			pdq_unres = ip->i_pdquot;
> > > -		}
> > > -	}
> > > -
> > > -	error = xfs_trans_reserve_quota_bydquots(tp, ip->i_mount,
> > > -				udq_delblks, gdq_delblks, pdq_delblks,
> > > -				ip->i_d.di_nblocks, 1, flags | blkflags);
> > > -	if (error)
> > > -		return error;
> > >  
> > >  	/*
> > > -	 * Do the delayed blks reservations/unreservations now. Since, these
> > > -	 * are done without the help of a transaction, if a reservation fails
> > > -	 * its previous reservations won't be automatically undone by trans
> > > -	 * code. So, we have to do it manually here.
> > > +	 * Reserve enough quota to handle blocks on disk and reserved for a
> > > +	 * delayed allocation.  We'll actually transfer the delalloc
> > > +	 * reservation between dquots at chown time, even though that part is
> > > +	 * only semi-transactional.
> > >  	 */
> > > -	if (delblks) {
> > > -		/*
> > > -		 * Do the reservations first. Unreservation can't fail.
> > > -		 */
> > > -		ASSERT(udq_delblks || gdq_delblks || pdq_delblks);
> > > -		ASSERT(udq_unres || gdq_unres || pdq_unres);
> > > -		error = xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount,
> > > -			    udq_delblks, gdq_delblks, pdq_delblks,
> > > -			    (xfs_qcnt_t)delblks, 0, flags | blkflags);
> > > -		if (error)
> > > -			return error;
> > > -		xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount,
> > > -				udq_unres, gdq_unres, pdq_unres,
> > > -				-((xfs_qcnt_t)delblks), 0, blkflags);
> > > -	}
> > > -
> > > -	return 0;
> > > +	return xfs_trans_reserve_quota_bydquots(tp, ip->i_mount, udq_delblks,
> > > +			gdq_delblks, pdq_delblks,
> > > +			ip->i_d.di_nblocks + ip->i_delayed_blks,
> > > +			1, blkflags | flags);
> > >  }
> > >  
> > >  int
> > > 
> > 
> 


  parent reply	other threads:[~2021-02-02 18:37 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-02  2:03 [PATCHSET v6 00/16] xfs: minor cleanups of the quota functions Darrick J. Wong
2021-02-02  2:03 ` [PATCH 01/16] xfs: fix chown leaking delalloc quota blocks when fssetxattr fails Darrick J. Wong
2021-02-02 13:13   ` Brian Foster
2021-02-02 17:47     ` Darrick J. Wong
2021-02-02 17:56       ` Christoph Hellwig
2021-02-02 18:34       ` Brian Foster [this message]
2021-02-02 19:38   ` [PATCH v6.1 " Darrick J. Wong
2021-02-02  2:03 ` [PATCH 02/16] xfs: clean up quota reservation callsites Darrick J. Wong
2021-02-02  2:03 ` [PATCH 03/16] xfs: create convenience wrappers for incore quota block reservations Darrick J. Wong
2021-02-02  2:03 ` [PATCH 04/16] xfs: remove xfs_trans_unreserve_quota_nblks completely Darrick J. Wong
2021-02-02  2:03 ` [PATCH 05/16] xfs: clean up icreate quota reservation calls Darrick J. Wong
2021-02-02  2:03 ` [PATCH 06/16] xfs: fix up build warnings when quotas are disabled Darrick J. Wong
2021-02-02  2:03 ` [PATCH 07/16] xfs: reduce quota reservation when doing a dax unwritten extent conversion Darrick J. Wong
2021-02-02  2:04 ` [PATCH 08/16] xfs: reserve data and rt quota at the same time Darrick J. Wong
2021-02-02  2:04 ` [PATCH 09/16] xfs: refactor common transaction/inode/quota allocation idiom Darrick J. Wong
2021-02-02  2:04 ` [PATCH 10/16] xfs: allow reservation of rtblocks with xfs_trans_alloc_inode Darrick J. Wong
2021-02-02  2:04 ` [PATCH 11/16] xfs: refactor reflink functions to use xfs_trans_alloc_inode Darrick J. Wong
2021-02-02  2:04 ` [PATCH 12/16] xfs: refactor inode creation transaction/inode/quota allocation idiom Darrick J. Wong
2021-02-02  2:04 ` [PATCH 13/16] xfs: refactor inode ownership change " Darrick J. Wong
2021-02-02  7:04   ` Christoph Hellwig
2021-02-02 13:13   ` Brian Foster
2021-02-02  2:04 ` [PATCH 14/16] xfs: remove xfs_qm_vop_chown_reserve Darrick J. Wong
2021-02-02  7:05   ` Christoph Hellwig
2021-02-02 13:13   ` Brian Foster
2021-02-02  2:04 ` [PATCH 15/16] xfs: rename code to error in xfs_ioctl_setattr Darrick J. Wong
2021-02-02 13:13   ` Brian Foster
2021-02-02  2:04 ` [PATCH 16/16] xfs: shut down the filesystem if we screw up quota errors Darrick J. Wong
2021-02-02 13:13   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210202183429.GK3336100@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox