From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>,
linux-xfs@vger.kernel.org, hch@infradead.org,
david@fromorbit.com
Subject: Re: [PATCH 01/16] xfs: fix chown leaking delalloc quota blocks when fssetxattr fails
Date: Tue, 2 Feb 2021 13:34:29 -0500 [thread overview]
Message-ID: <20210202183429.GK3336100@bfoster> (raw)
In-Reply-To: <20210202174726.GM7193@magnolia>
On Tue, Feb 02, 2021 at 09:47:26AM -0800, Darrick J. Wong wrote:
> On Tue, Feb 02, 2021 at 08:13:15AM -0500, Brian Foster wrote:
> > On Mon, Feb 01, 2021 at 06:03:23PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > >
> > > While refactoring the quota code to create a function to allocate inode
> > > change transactions, I noticed that xfs_qm_vop_chown_reserve does more
> > > than just make reservations: it also *modifies* the incore counts
> > > directly to handle the owner id change for the delalloc blocks.
> > >
> > > I then observed that the fssetxattr code continues validating input
> > > arguments after making the quota reservation but before dirtying the
> > > transaction. If the routine decides to error out, it fails to undo the
> > > accounting switch! This leads to incorrect quota reservation and
> > > failure down the line.
> > >
> > > We can fix this by making the reservation function do only that -- for
> > > the new dquot, it reserves ondisk and delalloc blocks to the
> > > transaction, and the old dquot hangs on to its incore reservation for
> > > now. Once we actually switch the dquots, we can then update the incore
> > > reservations because we've dirtied the transaction and it's too late to
> > > turn back now.
> > >
> > > No fixes tag because this has been broken since the start of git.
> > >
> >
> > Do we have any test coverage for this problem?
>
> Working on it, yeah. The tricky part is finding a combination of
> FSSETXATTR values that will reliably trigger an error return. I think
> it can be done by allocating delalloc blocks and then trying to set an
> extent size hint and change the project id in a single call.
>
> > > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > > fs/xfs/xfs_qm.c | 91 +++++++++++++++++++++----------------------------------
> > > 1 file changed, 34 insertions(+), 57 deletions(-)
> > >
> > >
> > > diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> > > index c134eb4aeaa8..322d337b5dca 100644
> > > --- a/fs/xfs/xfs_qm.c
> > > +++ b/fs/xfs/xfs_qm.c
> > > @@ -1785,6 +1785,28 @@ xfs_qm_vop_chown(
> > > xfs_trans_mod_dquot(tp, newdq, bfield, ip->i_d.di_nblocks);
> > > xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_ICOUNT, 1);
> > >
> > > + /*
> > > + * Back when we made quota reservations for the chown, we reserved the
> > > + * ondisk blocks + delalloc blocks with the new dquot. Now that we've
> > > + * switched the dquots, decrease the new dquot's block reservation
> > > + * (having already bumped up the real counter) so that we don't have
> > > + * any reservation to give back when we commit.
> > > + */
> > > + xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_RES_BLKS,
> > > + -ip->i_delayed_blks);
> > > +
> >
> > Ok, so the reservation code code below continues to reserve di_blocks
> > and delalloc blocks as it did before, but associates it with the
> > transaction and no longer reduces delalloc reservation from the old
> > dquots. Instead, this function increases the consumption of reserved
> > blocks for di_blocks and then removes the additional quota reservation
> > from the transaction, but not the new dquots. Thus when the transaction
> > commits, this has the side effect of leaving additional reservation on
> > the new dquots that correspond to the delalloc blocks on the inode. Am I
> > following that correctly?
>
> Correct.
>
Thanks.
> > > + /*
> > > + * Give the incore reservation for delalloc blocks back to the old
> > > + * dquot. We don't normally handle delalloc quota reservations
> > > + * transactionally, so just lock the dquot and subtract from the
> > > + * reservation. We've dirtied the transaction, so it's too late to
> > > + * turn back now.
> > > + */
> > > + tp->t_flags |= XFS_TRANS_DIRTY;
> > > + xfs_dqlock(prevdq);
> > > + prevdq->q_blk.reserved -= ip->i_delayed_blks;
> > > + xfs_dqunlock(prevdq);
> > > +
> >
> > What's the reason for not using xfs_trans_reserve_quota_bydquots(NULL,
> > ...) here like the original code?
>
> xfs_trans_reserve_quota_bydquots() makes the caller pass in user, group,
> and project dquots. It's not difficult to add more code to declare and
> route parameters, but that just felt overdone.
>
Ah, Ok.
> Given that this is the only place in the codebase where we want to
> change the incore quota reservation on a single dquot, I also didn't
> think it was worth making a whole new function.
>
> FWIW I don't really mind doing it, it just seemed like more work.
> Alternately I suppose I could expose xfs_trans_dqresv.
>
I have no strong preference. It seems fine to me as it is, the reason
for the change just wasn't clear to me at first glance:
Reviewed-by: Brian Foster <bfoster@redhat.com>
> --D
>
> > Brian
> >
> > > /*
> > > * Take an extra reference, because the inode is going to keep
> > > * this dquot pointer even after the trans_commit.
> > > @@ -1807,84 +1829,39 @@ xfs_qm_vop_chown_reserve(
> > > uint flags)
> > > {
> > > struct xfs_mount *mp = ip->i_mount;
> > > - uint64_t delblks;
> > > unsigned int blkflags;
> > > - struct xfs_dquot *udq_unres = NULL;
> > > - struct xfs_dquot *gdq_unres = NULL;
> > > - struct xfs_dquot *pdq_unres = NULL;
> > > struct xfs_dquot *udq_delblks = NULL;
> > > struct xfs_dquot *gdq_delblks = NULL;
> > > struct xfs_dquot *pdq_delblks = NULL;
> > > - int error;
> > > -
> > >
> > > ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
> > > ASSERT(XFS_IS_QUOTA_RUNNING(mp));
> > >
> > > - delblks = ip->i_delayed_blks;
> > > blkflags = XFS_IS_REALTIME_INODE(ip) ?
> > > XFS_QMOPT_RES_RTBLKS : XFS_QMOPT_RES_REGBLKS;
> > >
> > > if (XFS_IS_UQUOTA_ON(mp) && udqp &&
> > > - i_uid_read(VFS_I(ip)) != udqp->q_id) {
> > > + i_uid_read(VFS_I(ip)) != udqp->q_id)
> > > udq_delblks = udqp;
> > > - /*
> > > - * If there are delayed allocation blocks, then we have to
> > > - * unreserve those from the old dquot, and add them to the
> > > - * new dquot.
> > > - */
> > > - if (delblks) {
> > > - ASSERT(ip->i_udquot);
> > > - udq_unres = ip->i_udquot;
> > > - }
> > > - }
> > > +
> > > if (XFS_IS_GQUOTA_ON(ip->i_mount) && gdqp &&
> > > - i_gid_read(VFS_I(ip)) != gdqp->q_id) {
> > > + i_gid_read(VFS_I(ip)) != gdqp->q_id)
> > > gdq_delblks = gdqp;
> > > - if (delblks) {
> > > - ASSERT(ip->i_gdquot);
> > > - gdq_unres = ip->i_gdquot;
> > > - }
> > > - }
> > >
> > > if (XFS_IS_PQUOTA_ON(ip->i_mount) && pdqp &&
> > > - ip->i_d.di_projid != pdqp->q_id) {
> > > + ip->i_d.di_projid != pdqp->q_id)
> > > pdq_delblks = pdqp;
> > > - if (delblks) {
> > > - ASSERT(ip->i_pdquot);
> > > - pdq_unres = ip->i_pdquot;
> > > - }
> > > - }
> > > -
> > > - error = xfs_trans_reserve_quota_bydquots(tp, ip->i_mount,
> > > - udq_delblks, gdq_delblks, pdq_delblks,
> > > - ip->i_d.di_nblocks, 1, flags | blkflags);
> > > - if (error)
> > > - return error;
> > >
> > > /*
> > > - * Do the delayed blks reservations/unreservations now. Since, these
> > > - * are done without the help of a transaction, if a reservation fails
> > > - * its previous reservations won't be automatically undone by trans
> > > - * code. So, we have to do it manually here.
> > > + * Reserve enough quota to handle blocks on disk and reserved for a
> > > + * delayed allocation. We'll actually transfer the delalloc
> > > + * reservation between dquots at chown time, even though that part is
> > > + * only semi-transactional.
> > > */
> > > - if (delblks) {
> > > - /*
> > > - * Do the reservations first. Unreservation can't fail.
> > > - */
> > > - ASSERT(udq_delblks || gdq_delblks || pdq_delblks);
> > > - ASSERT(udq_unres || gdq_unres || pdq_unres);
> > > - error = xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount,
> > > - udq_delblks, gdq_delblks, pdq_delblks,
> > > - (xfs_qcnt_t)delblks, 0, flags | blkflags);
> > > - if (error)
> > > - return error;
> > > - xfs_trans_reserve_quota_bydquots(NULL, ip->i_mount,
> > > - udq_unres, gdq_unres, pdq_unres,
> > > - -((xfs_qcnt_t)delblks), 0, blkflags);
> > > - }
> > > -
> > > - return 0;
> > > + return xfs_trans_reserve_quota_bydquots(tp, ip->i_mount, udq_delblks,
> > > + gdq_delblks, pdq_delblks,
> > > + ip->i_d.di_nblocks + ip->i_delayed_blks,
> > > + 1, blkflags | flags);
> > > }
> > >
> > > int
> > >
> >
>
next prev parent reply other threads:[~2021-02-02 18:37 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-02 2:03 [PATCHSET v6 00/16] xfs: minor cleanups of the quota functions Darrick J. Wong
2021-02-02 2:03 ` [PATCH 01/16] xfs: fix chown leaking delalloc quota blocks when fssetxattr fails Darrick J. Wong
2021-02-02 13:13 ` Brian Foster
2021-02-02 17:47 ` Darrick J. Wong
2021-02-02 17:56 ` Christoph Hellwig
2021-02-02 18:34 ` Brian Foster [this message]
2021-02-02 19:38 ` [PATCH v6.1 " Darrick J. Wong
2021-02-02 2:03 ` [PATCH 02/16] xfs: clean up quota reservation callsites Darrick J. Wong
2021-02-02 2:03 ` [PATCH 03/16] xfs: create convenience wrappers for incore quota block reservations Darrick J. Wong
2021-02-02 2:03 ` [PATCH 04/16] xfs: remove xfs_trans_unreserve_quota_nblks completely Darrick J. Wong
2021-02-02 2:03 ` [PATCH 05/16] xfs: clean up icreate quota reservation calls Darrick J. Wong
2021-02-02 2:03 ` [PATCH 06/16] xfs: fix up build warnings when quotas are disabled Darrick J. Wong
2021-02-02 2:03 ` [PATCH 07/16] xfs: reduce quota reservation when doing a dax unwritten extent conversion Darrick J. Wong
2021-02-02 2:04 ` [PATCH 08/16] xfs: reserve data and rt quota at the same time Darrick J. Wong
2021-02-02 2:04 ` [PATCH 09/16] xfs: refactor common transaction/inode/quota allocation idiom Darrick J. Wong
2021-02-02 2:04 ` [PATCH 10/16] xfs: allow reservation of rtblocks with xfs_trans_alloc_inode Darrick J. Wong
2021-02-02 2:04 ` [PATCH 11/16] xfs: refactor reflink functions to use xfs_trans_alloc_inode Darrick J. Wong
2021-02-02 2:04 ` [PATCH 12/16] xfs: refactor inode creation transaction/inode/quota allocation idiom Darrick J. Wong
2021-02-02 2:04 ` [PATCH 13/16] xfs: refactor inode ownership change " Darrick J. Wong
2021-02-02 7:04 ` Christoph Hellwig
2021-02-02 13:13 ` Brian Foster
2021-02-02 2:04 ` [PATCH 14/16] xfs: remove xfs_qm_vop_chown_reserve Darrick J. Wong
2021-02-02 7:05 ` Christoph Hellwig
2021-02-02 13:13 ` Brian Foster
2021-02-02 2:04 ` [PATCH 15/16] xfs: rename code to error in xfs_ioctl_setattr Darrick J. Wong
2021-02-02 13:13 ` Brian Foster
2021-02-02 2:04 ` [PATCH 16/16] xfs: shut down the filesystem if we screw up quota errors Darrick J. Wong
2021-02-02 13:13 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210202183429.GK3336100@bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=hch@infradead.org \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox