From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH] xfs: use ordered buffers to initialize dquot buffers during quotacheck
Date: Wed, 13 May 2020 10:06:47 -0700 [thread overview]
Message-ID: <20200513170647.GE1984748@magnolia> (raw)
In-Reply-To: <20200513161223.GA45326@bfoster>
On Wed, May 13, 2020 at 12:12:23PM -0400, Brian Foster wrote:
> On Wed, May 13, 2020 at 08:26:28AM -0700, Darrick J. Wong wrote:
> > On Wed, May 13, 2020 at 09:14:15AM -0400, Brian Foster wrote:
> > > On Tue, May 12, 2020 at 02:00:33PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > >
> > > > While QAing the new xfs_repair quotacheck code, I uncovered a quota
> > > > corruption bug resulting from a bad interaction between dquot buffer
> > > > initialization and quotacheck. The bug can be reproduced with the
> > > > following sequence:
> > > >
> > > > # mkfs.xfs -f /dev/sdf
> > > > # mount /dev/sdf /opt -o usrquota
> > > > # su nobody -s /bin/bash -c 'touch /opt/barf'
> > > > # sync
> > > > # xfs_quota -x -c 'report -ahi' /opt
> > > > User quota on /opt (/dev/sdf)
> > > > Inodes
> > > > User ID Used Soft Hard Warn/Grace
> > > > ---------- ---------------------------------
> > > > root 3 0 0 00 [------]
> > > > nobody 1 0 0 00 [------]
> > > >
> > > > # xfs_io -x -c 'shutdown' /opt
> > > > # umount /opt
> > > > # mount /dev/sdf /opt -o usrquota
> > > > # touch /opt/man2
> > > > # xfs_quota -x -c 'report -ahi' /opt
> > > > User quota on /opt (/dev/sdf)
> > > > Inodes
> > > > User ID Used Soft Hard Warn/Grace
> > > > ---------- ---------------------------------
> > > > root 1 0 0 00 [------]
> > > > nobody 1 0 0 00 [------]
> > > >
> > > > # umount /opt
> > > >
> > > > Notice how the initial quotacheck set the root dquot icount to 3
> > > > (rootino, rbmino, rsumino), but after shutdown -> remount -> recovery,
> > > > xfs_quota reports that the root dquot has only 1 icount. We haven't
> > > > deleted anything from the filesystem, which means that quota is now
> > > > under-counting. This behavior is not limited to icount or the root
> > > > dquot, but this is the shortest reproducer.
> > > >
> > > > I traced the cause of this discrepancy to the way that we handle ondisk
> > > > dquot updates during quotacheck vs. regular fs activity. Normally, when
> > > > we allocate a disk block for a dquot, we log the buffer as a regular
> > > > (dquot) buffer. Subsequent updates to the dquots backed by that block
> > > > are done via separate dquot log item updates, which means that they
> > > > depend on the logged buffer update being written to disk before the
> > > > dquot items. Because individual dquots have their own LSN fields, that
> > > > initial dquot buffer must always be recovered.
> > > >
> > > > However, the story changes for quotacheck, which can cause dquot block
> > > > allocations but persists the final dquot counter values via a delwri
> > > > list. Because recovery doesn't gate dquot buffer replay on an LSN, this
> > > > means that the initial dquot buffer can be replayed over the (newer)
> > > > contents that were delwritten at the end of quotacheck. In effect, this
> > > > re-initializes the dquot counters after they've been updated. If the
> > > > log does not contain any other dquot items to recover, the obsolete
> > > > dquot contents will not be corrected by log recovery.
> > > >
> > > > Because quotacheck uses a transaction to log the setting of the CHKD
> > > > flags in the superblock, we skip quotacheck during the second mount
> > > > call, which allows the incorrect icount to remain.
> > > >
> > > > Fix this by changing the ondisk dquot initialization function to use
> > > > ordered buffers to write out fresh dquot blocks if it detects that we're
> > > > running quotacheck. If the system goes down before quotacheck can
> > > > complete, the CHKD flags will not be set in the superblock and the next
> > > > mount will run quotacheck again, which can fix uninitialized dquot
> > > > buffers. This requires amending the defer code to maintaine ordered
> > > > buffer state across defer rolls for the sake of the dquot allocation
> > > > code.
> > > >
> > > > For regular operations we preserve the current behavior since the dquot
> > > > items require properly initialized ondisk dquot records.
> > > >
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > > fs/xfs/libxfs/xfs_defer.c | 10 ++++++++-
> > > > fs/xfs/xfs_dquot.c | 51 ++++++++++++++++++++++++++++++++++-----------
> > > > 2 files changed, 47 insertions(+), 14 deletions(-)
> > > >
> > > ...
> > > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
> > > > index 7f39dd24d475..cf8b2f4de587 100644
> > > > --- a/fs/xfs/xfs_dquot.c
> > > > +++ b/fs/xfs/xfs_dquot.c
> > > ...
> > > > @@ -277,11 +279,34 @@ xfs_qm_init_dquot_blk(
> > > > }
> > > > }
> > > >
> > > > - xfs_trans_dquot_buf(tp, bp,
> > > > - (type & XFS_DQ_USER ? XFS_BLF_UDQUOT_BUF :
> > > > - ((type & XFS_DQ_PROJ) ? XFS_BLF_PDQUOT_BUF :
> > > > - XFS_BLF_GDQUOT_BUF)));
> > > > - xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1);
> > > > + if (type & XFS_DQ_USER) {
> > > > + qflag = XFS_UQUOTA_CHKD;
> > > > + blftype = XFS_BLF_UDQUOT_BUF;
> > > > + } else if (type & XFS_DQ_PROJ) {
> > > > + qflag = XFS_PQUOTA_CHKD;
> > > > + blftype = XFS_BLF_PDQUOT_BUF;
> > > > + } else {
> > > > + qflag = XFS_GQUOTA_CHKD;
> > > > + blftype = XFS_BLF_GDQUOT_BUF;
> > > > + }
> > > > +
> > > > + xfs_trans_dquot_buf(tp, bp, blftype);
> > > > +
> > > > + /*
> > > > + * If the CHKD flag isn't set, we're running quotacheck and need to use
> > > > + * ordered buffers so that the logged initialization buffer does not
> > > > + * get replayed over the delwritten quotacheck buffer. If we crash
> > > > + * before the end of quotacheck, the CHKD flags will not be set in the
> > > > + * superblock and we'll re-run quotacheck at next mount.
> > > > + *
> > > > + * Outside of quotacheck, dquot updates are logged via dquot items and
> > > > + * we must use the regular buffer logging mechanisms to ensure that the
> > > > + * initial buffer state is recovered before dquot items.
> > > > + */
> > > > + if (mp->m_qflags & qflag)
> > > > + xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1);
> > > > + else
> > > > + xfs_trans_ordered_buf(tp, bp);
> > >
> > > Ok, I think I follow what's going on here. quotacheck runs and allocates
> > > a dquot block and inits it in a transaction. Some time later quotacheck
> > > updates dquots in said block and queues the buffer in its own delwri
> > > list. Quotacheck completes, the buffer is written back and then the
> > > filesystem immediately crashes. We replay the content of the alloc/init
> > > transaction over the updated content in the block on disk. This isn't a
> > > problem outside of quotacheck because a subsequent dquot update would
> > > have also been logged instead of directly written back.
> >
> > Correct.
> >
> > > Therefore, the purpose of the change above is primarily to avoid logging
> > > the init of the dquot buffer during quotacheck. That makes sense, but
> > > what about the integrity of this particular transaction? For example,
> > > what happens if this transaction commits to the on-disk log and we crash
> > > before quotacheck completes and the buffer is written back? I'm assuming
> > > recovery would replay the dquot allocation but not the initialization,
> > > then quotacheck would run again and find the buffer in an unknown state
> > > (perhaps failing a read verifier?). Hm?
> >
> > Yes, the read verifier can fail in quotacheck, but quotacheck reacts to
> > that by re-trying the buffer read with a NULL buffer ops (in
> > xfs_qm_reset_dqcounts_all). Next, it calls xfs_qm_reset_dqcounts, which
> > calls xfs_dqblk_repair to reinitialize the dquot records. After that,
> > xfs_qm_reset_dqcounts resets even more of the dquot state (counters,
> > timers, flags, warns).
> >
>
> Ok, I was thinking we might get back to this "read or allocate" path but
> it sounds like the earlier reset phase reads every chunk, does this
> verifier dance and reinits if necessary. It's somewhat unfortunate that
> we can create those conditions intentionally, but I suppose that's
> better than the current state in that we at least recover from it. We
> should probably document that somewhere to explain why ordering the
> buffer like this is safe in quotacheck context.
Yeah, I was trying to get everyone there in the comment above, but maybe
this ought to be more explicitly stated in the comment. The current
version of that looks like:
/*
* When quotacheck runs, we use delayed writes to update all the
* dquots on disk in an efficient manner instead of logging the
* individual dquot changes as they are made.
*
* Hence if we log the buffer that we allocate here, then crash
* post-quotacheck while the logged initialisation is still in
* the active region of the log, we can lose the information
* quotacheck wrote directly to the buffer. That is, log
* recovery will replay the dquot buffer initialisation over the
* top of whatever information quotacheck had written to the
* buffer.
*
* To avoid this problem, dquot allocation during quotacheck
* needs to avoid logging the initialised buffer, but we still
* need to have writeback of the buffer pin the tail of the log
* so that it is initialised on disk before we remove the
* allocation transaction from the active region of the log.
* Marking the buffer as ordered instead of logging it provides
* this behaviour.
*
* If we crash before quotacheck completes, a subsequent
* quotacheck run will re-allocate and re-initialize the dquot
* records as needed.
*/
if (!(mp->m_qflags & qflag))
xfs_trans_ordered_buf(tp, bp);
else
xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1);
--D
> Brian
>
> > In short, quotacheck will fix anything it doesn't like about the dquot
> > buffers that are attached to the quota inodes.
> >
> > --D
> >
> > > Brian
> > >
> > > > }
> > > >
> > > > /*
> > > >
> > >
> >
>
prev parent reply other threads:[~2020-05-13 17:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-12 21:00 [PATCH] xfs: use ordered buffers to initialize dquot buffers during quotacheck Darrick J. Wong
2020-05-13 0:06 ` Dave Chinner
2020-05-13 0:25 ` Darrick J. Wong
2020-05-13 0:38 ` Dave Chinner
2020-05-13 13:14 ` Brian Foster
2020-05-13 15:26 ` Darrick J. Wong
2020-05-13 16:12 ` Brian Foster
2020-05-13 17:06 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200513170647.GE1984748@magnolia \
--to=darrick.wong@oracle.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox