From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH 05/14] xfs: repair the rmapbt
Date: Wed, 6 Jun 2018 14:13:34 -0700 [thread overview]
Message-ID: <20180606211334.GA25007@magnolia> (raw)
In-Reply-To: <CAOQ4uxhz5M_3iYgtdsD-t9PDTy3mGXYfSqy0iYHwujPw-5dbng@mail.gmail.com>
On Thu, May 31, 2018 at 08:42:06AM +0300, Amir Goldstein wrote:
> On Wed, May 30, 2018 at 10:31 PM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > Rebuild the reverse mapping btree from all primary metadata.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > fs/xfs/Makefile | 1
> > fs/xfs/scrub/common.c | 6
> > fs/xfs/scrub/repair.c | 119 +++++++
> > fs/xfs/scrub/repair.h | 27 +
> > fs/xfs/scrub/rmap.c | 6
> > fs/xfs/scrub/rmap_repair.c | 796 ++++++++++++++++++++++++++++++++++++++++++++
> > fs/xfs/scrub/scrub.c | 18 +
> > fs/xfs/scrub/scrub.h | 2
> > fs/xfs/xfs_mount.h | 1
> > fs/xfs/xfs_super.c | 27 +
> > fs/xfs/xfs_trans.c | 7
> > 11 files changed, 1004 insertions(+), 6 deletions(-)
> > create mode 100644 fs/xfs/scrub/rmap_repair.c
> >
> >
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index 7c442f83b179..b9bbac3d5075 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -178,6 +178,7 @@ xfs-y += $(addprefix scrub/, \
> > alloc_repair.o \
> > ialloc_repair.o \
> > repair.o \
> > + rmap_repair.o \
> > )
> > endif
> > endif
> > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > index 89938b328954..f92994716522 100644
> > --- a/fs/xfs/scrub/common.c
> > +++ b/fs/xfs/scrub/common.c
> > @@ -603,9 +603,13 @@ xfs_scrub_trans_alloc(
> > struct xfs_scrub_context *sc,
> > uint resblks)
> > {
> > + uint flags = 0;
> > +
> > + if (sc->fs_frozen)
> > + flags |= XFS_TRANS_NO_WRITECOUNT;
> > if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)
> > return xfs_trans_alloc(sc->mp, &M_RES(sc->mp)->tr_itruncate,
> > - resblks, 0, 0, &sc->tp);
> > + resblks, 0, flags, &sc->tp);
> >
> > return xfs_trans_alloc_empty(sc->mp, &sc->tp);
> > }
> > diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
> > index 45a91841c0ac..4b5d599d53b9 100644
> > --- a/fs/xfs/scrub/repair.c
> > +++ b/fs/xfs/scrub/repair.c
> > @@ -43,6 +43,8 @@
> > #include "xfs_ag_resv.h"
> > #include "xfs_trans_space.h"
> > #include "xfs_quota.h"
> > +#include "xfs_bmap.h"
> > +#include "xfs_bmap_util.h"
> > #include "scrub/xfs_scrub.h"
> > #include "scrub/scrub.h"
> > #include "scrub/common.h"
> > @@ -1146,3 +1148,120 @@ xfs_repair_mod_ino_counts(
> > (int64_t)freecount - old_freecount);
> > }
> > }
> > +
> > +/*
> > + * Freeze the FS against all other activity so that we can avoid ABBA
> > + * deadlocks while taking locks in unusual orders so that we can rebuild
> > + * metadata structures such as the rmapbt.
> > + */
> > +int
> > +xfs_repair_fs_freeze(
> > + struct xfs_scrub_context *sc)
> > +{
> > + int error;
> > +
> > + error = freeze_super(sc->mp->m_super);
> > + if (error)
> > + return error;
> > + sc->fs_frozen = true;
> > + return 0;
> > +}
> > +
> > +/* Unfreeze the FS. */
> > +int
> > +xfs_repair_fs_thaw(
> > + struct xfs_scrub_context *sc)
> > +{
> > + struct inode *inode, *o;
> > + int error;
> > +
> > + sc->fs_frozen = false;
> > + error = thaw_super(sc->mp->m_super);
> > +
> > + inode = sc->frozen_inode_list;
> > + while (inode) {
> > + o = inode->i_private;
> > + inode->i_private = NULL;
> > + iput(inode);
> > + inode = o;
> > + }
> > +
> > + return error;
> > +}
> > +
>
>
> I think that new mechanism is worth a mention in the commit message,
> if not a patch of its own with cc to fsdevel.
> In a discussion on said patch I would ask: how does xfs_repair_fs_freeze()
> work in collaboration with user initiated fsfreeze?
> Is there a situation where LVM can be fooled to think that XFS is really
> frozen, but it is actually "repair frozen"? and metadata can change while
> taking a snapshot?
Notice how xfs added a ->freeze_super handler to the superblock
operations that prohibits userspace from initiating a freeze while any
repair operations are running? If userspace (lvm, etc.) try to initiate
a freeze while repair is running, the freeze attempt is kicked back to
userspace with -EBUSY. Similarly, a new xfs ->thaw_super handler
prevents userspace from unfreezing while repair is running.
Granted, the current patch doesn't quite work right either; I've
replaced m_scrubbers with a mutex that repair holds for the duration of
the repair freeze; this way regular freeze/thaw requests will block
until the repair is finished.
> This is why I suggested to add a VFS freeze level, e.g.
> SB_FREEZE_FS_MAINTAINANCE so that you don't publish XFS state
> as SB_FREEZE_COMPLETE while you are modifying metadata on disk.
> It might be sufficient to get XFS to state SB_FREEZE_COMPLETE and
> then up only to SB_FREEZE_FS in xfs_repair_fs_freeze() without
> adding any new states.
I tried that, and it didn't work. We actually /do/ want to be at
SB_FREEZE_COMPLETE so that repair is the /only/ thread that can change
any filesystem state. Under normal circumstances, XFS transaction
allocation will block until the SB_FREEZE_COMPLETE condition clears.
This stops any background space reclamation from happening, at least if
it requires a transaction. Online repair of course grants itself the
ability to run transactions even during FREEZE_COMPLETE.
Will the following comment (to be embedded in the repair code) explain
this all sufficiently?
/*
* Freezing the Filesystem for a Repair
* ====================================
*
* While most repair activity can occur while the filesystem is live,
* there are certain scenarios where we cannot tolerate concurrent
* metadata updates. We therefore must freeze the filesystem against
* all other changes.
*
* The typical scenarios envisioned for repair freezes are (a) to avoid
* ABBA deadlocks when need to take locks in an unusual order; or (b) to
* update global filesystem state. For example, reconstruction of a
* damaged reverse mapping btree requires us to hold the AG header locks
* while scanning inodes, which goes against the usual inode -> AG
* header locking order.
*
* A note about inode reclaim: when we freeze the filesystem, users
* can't modify things and periodic background reclaim of speculative
* preallocations and copy-on-write staging extents is stopped.
* However, the repair thread must be careful about evicting an inode
* from memory -- if the eviction would require a transaction, we must
* defer the iput until after the repair freeze. The reasons for this
* are twofold: first, repair already has a transaction and xfs can't
* nest transactions; and second, we froze the fs to prevent
* modifications that repair doesn't control directly.
*
* Userspace is prevented from freezing or thawing the filesystem during
* a repair freeze by the ->freeze_super and ->thaw_super superblock
* operations, which block any changes to the freeze state while a
* repair freeze is running through the use of the m_repair_freeze
* mutex. It only makes sense to run one repair freeze at a time, so
* the mutex is fine.
*
* Repair freezes cannot be initiated during a regular freeze because
* freeze_super does not allow nested freeze. Repair activity that does
* not require a repair freeze is also prevented from running during a
* regular freeze because transaction allocation blocks on the regular
* freeze. We assume that the only other users of
* XFS_TRANS_NO_WRITECOUNT transactions either aren't modifying space
* metadata in a way that would affect repair, or that we can inhibit
* any of the ones that do.
*/
--D
>
> Thanks,
> Amir.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-06-06 21:13 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-30 19:30 [PATCH v15.2 00/14] xfs-4.18: online repair support Darrick J. Wong
2018-05-30 19:30 ` [PATCH 01/14] xfs: repair the AGF and AGFL Darrick J. Wong
2018-06-04 1:52 ` Dave Chinner
2018-06-05 23:18 ` Darrick J. Wong
2018-06-06 4:06 ` Dave Chinner
2018-06-06 4:56 ` Darrick J. Wong
2018-06-07 0:31 ` Dave Chinner
2018-06-07 4:42 ` Darrick J. Wong
2018-06-08 0:55 ` Dave Chinner
2018-06-08 1:23 ` Darrick J. Wong
2018-05-30 19:30 ` [PATCH 02/14] xfs: repair the AGI Darrick J. Wong
2018-06-04 1:56 ` Dave Chinner
2018-06-05 23:54 ` Darrick J. Wong
2018-05-30 19:30 ` [PATCH 03/14] xfs: repair free space btrees Darrick J. Wong
2018-06-04 2:12 ` Dave Chinner
2018-06-06 1:50 ` Darrick J. Wong
2018-06-06 3:34 ` Dave Chinner
2018-06-06 4:01 ` Darrick J. Wong
2018-05-30 19:31 ` [PATCH 04/14] xfs: repair inode btrees Darrick J. Wong
2018-06-04 3:41 ` Dave Chinner
2018-06-06 3:55 ` Darrick J. Wong
2018-06-06 4:32 ` Dave Chinner
2018-06-06 4:58 ` Darrick J. Wong
2018-05-30 19:31 ` [PATCH 05/14] xfs: repair the rmapbt Darrick J. Wong
2018-05-31 5:42 ` Amir Goldstein
2018-06-06 21:13 ` Darrick J. Wong [this message]
2018-05-30 19:31 ` [PATCH 06/14] xfs: repair refcount btrees Darrick J. Wong
2018-05-30 19:31 ` [PATCH 07/14] xfs: repair inode records Darrick J. Wong
2018-05-30 19:31 ` [PATCH 08/14] xfs: zap broken inode forks Darrick J. Wong
2018-05-30 19:31 ` [PATCH 09/14] xfs: repair inode block maps Darrick J. Wong
2018-05-30 19:31 ` [PATCH 10/14] xfs: repair damaged symlinks Darrick J. Wong
2018-05-30 19:31 ` [PATCH 11/14] xfs: repair extended attributes Darrick J. Wong
2018-05-30 19:31 ` [PATCH 12/14] xfs: scrub should set preen if attr leaf has holes Darrick J. Wong
2018-05-30 19:32 ` [PATCH 13/14] xfs: repair quotas Darrick J. Wong
2018-05-30 19:32 ` [PATCH 14/14] xfs: implement live quotacheck as part of quota repair Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180606211334.GA25007@magnolia \
--to=darrick.wong@oracle.com \
--cc=amir73il@gmail.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox