From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs <linux-xfs@vger.kernel.org>, Alex Lyakas <alex@zadara.com>,
Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
Date: Thu, 5 Dec 2019 13:42:22 -0800 [thread overview]
Message-ID: <20191205214222.GE13260@magnolia> (raw)
In-Reply-To: <20191205143618.GA48368@bfoster>
On Thu, Dec 05, 2019 at 09:36:18AM -0500, Brian Foster wrote:
> On Wed, Dec 04, 2019 at 09:03:40AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
> > and swidth values could cause xfs_repair to fail loudly. The problem
> > here is that repair calculates the where mkfs should have allocated the
> > root inode, based on the superblock geometry. The allocation decisions
> > depend on sunit, which means that we really can't go updating sunit if
> > it would lead to a subsequent repair failure on an otherwise correct
> > filesystem.
> >
> > Port the computation code from xfs_repair and teach mount to avoid the
> > ondisk update if it would cause problems for repair. We allow the mount
> > to proceed (and new allocations will reflect this new geometry) because
> > we've never screened this kind of thing before.
> >
> > [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d
> >
> > Reported-by: Alex Lyakas <alex@zadara.com>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > v2: compute the root inode location directly
> > ---
> > fs/xfs/libxfs/xfs_ialloc.c | 81 ++++++++++++++++++++++++++++++++++++++++++++
> > fs/xfs/libxfs/xfs_ialloc.h | 1 +
> > fs/xfs/xfs_mount.c | 51 ++++++++++++++++++----------
> > 3 files changed, 115 insertions(+), 18 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> > index 988cde7744e6..6df9bcc96251 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.c
> > +++ b/fs/xfs/libxfs/xfs_ialloc.c
> > @@ -2909,3 +2909,84 @@ xfs_ialloc_setup_geometry(
> > else
> > igeo->ialloc_align = 0;
> > }
> > +
> > +/*
> > + * Compute the location of the root directory inode that is laid out by mkfs.
> > + * The @sunit parameter will be copied from the superblock if it is negative.
> > + */
> > +xfs_ino_t
> > +xfs_ialloc_calc_rootino(
> > + struct xfs_mount *mp,
> > + int sunit)
> > +{
> > + struct xfs_ino_geometry *igeo = M_IGEO(mp);
> > + xfs_agino_t first_agino;
> > + xfs_agblock_t first_bno;
> > +
> > + if (sunit < 0)
> > + sunit = mp->m_sb.sb_unit;
> > +
> > + /*
> > + * Pre-calculate the geometry of ag 0. We know what it looks like
> > + * because we know what mkfs does: 2 allocation btree roots (by block
> > + * and by size), the inode allocation btree root, the free inode
> > + * allocation btree root (if enabled) and some number of blocks to
> > + * prefill the agfl.
> > + *
> > + * Because the current shape of the btrees may differ from the current
> > + * shape, we open code the mkfs freelist block count here. mkfs creates
> > + * single level trees, so the calculation is pretty straight forward for
> > + * the trees that use the AGFL.
> > + */
> > +
>
> I know this code is lifted from userspace, but.. "the current shape of
> the btrees may differ from the current shape, .." Eh?
Heh, ok, I'll clean up the comment too.
> > + /* free space by block btree root comes after the ag headers */
> > + first_bno = howmany(4 * mp->m_sb.sb_sectsize, mp->m_sb.sb_blocksize);
> > +
> > + /* free space by length btree root */
> > + first_bno += 1;
> > +
> > + /* inode btree root */
> > + first_bno += 1;
> > +
> > + /* agfl */
> > + first_bno += (2 * min_t(xfs_agblock_t, 2, mp->m_ag_maxlevels)) + 1;
> > +
>
> This is a little subtle from the userspace code. The extra +1 here is
> where we go from pointing at metadata blocks (i.e. bnobt root) to the
> first free block (i.e., past metadata blocks), right? If so, I wonder if
> this should be incorporated either at the beginning or end with a
> comment for explanation (i.e. "Start by pointing at the first block
> after the AG headers and increment by size of applicable metadata to
> locate the first free block ...").
>
> I'm guessing this is a historical artifact of the userspace code as
> features were added. The fact that userspace uses different variables
> somewhat helps self-document in that context, and we lose that here.
Yeah. I'll rework the comments to make it clearer what's going on.
"first_bno is the first block in which we could allocate an inode
chunk after factoring in..."
"...the ag headers"
"...the bnobt / cntbt"
"...the inobt"
etc.
> > + if (xfs_sb_version_hasfinobt(&mp->m_sb))
> > + first_bno++;
> > +
> > + if (xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> > + first_bno++;
> > + /* agfl blocks */
> > + first_bno += min_t(xfs_agblock_t, 2, mp->m_rmap_maxlevels);
> > + }
> > +
> > + if (xfs_sb_version_hasreflink(&mp->m_sb))
> > + first_bno++;
> > +
> > + /*
> > + * If the log is allocated in the first allocation group we need to
> > + * add the number of blocks used by the log to the above calculation.
> > + *
> > + * This can happens with filesystems that only have a single
> > + * allocation group, or very odd geometries created by old mkfs
> > + * versions on very small filesystems.
> > + */
> > + if (mp->m_sb.sb_logstart &&
> > + XFS_FSB_TO_AGNO(mp, mp->m_sb.sb_logstart) == 0)
> > + first_bno += mp->m_sb.sb_logblocks;
> > +
> > + /*
> > + * ditto the location of the first inode chunks in the fs ('/')
> > + */
> > + if (xfs_sb_version_hasdalign(&mp->m_sb) && igeo->ialloc_align > 0) {
> > + first_agino = XFS_AGB_TO_AGINO(mp, roundup(first_bno, sunit));
> > + } else if (xfs_sb_version_hasalign(&mp->m_sb) &&
> > + mp->m_sb.sb_inoalignmt > 1) {
> > + first_agino = XFS_AGB_TO_AGINO(mp,
> > + roundup(first_bno, mp->m_sb.sb_inoalignmt));
> > + } else {
> > + first_agino = XFS_AGB_TO_AGINO(mp, first_bno);
> > + }
> > +
> > + return XFS_AGINO_TO_INO(mp, 0, first_agino);
> > +}
> > diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h
> > index 323592d563d5..72b3468b97b1 100644
> > --- a/fs/xfs/libxfs/xfs_ialloc.h
> > +++ b/fs/xfs/libxfs/xfs_ialloc.h
> > @@ -152,5 +152,6 @@ int xfs_inobt_insert_rec(struct xfs_btree_cur *cur, uint16_t holemask,
> >
> > int xfs_ialloc_cluster_alignment(struct xfs_mount *mp);
> > void xfs_ialloc_setup_geometry(struct xfs_mount *mp);
> > +xfs_ino_t xfs_ialloc_calc_rootino(struct xfs_mount *mp, int sunit);
> >
> > #endif /* __XFS_IALLOC_H__ */
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index fca65109cf24..a4eb3ae34a84 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -363,9 +363,10 @@ xfs_readsb(
> > * Update alignment values based on mount options and sb values
> > */
> > STATIC int
> > -xfs_update_alignment(xfs_mount_t *mp)
> > +xfs_update_alignment(
> > + struct xfs_mount *mp)
> > {
> > - xfs_sb_t *sbp = &(mp->m_sb);
> > + struct xfs_sb *sbp = &mp->m_sb;
> >
> > if (mp->m_dalign) {
> > /*
> > @@ -398,28 +399,42 @@ xfs_update_alignment(xfs_mount_t *mp)
> > }
> > }
> >
> > - /*
> > - * Update superblock with new values
> > - * and log changes
> > - */
> > - if (xfs_sb_version_hasdalign(sbp)) {
> > - if (sbp->sb_unit != mp->m_dalign) {
> > - sbp->sb_unit = mp->m_dalign;
> > - mp->m_update_sb = true;
> > - }
> > - if (sbp->sb_width != mp->m_swidth) {
> > - sbp->sb_width = mp->m_swidth;
> > - mp->m_update_sb = true;
> > - }
> > - } else {
> > + /* Update superblock with new values and log changes. */
> > + if (!xfs_sb_version_hasdalign(sbp)) {
> > xfs_warn(mp,
> > "cannot change alignment: superblock does not support data alignment");
> > return -EINVAL;
> > }
> > +
> > + if (sbp->sb_unit == mp->m_dalign &&
> > + sbp->sb_width == mp->m_swidth)
> > + return 0;
> > +
> > + /*
> > + * If the sunit/swidth change would move the precomputed root
> > + * inode value, we must reject the ondisk change because repair
> > + * will stumble over that. However, we allow the mount to
> > + * proceed because we never rejected this combination before.
> > + */
> > + if (sbp->sb_rootino !=
> > + xfs_ialloc_calc_rootino(mp, mp->m_dalign)) {
> > + xfs_warn(mp,
> > + "cannot change stripe alignment: would require moving root inode");
> > +
>
> FWIW, I read this error message as the mount option was ignored. I don't
> much care whether we ignore the mount option or simply the on-disk
> update, but the error could be a bit more clear in the latter case.
Ok, I'll add a message about how we're skipping the sb update.
> Also, what is the expected behavior for xfs_info in the latter
> situation?
A previous revision of the patch had the ioctl feeding xfs_info using
the incore values, but Dave objected so I dropped it.
--D
> Brian
>
> > + /*
> > + * XXX: Next time we add a new incompat feature, this
> > + * should start returning -EINVAL.
> > + */
> > + return 0;
> > + }
> > +
> > + sbp->sb_unit = mp->m_dalign;
> > + sbp->sb_width = mp->m_swidth;
> > + mp->m_update_sb = true;
> > } else if ((mp->m_flags & XFS_MOUNT_NOALIGN) != XFS_MOUNT_NOALIGN &&
> > xfs_sb_version_hasdalign(&mp->m_sb)) {
> > - mp->m_dalign = sbp->sb_unit;
> > - mp->m_swidth = sbp->sb_width;
> > + mp->m_dalign = sbp->sb_unit;
> > + mp->m_swidth = sbp->sb_width;
> > }
> >
> > return 0;
> >
>
next prev parent reply other threads:[~2019-12-05 21:42 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-04 17:03 [PATCH] xfs: don't commit sunit/swidth updates to disk if that would cause repair failures Darrick J. Wong
2019-12-05 14:36 ` Brian Foster
2019-12-05 21:42 ` Darrick J. Wong [this message]
2019-12-06 10:57 ` Brian Foster
2019-12-13 16:10 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191205214222.GE13260@magnolia \
--to=darrick.wong@oracle.com \
--cc=alex@zadara.com \
--cc=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox