From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 3/3] xfs: terminate perag iteration reliably on end agno
Date: Fri, 8 Oct 2021 10:45:13 -0400 [thread overview]
Message-ID: <YWBZef87p55+XKNh@bfoster> (raw)
In-Reply-To: <20211007230259.GG54211@dread.disaster.area>
On Fri, Oct 08, 2021 at 10:02:59AM +1100, Dave Chinner wrote:
> On Thu, Oct 07, 2021 at 08:50:53AM -0400, Brian Foster wrote:
> > The for_each_perag*() set of macros are hacky in that some (i.e. those
> > based on sb_agcount) rely on the assumption that perag iteration
> > terminates naturally with a NULL perag at the specified end agno. Others
> > allow for the final AG to have a valid perag and require the calling
> > function to clean up any potential leftover xfs_perag reference on
> > termination of the loop.
> >
> > Aside from providing a subtly inconsistent interface, the former variant
> > is racy with a potential growfs in progress because growfs can create
> > discoverable post-eofs perags before the final superblock update that
> > completes the grow operation and increases sb_agcount. This leads to
> > unexpected assert failures (reproduced by xfs/104) such as the following
> > in the superblock buffer write verifier path:
> >
> > XFS: Assertion failed: agno < mp->m_sb.sb_agcount, file: fs/xfs/libxfs/xfs_types.c, line: 22
>
> Yeah, that's a bad assert. It's not valid in the context of grow or
> shrink or any of the future advanced per-ag management things we
> want to do.
>
I think it depends on the context. I don't think it's unreasonable to
expect certain paths to not want to process post-eofs perags. I don't
really like the placement of this assert tbh (with it being in a generic
helper) and the sb write verifier is probably not ideal context for the
check in a generic sense, but it does flag unexpected behavior when you
consider the higher level iteration is based on sb_agcount.
> I'm ok with the change being proposed as a expedient bug fix, but
> I'll note that the approach taken to fix it is not compatible with
> future plans for managing shrink and perag operations. I'll comment
> on the patch first, then the rest of the email is commentary about
> how xfs_perag_get() is intended to be used...
>
> > diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> > index d05c9217c3af..edcdd4fbc225 100644
> > --- a/fs/xfs/libxfs/xfs_ag.h
> > +++ b/fs/xfs/libxfs/xfs_ag.h
> > @@ -116,34 +116,30 @@ void xfs_perag_put(struct xfs_perag *pag);
> >
> > /*
> > * Perag iteration APIs
> > - *
> > - * XXX: for_each_perag_range() usage really needs an iterator to clean up when
> > - * we terminate at end_agno because we may have taken a reference to the perag
> > - * beyond end_agno. Right now callers have to be careful to catch and clean that
> > - * up themselves. This is not necessary for the callers of for_each_perag() and
> > - * for_each_perag_from() because they terminate at sb_agcount where there are
> > - * no perag structures in tree beyond end_agno.
>
> We still really need an iterator for the range iterations so that we
> can have a consistent set of behaviours for all iterations and
> don't need a special case just for the "mid walk break" where the
> code keeps the active reference to the perag for itself...
>
Ok, but what exactly are you referring to by "an iterator" beyond what
we have here to this point? A walker function with a callback or
something? And why wouldn't we have done that in the first place instead
of introducing the API wart documented above?
> > */
> > static inline
> > struct xfs_perag *xfs_perag_next(
> > struct xfs_perag *pag,
> > - xfs_agnumber_t *agno)
> > + xfs_agnumber_t *agno,
> > + xfs_agnumber_t end_agno)
> > {
> > struct xfs_mount *mp = pag->pag_mount;
> >
> > *agno = pag->pag_agno + 1;
> > xfs_perag_put(pag);
> > - pag = xfs_perag_get(mp, *agno);
> > + pag = NULL;
> > + if (*agno <= end_agno)
> > + pag = xfs_perag_get(mp, *agno);
> > return pag;
>
> *agno = pag->pag_agno + 1;
> xfs_perag_put(pag);
> if (*agno > end_agno)
> return NULL;
> return xfs_perag_get(mp, *agno);
>
Will fix.
> > }
> >
> > #define for_each_perag_range(mp, agno, end_agno, pag) \
> > for ((pag) = xfs_perag_get((mp), (agno)); \
> > - (pag) != NULL && (agno) <= (end_agno); \
> > - (pag) = xfs_perag_next((pag), &(agno)))
> > + (pag) != NULL; \
> > + (pag) = xfs_perag_next((pag), &(agno), (end_agno)))
> >
> > #define for_each_perag_from(mp, agno, pag) \
> > - for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount, (pag))
> > + for_each_perag_range((mp), (agno), (mp)->m_sb.sb_agcount - 1, (pag))
>
> Isn't this one line the entire bug fix right here? i.e. the
> factoring is largely unnecessary, the grow race bug is fixed by just
> this one-liner?
>
No, the reference count problems can still occur regardless of this
particular change.
...
>
> > The following assert failure occasionally triggers during the xfs_perag
> > free path on unmount, presumably because one of the many
> > for_each_perag() loops in the code that is expected to terminate with a
> > NULL pag raced with a growfs and actually terminated with a non-NULL
> > reference to post-eofs (at the time) perag.
> >
> > XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/libxfs/xfs_ag.c, line: 195
> >
> > Rework the lower level perag iteration logic to explicitly terminate
> > on the specified end agno, not implicitly rely on pag == NULL as a
> > termination clause and thus avoid these problems.
>
> IMO, this just hides the symptom that results from code that isn't
> handling unexpected adverse loop termination correctly. The
> iterators are going to get more complex in the near future, so we
> really need them to have a robust iterator API that does all the
> cleanup work correctly, rather than try to hide it all in a
> increasingly complex for loop construct.
>
Any loop that uses one of the sb_agcount based iteration macros in a
context that can race with growfs and doesn't check pag != NULL post
loop is not handling loop termination correctly. The sb_agcount check
effectively builds in an early termination vector to every such usage,
because we can't guarantee that pag == NULL when the sb_agcount check
causes loop termination.
IOW, the following usage pattern documented in the comment above is not
universally correct/safe:
for_each_perag(...) {
/* no early termination */
}
/* no perag check because no early termination */
... because for_each_perag() is not implemented correctly to guarantee
that pag == NULL on exit of the loop. This is a simple logic bug with a
simple fix.
Brian
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2021-10-08 14:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-07 12:50 [PATCH 0/3] xfs: fix perag iteration raciness Brian Foster
2021-10-07 12:50 ` [PATCH 1/3] xfs: fold perag loop iteration logic into helper function Brian Foster
2021-10-07 12:50 ` [PATCH 2/3] xfs: rename the next_agno perag iteration variable Brian Foster
2021-10-07 12:50 ` [PATCH 3/3] xfs: terminate perag iteration reliably on end agno Brian Foster
2021-10-07 23:02 ` Dave Chinner
2021-10-08 14:45 ` Brian Foster [this message]
2021-10-08 21:20 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YWBZef87p55+XKNh@bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.