From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2 18/21] xfs: cross-reference the rmapbt data with the refcountbt
Date: Tue, 16 Jan 2018 11:47:04 -0800 [thread overview]
Message-ID: <20180116194704.GT5602@magnolia> (raw)
In-Reply-To: <20180116064955.GI5602@magnolia>
On Mon, Jan 15, 2018 at 10:49:55PM -0800, Darrick J. Wong wrote:
> On Tue, Jan 16, 2018 at 10:49:28AM +1100, Dave Chinner wrote:
> > On Tue, Jan 09, 2018 at 01:25:17PM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > Cross reference the refcount data with the rmap data to check that the
> > > number of rmaps for a given block match the refcount of that block, and
> > > that CoW blocks (which are owned entirely by the refcountbt) are tracked
> > > as well.
> > >
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > v2: streamline scrubber arguments, remove stack allocated objects
> > > ---
> > > fs/xfs/scrub/refcount.c | 318 +++++++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 316 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c
> > > index 700f8f1..df18e47 100644
> > > --- a/fs/xfs/scrub/refcount.c
> > > +++ b/fs/xfs/scrub/refcount.c
> > > @@ -50,6 +50,274 @@ xfs_scrub_setup_ag_refcountbt(
> > >
> > > /* Reference count btree scrubber. */
> > >
> > > +/*
> > > + * Confirming Reference Counts via Reverse Mappings
> > > + *
> > > + * We want to count the reverse mappings overlapping a refcount record
> > > + * (bno, len, refcount), allowing for the possibility that some of the
> > > + * overlap may come from smaller adjoining reverse mappings, while some
> > > + * comes from single extents which overlap the range entirely. The
> > > + * outer loop is as follows:
> > > + *
> > > + * 1. For all reverse mappings overlapping the refcount extent,
> > > + * a. If a given rmap completely overlaps, mark it as seen.
> > > + * b. Otherwise, record the fragment for later processing.
> > > + *
> > > + * Once we've seen all the rmaps, we know that for all blocks in the
> > > + * refcount record we want to find $refcount owners and we've already
> > > + * visited $seen extents that overlap all the blocks. Therefore, we
> > > + * need to find ($refcount - $seen) owners for every block in the
> > > + * extent; call that quantity $target_nr. Proceed as follows:
> > > + *
> > > + * 2. Pull the first $target_nr fragments from the list; all of them
> > > + * should start at or before the start of the extent.
> > > + * Call this subset of fragments the working set.
> > > + * 3. Until there are no more unprocessed fragments,
> > > + * a. Find the shortest fragments in the set and remove them.
> > > + * b. Note the block number of the end of these fragments.
> > > + * c. Pull the same number of fragments from the list. All of these
> > > + * fragments should start at the block number recorded in the
> > > + * previous step.
> > > + * d. Put those fragments in the set.
> > > + * 4. Check that there are $target_nr fragments remaining in the list,
> > > + * and that they all end at or beyond the end of the refcount extent.
> > > + *
> > > + * If the refcount is correct, all the check conditions in the algorithm
> > > + * should always hold true. If not, the refcount is incorrect.
> >
> > This needs a comment somewhere in here describing the order of
> > the records on the fragment list. AFAICT, it's ordered by start
> > bno, but I'm not 100% sure and it seems the code is dependent on
> > strict ordering of records the frag list....
>
> Yes, the list must be ordered by agbno, which should be the case if we
> iterated the rmap records in order. We /could/ potentially list_sort
> to ensure that this code doesn't blow up even if the rmapbt decides to
> feed us out of order garbage.
>
> Could? Should.
Hm. In theory the rmap should never feed us out of order rmaps, but if
it does that's an xref corruption, so we can employ a computationally
less expensive order check (over list_sort) and bail out if the list
is out of order.
> > .....
> >
> > > + } else {
> > > + /*
> > > + * This rmap covers only part of the refcount record, so
> > > + * save the fragment for later processing.
> > > + */
> > > + frag = kmem_alloc(sizeof(struct xfs_scrub_refcnt_frag),
> > > + KM_MAYFAIL | KM_NOFS);
> > > + if (!frag)
> > > + return -ENOMEM;
> > > + memcpy(&frag->rm, rec, sizeof(frag->rm));
> > > + list_add_tail(&frag->list, &refchk->fragments);
> >
> > I'm making the assumption here that we're seeing records in the
> > order they are in the rmap tree and that it's in increase startblock
> > order, hence the list is ordered that way....
> >
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/*
> > > + * Given a bunch of rmap fragments, iterate through them, keeping
> > > + * a running tally of the refcount. If this ever deviates from
> > > + * what we expect (which is the refcountbt's refcount minus the
> > > + * number of extents that totally covered the refcountbt extent),
> > > + * we have a refcountbt error.
> > > + */
> > > +STATIC void
> > > +xfs_scrub_refcountbt_process_rmap_fragments(
> > > + struct xfs_scrub_refcnt_check *refchk)
> > > +{
> > > + struct list_head worklist;
> > > + struct xfs_scrub_refcnt_frag *frag;
> > > + struct xfs_scrub_refcnt_frag *n;
> > > + xfs_agblock_t bno;
> > > + xfs_agblock_t rbno;
> > > + xfs_agblock_t next_rbno;
> > > + xfs_nlink_t nr;
> > > + xfs_nlink_t target_nr;
> > > +
> > > + target_nr = refchk->refcount - refchk->seen;
> > > + if (target_nr == 0)
> > > + return;
> > > +
> > > + /*
> > > + * There are (refchk->rc.rc_refcount - refchk->nr refcount)
> > > + * references we haven't found yet. Pull that many off the
> > > + * fragment list and figure out where the smallest rmap ends
> > > + * (and therefore the next rmap should start). All the rmaps
> > > + * we pull off should start at or before the beginning of the
> > > + * refcount record's range.
> > > + */
> > > + INIT_LIST_HEAD(&worklist);
> > > + rbno = NULLAGBLOCK;
> > > + nr = 1;
> > > +
> > > + /* Find all the rmaps that start at or before the refc extent. */
> > > + list_for_each_entry_safe(frag, n, &refchk->fragments, list) {
> > > + if (frag->rm.rm_startblock > refchk->bno)
> > > + goto done;
> >
> > .... and is where the code implies lowest to highest startblock
> > ordering on the frag list.
> >
> > > + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount;
> > > + if (rbno > bno)
> > > + rbno = bno;
> >
> > Can we put that check the other way around? we're looking for the
> > shortest/smallest end block, so
> >
> > if (bno < rbno)
> > rbno = bno;
> >
> > Makes a lot more sense to me.
>
> Ok.
>
> >
> > > + list_del(&frag->list);
> > > + list_add_tail(&frag->list, &worklist);
> >
> > list_move_tail()?
> >
> > Ok, so we are moving fragments that start before the recount bno to
> > the work list.
>
> <nod>
>
> > > + if (nr == target_nr)
> > > + break;
> > > + nr++;
> > > + }
> > > +
> > > + /*
> > > + * We should have found exactly $target_nr rmap fragments starting
> > > + * at or before the refcount extent.
> > > + */
> > > + if (nr != target_nr)
> > > + goto done;
> >
> > ok. so on error we clean up and free the frag list and work list....
> >
> > > +
> > > + while (!list_empty(&refchk->fragments)) {
> > > + /* Discard any fragments ending at rbno. */
> > > + nr = 0;
> > > + next_rbno = NULLAGBLOCK;
> > > + list_for_each_entry_safe(frag, n, &worklist, list) {
> >
> > Ok, this needs to be clearer than it's walking the working set
> > of fragments. I had to read this code several times before I worked
> > out this is where the "working set" was being processed....
>
> /* Walk the working set of rmap fragments... */
Correction: just update the comment to:
/* Discard any fragments ending at rbno from the worklist. */
--D
>
> >
> > > + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount;
> > > + if (bno != rbno) {
> > > + if (next_rbno > bno)
> > > + next_rbno = bno;
> >
> > Same comment here about being a next_rbno "smallest bno" variable.
>
> <nod>
>
> > > + continue;
> > > + }
> > > + list_del(&frag->list);
> > > + kmem_free(frag);
> > > + nr++;
> > > + }
> > > +
> > > + /* Empty list? We're done. */
> > > + if (list_empty(&refchk->fragments))
> > > + break;
> > > +
> > > + /* Try to add nr rmaps starting at rbno to the worklist. */
> > > + list_for_each_entry_safe(frag, n, &refchk->fragments, list) {
> > > + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount;
> > > + if (frag->rm.rm_startblock != rbno)
> > > + goto done;
> >
> > definitely assuming the frag list is ordered here :P
> >
> > > + list_del(&frag->list);
> > > + list_add_tail(&frag->list, &worklist);
> > > + if (next_rbno > bno)
> > > + next_rbno = bno;
> > > + nr--;
> > > + if (nr == 0)
> > > + break;
> > > + }
> >
> > Ok, so if we get here with nr > 0, then we must have emptied the
> > fragment list onto the work list, right? At this point, the outer
> > loop will terminate. Don't we need to run the worklist processing
> > loop one last time?
>
> Nope. Throughout xfs_scrub_refcountbt_process_rmap_fragments, we're
> checking that the number of rmaps for a given refcount extent (i.e.
> target_nr) remains the same. nr is the number of rmaps we discarded
> from the worklist at the top of the loop, so if we can't add the exact
> same number of rmaps back to the worklist then we know that the refcount
> is wrong.
>
> So there /is/ a bug here, and the bug is that we need "if (nr) break;"
> because we don't need to process more of the loop, we already know the
> refcountbt is broken.
>
> > > + rbno = next_rbno;
> > > + }
> >
> > .....
> > > +/* Use the rmap entries covering this extent to verify the refcount. */
> > > +STATIC void
> > > +xfs_scrub_refcountbt_xref_rmap(
> > > + struct xfs_scrub_context *sc,
> > > + xfs_agblock_t bno,
> > > + xfs_extlen_t len,
> > > + xfs_nlink_t refcount)
> > > +{
> > > + struct xfs_scrub_refcnt_check refchk = {
> > > + .sc = sc,
> > > + .bno = bno,
> > > + .len = len,
> > > + .refcount = refcount,
> > > + .seen = 0,
> > > + };
> > > + struct xfs_rmap_irec low;
> > > + struct xfs_rmap_irec high;
> > > + struct xfs_scrub_refcnt_frag *frag;
> > > + struct xfs_scrub_refcnt_frag *n;
> > > + int error;
> > > +
> > > + if (!sc->sa.rmap_cur)
> > > + return;
> > > +
> > > + /* Cross-reference with the rmapbt to confirm the refcount. */
> > > + memset(&low, 0, sizeof(low));
> > > + low.rm_startblock = bno;
> > > + memset(&high, 0xFF, sizeof(high));
> > > + high.rm_startblock = bno + len - 1;
> >
> > This range query init feels like a familiar pattern now. Helper
> > function (separate patch)?
>
> Yeah, these can all get cleaned up at the end of the series.
>
> > ....
> >
> > > +/* Make sure we have as many refc blocks as the rmap says. */
> > > +STATIC void
> > > +xfs_scrub_refcount_xref_rmap(
> > > + struct xfs_scrub_context *sc,
> > > + struct xfs_owner_info *oinfo,
> > > + xfs_filblks_t cow_blocks)
> > > +{
> > > + xfs_extlen_t refcbt_blocks = 0;
> > > + xfs_filblks_t blocks;
> > > + int error;
> > > +
> > > + /* Check that we saw as many refcbt blocks as the rmap knows about. */
> > > + error = xfs_btree_count_blocks(sc->sa.refc_cur, &refcbt_blocks);
> > > + if (!xfs_scrub_btree_process_error(sc, sc->sa.refc_cur, 0, &error))
> > > + return;
> > > + error = xfs_scrub_count_rmap_ownedby_ag(sc, sc->sa.rmap_cur, oinfo,
> > > + &blocks);
> > > + if (xfs_scrub_should_check_xref(sc, &error, &sc->sa.rmap_cur) &&
> > > + blocks != refcbt_blocks)
> > > + xfs_scrub_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0);
> > > +
> > > + if (!sc->sa.rmap_cur)
> > > + return;
> > > +
> > > + /* Check that we saw as many cow blocks as the rmap knows about. */
> > > + xfs_rmap_ag_owner(oinfo, XFS_RMAP_OWN_COW);
> > > + error = xfs_scrub_count_rmap_ownedby_ag(sc, sc->sa.rmap_cur, oinfo,
> > > + &blocks);
> > > + if (xfs_scrub_should_check_xref(sc, &error, &sc->sa.rmap_cur) &&
> > > + blocks != cow_blocks)
> > > + xfs_scrub_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0);
> >
> > Bit of a landmine that this code changes the owner info structure
> > that was passed in....
>
> Will fix.
>
> > > +}
> > > +
> > > /* Scrub the refcount btree for some AG. */
> > > int
> > > xfs_scrub_refcountbt(
> > > struct xfs_scrub_context *sc)
> > > {
> > > struct xfs_owner_info oinfo;
> > > + xfs_agblock_t cow_blocks = 0;
> > > + int error;
> > >
> > > xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_REFC);
> > > - return xfs_scrub_btree(sc, sc->sa.refc_cur, xfs_scrub_refcountbt_rec,
> > > - &oinfo, NULL);
> > > + error = xfs_scrub_btree(sc, sc->sa.refc_cur, xfs_scrub_refcountbt_rec,
> > > + &oinfo, &cow_blocks);
> > > + if (error)
> > > + return error;
> > > +
> > > + if (sc->sa.rmap_cur)
> > > + xfs_scrub_refcount_xref_rmap(sc, &oinfo, cow_blocks);
> >
> > .... because that's not obvious in this code here if we add anymore
> > code after this call.
>
> <nod>
>
> > > +
> > > + return error;
> >
> > error is zero here, so "return 0" instead?
>
> Ok.
>
> --D
>
> >
> > Cheers,
> >
> > Dave.
> >
> > --
> > Dave Chinner
> > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-01-16 19:52 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-23 0:42 [PATCH v11 00/21] xfs: online scrub xref support Darrick J. Wong
2017-12-23 0:42 ` [PATCH 01/21] xfs: ignore agfl read errors when not scrubbing agfl Darrick J. Wong
2018-01-05 1:12 ` Dave Chinner
2017-12-23 0:43 ` [PATCH 02/21] xfs: catch a few more error codes when scrubbing secondary sb Darrick J. Wong
2018-01-05 1:17 ` Dave Chinner
2018-01-05 1:24 ` Darrick J. Wong
2018-01-05 2:10 ` Dave Chinner
2017-12-23 0:43 ` [PATCH 03/21] xfs: xfs_scrub_bmap should use for_each_xfs_iext Darrick J. Wong
2018-01-05 1:17 ` Dave Chinner
2017-12-23 0:43 ` [PATCH 04/21] xfs: always grab transaction when scrubbing inode Darrick J. Wong
2018-01-05 1:18 ` Dave Chinner
2017-12-23 0:43 ` [PATCH 05/21] xfs: distinguish between corrupt inode and invalid inum in xfs_scrub_get_inode Darrick J. Wong
2018-01-05 1:23 ` Dave Chinner
2017-12-23 0:43 ` [PATCH 06/21] xfs: add scrub cross-referencing helpers for the free space btrees Darrick J. Wong
2018-01-05 1:29 ` Dave Chinner
2017-12-23 0:43 ` [PATCH 07/21] xfs: add scrub cross-referencing helpers for the inode btrees Darrick J. Wong
2018-01-05 1:36 ` Dave Chinner
2018-01-05 2:19 ` Darrick J. Wong
2018-01-05 21:51 ` [PATCH v2 " Darrick J. Wong
2018-01-16 23:05 ` Darrick J. Wong
2018-01-17 0:36 ` Dave Chinner
2017-12-23 0:43 ` [PATCH 08/21] xfs: add scrub cross-referencing helpers for the rmap btrees Darrick J. Wong
2018-01-05 1:40 ` Dave Chinner
2018-01-05 2:49 ` Darrick J. Wong
2018-01-05 3:38 ` Dave Chinner
2018-01-05 21:53 ` [PATCH v2 " Darrick J. Wong
2018-01-06 20:46 ` Dave Chinner
2017-12-23 0:43 ` [PATCH 09/21] xfs: add scrub cross-referencing helpers for the refcount btrees Darrick J. Wong
2018-01-05 1:41 ` Dave Chinner
2017-12-23 0:43 ` [PATCH 10/21] xfs: set up scrub cross-referencing helpers Darrick J. Wong
2018-01-05 2:08 ` Dave Chinner
2018-01-05 3:05 ` Darrick J. Wong
2018-01-05 21:54 ` [PATCH v2 " Darrick J. Wong
2018-01-16 23:06 ` Darrick J. Wong
2018-01-17 0:41 ` Dave Chinner
2017-12-23 0:44 ` [PATCH 11/21] xfs: fix a few erroneous process_error calls in the scrubbers Darrick J. Wong
2018-01-05 2:11 ` Dave Chinner
2017-12-23 0:44 ` [PATCH 12/21] xfs: check btree block ownership with bnobt/rmapbt when scrubbing btree Darrick J. Wong
2018-01-05 2:24 ` Dave Chinner
2018-01-05 2:53 ` Darrick J. Wong
2018-01-05 3:39 ` Dave Chinner
2017-12-23 0:44 ` [PATCH 13/21] xfs: introduce scrubber cross-referencing stubs Darrick J. Wong
2018-01-08 23:36 ` Dave Chinner
2018-01-08 23:59 ` Darrick J. Wong
2018-01-09 21:00 ` [PATCH v2 " Darrick J. Wong
2018-01-10 0:12 ` Dave Chinner
2017-12-23 0:44 ` [PATCH 14/21] xfs: cross-reference with the bnobt Darrick J. Wong
2018-01-08 23:51 ` Dave Chinner
2018-01-09 0:34 ` Darrick J. Wong
2018-01-09 0:57 ` Dave Chinner
2018-01-09 21:15 ` [PATCH v2 " Darrick J. Wong
2018-01-10 0:15 ` Dave Chinner
2017-12-23 0:44 ` [PATCH 15/21] xfs: cross-reference bnobt records with cntbt Darrick J. Wong
2018-01-08 23:55 ` Dave Chinner
2018-01-09 0:37 ` Darrick J. Wong
2018-01-09 21:20 ` [PATCH v2 " Darrick J. Wong
2018-01-10 0:19 ` Dave Chinner
2017-12-23 0:44 ` [PATCH 16/21] xfs: cross-reference inode btrees during scrub Darrick J. Wong
2018-01-09 21:22 ` [PATCH v2 " Darrick J. Wong
2018-01-15 22:17 ` Dave Chinner
2018-01-16 6:30 ` Darrick J. Wong
2018-01-16 23:23 ` [PATCH v3 " Darrick J. Wong
2018-01-17 0:44 ` Dave Chinner
2017-12-23 0:44 ` [PATCH 17/21] xfs: cross-reference reverse-mapping btree Darrick J. Wong
2018-01-09 21:24 ` [PATCH v2 " Darrick J. Wong
2018-01-15 23:04 ` Dave Chinner
2018-01-16 6:38 ` Darrick J. Wong
2018-01-16 23:25 ` [PATCH v3 " Darrick J. Wong
2018-01-17 0:52 ` Dave Chinner
2017-12-23 0:44 ` [PATCH 18/21] xfs: cross-reference the rmapbt data with the refcountbt Darrick J. Wong
2018-01-09 21:25 ` [PATCH v2 " Darrick J. Wong
2018-01-15 23:49 ` Dave Chinner
2018-01-16 6:49 ` Darrick J. Wong
2018-01-16 19:47 ` Darrick J. Wong [this message]
2018-01-16 23:26 ` [PATCH v3 " Darrick J. Wong
2018-01-17 1:00 ` Dave Chinner
2018-01-17 1:11 ` Darrick J. Wong
2017-12-23 0:44 ` [PATCH 19/21] xfs: cross-reference refcount btree during scrub Darrick J. Wong
2018-01-09 21:25 ` [PATCH v2 " Darrick J. Wong
2018-01-16 2:44 ` Dave Chinner
2018-01-16 6:52 ` Darrick J. Wong
2018-01-16 20:26 ` Darrick J. Wong
2018-01-16 23:27 ` [PATCH v3 " Darrick J. Wong
2018-01-17 1:02 ` Dave Chinner
2017-12-23 0:44 ` [PATCH 20/21] xfs: cross-reference the realtime bitmap Darrick J. Wong
2018-01-09 21:26 ` [PATCH v2 " Darrick J. Wong
2018-01-16 2:57 ` Dave Chinner
2018-01-16 6:55 ` Darrick J. Wong
2018-01-16 23:27 ` [PATCH v3 " Darrick J. Wong
2018-01-17 1:03 ` Dave Chinner
2017-12-23 0:45 ` [PATCH 21/21] xfs: cross-reference the block mappings when possible Darrick J. Wong
2018-01-09 21:26 ` [PATCH v2 " Darrick J. Wong
2018-01-16 2:58 ` Dave Chinner
2018-01-16 6:55 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180116194704.GT5602@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).