From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
linux-xfs@vger.kernel.org, Eric Sandeen <sandeen@redhat.com>
Subject: Re: [PATCH RFC] xfs: convert between packed and unpacked agfls on-demand
Date: Tue, 13 Mar 2018 08:14:55 +1100 [thread overview]
Message-ID: <20180312211455.GY18129@dastard> (raw)
In-Reply-To: <20180312173552.GA30332@bfoster.bfoster>
On Mon, Mar 12, 2018 at 01:35:53PM -0400, Brian Foster wrote:
> On Mon, Mar 12, 2018 at 09:11:58AM -0400, Brian Foster wrote:
> > On Fri, Mar 09, 2018 at 02:37:57PM -0800, Darrick J. Wong wrote:
> > > On Fri, Mar 09, 2018 at 04:20:31PM -0500, Brian Foster wrote:
> > > > On Fri, Mar 09, 2018 at 11:08:32AM -0800, Darrick J. Wong wrote:
> > > > > On Fri, Mar 09, 2018 at 01:37:28PM -0500, Brian Foster wrote:
> > > > > > On Fri, Mar 09, 2018 at 09:33:18AM -0800, Darrick J. Wong wrote:
> ...
> > > <nod> I've been mulling over rewriting your previous rfc patch that put
> > > all the scanning/lifting in {get,put}_freelist but having it reset the
> > > agfl instead of doing the swizzling stuff.
> > >
> >
> > Something to be careful of... emptying the agfl obviously means the
> > subsequent fixup is a btree block allocation. That may limit the context
> > of where the fixup can be performed. IOW, deferring it to
> > _get_freelist() might no longer be safe. Instead, I think we'd have to
> > implement it such that the on-disk flcount is completely untrusted when
> > the mismatch is detected.
> >
> ...
>
> Here's a variant of that patch that does a reset. It's definitely much
> simpler. Thoughts?
I like it - it is so much simpler than the other proposals, and it's
done on demand rather than by a brute-force mount/unmount scan.
> Brian
>
> --- 8< ---
>
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index c02781a4c091..7d313bb4677d 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -2053,6 +2053,59 @@ xfs_alloc_space_available(
> return true;
> }
>
> +static bool
> +xfs_agf_verify_flcount(
> + struct xfs_mount *mp,
> + struct xfs_agf *agf)
Needs a comment explaining what it's checking and the return value.
Might actually read better as "xfs_agf_need_flcount_reset()"
returning true if fixup is required, especially at the caller
site...
> +{
> + int f = be32_to_cpu(agf->agf_flfirst);
> + int l = be32_to_cpu(agf->agf_fllast);
> + int c = be32_to_cpu(agf->agf_flcount);
These should probably be uint32_t - they are unsigned on disk.
> + int active = c;
> + int agfl_size = XFS_AGFL_SIZE(mp);
> +
> + if (!xfs_sb_version_hascrc(&mp->m_sb))
> + return true;
> +
> + if (c && l >= f)
> + active = l - f + 1;
> + else if (c)
> + active = agfl_size - f + l + 1;
else
active = 0;
To move all the initialisation of active into the one logic
statement and not make it dependent on the original value in the
AGF?
> + if (active != c)
> + return false;
> + if (f >= agfl_size || l >= agfl_size)
> + return false;
Shouldn't we be range checking these first? Also should probably
checking c the same way.
> +
> + return true;
> +}
> +
> +static void
> +xfs_agfl_reset(
> + struct xfs_trans *tp,
> + struct xfs_buf *agbp,
> + struct xfs_perag *pag)
> +{
> + struct xfs_mount *mp = tp->t_mountp;
> + struct xfs_agf *agf = XFS_BUF_TO_AGF(agbp);
> +
> + if (!pag->pagf_needreset)
> + return;
> +
> + trace_xfs_agfl_reset(pag);
> + xfs_warn(mp, "agfl reset agno %u flcount %d", pag->pag_agno,
> + pag->pagf_flcount);
> +
> + agf->agf_flfirst = 0;
> + agf->agf_fllast = cpu_to_be32(XFS_AGFL_SIZE(mp) - 1);
> + agf->agf_flcount = 0;
> + xfs_alloc_log_agf(tp, agbp, XFS_AGF_FLFIRST | XFS_AGF_FLLAST |
> + XFS_AGF_FLCOUNT);
> +
> + pag->pagf_flcount = 0;
> + pag->pagf_needreset = false;
> +}
> +
> /*
> * Decide whether to use this allocation group for this allocation.
> * If so, fix up the btree freelist's size.
> @@ -2119,6 +2172,9 @@ xfs_alloc_fix_freelist(
> if (!xfs_alloc_space_available(args, need, flags))
> goto out_agbp_relse;
>
> + if (pag->pagf_needreset)
> + xfs_agfl_reset(tp, agbp, pag);
> +
> /*
> * Make the freelist shorter if it's too long.
> *
> @@ -2588,6 +2644,7 @@ xfs_alloc_read_agf(
> pag->pagb_count = 0;
> pag->pagb_tree = RB_ROOT;
> pag->pagf_init = 1;
> + pag->pagf_needreset = !xfs_agf_verify_flcount(mp, agf);
> }
> #ifdef DEBUG
> else if (!XFS_FORCED_SHUTDOWN(mp)) {
> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index e0792d036be2..5dd36920b7d6 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -353,6 +353,7 @@ typedef struct xfs_perag {
> char pagi_inodeok; /* The agi is ok for inodes */
> uint8_t pagf_levels[XFS_BTNUM_AGF];
> /* # of levels in bno & cnt btree */
> + bool pagf_needreset;
> uint32_t pagf_flcount; /* count of blocks in freelist */
> xfs_extlen_t pagf_freeblks; /* total free blocks */
> xfs_extlen_t pagf_longest; /* longest free space */
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index 945de08af7ba..678d602dc400 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -3339,6 +3339,24 @@ TRACE_EVENT(xfs_trans_resv_calc,
> __entry->logflags)
> );
>
> +TRACE_EVENT(xfs_agfl_reset,
> + TP_PROTO(struct xfs_perag *pag),
> + TP_ARGS(pag),
> + TP_STRUCT__entry(
> + __field(dev_t, dev)
> + __field(xfs_agnumber_t, agno)
> + __field(int, flcount)
> + ),
> + TP_fast_assign(
> + __entry->dev = pag->pag_mount->m_super->s_dev;
> + __entry->agno = pag->pag_agno;
> + __entry->flcount = pag->pagf_flcount;
> + ),
> + TP_printk("dev %d:%d agno %u flcount %d",
> + MAJOR(__entry->dev), MINOR(__entry->dev),
> + __entry->agno, __entry->flcount)
> +);
Do we need a new tracepoint? Wouldn't it be better to just
call trace_xfs_agf() which will dump all the information in the AGF
just before we do the reset? We'll know it's a reset from the caller
information that is dumped by that tracepoint....
> +
> #endif /* _TRACE_XFS_H */
>
> #undef TRACE_INCLUDE_PATH
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-03-12 21:14 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-07 19:24 [PATCH RFC] xfs: convert between packed and unpacked agfls on-demand Brian Foster
2018-03-08 14:03 ` Carlos Maiolino
2018-03-08 14:15 ` Carlos Maiolino
2018-03-09 13:16 ` Brian Foster
2018-03-09 17:33 ` Darrick J. Wong
2018-03-09 18:37 ` Brian Foster
2018-03-09 19:08 ` Darrick J. Wong
2018-03-09 21:20 ` Brian Foster
2018-03-09 22:37 ` Darrick J. Wong
2018-03-12 13:11 ` Brian Foster
2018-03-12 17:35 ` Brian Foster
2018-03-12 21:14 ` Dave Chinner [this message]
2018-03-13 11:27 ` Brian Foster
2018-03-14 3:07 ` Dave Chiluk
2018-03-14 11:02 ` Brian Foster
2018-03-14 15:28 ` Darrick J. Wong
2018-03-09 22:10 ` Dave Chinner
2018-03-12 13:26 ` Brian Foster
2018-03-12 13:29 ` Carlos Maiolino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180312211455.GY18129@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=sandeen@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).