Re: [PATCH 5/5] xfs: fix agfl wrapping

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org, djwong@kernel.org
Subject: Re: [PATCH 5/5] xfs: fix agfl wrapping
Date: Mon, 5 Mar 2018 18:15:39 -0800	[thread overview]
Message-ID: <20180306021539.GF18989@magnolia> (raw)
In-Reply-To: <20180305222430.GE18989@magnolia>

On Mon, Mar 05, 2018 at 02:24:30PM -0800, Darrick J. Wong wrote:
> On Sat, Mar 03, 2018 at 08:59:50AM -0500, Brian Foster wrote:
> > On Fri, Mar 02, 2018 at 08:12:07AM -0500, Brian Foster wrote:
> > > On Thu, Mar 01, 2018 at 12:55:41PM -0800, Darrick J. Wong wrote:
> > > > On Thu, Mar 01, 2018 at 12:28:33PM -0500, Brian Foster wrote:
> > > > > On Wed, Feb 28, 2018 at 03:20:32PM -0800, Darrick J. Wong wrote:
> > > > > > On Wed, Feb 28, 2018 at 05:43:51PM -0500, Brian Foster wrote:
> > > > > > > On Tue, Feb 27, 2018 at 01:03:13PM -0800, Darrick J. Wong wrote:
> > > > > > > > On Tue, Feb 27, 2018 at 02:35:49PM -0500, Brian Foster wrote:
> > > > > > > > > On Thu, Feb 22, 2018 at 06:00:15PM -0800, Darrick J. Wong wrote:
> > > > > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > ...
> > > > Going forward, I want the number of unpacked kernels to decrease as
> > > > quickly as possible.  I understand that distro kernel maintainers are
> > > > not willing to apply the packing patch to their kernel until we come up
> > > > with a smooth transition path.
> > > > 
> > > 
> > > I agree wrt to upstream, but note that I don't anticipate including the
> > > padding fix downstream any time soon.
> > > 
> > > > I don't want to support fixing agfls to be 118 units long on 64-bit
> > > > unpacked kernels and 119 units long on 32-bit unpacked kernels, and I
> > > > only want to support the packed kernels with their 119 unit long agfls.
> > > > An AGFL that starts at 0 and ends at flcount-1 is compatible with packed
> > > > and unpacked kernels, so the v2 patch I sent last night removes the
> > > > delicate per-case surgery in favor of a new strategy where the mount
> > > > time and unmount time helpers both look for agfl configurationss that
> > > > are known to cause problems, and solves them all with the same solution:
> > > > moving the agfl list towards the start of the block.
> > > > 
> > > 
> > > The purpose of the patch I sent was not for upstream unpacked support
> > > going forward. Upstream has clearly moved forward with the packed
> > > format. The goal of the patch was to explore a single/generic patch that
> > > could be merged upstream/downstream and handle compatibility cleanly.
> > > 
> > 
> > FWIW, here's a new variant of a bidirectional fixup. It's refactored and
> > polished up a bit into a patch. It basically inspects the agf when first
> > read for any evidence that the on-disk fields reflect a size mismatch
> > with the current kernel and sets a flag if so. agfl gets/puts check the
> > flag and thus the first transaction that attempts to modify a mismatched
> > agfl swaps a block into or out of the gap slot appropriately.
> > 
> > This avoids the need for any new transactions or mount time scan and the
> > (downstream motivated) packed -> unpacked case is only 10 or so lines of
> > additional code. Only spot tested, but I _think_ it covers all of the
> > cases. Hm?
> 
> Just from a quick glance this looks like a reasonable way to fix the
> agfl wrapping to whatever the running kernel expects.  I tried feeding
> it to the xfstest I wrote to exercise my agfl fixer[1], but even with
> changing the test to fill the fs to enospc and delete everything I
> couldn't get it to trigger reliably.
> 
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/commit/?h=djwong-experimental&id=f085bb09c839da69daf921da33f5d13c80c9f165

Ok, I got it to trigger by updating that xfstest to fragment the free
space after mounting.  According to the test results on 4.16-rc4 it
fixes the deliberately tweaked agfls to work without complaint, so could
you please write this up as a proper patch?

--D

> 
> --D
> 
> > Brian
> > 
> > --- 8< ---
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index c02781a4c091..1cbf80d8481f 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -2053,6 +2053,115 @@ xfs_alloc_space_available(
> >  	return true;
> >  }
> >  
> > +static int
> > +xfs_agfl_ondisk_size(
> > +	struct xfs_mount	*mp,
> > +	int			first,
> > +	int			last,
> > +	int			count)
> > +{
> > +	int			active = count;
> > +	int			agfl_size = XFS_AGFL_SIZE(mp);
> > +
> > +	if (count && last >= first)
> > +		active = last - first + 1;
> > +	else if (count)
> > +		active = agfl_size - first + last + 1;
> > +
> > +	if (active == count + 1)
> > +		return agfl_size - 1;
> > +	if (active == count - 1 || first == agfl_size || last == agfl_size)
> > +		return agfl_size + 1;
> > +
> > +	ASSERT(active == count);
> > +	return agfl_size;
> > +}
> > +
> > +static bool
> > +xfs_agfl_need_padfix(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_agf		*agf)
> > +{
> > +	int			f = be32_to_cpu(agf->agf_flfirst);
> > +	int			l = be32_to_cpu(agf->agf_fllast);
> > +	int			c = be32_to_cpu(agf->agf_flcount);
> > +
> > +	return xfs_agfl_ondisk_size(mp, f, l, c) != XFS_AGFL_SIZE(mp);
> > +}
> > +
> > +static int
> > +xfs_agfl_check_padfix(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_buf		*agbp,
> > +	struct xfs_buf		*agflbp,
> > +	struct xfs_perag	*pag)
> > +{
> > +	struct xfs_mount	*mp = tp->t_mountp;
> > +	struct xfs_agf		*agf = XFS_BUF_TO_AGF(agbp);
> > +	__be32			*agfl_bno = XFS_BUF_TO_AGFL_BNO(mp, agflbp);
> > +	int			agfl_size = XFS_AGFL_SIZE(mp);
> > +	int			ofirst, olast, osize;
> > +	int			nfirst, nlast;
> > +	int			logflags = 0;
> > +	int			startoff = 0;
> > +
> > +	if (!pag->pagf_needpadfix)
> > +		return 0;
> > +
> > +	ofirst = nfirst = be32_to_cpu(agf->agf_flfirst);
> > +	olast = nlast = be32_to_cpu(agf->agf_fllast);
> > +	osize = xfs_agfl_ondisk_size(mp, ofirst, olast, pag->pagf_flcount);
> > +
> > +	/*
> > +	 * If the on-disk agfl is smaller than what the kernel expects, the
> > +	 * last slot of the on-disk agfl is a gap with bogus data. Move the
> > +	 * first valid block into the gap and bump the pointer.
> > +	 */
> > +	if (osize < agfl_size) {
> > +		ASSERT(pag->pagf_flcount != 0);
> > +		agfl_bno[agfl_size - 1] = agfl_bno[ofirst];
> > +		startoff = (char *) &agfl_bno[agfl_size - 1] - (char *) agflbp->b_addr;
> > +		nfirst++;
> > +		goto done;
> > +	}
> > +
> > +	/*
> > +	 * Otherwise, the on-disk agfl is larger than what the current kernel
> > +	 * expects. If empty, just fix up the first and last pointers. If not,
> > +	 * move the inaccessible block to the end of the valid range.
> > +	 */
> > +	nfirst = do_mod(nfirst, agfl_size);
> > +	if (pag->pagf_flcount == 0) {
> > +		nlast = (nfirst == 0 ? agfl_size - 1 : nfirst - 1);
> > +		goto done;
> > +	}
> > +	if (nlast != agfl_size)
> > +		nlast++;
> > +	nlast = do_mod(nlast, agfl_size);
> > +	agfl_bno[nlast] = agfl_bno[osize - 1];
> > +	startoff = (char *) &agfl_bno[nlast] - (char *) agflbp->b_addr;
> > +
> > +done:
> > +	if (nfirst != ofirst) {
> > +		agf->agf_flfirst = cpu_to_be32(nfirst);
> > +		logflags |= XFS_AGF_FLFIRST;
> > +	}
> > +	if (nlast != olast) {
> > +		agf->agf_fllast = cpu_to_be32(nlast);
> > +		logflags |= XFS_AGF_FLLAST;
> > +	}
> > +	if (startoff) {
> > +		xfs_trans_buf_set_type(tp, agflbp, XFS_BLFT_AGFL_BUF);
> > +		xfs_trans_log_buf(tp, agflbp, startoff,
> > +				  startoff + sizeof(xfs_agblock_t) - 1);
> > +	}
> > +	if (logflags)
> > +		xfs_alloc_log_agf(tp, agbp, logflags);
> > +
> > +	pag->pagf_needpadfix = false;
> > +	return 0;
> > +}
> > +
> >  /*
> >   * Decide whether to use this allocation group for this allocation.
> >   * If so, fix up the btree freelist's size.
> > @@ -2258,6 +2367,12 @@ xfs_alloc_get_freelist(
> >  	if (error)
> >  		return error;
> >  
> > +	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > +	error = xfs_agfl_check_padfix(tp, agbp, agflbp, pag);
> > +	if (error) {
> > +		xfs_perag_put(pag);
> > +		return error;
> > +	}
> >  
> >  	/*
> >  	 * Get the block number and update the data structures.
> > @@ -2269,7 +2384,6 @@ xfs_alloc_get_freelist(
> >  	if (be32_to_cpu(agf->agf_flfirst) == XFS_AGFL_SIZE(mp))
> >  		agf->agf_flfirst = 0;
> >  
> > -	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> >  	be32_add_cpu(&agf->agf_flcount, -1);
> >  	xfs_trans_agflist_delta(tp, -1);
> >  	pag->pagf_flcount--;
> > @@ -2376,11 +2490,18 @@ xfs_alloc_put_freelist(
> >  	if (!agflbp && (error = xfs_alloc_read_agfl(mp, tp,
> >  			be32_to_cpu(agf->agf_seqno), &agflbp)))
> >  		return error;
> > +
> > +	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> > +	error = xfs_agfl_check_padfix(tp, agbp, agflbp, pag);
> > +	if (error) {
> > +		xfs_perag_put(pag);
> > +		return error;
> > +	}
> > +
> >  	be32_add_cpu(&agf->agf_fllast, 1);
> >  	if (be32_to_cpu(agf->agf_fllast) == XFS_AGFL_SIZE(mp))
> >  		agf->agf_fllast = 0;
> >  
> > -	pag = xfs_perag_get(mp, be32_to_cpu(agf->agf_seqno));
> >  	be32_add_cpu(&agf->agf_flcount, 1);
> >  	xfs_trans_agflist_delta(tp, 1);
> >  	pag->pagf_flcount++;
> > @@ -2588,6 +2709,7 @@ xfs_alloc_read_agf(
> >  		pag->pagb_count = 0;
> >  		pag->pagb_tree = RB_ROOT;
> >  		pag->pagf_init = 1;
> > +		pag->pagf_needpadfix = xfs_agfl_need_padfix(mp, agf);
> >  	}
> >  #ifdef DEBUG
> >  	else if (!XFS_FORCED_SHUTDOWN(mp)) {
> > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> > index e0792d036be2..78a6377a9b38 100644
> > --- a/fs/xfs/xfs_mount.h
> > +++ b/fs/xfs/xfs_mount.h
> > @@ -353,6 +353,7 @@ typedef struct xfs_perag {
> >  	char		pagi_inodeok;	/* The agi is ok for inodes */
> >  	uint8_t		pagf_levels[XFS_BTNUM_AGF];
> >  					/* # of levels in bno & cnt btree */
> > +	bool		pagf_needpadfix;
> >  	uint32_t	pagf_flcount;	/* count of blocks in freelist */
> >  	xfs_extlen_t	pagf_freeblks;	/* total free blocks */
> >  	xfs_extlen_t	pagf_longest;	/* longest free space */
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2018-03-06  2:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-23  1:59 [PATCH 0/5] xfs: fix various problems Darrick J. Wong
2018-02-23  1:59 ` [PATCH 1/5] xfs: don't iunlock the quota ip when quota block allocation fails Darrick J. Wong
2018-02-27 13:55   ` Brian Foster
2018-02-23  1:59 ` [PATCH 2/5] xfs: convert a few more directory asserts to corruption returns Darrick J. Wong
2018-02-27 13:55   ` Brian Foster
2018-02-23  2:00 ` [PATCH 3/5] xfs: check for cow blocks before trying to clear them during inode reclaim Darrick J. Wong
2018-02-27 13:55   ` Brian Foster
2018-02-23  2:00 ` [PATCH 4/5] xfs: convert XFS_AGFL_SIZE to a helper function Darrick J. Wong
2018-02-27 19:34   ` Brian Foster
2018-02-23  2:00 ` [PATCH 5/5] xfs: fix agfl wrapping Darrick J. Wong
2018-02-23  4:40   ` Darrick J. Wong
2018-02-23 20:33   ` Darrick J. Wong
2018-02-27 19:35   ` Brian Foster
2018-02-27 21:03     ` Darrick J. Wong
2018-02-28 22:43       ` Brian Foster
2018-02-28 23:20         ` Darrick J. Wong
2018-03-01 17:28           ` Brian Foster
2018-03-01 20:55             ` Darrick J. Wong
2018-03-02 13:12               ` Brian Foster
2018-03-03 13:59                 ` Brian Foster
2018-03-05 22:24                   ` Darrick J. Wong
2018-03-05 22:53                     ` Dave Chinner
2018-03-06  2:15                     ` Darrick J. Wong [this message]
2018-03-06 12:02                       ` Brian Foster
2018-03-01  6:37     ` Darrick J. Wong
2018-03-01  6:42   ` [PATCH v2 " Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180306021539.GF18989@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).