Re: [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests
Date: Wed, 6 Dec 2017 09:53:00 -0800	[thread overview]
Message-ID: <20171206175300.GJ19219@magnolia> (raw)
In-Reply-To: <20171206141406.GA46723@bfoster.bfoster>

On Wed, Dec 06, 2017 at 09:14:07AM -0500, Brian Foster wrote:
> On Tue, Dec 05, 2017 at 03:34:20PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > Calling xfs_rmap_free with an unknown owner is supposed to remove any
> > rmaps covering that range regardless of owner.  This is used by the EFI
> > recovery code to say "we're freeing this, it mustn't be owned by
> > anything anymore", but for whatever reason xfs_free_ag_extent filters
> > them out.
> > 
> > Therefore, remove the filter and make xfs_rmap_unmap actually treat it
> > as a wildcard owner -- free anything that's already there, and if
> > there's no owner at all then that's fine too.
> > 
> > There are two existing callers of bmap_add_free that take care the rmap
> > deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
> > cleanup; convert these to use OWN_NULL, and ensure that the RUI gets
> > added to the defer ops ahead of any EFI.
> > 
> > Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
> > growfs will have to consult directly with the rmap to ensure that there
> > aren't any rmaps in the grown region.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> 
> Thanks... this resolves the log recovery problem on a quick test.
> 
> >  fs/xfs/libxfs/xfs_alloc.c    |    2 +-
> >  fs/xfs/libxfs/xfs_bmap.c     |    2 +-
> >  fs/xfs/libxfs/xfs_refcount.c |   52 +++++++++++++++---------------------------
> >  fs/xfs/libxfs/xfs_rmap.c     |   15 +++++++++---
> >  fs/xfs/xfs_fsops.c           |    5 ++++
> >  5 files changed, 37 insertions(+), 39 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index a840028..0f260eeb 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -1696,7 +1696,7 @@ xfs_free_ag_extent(
> >  	bno_cur = cnt_cur = NULL;
> >  	mp = tp->t_mountp;
> >  
> > -	if (oinfo->oi_owner != XFS_RMAP_OWN_UNKNOWN) {
> > +	if (oinfo->oi_owner != XFS_RMAP_OWN_NULL) {
> >  		error = xfs_rmap_free(tp, agbp, agno, bno, len, oinfo);
> >  		if (error)
> >  			goto error0;
> > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > index 16df627..89bb3d9 100644
> > --- a/fs/xfs/libxfs/xfs_bmap.c
> > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > @@ -573,7 +573,7 @@ xfs_bmap_add_free(
> >  	if (oinfo)
> >  		new->xefi_oinfo = *oinfo;
> >  	else
> > -		xfs_rmap_skip_owner_update(&new->xefi_oinfo);
> > +		xfs_rmap_ag_owner(&new->xefi_oinfo, XFS_RMAP_OWN_NULL);
> 
> So what is the difference now between xfs_rmap_skip_owner_update(),
> which sets OWN_UNKNOWN, and OWN_NULL, which skips owner updates in
> certain cases? Should we be using OWN_NULL consistently to skip owner
> updates (not that UNKNOWN makes much sense in some of the other cases,
> like allocation).

Yeah, there's a bunch of cleanups that I was intending to do (most of
which you've caught below) prior to making a non-RFC submission.

> >  	trace_xfs_bmap_free_defer(mp, XFS_FSB_TO_AGNO(mp, bno), 0,
> >  			XFS_FSB_TO_AGBNO(mp, bno), len);
> >  	xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_FREE, &new->xefi_list);
> > diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c
> > index 73f8058..9103be0 100644
> > --- a/fs/xfs/libxfs/xfs_refcount.c
> > +++ b/fs/xfs/libxfs/xfs_refcount.c
> > @@ -1505,27 +1505,12 @@ __xfs_refcount_cow_alloc(
> >  	xfs_extlen_t		aglen,
> >  	struct xfs_defer_ops	*dfops)
> >  {
> > -	int			error;
> > -
> >  	trace_xfs_refcount_cow_increase(rcur->bc_mp, rcur->bc_private.a.agno,
> >  			agbno, aglen);
> >  
> >  	/* Add refcount btree reservation */
> > -	error = xfs_refcount_adjust_cow(rcur, agbno, aglen,
> > +	return xfs_refcount_adjust_cow(rcur, agbno, aglen,
> >  			XFS_REFCOUNT_ADJUST_COW_ALLOC, dfops);
> > -	if (error)
> > -		return error;
> > -
> > -	/* Add rmap entry */
> > -	if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) {
> > -		error = xfs_rmap_alloc_extent(rcur->bc_mp, dfops,
> > -				rcur->bc_private.a.agno,
> > -				agbno, aglen, XFS_RMAP_OWN_COW);
> > -		if (error)
> > -			return error;
> > -	}
> > -
> > -	return error;
> >  }
> 
> I think the refcount fixup probably warrants an independent patch with a
> more detailed commit log around the ordering requirement and how this
> changes behavior.

Yep.

> >  
> >  /*
> > @@ -1538,27 +1523,12 @@ __xfs_refcount_cow_free(
> >  	xfs_extlen_t		aglen,
> >  	struct xfs_defer_ops	*dfops)
> >  {
> > -	int			error;
> > -
> >  	trace_xfs_refcount_cow_decrease(rcur->bc_mp, rcur->bc_private.a.agno,
> >  			agbno, aglen);
> >  
> >  	/* Remove refcount btree reservation */
> > -	error = xfs_refcount_adjust_cow(rcur, agbno, aglen,
> > +	return xfs_refcount_adjust_cow(rcur, agbno, aglen,
> >  			XFS_REFCOUNT_ADJUST_COW_FREE, dfops);
> 
> xfs_refcount_finish_one() -> xfs_refcount_cow_[alloc|free]() ->
> xfs_refcount_adjust_cow() -> ...
> 
> Hmm, seems like there's opportunity for more cleanup here. Do we really
> need separate xfs_refcount_cow_*() functions just for tracepoints? Seems
> like we could just fold these into xfs_refcount_finish_one().

Yep.

> > -	if (error)
> > -		return error;
> > -
> > -	/* Remove rmap entry */
> > -	if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) {
> > -		error = xfs_rmap_free_extent(rcur->bc_mp, dfops,
> > -				rcur->bc_private.a.agno,
> > -				agbno, aglen, XFS_RMAP_OWN_COW);
> > -		if (error)
> > -			return error;
> > -	}
> > -
> > -	return error;
> >  }
> >  
> >  /* Record a CoW staging extent in the refcount btree. */
> > @@ -1569,11 +1539,19 @@ xfs_refcount_alloc_cow_extent(
> >  	xfs_fsblock_t			fsb,
> >  	xfs_extlen_t			len)
> >  {
> > +	int				error;
> > +
> >  	if (!xfs_sb_version_hasreflink(&mp->m_sb))
> >  		return 0;
> >  
> > -	return __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_ALLOC_COW,
> > +	error = __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_ALLOC_COW,
> >  			fsb, len);
> > +	if (error)
> > +		return error;
> > +
> > +	/* Add rmap entry */
> > +	return xfs_rmap_alloc_extent(mp, dfops, XFS_FSB_TO_AGNO(mp, fsb),
> > +			XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW);
> >  }
> >  
> >  /* Forget a CoW staging event in the refcount btree. */
> > @@ -1584,9 +1562,17 @@ xfs_refcount_free_cow_extent(
> >  	xfs_fsblock_t			fsb,
> >  	xfs_extlen_t			len)
> >  {
> > +	int				error;
> > +
> >  	if (!xfs_sb_version_hasreflink(&mp->m_sb))
> >  		return 0;
> >  
> > +	/* Remove rmap entry */
> > +	error = xfs_rmap_free_extent(mp, dfops, XFS_FSB_TO_AGNO(mp, fsb),
> > +			XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW);
> > +	if (error)
> > +		return error;
> > +
> >  	return __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_FREE_COW,
> >  			fsb, len);
> >  }
> > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c
> > index 5f3a3d9..fd0e630 100644
> > --- a/fs/xfs/libxfs/xfs_rmap.c
> > +++ b/fs/xfs/libxfs/xfs_rmap.c
> > @@ -484,10 +484,17 @@ xfs_rmap_unmap(
> >  	XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) ==
> >  			(ltrec.rm_flags & XFS_RMAP_UNWRITTEN), out_error);
> >  
> > -	/* Make sure the extent we found covers the entire freeing range. */
> > -	XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno &&
> > -		ltrec.rm_startblock + ltrec.rm_blockcount >=
> > -		bno + len, out_error);
> > +	/*
> > +	 * Make sure the extent we found covers the entire freeing range.
> > +	 * If this is a wildcard free, we're already done, otherwise there's
> > +	 * something wrong with the rmapbt.
> > +	 */
> 
> What does this mean by "we're already done?" This logic appears to mean
> that we don't do anything (as opposed to throwing an error). I think the
> comment would be more clear if it pointed out that/why we have nothing
> to do here (due to OWN_UNKNOWN). I.e., caller passed in a wildcard and
> we essentially didn't find a match..?

"Make sure the extent we found covers the entire freeing range.  Passing
in an owner of OWN_UNKNOWN means that the caller wants to remove any
reverse mapping that may exist for this range of blocks regardless of
owner; if there are no mappings at all, we're done."

> > +	if (ltrec.rm_startblock > bno ||
> > +	    ltrec.rm_startblock + ltrec.rm_blockcount < bno + len) {
> > +		if (owner == XFS_RMAP_OWN_UNKNOWN)
> > +			goto out_done;
> > +		XFS_WANT_CORRUPTED_GOTO(mp, false, out_error);
> > +	}
> >  
> 
> Also... unrelated, but is this check immediately below really intending
> to ignore owner inconsistencies for all !inode owners?

I had my eye on that one too, though I think that could be a
freestanding cleanup.

> >  	/* Make sure the owner matches what we expect to find in the tree. */
> >  	XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner ||
> > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> > index 8f22fc5..60a2e12 100644
> > --- a/fs/xfs/xfs_fsops.c
> > +++ b/fs/xfs/xfs_fsops.c
> > @@ -571,6 +571,11 @@ xfs_growfs_data_private(
> >  		 * this doesn't actually exist in the rmap btree.
> >  		 */
> >  		xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL);
> > +		error = xfs_rmap_free(tp, bp, agno,
> > +				be32_to_cpu(agf->agf_length) - new,
> > +				new, &oinfo);
> > +		if (error)
> > +			goto error0;
> 
> OWN_NULL makes sense from the perspective of needing to avoid some error
> down in the free code where we need to free some space without needing
> to remove an owner, but what is the purpose of the above? It doesn't
> look like this really does anything beyond checking that the associated
> space is beyond the end of the rmapbt. If that's the intent, then it
> probably makes sense to update this comment as well.

Yes, that's exactly the intent.

Hmm, come to think of it, the rmap xref patch adds a
xfs_rmap_has_record helper that does exactly what we want here (decides
if there are any records covering this range).

--D

> Brian
> 
> >  		error = xfs_free_extent(tp,
> >  				XFS_AGB_TO_FSB(mp, agno,
> >  					be32_to_cpu(agf->agf_length) - new),
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2017-12-06 17:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-05 18:55 extfree log recovery and owner (rmapbt) updates Brian Foster
2017-12-05 23:32 ` Darrick J. Wong
2017-12-05 23:34 ` [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests Darrick J. Wong
2017-12-06 14:14   ` Brian Foster
2017-12-06 17:53     ` Darrick J. Wong [this message]
2017-12-06 20:49       ` Brian Foster
2017-12-06 22:06         ` Darrick J. Wong
2017-12-07 13:00           ` Brian Foster
2017-12-05 23:49 ` extfree log recovery and owner (rmapbt) updates Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171206175300.GJ19219@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).