All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs <linux-xfs@vger.kernel.org>, Eryu Guan <eguan@redhat.com>
Subject: Re: [PATCH] xfs: recheck reflink / dirty page status before freeing CoW reservations
Date: Thu, 11 Jan 2018 12:32:46 -0800	[thread overview]
Message-ID: <20180111203246.GJ5602@magnolia> (raw)
In-Reply-To: <20180111193828.GC22600@bfoster.bfoster>

On Thu, Jan 11, 2018 at 02:38:28PM -0500, Brian Foster wrote:
> On Thu, Jan 11, 2018 at 09:40:27AM -0800, Darrick J. Wong wrote:
> > On Thu, Jan 11, 2018 at 07:04:10AM -0500, Brian Foster wrote:
> > > On Wed, Jan 10, 2018 at 02:03:36PM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > 
> > > > Eryu Guan reported seeing occasional hangs when running generic/269 with
> > > > a new fsstress that supports clonerange/deduperange.  The cause of this
> > > > hang is an infinite loop when we convert the CoW fork extents from
> > > > unwritten to real just prior to writing the pages out; the infinite
> > > > loop happens because there's nothing in the CoW fork to convert, and so
> > > > it spins forever.
> > > > 
> > > > The underlying issue here is that when we go to perform these CoW fork
> > > > conversions, we're supposed to have an extent waiting for us, but the
> > > > low space CoW reaper has snuck in and blown them away!  There are four
> > > > conditions that can dissuade the reaper from touching our file -- no
> > > > reflink iflag; dirty page cache; writeback in progress; or directio in
> > > > progress.  We check the four conditions prior to taking the locks, but
> > > > we neglect to recheck them once we have the locks, which is how we end
> > > > up whacking the writeback that's in progress.
> > > > 
> > > > Therefore, refactor the four checks into a helper function and call it
> > > > once again once we have the locks to make sure we really want to reap
> > > > the inode.  While we're at it, add an ASSERT for this weird condition so
> > > > that we'll fail noisily if we ever screw this up again.
> > > > 
> > > > Reported-by: Eryu Guan <eguan@redhat.com>
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > >  fs/xfs/libxfs/xfs_bmap.c |    7 +++++
> > > >  fs/xfs/xfs_icache.c      |   61 +++++++++++++++++++++++++++++-----------------
> > > >  2 files changed, 46 insertions(+), 22 deletions(-)
> > > > 
> > > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > > > index a01cef4..7bd933f 100644
> > > > --- a/fs/xfs/libxfs/xfs_bmap.c
> > > > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > > > @@ -4311,6 +4311,13 @@ xfs_bmapi_write(
> > > >  	while (bno < end && n < *nmap) {
> > > >  		bool			need_alloc = false, wasdelay = false;
> > > >  
> > > > +		/*
> > > > +		 * CoW fork conversions should /never/ hit EOF.  There should
> > > > +		 * always be something for us to work on.
> > > > +		 */
> > > > +		ASSERT(!eof || !(flags & XFS_BMAPI_CONVERT) ||
> > > > +			       !(flags & XFS_BMAPI_COWFORK));
> > > > +
> > > 
> > > The hunk just below asserts for BMAPI_COWFORK in a case that explicitly
> > > considers eof. That makes the logic confusing to follow IMO, but I'm
> > > more wondering whether pushing something like ASSERT(!((flags & CONVERT)
> > > && (flags & COWFORK))) down into that hunk is effectively the same
> > > thing..?  I.e., is it also true that we should not find a hole in the
> > > (CONVERT & COW) case?
> > 
> > Yes.
> > 
> > > >  		/* in hole or beyoned EOF? */
> > > >  		if (eof || bma.got.br_startoff > bno) {
> > > >  			if (flags & XFS_BMAPI_DELALLOC) {
> > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> > > > index 1f84562..3fbcc03 100644
> > > > --- a/fs/xfs/xfs_icache.c
> > > > +++ b/fs/xfs/xfs_icache.c
> > > > @@ -1654,6 +1654,35 @@ xfs_inode_clear_eofblocks_tag(
> > > >  			trace_xfs_perag_clear_eofblocks, XFS_ICI_EOFBLOCKS_TAG);
> > > >  }
> > > >  
> > > > +/* Is this a good time to reap the CoW reservations for this file? */
> > > > +static bool
> > > > +xfs_can_free_cowblocks(
> > > > +	struct xfs_inode	*ip,
> > > > +	struct xfs_ifork	*ifp)
> > > > +{
> > > > +	/*
> > > > +	 * Just clear the tag if we have an empty cow fork or none at all. It's
> > > > +	 * possible the inode was fully unshared since it was originally tagged.
> > > > +	 */
> > > > +	if (!xfs_is_reflink_inode(ip) || !ifp->if_bytes) {
> > > > +		trace_xfs_inode_free_cowblocks_invalid(ip);
> > > > +		xfs_inode_clear_cowblocks_tag(ip);
> > > > +		return false;
> > > 
> > > I think the flag update and tracepoint should probably remain in the
> > > caller. They're somewhat misplaced for a "xfs_can_do_something()"
> > > helper, particularly if it's ever exported and used in other contexts in
> > > the future. Otherwise seems fine.
> > 
> > Hmm.  There's a subtlety to step around here, which is that this
> > predicate can return false to mean "nothing here to see" or to mean
> > "cannot clear anything at this time".  We want the trace+clear for the
> > first case, but not the second.
> > 
> > I suppose this function could return the regular error code int and the
> > caller can figure out what that means to it, but then all the post-check
> > stuff ends up duplicated in the callers... so maybe I should just rename
> > it xfs_prep_free_cowblocks().
> > 
> 
> Are both checks necessarily required to be repeated under lock to fix
> the bug? IOW, Could the !fork || !flag check remain in the caller to
> cover the first case?

They're necessary in both cases; generic/269 (when it wasn't hanging)
would also blow the xfs_is_reflink_inode assert in
xfs_reflink_cancel_cow_range if the flag got cleared before we can grab
the iolock/mmaplock.

> > And change the comment to:
> > 
> > /*
> >  * Set ourselves up to free CoW blocks from this file.  If it's already
> >  * clean then we can bail out quickly, but otherwise we must back off if
> >  * the file is undergoing some kind of write.
> >  */
> > 
> 
> That sounds reasonable too.

<nod>

--D

> Brian
> 
> > --D
> > 
> > > Brian
> > > 
> > > > +	}
> > > > +
> > > > +	/*
> > > > +	 * If the mapping is dirty or under writeback we cannot touch the
> > > > +	 * CoW fork.  Leave it alone if we're in the midst of a directio.
> > > > +	 */
> > > > +	if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) ||
> > > > +	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) ||
> > > > +	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
> > > > +	    atomic_read(&VFS_I(ip)->i_dio_count))
> > > > +		return false;
> > > > +
> > > > +	return true;
> > > > +}
> > > > +
> > > >  /*
> > > >   * Automatic CoW Reservation Freeing
> > > >   *
> > > > @@ -1672,29 +1701,12 @@ xfs_inode_free_cowblocks(
> > > >  	int			flags,
> > > >  	void			*args)
> > > >  {
> > > > -	int ret;
> > > > -	struct xfs_eofblocks *eofb = args;
> > > > -	int match;
> > > > +	struct xfs_eofblocks	*eofb = args;
> > > >  	struct xfs_ifork	*ifp = XFS_IFORK_PTR(ip, XFS_COW_FORK);
> > > > +	int			match;
> > > > +	int			ret = 0;
> > > >  
> > > > -	/*
> > > > -	 * Just clear the tag if we have an empty cow fork or none at all. It's
> > > > -	 * possible the inode was fully unshared since it was originally tagged.
> > > > -	 */
> > > > -	if (!xfs_is_reflink_inode(ip) || !ifp->if_bytes) {
> > > > -		trace_xfs_inode_free_cowblocks_invalid(ip);
> > > > -		xfs_inode_clear_cowblocks_tag(ip);
> > > > -		return 0;
> > > > -	}
> > > > -
> > > > -	/*
> > > > -	 * If the mapping is dirty or under writeback we cannot touch the
> > > > -	 * CoW fork.  Leave it alone if we're in the midst of a directio.
> > > > -	 */
> > > > -	if ((VFS_I(ip)->i_state & I_DIRTY_PAGES) ||
> > > > -	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_DIRTY) ||
> > > > -	    mapping_tagged(VFS_I(ip)->i_mapping, PAGECACHE_TAG_WRITEBACK) ||
> > > > -	    atomic_read(&VFS_I(ip)->i_dio_count))
> > > > +	if (!xfs_can_free_cowblocks(ip, ifp))
> > > >  		return 0;
> > > >  
> > > >  	if (eofb) {
> > > > @@ -1715,7 +1727,12 @@ xfs_inode_free_cowblocks(
> > > >  	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > > >  	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> > > >  
> > > > -	ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> > > > +	/*
> > > > +	 * Check again, nobody else should be able to dirty blocks or change
> > > > +	 * the reflink iflag now that we have the first two locks held.
> > > > +	 */
> > > > +	if (xfs_can_free_cowblocks(ip, ifp))
> > > > +		ret = xfs_reflink_cancel_cow_range(ip, 0, NULLFILEOFF, false);
> > > >  
> > > >  	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);
> > > >  	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-01-11 20:32 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-10 22:03 [PATCH] xfs: recheck reflink / dirty page status before freeing CoW reservations Darrick J. Wong
2018-01-11  7:54 ` Eryu Guan
2018-01-12  3:32   ` Eryu Guan
2018-01-15  6:36     ` Eryu Guan
2018-01-15 20:08       ` Darrick J. Wong
2018-01-11 12:04 ` Brian Foster
2018-01-11 17:40   ` Darrick J. Wong
2018-01-11 19:38     ` Brian Foster
2018-01-11 20:32       ` Darrick J. Wong [this message]
2018-01-17  1:18 ` [PATCH v2] " Darrick J. Wong
2018-01-17 12:56   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180111203246.GJ5602@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=eguan@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.