Re: [PATCH] xfs: clear XBF_ASYNC if we attach an iodone callback

linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Josef Bacik <jbacik@fb.com>
Cc: Dave Chinner <david@fromorbit.com>,
	"linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"darrick.wong@oracle.com" <darrick.wong@oracle.com>,
	Kernel Team <Kernel-team@fb.com>,
	Carlos Maiolino <cmaiolino@redhat.com>
Subject: Re: [PATCH] xfs: clear XBF_ASYNC if we attach an iodone callback
Date: Tue, 7 Mar 2017 11:12:46 -0500	[thread overview]
Message-ID: <20170307161246.GB8144@bfoster.bfoster> (raw)
In-Reply-To: <1488902165.9307.26.camel@fb.com>

On Tue, Mar 07, 2017 at 10:56:05AM -0500, Josef Bacik wrote:
> On Tue, 2017-03-07 at 12:22 +1100, Dave Chinner wrote:
> > On Mon, Mar 06, 2017 at 02:00:13PM -0500, Josef Bacik wrote:
> > > 
> > > so you retry once, and the second time through we will return false
> > > if
> > > we are unmounting, which means that xfs_buf_do_callbacks gets
> > > called.
> > > 
> > > Now you say that xfs_buf_do_callbacks() being called when we
> > > errored
> > > out is dangerous, but I don't understand why.  We have this code
> > > 
> > >         if (need_ail) {
> > >                 struct xfs_log_item *log_items[need_ail];
> > >                 int i = 0;
> > >                 spin_lock(&ailp->xa_lock);
> > >                 for (blip = lip; blip; blip = blip->li_bio_list) {
> > >                         iip = INODE_ITEM(blip);
> > >                         if (iip->ili_logged &&
> > >                             blip->li_lsn == iip->ili_flush_lsn) {
> > >                                 log_items[i++] = blip;
> > >                         }
> > >                         ASSERT(i <= need_ail);
> > >                 }
> > >                 /* xfs_trans_ail_delete_bulk() drops the AIL lock.
> > > */
> > >                 xfs_trans_ail_delete_bulk(ailp, log_items, i,
> > >                                           SHUTDOWN_CORRUPT_INCORE);
> > >         }
> > [ Your cut-n-paste is horribly mangled in something non-ascii. ]
> > 
> > > 
> > > and need_ail is set if we didn't actually end up on disk properly.
> > >  So
> > > we delete the thing and mark the fs as fucked.  Am I mis-reading
> > > what
> > > is going on here?
> > Yes. That won't shut down the filesystem on an inode write error. It
> > will simply remove the inode from the AIL, the error will do dropped
> > on the floor, and the filesystem will not be shut down even though
> > it's now in a corrupt state. xfs_trans_ail_delete_bulk() only shuts
> > down the filesytem if it doesn't find the item being removed form
> > the AIL in the AIL.
> > 
> > As I keep telling people, what needs to happen is this:
> > 
> > 	1. buffer write error occurs
> > 	2. retries exhausted, mark buffer as errored, run callbacks
> > 	3. xfs_iflush_done() (and other iodone callbacks) need to
> > 	   look at the buffer error state to determine what to do.
> > 	4. if the buffer is errored and recovery is not possible,
> > 	   the callback needs to shut down the filesytem.
> > 	5. the callback then needs to call flush abort function for
> > 	   item (e.g. xfs_iflush_abort()) rather than the normal
> > 	   IO completion processing path.
> > 
> > > 
> > >  This is what I did to our 4.6 tree (in testing
> > > still, not actually pushed to production)
> > > 
> > >         if (bp->b_flags & XBF_ASYNC) {
> > >                 ASSERT(bp->b_iodone != NULL);
> > > 
> > >                 trace_xfs_buf_item_iodone_async(bp, _RET_IP_);
> > > 
> > >                 xfs_buf_ioerror(bp, 0); /* errno of 0 unsets the
> > > flag
> > > */
> > > 
> > >                 if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) {
> > >                         bp->b_flags |= XBF_WRITE | XBF_ASYNC
> > >                                        XBF_DONE | XBF_WRITE_FAIL;
> > >                         xfs_buf_submit(bp);
> > >                 } else {
> > > +                       xfs_buf_do_callbacks(bp);
> > > +                       bp->b_fspriv = NULL;
> > > +                       bp->b_iodone = NULL;
> > >                         xfs_buf_relse(bp);
> > >                 }
> > > 
> > >                 return;
> > >         }
> > It's likely you'll end up with silent on disk corruption if you do
> > that because it's just tossing the error state away.
> 
> I think I'm getting a handle on this, but I wonder if we don't have
> this problem currently?  We get an error on a buffer, we retry it and
> it fails again.  If we don't have XFS_ERR_RETRY_FOREVER set then we
> error out and do the normal callbacks as if nothing ever happened, and
> we get the silent corruption without shutting down the file system.
>  How is the code as it stands without XFS_ERR_RETRY_FOREVER safe?
>  Thanks,
> 

We shouldn't get into a situation where we run the callbacks when the
I/O has failed and the fs isn't shut down. IOWs, we do run the callbacks
only when the I/O succeeds, or it fails and the fs is shutdown as a
result (never when the I/O fails but we haven't shut down, which is the
problematic case the patch above introduces).

That tunable error handling logic (XFS_ERR_RETRY_FOREVER) basically
determines when an I/O error is considered permanent, which means
retries are futile and we have no choice but to shutdown the fs. E.g.,
if an error is deemed permanent, we jump to permanent_error:

        /*
         * Permanent error - we need to trigger a shutdown if we haven't already
         * to indicate that inconsistency will result from this action.
         */
permanent_error:
        xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
out_stale:
        xfs_buf_stale(bp);
        bp->b_flags |= XBF_DONE;
        trace_xfs_buf_error_relse(bp, _RET_IP_);
        return false;
}

... which shuts down the filesystem before we go any farther.

Brian

> Josef

next prev parent reply	other threads:[~2017-03-07 16:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-03 14:46 [PATCH] xfs: clear XBF_ASYNC if we attach an iodone callback Josef Bacik
2017-03-03 20:49 ` Brian Foster
2017-03-03 21:34   ` Josef Bacik
2017-03-03 22:20     ` Brian Foster
2017-03-04  1:26       ` Josef Bacik
2017-03-04 13:58         ` Brian Foster
2017-03-06 19:00           ` Josef Bacik
2017-03-07  1:22             ` Dave Chinner
2017-03-07  2:37               ` Josef Bacik
2017-03-07 15:56               ` Josef Bacik
2017-03-07 16:12                 ` Brian Foster [this message]
2017-04-03  9:19                   ` Carlos Maiolino

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170307161246.GB8144@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=Kernel-team@fb.com \
    --cc=cmaiolino@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=jbacik@fb.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).