From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 1/9] xfs: synchronous buffer IO needs a reference
Date: Mon, 18 Aug 2014 10:15:26 -0400 [thread overview]
Message-ID: <20140818141525.GA30093@bfoster.bfoster> (raw)
In-Reply-To: <20140815231736.GT26465@dastard>
On Sat, Aug 16, 2014 at 09:17:36AM +1000, Dave Chinner wrote:
> On Fri, Aug 15, 2014 at 09:18:04AM -0400, Brian Foster wrote:
> > On Fri, Aug 15, 2014 at 04:38:59PM +1000, Dave Chinner wrote:
> ....
> > > if (bp->b_flags & XBF_WRITE)
> > > xfs_buf_wait_unpin(bp);
> > > +
> > > + /*
> > > + * Take references to the buffer. For XBF_ASYNC buffers, holding a
> > > + * reference for as long as submission takes is all that is necessary
> > > + * here. The IO inherits the lock and hold count from the submitter,
> > > + * and these are release during IO completion processing. Taking a hold
> > > + * over submission ensures that the buffer is not freed until we have
> > > + * completed all processing, regardless of when IO errors occur or are
> > > + * reported.
> > > + *
> > > + * However, for synchronous IO, the IO does not inherit the submitters
> > > + * reference count, nor the buffer lock. Hence we need to take an extra
> > > + * reference to the buffer for the for the IO context so that we can
> > > + * guarantee the buffer is not freed until all IO completion processing
> > > + * is done. Otherwise the caller can drop their reference while the IO
> > > + * is still in progress and hence trigger a use-after-free situation.
> > > + */
> > > xfs_buf_hold(bp);
> > > + if (!(bp->b_flags & XBF_ASYNC))
> > > + xfs_buf_hold(bp);
> > > +
> > >
> > > /*
> > > - * Set the count to 1 initially, this will stop an I/O
> > > - * completion callout which happens before we have started
> > > - * all the I/O from calling xfs_buf_ioend too early.
> > > + * Set the count to 1 initially, this will stop an I/O completion
> > > + * callout which happens before we have started all the I/O from calling
> > > + * xfs_buf_ioend too early.
> > > */
> > > atomic_set(&bp->b_io_remaining, 1);
> > > _xfs_buf_ioapply(bp);
> > > +
> > > /*
> > > - * If _xfs_buf_ioapply failed, we'll get back here with
> > > - * only the reference we took above. _xfs_buf_ioend will
> > > - * drop it to zero, so we'd better not queue it for later,
> > > - * or we'll free it before it's done.
> > > + * If _xfs_buf_ioapply failed or we are doing synchronous IO that
> > > + * completes extremely quickly, we can get back here with only the IO
> > > + * reference we took above. _xfs_buf_ioend will drop it to zero, so
> > > + * we'd better run completion processing synchronously so that the we
> > > + * don't return to the caller with completion still pending. In the
> > > + * error case, this allows the caller to check b_error safely without
> > > + * waiting, and in the synchronous IO case it avoids unnecessary context
> > > + * switches an latency for high-peformance devices.
> > > */
> >
> > AFAICT there is no real wait if the buf has completed at this point. The
> > wait just decrements the completion counter.
>
> If the IO has completed, then we run the completion code.
>
> > So what's the benefit of
> > "not waiting?" Where is the potential context switch?
>
> async work for completion processing on synchrnous IO means we queue
> the work, then sleep in xfs_buf_iowait(). Two context switches, plus
> a work queue execution
>
Right...
> > Are you referring
> > to the case where error is set but I/O is not complete? Are you saying
> > the advantage to the caller is it doesn't have to care about the state
> > of further I/O once it has been determined at least one error has
> > occurred? (If so, who cares about latency given that some operation that
> > depends on this I/O is already doomed to fail?).
>
> No, you're reading *way* too much into this. For sync IO, it's
> always best to process completion inline. For async, it doesn't
> matter, but if there's a submission error is *more effecient* to
> process it in the current context.
>
Heh. Sure, that makes sense. Perhaps it's just the way I read it,
implying that how we process I/O completion effects what the calling
code should look like. Simple case of the comment being a bit more
confusing than the code. ;) FWIW, the following is more clear to me:
/*
* If _xfs_buf_ioapply failed or we are doing synchronous IO that
* completes extremely quickly, we can get back here with only the IO
* reference we took above. _xfs_buf_ioend will drop it to zero. Run
* completion processing synchronously so that we don't return to the
* caller with completion still pending. This avoids unnecessary context
* switches associated with the end_io workqueue.
*/
Thanks for the explanation.
Brian
> > The code looks fine, but I'm trying to understand the reasoning better
> > (and I suspect we can clarify the comment).
> >
> > > - _xfs_buf_ioend(bp, bp->b_error ? 0 : 1);
> > > + if (bp->b_error || !(bp->b_flags & XBF_ASYNC))
> > > + _xfs_buf_ioend(bp, 0);
> > > + else
> > > + _xfs_buf_ioend(bp, 1);
> >
> > Not related to this patch, but it seems like the problem this code tries
> > to address is still possible.
>
> The race condition is still possible - it just won't result in a
> use-after-free. The race condition is not fixed until patch 8,
> but as a backportable fix, this patch is much, much simpler.
>
> > Perhaps this papers over a particular
> > instance. Consider the case where an I/O fails immediately after this
> > call completes, but not before. We have an extra reference now for
> > completion, but we can still return to the caller with completion
> > pending. I suppose its fine if we consider the "problem" to be that the
> > reference goes away underneath the completion, as opposed to the caller
> > caring about the status of completion.
>
> Precisely.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-08-18 14:15 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-15 6:38 [RFC PATCH 0/9] xfs: clean up xfs_buf io interfaces Dave Chinner
2014-08-15 6:38 ` [PATCH 1/9] xfs: synchronous buffer IO needs a reference Dave Chinner
2014-08-15 13:18 ` Brian Foster
2014-08-15 23:17 ` Dave Chinner
2014-08-18 14:15 ` Brian Foster [this message]
2014-08-29 0:18 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 2/9] xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality Dave Chinner
2014-08-15 13:18 ` Brian Foster
2014-08-15 23:21 ` Dave Chinner
2014-08-18 14:15 ` Brian Foster
2014-08-29 0:22 ` Christoph Hellwig
2014-08-29 0:55 ` Dave Chinner
2014-08-15 6:39 ` [PATCH 3/9] xfs: rework xfs_buf_bio_endio error handling Dave Chinner
2014-08-15 13:18 ` Brian Foster
2014-08-15 23:25 ` Dave Chinner
2014-08-29 0:23 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 4/9] xfs: kill xfs_bdstrat_cb Dave Chinner
2014-08-29 0:24 ` Christoph Hellwig
2014-08-15 6:39 ` [PATCH 5/9] xfs: xfs_bioerror can die Dave Chinner
2014-08-15 14:35 ` Brian Foster
2014-08-15 23:27 ` Dave Chinner
2014-08-29 0:28 ` Christoph Hellwig
2014-08-29 1:05 ` Dave Chinner
2014-08-15 6:39 ` [PATCH 6/9] xfs: kill xfs_bioerror_relse Dave Chinner
2014-08-29 0:32 ` Christoph Hellwig
2014-08-29 1:12 ` Dave Chinner
2014-08-29 18:26 ` Christoph Hellwig
2014-08-30 0:05 ` Dave Chinner
2014-08-15 6:39 ` [PATCH 7/9] xfs: clean up xfs_trans_buf_read_map Dave Chinner
2014-08-15 6:39 ` [PATCH 8/9] xfs: introduce xfs_buf_submit[_wait] Dave Chinner
2014-08-15 13:10 ` Christoph Hellwig
2014-08-15 23:37 ` Dave Chinner
2014-08-16 4:55 ` Christoph Hellwig
2014-08-15 14:35 ` Brian Foster
2014-08-15 23:39 ` Dave Chinner
2014-08-18 14:16 ` Brian Foster
2014-08-15 16:13 ` Brian Foster
2014-08-15 23:58 ` Dave Chinner
2014-08-18 14:26 ` Brian Foster
2014-08-15 6:39 ` [PATCH 9/9] xfs: check xfs_buf_read_uncached returns correctly Dave Chinner
2014-08-15 12:56 ` Christoph Hellwig
2014-08-15 23:58 ` Dave Chinner
2014-08-29 0:37 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140818141525.GA30093@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.