All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	Christoph Hellwig <hch@lst.de>,
	xfs@oss.sgi.com
Subject: Re: stop using ioends for direct write completions
Date: Tue, 2 Feb 2016 10:26:21 -0500	[thread overview]
Message-ID: <20160202152620.GA1853@bfoster.bfoster> (raw)
In-Reply-To: <20160201215407.GI20038@birch.djwong.org>

On Mon, Feb 01, 2016 at 01:54:07PM -0800, Darrick J. Wong wrote:
> On Fri, Jan 29, 2016 at 09:12:33AM -0500, Brian Foster wrote:
> > On Thu, Jan 28, 2016 at 05:16:56AM -0800, Christoph Hellwig wrote:
> > > Any chance to get a review for this?  It should really help
> > > with sorting out the buffered I/O COW code.
> > > 
> > 
> > I haven't taken a closer look at the patch yet. I was kind of waiting
> > for Dave to chime in because I'm a little confused over the back and
> > forth nature of dio/ioend completion lately.
> > 
> > My understanding is that the original requirement for ioends here was to
> > track the state necessary in order to defer (to wq) completions that had
> > to allocate transactions. Eventually, the deferred buffer state was
> > implemented and we no longer required an ioend for that, so we removed
> > the ioends here:
> > 
> >   2ba6623 xfs: don't allocate an ioend for direct I/O completions
> > 
> > Then just a couple months later, we merged the following to re-add the
> > ioend into the dio path:
> > 
> >   d5cc2e3f xfs: DIO needs an ioend for writes
> > 
> > I recall reviewing that one, but looking back afaict the ioend was used
> > simply to enable reuse of the ioend completion code. Now we have this
> > patch which presumably removes much of that code to eliminate the ioend
> > allocation overhead.
> > 
> > Neither this patch nor the previous has any technical reasoning for one
> > approach over the other in the commit logs, so afaics this appears to be
> > a matter of preference. Can we get some agreement (or discussion) on
> > what the right interface is to transfer information to dio completion?
> > E.g., is this allocation overhead noticeable? Is ioend usage problematic
> > for other reasons (such as copy-on-write)? Going back to the previous
> > patch, were there explicit reasons for using ioends aside from code
> > reuse? Note that the subsequent commit 6dfa1b67 ("xfs: handle DIO
> > overwrite EOF update completion correctly") does refer to some problems
> > not running dio completions in the right context... Dave?
> 
> I don't know the reason /for/ using ioends, but I'll share an argument in favor
> of Christoph's patch to remove them:
> 

The only thing I can gather from the historical information is that they
are used to facilitate reuse of the completion path code responsible for
unwritten conversion, inode size updates, etc. (i.e., convenience).

> Each ioend has a type code which determines what we do at the end of the IO.
> For buffered writes, we can create as many ioends as necessary to handle all
> the different dispositions of all the blocks in the range of dirty pages.
> 
> For directio, however, the requirements are a little different -- we want to be
> able to say "change all the extents in this part of the file from unwritten to
> written", and to be able to change i_size after a write.  We can reuse the
> ioend structure to encode these desires until CoW came along.  Now we also want
> to be able to say "remap all extents in this part of a file if the write
> succeeds".  Since we can only have one type code per ioend, either we have to
> hide extra offset/size fields in the ioend for this purpose, add a flags field,
> revise the code to create a bunch of ioends for each disposition type,
> unconditionally try to remap blocks for any reflink'd inode, or do what
> Christoph proposes, which is to stop (ab)using ioends and simply encode status
> flags into dio->private.
> 
> Christoph's approach seems like the easiest, since we're already provided
> with the offset and size, and both the unwritten extent conversion and the
> CoW remap code know how to iterate the data fork to look for the extents
> they want to alter.
> 
> The overhead probably goes down once we get rid of the memory allocation,
> but the primary purpose is to make life easier for CoW to avoid unnecessary
> overhead.
> 

That sounds reasonable to me, though I'd probably have to see the cow
bits to grok/appreciate the complexity avoided by killing off ioends
here. I wonder if then perhaps the right thing to do is package this
change with the associated cow bits (as a dependency) wherever/once they
are available..?

> Also: I've noticed that if I write 8MB file to a file backed by a failing
> device, xfs_end_io_direct_write() gets called with size == 8MB even if the
> writes failed.  If the write was to an unwritten extent, the extent will be
> converted to a regular extent even though the blocks contain old file contents
> and other garbage, which can subsequently be read off the disk.
> 
> For CoW I'd only want to remap the blocks if the write totally succeeds.  If a
> disk error happens, the EIO code gets passed back to userspace, and it seems
> logical to me that the file contents should remain unchanged.
> 
> To that end, I'm changing dio_iodone_t to pass an error code, and I think I
> want to change XFS to avoid doing unwritten extent conversions when the write
> fails.  Does that make sense?
> 

Yep, sounds like a fail to me.

Brian

> --D
> 
> > 
> > Brian
> > 
> > > _______________________________________________
> > > xfs mailing list
> > > xfs@oss.sgi.com
> > > http://oss.sgi.com/mailman/listinfo/xfs
> > 
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-02-02 15:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-14 10:10 stop using ioends for direct write completions Christoph Hellwig
2016-01-14 10:10 ` [PATCH] xfs: don't use " Christoph Hellwig
2016-01-28 13:16 ` stop using " Christoph Hellwig
2016-01-28 20:53   ` Darrick J. Wong
2016-01-28 21:10     ` Christoph Hellwig
2016-01-28 21:58       ` Darrick J. Wong
2016-01-28 22:02         ` Christoph Hellwig
2016-01-28 22:31           ` Darrick J. Wong
2016-01-29  8:01             ` Christoph Hellwig
2016-01-29 14:12   ` Brian Foster
2016-02-01 21:54     ` Darrick J. Wong
2016-02-02 11:17       ` Christoph Hellwig
2016-02-02 15:26       ` Brian Foster [this message]
2016-02-02 11:20     ` Christoph Hellwig
2016-02-02 15:31       ` Brian Foster
2016-02-02 16:42         ` Christoph Hellwig
2016-02-03 22:22           ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160202152620.GA1853@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.