From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 4/8] xfs: handle DIO overwrite EOF update completion correctly
Date: Tue, 14 Apr 2015 11:35:39 -0400 [thread overview]
Message-ID: <20150414153538.GJ36198@bfoster.bfoster> (raw)
In-Reply-To: <20150414143501.GE36198@bfoster.bfoster>
On Tue, Apr 14, 2015 at 10:35:02AM -0400, Brian Foster wrote:
> On Tue, Apr 14, 2015 at 05:26:47PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > Currently a DIO overwrite that extends the EOF (e.g sub-block IO or
> > write into allocated blocks beyond EOF) requires a transaction for
> > the EOF update. Thi is done in IO completion context, but we aren't
> > explicitly handling this situation properly and so it can run in
> > interrupt context. Ensure that we defer IO that spans EOF correctly
> > to the DIO completion workqueue, and now that we have an ioend in IO
> > completion we can use the common ioend completion path to do all the
> > work.
> >
> > Note: we do not preallocate the append transaction as we can have
> > multiple mapping and allocation calls per direct IO. hence
> > preallocating can still leave us with nested transactions by
> > attempting to map and allocate more blocks after we've preallocated
> > an append transaction.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> > fs/xfs/xfs_aops.c | 61 +++++++++++++++++++++++++++---------------------------
> > fs/xfs/xfs_trace.h | 1 +
> > 2 files changed, 31 insertions(+), 31 deletions(-)
> >
> > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> > index e1fa926..e3968a3 100644
> > --- a/fs/xfs/xfs_aops.c
> > +++ b/fs/xfs/xfs_aops.c
> > @@ -1293,7 +1293,7 @@ xfs_map_direct(
> > imap);
> > }
> >
> > - if (ioend->io_type == XFS_IO_UNWRITTEN)
> > + if (ioend->io_type == XFS_IO_UNWRITTEN || xfs_ioend_is_append(ioend))
> > set_buffer_defer_completion(bh_result);
> > }
> >
> > @@ -1531,8 +1531,10 @@ xfs_end_io_direct_write(
> > struct xfs_mount *mp = ip->i_mount;
> > struct xfs_ioend *ioend = private;
> >
> > + trace_xfs_gbmap_direct_endio(ip, offset, size, ioend->io_type, NULL);
> > +
> > if (XFS_FORCED_SHUTDOWN(mp))
> > - goto out_destroy_ioend;
> > + goto out_end_io;
> >
> > /*
> > * dio completion end_io functions are only called on writes if more
> > @@ -1553,40 +1555,37 @@ xfs_end_io_direct_write(
> > ioend->io_offset = offset;
> >
> > /*
> > - * While the generic direct I/O code updates the inode size, it does
> > - * so only after the end_io handler is called, which means our
> > - * end_io handler thinks the on-disk size is outside the in-core
> > - * size. To prevent this just update it a little bit earlier here.
> > + * The ioend tells us whether we are doing unwritten extent conversion
> > + * or an append transaction that updates the on-disk file size. These
> > + * cases are the only cases where we should *potentially* be needing
> > + * to update the VFS inode size. When the ioend indicates this, we
> > + * are *guaranteed* to be running in non-interrupt context.
> > + *
> > + * We need to update the in-core inode size here so that we don't end up
> > + * with the on-disk inode size being outside the in-core inode size.
> > + * While we can do this in the process context after the IO has
> > + * completed, this does not work for AIO and hence we always update
> > + * the in-core inode size here if necessary.
> > */
> > - if (offset + size > i_size_read(inode))
> > - i_size_write(inode, offset + size);
> > + if (ioend->io_type == XFS_IO_UNWRITTEN || xfs_ioend_is_append(ioend)) {
> > + if (offset + size > i_size_read(inode))
> > + i_size_write(inode, offset + size);
> > + } else
> > + ASSERT(offset + size <= i_size_read(inode));
>
> The code was obviously incorrect prior to this change, potentially
> running some of these transactions in irq context. That said, it occurs
> to me that one thing that the previous implementation looked to handle
> that this does not is racing of in-flight aio with other operations.
> E.g., what happens now if a non-extending, overwrite aio is submitted
> and races with a truncate that causes it to be extending by the time we
> get here? It looks like it would have been racy regardless, so maybe
> that's just a separate problem...
>
Looking further, we actually wait on dio in the truncate path with
IOLOCK_EXCL (e.g., similar to what we now do for extending aio itself),
so this is probably irrelevant...
Brian
> >
> > /*
> > - * For direct I/O we do not know if we need to allocate blocks or not,
> > - * so we can't preallocate an append transaction, as that results in
> > - * nested reservations and log space deadlocks. Hence allocate the
> > - * transaction here. While this is sub-optimal and can block IO
> > - * completion for some time, we're stuck with doing it this way until
> > - * we can pass the ioend to the direct IO allocation callbacks and
> > - * avoid nesting that way.
> > + * If we are doing an append IO that needs to update the EOF on disk,
> > + * do the transaction reserve now so we can use common end io
> > + * processing. Stashing the error (if there is one) in the ioend will
> > + * result in the ioend processing passing on the error if it is
> > + * possible as we can't return it from here.
> > */
> > - if (ioend->io_type == XFS_IO_UNWRITTEN) {
> > - xfs_iomap_write_unwritten(ip, offset, size);
> > - } else if (offset + size > ip->i_d.di_size) {
> > - struct xfs_trans *tp;
> > - int error;
> > -
> > - tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
> > - error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
> > - if (error) {
> > - xfs_trans_cancel(tp, 0);
> > - goto out_destroy_ioend;
> > - }
> > + if (ioend->io_type == XFS_IO_OVERWRITE && xfs_ioend_is_append(ioend))
> > + ioend->io_error = xfs_setfilesize_trans_alloc(ioend);
>
> As you mentioned previously, we no longer need the transaction context
> manipulation stuff in xfs_setfilesize_trans_alloc() with this approach.
> It's still called from the writepage path though, so I guess it needs to
> stay.
>
> Brian
>
> >
> > - xfs_setfilesize(ip, tp, offset, size);
> > - }
> > -out_destroy_ioend:
> > - xfs_destroy_ioend(ioend);
> > +out_end_io:
> > + xfs_end_io(&ioend->io_work);
> > + return;
> > }
> >
> > STATIC ssize_t
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index e78b64e..967993b 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -1224,6 +1224,7 @@ DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
> > DEFINE_IOMAP_EVENT(xfs_gbmap_direct);
> > DEFINE_IOMAP_EVENT(xfs_gbmap_direct_new);
> > DEFINE_IOMAP_EVENT(xfs_gbmap_direct_update);
> > +DEFINE_IOMAP_EVENT(xfs_gbmap_direct_endio);
> >
> > DECLARE_EVENT_CLASS(xfs_simple_io_class,
> > TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
> > --
> > 2.0.0
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-04-14 15:35 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-14 7:26 [PATCH 0/8 v2] xfs: fix direct IO completion issues Dave Chinner
2015-04-14 7:26 ` [PATCH 1/8] xfs: factor DIO write mapping from get_blocks Dave Chinner
2015-04-14 14:23 ` Brian Foster
2015-04-14 20:06 ` Dave Chinner
2015-04-14 7:26 ` [PATCH 2/8] xfs: move DIO mapping size calculation Dave Chinner
2015-04-14 14:24 ` Brian Foster
2015-04-14 7:26 ` [PATCH 3/8] xfs: DIO needs an ioend for writes Dave Chinner
2015-04-14 14:24 ` Brian Foster
2015-04-14 7:26 ` [PATCH 4/8] xfs: handle DIO overwrite EOF update completion correctly Dave Chinner
2015-04-14 14:35 ` Brian Foster
2015-04-14 15:35 ` Brian Foster [this message]
2015-04-14 20:12 ` Dave Chinner
2015-04-14 7:26 ` [PATCH 5/8] xfs: DIO writes within EOF don't need an ioend Dave Chinner
2015-04-14 14:35 ` Brian Foster
2015-04-14 20:18 ` Dave Chinner
2015-04-14 7:26 ` [PATCH 6/8] xfs: DIO write completion size updates race Dave Chinner
2015-04-14 14:35 ` Brian Foster
2015-04-14 7:26 ` [PATCH 7/8] xfs: direct IO EOF zeroing needs to drain AIO Dave Chinner
2015-04-14 14:35 ` Brian Foster
2015-04-14 7:26 ` [PATCH 8/8] xfs: using generic_file_direct_write() is unnecessary Dave Chinner
2015-04-14 14:35 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150414153538.GJ36198@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox