From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 4/8] xfs: handle DIO overwrite EOF update completion correctly
Date: Tue, 14 Apr 2015 10:35:02 -0400 [thread overview]
Message-ID: <20150414143501.GE36198@bfoster.bfoster> (raw)
In-Reply-To: <1428996411-1507-5-git-send-email-david@fromorbit.com>
On Tue, Apr 14, 2015 at 05:26:47PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Currently a DIO overwrite that extends the EOF (e.g sub-block IO or
> write into allocated blocks beyond EOF) requires a transaction for
> the EOF update. Thi is done in IO completion context, but we aren't
> explicitly handling this situation properly and so it can run in
> interrupt context. Ensure that we defer IO that spans EOF correctly
> to the DIO completion workqueue, and now that we have an ioend in IO
> completion we can use the common ioend completion path to do all the
> work.
>
> Note: we do not preallocate the append transaction as we can have
> multiple mapping and allocation calls per direct IO. hence
> preallocating can still leave us with nested transactions by
> attempting to map and allocate more blocks after we've preallocated
> an append transaction.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/xfs_aops.c | 61 +++++++++++++++++++++++++++---------------------------
> fs/xfs/xfs_trace.h | 1 +
> 2 files changed, 31 insertions(+), 31 deletions(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index e1fa926..e3968a3 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1293,7 +1293,7 @@ xfs_map_direct(
> imap);
> }
>
> - if (ioend->io_type == XFS_IO_UNWRITTEN)
> + if (ioend->io_type == XFS_IO_UNWRITTEN || xfs_ioend_is_append(ioend))
> set_buffer_defer_completion(bh_result);
> }
>
> @@ -1531,8 +1531,10 @@ xfs_end_io_direct_write(
> struct xfs_mount *mp = ip->i_mount;
> struct xfs_ioend *ioend = private;
>
> + trace_xfs_gbmap_direct_endio(ip, offset, size, ioend->io_type, NULL);
> +
> if (XFS_FORCED_SHUTDOWN(mp))
> - goto out_destroy_ioend;
> + goto out_end_io;
>
> /*
> * dio completion end_io functions are only called on writes if more
> @@ -1553,40 +1555,37 @@ xfs_end_io_direct_write(
> ioend->io_offset = offset;
>
> /*
> - * While the generic direct I/O code updates the inode size, it does
> - * so only after the end_io handler is called, which means our
> - * end_io handler thinks the on-disk size is outside the in-core
> - * size. To prevent this just update it a little bit earlier here.
> + * The ioend tells us whether we are doing unwritten extent conversion
> + * or an append transaction that updates the on-disk file size. These
> + * cases are the only cases where we should *potentially* be needing
> + * to update the VFS inode size. When the ioend indicates this, we
> + * are *guaranteed* to be running in non-interrupt context.
> + *
> + * We need to update the in-core inode size here so that we don't end up
> + * with the on-disk inode size being outside the in-core inode size.
> + * While we can do this in the process context after the IO has
> + * completed, this does not work for AIO and hence we always update
> + * the in-core inode size here if necessary.
> */
> - if (offset + size > i_size_read(inode))
> - i_size_write(inode, offset + size);
> + if (ioend->io_type == XFS_IO_UNWRITTEN || xfs_ioend_is_append(ioend)) {
> + if (offset + size > i_size_read(inode))
> + i_size_write(inode, offset + size);
> + } else
> + ASSERT(offset + size <= i_size_read(inode));
The code was obviously incorrect prior to this change, potentially
running some of these transactions in irq context. That said, it occurs
to me that one thing that the previous implementation looked to handle
that this does not is racing of in-flight aio with other operations.
E.g., what happens now if a non-extending, overwrite aio is submitted
and races with a truncate that causes it to be extending by the time we
get here? It looks like it would have been racy regardless, so maybe
that's just a separate problem...
>
> /*
> - * For direct I/O we do not know if we need to allocate blocks or not,
> - * so we can't preallocate an append transaction, as that results in
> - * nested reservations and log space deadlocks. Hence allocate the
> - * transaction here. While this is sub-optimal and can block IO
> - * completion for some time, we're stuck with doing it this way until
> - * we can pass the ioend to the direct IO allocation callbacks and
> - * avoid nesting that way.
> + * If we are doing an append IO that needs to update the EOF on disk,
> + * do the transaction reserve now so we can use common end io
> + * processing. Stashing the error (if there is one) in the ioend will
> + * result in the ioend processing passing on the error if it is
> + * possible as we can't return it from here.
> */
> - if (ioend->io_type == XFS_IO_UNWRITTEN) {
> - xfs_iomap_write_unwritten(ip, offset, size);
> - } else if (offset + size > ip->i_d.di_size) {
> - struct xfs_trans *tp;
> - int error;
> -
> - tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
> - error = xfs_trans_reserve(tp, &M_RES(mp)->tr_fsyncts, 0, 0);
> - if (error) {
> - xfs_trans_cancel(tp, 0);
> - goto out_destroy_ioend;
> - }
> + if (ioend->io_type == XFS_IO_OVERWRITE && xfs_ioend_is_append(ioend))
> + ioend->io_error = xfs_setfilesize_trans_alloc(ioend);
As you mentioned previously, we no longer need the transaction context
manipulation stuff in xfs_setfilesize_trans_alloc() with this approach.
It's still called from the writepage path though, so I guess it needs to
stay.
Brian
>
> - xfs_setfilesize(ip, tp, offset, size);
> - }
> -out_destroy_ioend:
> - xfs_destroy_ioend(ioend);
> +out_end_io:
> + xfs_end_io(&ioend->io_work);
> + return;
> }
>
> STATIC ssize_t
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index e78b64e..967993b 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -1224,6 +1224,7 @@ DEFINE_IOMAP_EVENT(xfs_get_blocks_alloc);
> DEFINE_IOMAP_EVENT(xfs_gbmap_direct);
> DEFINE_IOMAP_EVENT(xfs_gbmap_direct_new);
> DEFINE_IOMAP_EVENT(xfs_gbmap_direct_update);
> +DEFINE_IOMAP_EVENT(xfs_gbmap_direct_endio);
>
> DECLARE_EVENT_CLASS(xfs_simple_io_class,
> TP_PROTO(struct xfs_inode *ip, xfs_off_t offset, ssize_t count),
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-04-14 14:35 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-14 7:26 [PATCH 0/8 v2] xfs: fix direct IO completion issues Dave Chinner
2015-04-14 7:26 ` [PATCH 1/8] xfs: factor DIO write mapping from get_blocks Dave Chinner
2015-04-14 14:23 ` Brian Foster
2015-04-14 20:06 ` Dave Chinner
2015-04-14 7:26 ` [PATCH 2/8] xfs: move DIO mapping size calculation Dave Chinner
2015-04-14 14:24 ` Brian Foster
2015-04-14 7:26 ` [PATCH 3/8] xfs: DIO needs an ioend for writes Dave Chinner
2015-04-14 14:24 ` Brian Foster
2015-04-14 7:26 ` [PATCH 4/8] xfs: handle DIO overwrite EOF update completion correctly Dave Chinner
2015-04-14 14:35 ` Brian Foster [this message]
2015-04-14 15:35 ` Brian Foster
2015-04-14 20:12 ` Dave Chinner
2015-04-14 7:26 ` [PATCH 5/8] xfs: DIO writes within EOF don't need an ioend Dave Chinner
2015-04-14 14:35 ` Brian Foster
2015-04-14 20:18 ` Dave Chinner
2015-04-14 7:26 ` [PATCH 6/8] xfs: DIO write completion size updates race Dave Chinner
2015-04-14 14:35 ` Brian Foster
2015-04-14 7:26 ` [PATCH 7/8] xfs: direct IO EOF zeroing needs to drain AIO Dave Chinner
2015-04-14 14:35 ` Brian Foster
2015-04-14 7:26 ` [PATCH 8/8] xfs: using generic_file_direct_write() is unnecessary Dave Chinner
2015-04-14 14:35 ` Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150414143501.GE36198@bfoster.bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox