From: Dave Chinner <david@fromorbit.com>
To: Avi Kivity <avi@scylladb.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: xfs_buf_lock vs aio
Date: Mon, 19 Feb 2018 15:48:00 +1100 [thread overview]
Message-ID: <20180219044800.GX7000@dastard> (raw)
In-Reply-To: <20180219024055.GW7000@dastard>
On Mon, Feb 19, 2018 at 01:40:55PM +1100, Dave Chinner wrote:
> On Fri, Feb 16, 2018 at 10:07:55AM +0200, Avi Kivity wrote:
> > On 02/15/2018 11:30 PM, Dave Chinner wrote:
> > >On Thu, Feb 15, 2018 at 11:36:54AM +0200, Avi Kivity wrote:
> > >>On 02/15/2018 01:56 AM, Dave Chinner wrote:
> > >>A little bird whispered in my ear to try XFS_IOC_OPEN_BY_HANDLE to
> > >>avoid the the time update lock, so we'll be trying that next, to
> > >>emulate lazytime.
> > >Biggest problem with that is it requires root permissions. It's not
> > >a solution that can be deployed in practice, so I haven't bothered
> > >suggesting it as something to try.
> > >
> > >If you want to try lazytime, an easier test might be to rebuild the
> > >kernel with this change below to support the lazytime mount option
> > >and not log the timestamp updates. This is essentially the mechanism
> > >that I'll use for this, but it will need to grow more stuff to have
> > >the correct lazytime semantics...
> > >
> >
> > We tried open by handle to see if lazytime would provide relief, but
> > it looks like it just pushes the lock acquisition to another place:
>
> Whack-a-mole.
>
> This is the whole problem with driving the "nowait" semantics into
> the filesystem implementations - every time we fix one blocking
> point, we find a deeper one, and we have to drive the "nowait"
> semantics deeper into code that should not have to care about IO
> level blocking semantics. And by doing it in a "slap-a-bandaid on
> it" process, we end up with spagetti code that is fragile and
> unmaintainable...
>
> > However, that function can EAGAIN (it does for IOLOCK) so maybe we
> > can change xfs_ilock to xfs_ilock_nowait and teach it about not
> > waiting for ILOCK too.
>
> If only it were that simple. Why, exactly, does the direct IO write
> code require the ILOCK exclusive? Indeed, if it goes to allocate
> blocks, we do this:
>
> /*
> * xfs_iomap_write_direct() expects the shared lock. It
> * is unlocked on return.
> */
> if (lockmode == XFS_ILOCK_EXCL)
> xfs_ilock_demote(ip, lockmode);
>
> We demote the lock to shared before we call into the allocation
> code. And for pure direct IO writes, all we care about is ensuring
> the extent map does not change while we do the lookup and check.
> That only requires a shared lock.
>
> So now I've got to go work out why need_excl_ilock() says we need
> an exclusive ilock for direct IO writes when it looks pretty clear
> to me that we don't.
>
> But that's only half the problem. The other problem is that even if
> we take it shared, we're still going to block on IO completion
> taking the ILOCK exclusive to do things like unwritten extent
> completion. So we still need to hack about with "trylock" operations
> into functions into various functions (xfs_ilock_data_map_shared()
> for one).
>
> What a mess....
Try the patch below on top of the lazytime stuff. Let's see where
the next layer of the onion is found.
-Dave.
--
Dave Chinner
david@fromorbit.com
xfs: clean up xfs_file_iomap_begin, make it non-blocking
From: Dave Chinner <dchinner@redhat.com>
Gah, what a friggin' mess.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_iomap.c | 166 ++++++++++++++++++++++++++++++++---------------------
1 file changed, 101 insertions(+), 65 deletions(-)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 66e1edbfb2b2..9c7398c8f7e7 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -955,19 +955,52 @@ static inline bool imap_needs_alloc(struct inode *inode,
(IS_DAX(inode) && imap->br_state == XFS_EXT_UNWRITTEN);
}
-static inline bool need_excl_ilock(struct xfs_inode *ip, unsigned flags)
+static int
+xfs_ilock_for_iomap(
+ struct xfs_inode *ip,
+ unsigned flags,
+ unsigned *lockmode)
{
+ unsigned mode = XFS_ILOCK_SHARED;
+
+ /* Modifications to reflink files require exclusive access */
+ if (xfs_is_reflink_inode(ip) && (flags & (IOMAP_WRITE | IOMAP_ZERO))) {
+ /*
+ * A reflinked inode will result in CoW alloc.
+ * FIXME: It could still overwrite on unshared extents
+ * and not need allocation in the direct IO path.
+ */
+ if ((flags & IOMAP_NOWAIT) && (flags & IOMAP_DIRECT))
+ return -EAGAIN;
+ mode = XFS_ILOCK_EXCL;
+ }
+
+ /* Non-direct IO modifications require exclusive access */
+ if (!(flags & IOMAP_DIRECT) && (flags & IOMAP_WRITE))
+ mode = XFS_ILOCK_EXCL;
+
/*
- * COW writes will allocate delalloc space, so we need to make sure
- * to take the lock exclusively here.
+ * Extents not yet cached requires exclusive access, don't block.
+ * This is an opencoded xfs_ilock_data_map_shared() call but with
+ * non-blocking behaviour.
*/
- if (xfs_is_reflink_inode(ip) && (flags & (IOMAP_WRITE | IOMAP_ZERO)))
- return true;
- if ((flags & IOMAP_DIRECT) && (flags & IOMAP_WRITE))
- return true;
- return false;
+ if (!(ip->i_df.if_flags & XFS_IFEXTENTS)) {
+ if (flags & IOMAP_NOWAIT)
+ return -EAGAIN;
+ mode = XFS_ILOCK_EXCL;
+ }
+
+ if (!xfs_ilock_nowait(ip, mode)) {
+ if (flags & IOMAP_NOWAIT)
+ return -EAGAIN;
+ xfs_ilock(ip, mode);
+ }
+
+ *lockmode = mode;
+ return 0;
}
+
static int
xfs_file_iomap_begin(
struct inode *inode,
@@ -993,17 +1026,15 @@ xfs_file_iomap_begin(
return xfs_file_iomap_begin_delay(inode, offset, length, iomap);
}
- if (need_excl_ilock(ip, flags)) {
- lockmode = XFS_ILOCK_EXCL;
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- } else {
- lockmode = xfs_ilock_data_map_shared(ip);
- }
-
- if ((flags & IOMAP_NOWAIT) && !(ip->i_df.if_flags & XFS_IFEXTENTS)) {
- error = -EAGAIN;
- goto out_unlock;
- }
+ /*
+ * Lock the inode in the manner required for the specified operation and
+ * check for as many conditions that would result in blocking as
+ * possible. This removes most of the non-blocking checks from the
+ * mapping code below.
+ */
+ error = xfs_ilock_for_iomap(ip, flags, &lockmode);
+ if (error)
+ return error;
ASSERT(offset <= mp->m_super->s_maxbytes);
if (offset > mp->m_super->s_maxbytes - length)
@@ -1024,17 +1055,16 @@ xfs_file_iomap_begin(
goto out_unlock;
}
- if ((flags & (IOMAP_WRITE | IOMAP_ZERO)) && xfs_is_reflink_inode(ip)) {
+ /* Non-modifying mapping requested, so we are done */
+ if (!(flags & (IOMAP_WRITE | IOMAP_ZERO)))
+ goto iomap_found;
+
+ /*
+ * Break shared extents if necessary. Checks for blocking in the direct
+ * IO case have been done up front, so we don't need to do them here.
+ */
+ if (xfs_is_reflink_inode(ip)) {
if (flags & IOMAP_DIRECT) {
- /*
- * A reflinked inode will result in CoW alloc.
- * FIXME: It could still overwrite on unshared extents
- * and not need allocation.
- */
- if (flags & IOMAP_NOWAIT) {
- error = -EAGAIN;
- goto out_unlock;
- }
/* may drop and re-acquire the ilock */
error = xfs_reflink_allocate_cow(ip, &imap, &shared,
&lockmode);
@@ -1050,46 +1080,45 @@ xfs_file_iomap_begin(
length = XFS_FSB_TO_B(mp, end_fsb) - offset;
}
- if ((flags & IOMAP_WRITE) && imap_needs_alloc(inode, &imap, nimaps)) {
- /*
- * If nowait is set bail since we are going to make
- * allocations.
- */
- if (flags & IOMAP_NOWAIT) {
- error = -EAGAIN;
- goto out_unlock;
- }
- /*
- * We cap the maximum length we map here to MAX_WRITEBACK_PAGES
- * pages to keep the chunks of work done where somewhat symmetric
- * with the work writeback does. This is a completely arbitrary
- * number pulled out of thin air as a best guess for initial
- * testing.
- *
- * Note that the values needs to be less than 32-bits wide until
- * the lower level functions are updated.
- */
- length = min_t(loff_t, length, 1024 * PAGE_SIZE);
- /*
- * xfs_iomap_write_direct() expects the shared lock. It
- * is unlocked on return.
- */
- if (lockmode == XFS_ILOCK_EXCL)
- xfs_ilock_demote(ip, lockmode);
- error = xfs_iomap_write_direct(ip, offset, length, &imap,
- nimaps);
- if (error)
- return error;
+ /* Don't need to allocate over holes when doing zeroing operations. */
+ if (flags & IOMAP_ZERO)
+ goto iomap_found;
+ ASSERT(flags & IOMAP_WRITE);
- iomap->flags = IOMAP_F_NEW;
- trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
- } else {
- ASSERT(nimaps);
+ if (!imap_needs_alloc(inode, &imap, nimaps))
+ goto iomap_found;
- xfs_iunlock(ip, lockmode);
- trace_xfs_iomap_found(ip, offset, length, 0, &imap);
+ /* Don't block on allocation if we are doing non-blocking IO */
+ if (flags & IOMAP_NOWAIT) {
+ error = -EAGAIN;
+ goto out_unlock;
}
+ /*
+ * We cap the maximum length we map here to a sane size keep the chunks
+ * of work done where somewhat symmetric with the work writeback does.
+ * This is a completely arbitrary number pulled out of thin air as a
+ * best guess for initial testing.
+ *
+ * Note that the values needs to be less than 32-bits wide until the
+ * lower level functions are updated.
+ */
+ length = min_t(loff_t, length, 1024 * PAGE_SIZE);
+
+ /*
+ * xfs_iomap_write_direct() expects the shared lock. It is unlocked on
+ * return.
+ */
+ if (lockmode == XFS_ILOCK_EXCL)
+ xfs_ilock_demote(ip, lockmode);
+ error = xfs_iomap_write_direct(ip, offset, length, &imap, nimaps);
+ if (error)
+ return error;
+
+ iomap->flags = IOMAP_F_NEW;
+ trace_xfs_iomap_alloc(ip, offset, length, 0, &imap);
+
+iomap_finish:
if (xfs_ipincount(ip) && (ip->i_itemp->ili_fsync_fields
& ~XFS_ILOG_TIMESTAMP))
iomap->flags |= IOMAP_F_DIRTY;
@@ -1099,6 +1128,13 @@ xfs_file_iomap_begin(
if (shared)
iomap->flags |= IOMAP_F_SHARED;
return 0;
+
+iomap_found:
+ ASSERT(nimaps);
+ xfs_iunlock(ip, lockmode);
+ trace_xfs_iomap_found(ip, offset, length, 0, &imap);
+ goto iomap_finish;
+
out_unlock:
xfs_iunlock(ip, lockmode);
return error;
next prev parent reply other threads:[~2018-02-19 4:48 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-07 17:20 xfs_buf_lock vs aio Avi Kivity
2018-02-07 23:33 ` Dave Chinner
2018-02-08 8:24 ` Avi Kivity
2018-02-08 22:11 ` Dave Chinner
2018-02-09 12:11 ` Avi Kivity
2018-02-09 23:10 ` Dave Chinner
2018-02-12 9:33 ` Avi Kivity
2018-02-13 5:18 ` Dave Chinner
2018-02-13 23:14 ` Darrick J. Wong
2018-02-14 2:16 ` Dave Chinner
2018-02-14 12:01 ` Avi Kivity
2018-02-14 12:07 ` Avi Kivity
2018-02-14 12:18 ` Avi Kivity
2018-02-14 23:56 ` Dave Chinner
2018-02-15 9:36 ` Avi Kivity
2018-02-15 21:30 ` Dave Chinner
2018-02-16 8:07 ` Avi Kivity
2018-02-19 2:40 ` Dave Chinner
2018-02-19 4:48 ` Dave Chinner [this message]
2018-02-25 17:47 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180219044800.GX7000@dastard \
--to=david@fromorbit.com \
--cc=avi@scylladb.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.