public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 7/7] xfs: serialise unaligned direct IOs
Date: Wed, 15 Dec 2010 12:23:28 +1100	[thread overview]
Message-ID: <1292376208-16282-8-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1292376208-16282-1-git-send-email-david@fromorbit.com>

From: Dave Chinner <dchinner@redhat.com>

When two concurrent unaligned, non-overlapping direct IOs are issued
to the same block, the direct Io layer will race to zero the block.
The result is that one of the concurrent IOs will overwrite data
written by the other IO with zeros. This is demonstrated by the
xfsqa test 240.

To avoid this problem, serialise all unaligned direct IOs to an
inode with a big hammer. We need a big hammer approach as we need to
serialise AIO as well, so we can't just block writes on locks.
Hence, the big hammer is calling xfs_ioend_wait() while holding out
other unaligned direct IOs from starting.

We don't bother trying to serialised aligned vs unaligned IOs as
they are overlapping IO and the result of concurrent overlapping IOs
is undefined - the result of either IO is a valid result so we let
them race. Hence we only penalise unaligned IO, which already has a
major overhead compared to aligned IO so this isn't a major problem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |   29 ++++++++++++++++++++++++-----
 1 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 269da12..68f88a5 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -648,10 +648,21 @@ xfs_file_aio_write_checks(
  * xfs_file_dio_aio_write - handle direct IO writes
  *
  * Lock the inode appropriately to prepare for and issue a direct IO write.
- * By spearating it from the buffered write path we remove all the tricky to
+ * By separating it from the buffered write path we remove all the tricky to
  * follow locking changes and looping. This also clearly indicates that XFS
  * does not fall back to buffered IO in the direct IO write path.
  *
+ * In most cases the direct IO writes will be done with IOLOCK_SHARED allowing
+ * them to be done in parallel with reads and other direct IO writes. However,
+ * if the IO is not aligned to filesystem blocks, the direct IO layer needs to
+ * do sub-block zeroing and that requires serialisation against other direct
+ * IOs to the same block. In this case we need to serialise the submission of
+ * the unaligned IOs so that we don't get racing block zeroing in the dio layer.
+ * To avoid the problem with aio, we also need to wait for outstanding IOs to
+ * complete so that unwritten extent conversion is completed before we try to
+ * map the overlapping block. This is currently implemented by hitting it
+ * with a big hammer (i.e. xfs_ioend_wait()).
+ *
  * Returns with locks held indicated by @need_i_mutex and @iolock.
  */
 STATIC ssize_t
@@ -671,6 +682,7 @@ xfs_file_dio_aio_write(
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			error = 0;
 	size_t			count = ocount;
+	int			unaligned_io = 0;
 	xfs_buftarg_t		*target = XFS_IS_REALTIME_INODE(ip) ?
 					mp->m_rtdev_targp : mp->m_ddev_targp;
 
@@ -679,10 +691,15 @@ xfs_file_dio_aio_write(
 	if ((pos & target->bt_smask) || (count & target->bt_smask))
 		return XFS_ERROR(-EINVAL);
 
-	if (mapping->nrpages || pos > ip->i_size) {
+	if ((pos & mp->m_blockmask) || ((pos + count) & mp->m_blockmask))
+		unaligned_io = 1;
+
+	if (unaligned_io || mapping->nrpages || pos > ip->i_size) {
 		*iolock = XFS_IOLOCK_EXCL;
 		*need_i_mutex = 1;
 		mutex_lock(&inode->i_mutex);
+		if (unaligned_io)
+			xfs_ioend_wait(ip);
 	} else {
 		*iolock = XFS_IOLOCK_SHARED;
 	}
@@ -700,10 +717,12 @@ xfs_file_dio_aio_write(
 	}
 
 	if (*need_i_mutex) {
-		/* demote the lock now the cached pages are gone */
-		xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
+		/* demote the lock now the cached pages are gone if we can */
+		if (!unaligned_io) {
+			xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
+			*iolock = XFS_IOLOCK_SHARED;
+		}
 		mutex_unlock(&inode->i_mutex);
-		*iolock = XFS_IOLOCK_SHARED;
 		*need_i_mutex = 0;
 	}
 
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

      parent reply	other threads:[~2010-12-15  1:23 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-15  1:23 [PATCH 0/7] xfs: serialise unaligned direct IOs Dave Chinner
2010-12-15  1:23 ` [PATCH 1/7] xfs: ensure sync write errors are returned Dave Chinner
2010-12-16 11:57   ` Christoph Hellwig
2010-12-16 21:57     ` Dave Chinner
2010-12-16 20:50   ` Alex Elder
2010-12-17  6:43     ` Dave Chinner
2010-12-15  1:23 ` [PATCH 2/7] xfs: factor common post-write isize handling code Dave Chinner
2010-12-15  1:23 ` [PATCH 4/7] xfs: split direct IO write path from xfs_file_aio_write Dave Chinner
2010-12-16 12:06   ` Christoph Hellwig
2010-12-17  7:31     ` Dave Chinner
2010-12-20 11:29       ` Christoph Hellwig
2010-12-21  0:51         ` Dave Chinner
2010-12-15  1:23 ` [PATCH 5/7] xfs: split buffered " Dave Chinner
2010-12-15  1:23 ` [PATCH 6/7] xfs: factor common write setup code Dave Chinner
2010-12-15  1:23 ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1292376208-16282-8-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox