[PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2
@ 2011-01-04  4:48 Dave Chinner
  2011-01-04  4:48 ` [PATCH 1/8] xfs: ensure sync write errors are returned Dave Chinner
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Dave Chinner @ 2011-01-04  4:48 UTC (permalink / raw)
  To: xfs

This series aims to serialise unaligned direct IOs to an inode to
avoid corruption caused by sub-block zeroing races. The previous
approaches at the direct IO layer fail because for !DIO_LOCKING
filesystems like XFS, there is no way we can track and serialise all
the direct IOs to a given inode in a race free manner. While we can
track them, we cannot close the races between mapping blocks and
tracked IO completion occuring before subsequent tracking lookups
without adding some kind of locking to the DIO layer. Hence for
!DIO_LOCKING users, unaligned direct IO needs to be serialised at a
higher layer.

Because the xfs_file_aio_write() path is so twisted and difficult to
follow, adding new locking cases to the code is difficult to verify
that it is correct in all cases. Hence the series starts by cleaning
up the code and splitting apart the direct IO and buffered IO paths
before adding the unaligned direct IO detection and serialisation.

The first patch fixes a sync write error handling bug - we should
consider pushing that to .38. The next patches factor code that is
common to write and splice into helpers. The direct and buffered IO
paths are then separated out and the common write checks and bounds
limiting is factored out into a helper.

Finally, the serialisation of unaligned direct IOs is added by a
big-hammer approach. That is, we take the i_mutex and
XFS_IOLOCK_EXCL and hold them across the unaligned IO submission.
This means that unaligned direct IO submission is serialised, and
non-AIO DIO is serialised completely.

For unaligned AIO DIO, this would only serialise the submission of
the DIO, leaving the sub-block zeroing races open for unaligned
writes into unwritten extents. To avoid this problem, we use
xfs_ioend_wait() to ensure all AIO writes have completed before we
submit the unaligned write. We do this wait holding the i_mutex so
we serialise against other unaligned AIO as there is no need to
serialise against aligned DIO.

Version 2:
- fix initial sync write error return fixup
- add new patch to abstract locking from read/write path and remove
  the need for the need_i_mutex variable.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/8] xfs: ensure sync write errors are returned
  2011-01-04  4:48 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Dave Chinner
@ 2011-01-04  4:48 ` Dave Chinner
  2011-01-05  1:53   ` Alex Elder
  2011-01-07  8:45   ` Christoph Hellwig
  2011-01-04  4:48 ` [PATCH 2/8] xfs: factor common post-write isize handling code Dave Chinner
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 27+ messages in thread
From: Dave Chinner @ 2011-01-04  4:48 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_file_aio_write() only returns the error from synchronous
flushing of the data and inode if error == 0. At the point where
error is being checked, it is guaranteed to be > 0. Therefore any
errors returned by the data or fsync flush will never be returned.
Fix the checks so we overwrite the current error once and only if an
error really occurred.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |   51 +++++++++++++++++++++----------------------
 1 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index ba8ad42..de71a84 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -574,7 +574,7 @@ xfs_file_aio_write(
 	struct inode		*inode = mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
-	ssize_t			ret = 0, error = 0;
+	ssize_t			ret = 0;
 	int			ioflags = 0;
 	xfs_fsize_t		isize, new_size;
 	int			iolock;
@@ -590,9 +590,9 @@ xfs_file_aio_write(
 	if (file->f_mode & FMODE_NOCMTIME)
 		ioflags |= IO_INVIS;
 
-	error = generic_segment_checks(iovp, &nr_segs, &ocount, VERIFY_READ);
-	if (error)
-		return error;
+	ret = generic_segment_checks(iovp, &nr_segs, &ocount, VERIFY_READ);
+	if (ret)
+		return ret;
 
 	count = ocount;
 	if (count == 0)
@@ -616,9 +616,9 @@ relock:
 	xfs_ilock(ip, XFS_ILOCK_EXCL|iolock);
 
 start:
-	error = -generic_write_checks(file, &pos, &count,
+	ret = generic_write_checks(file, &pos, &count,
 					S_ISBLK(inode->i_mode));
-	if (error) {
+	if (ret) {
 		xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
 		goto out_unlock_mutex;
 	}
@@ -660,8 +660,8 @@ start:
 	 */
 
 	if (pos > ip->i_size) {
-		error = xfs_zero_eof(ip, pos, ip->i_size);
-		if (error) {
+		ret = -xfs_zero_eof(ip, pos, ip->i_size);
+		if (ret) {
 			xfs_iunlock(ip, XFS_ILOCK_EXCL);
 			goto out_unlock_internal;
 		}
@@ -674,8 +674,8 @@ start:
 	 * by root.  This keeps people from modifying setuid and
 	 * setgid binaries.
 	 */
-	error = -file_remove_suid(file);
-	if (unlikely(error))
+	ret = file_remove_suid(file);
+	if (unlikely(ret))
 		goto out_unlock_internal;
 
 	/* We can write back this queue in page reclaim */
@@ -684,10 +684,10 @@ start:
 	if ((ioflags & IO_ISDIRECT)) {
 		if (mapping->nrpages) {
 			WARN_ON(need_i_mutex == 0);
-			error = xfs_flushinval_pages(ip,
+			ret = -xfs_flushinval_pages(ip,
 					(pos & PAGE_CACHE_MASK),
 					-1, FI_REMAPF_LOCKED);
-			if (error)
+			if (ret)
 				goto out_unlock_internal;
 		}
 
@@ -720,24 +720,24 @@ start:
 		}
 	} else {
 		int enospc = 0;
-		ssize_t ret2 = 0;
 
 write_retry:
 		trace_xfs_file_buffered_write(ip, count, iocb->ki_pos, ioflags);
-		ret2 = generic_file_buffered_write(iocb, iovp, nr_segs,
+		ret = generic_file_buffered_write(iocb, iovp, nr_segs,
 				pos, &iocb->ki_pos, count, ret);
 		/*
 		 * if we just got an ENOSPC, flush the inode now we
 		 * aren't holding any page locks and retry *once*
 		 */
-		if (ret2 == -ENOSPC && !enospc) {
-			error = xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
-			if (error)
+		if (ret == -ENOSPC && !enospc) {
+			ssize_t ret2 = xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
+			if (ret2) {
+				ret = ret2;
 				goto out_unlock_internal;
+			}
 			enospc = 1;
 			goto write_retry;
 		}
-		ret = ret2;
 	}
 
 	current->backing_dev_info = NULL;
@@ -753,7 +753,6 @@ write_retry:
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	}
 
-	error = -ret;
 	if (ret <= 0)
 		goto out_unlock_internal;
 
@@ -762,23 +761,23 @@ write_retry:
 	/* Handle various SYNC-type writes */
 	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
 		loff_t end = pos + ret - 1;
-		int error2;
+		int error, error2;
 
 		xfs_iunlock(ip, iolock);
 		if (need_i_mutex)
 			mutex_unlock(&inode->i_mutex);
 
-		error2 = filemap_write_and_wait_range(mapping, pos, end);
-		if (!error)
-			error = error2;
+		error = filemap_write_and_wait_range(mapping, pos, end);
 		if (need_i_mutex)
 			mutex_lock(&inode->i_mutex);
 		xfs_ilock(ip, iolock);
 
 		error2 = -xfs_file_fsync(file,
 					 (file->f_flags & __O_SYNC) ? 0 : 1);
-		if (!error)
-			error = error2;
+		if (error)
+			ret = error;
+		else if (error2)
+			ret = error2;
 	}
 
  out_unlock_internal:
@@ -800,7 +799,7 @@ write_retry:
  out_unlock_mutex:
 	if (need_i_mutex)
 		mutex_unlock(&inode->i_mutex);
-	return -error;
+	return ret;
 }
 
 STATIC int
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/8] xfs: factor common post-write isize handling code
  2011-01-04  4:48 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Dave Chinner
  2011-01-04  4:48 ` [PATCH 1/8] xfs: ensure sync write errors are returned Dave Chinner
@ 2011-01-04  4:48 ` Dave Chinner
  2011-01-05  1:54   ` Alex Elder
  2011-01-04  4:48 ` [PATCH 4/8] xfs: introduce xfs_rw_lock() helpers for locking the inode Dave Chinner
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2011-01-04  4:48 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |   52 +++++++++++++++++++++++-------------------
 1 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index de71a84..841529e 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -321,6 +321,30 @@ xfs_file_splice_read(
 	return ret;
 }
 
+STATIC void
+xfs_aio_write_isize_update(
+	struct inode	*inode,
+	loff_t		*ppos,
+	ssize_t		bytes_written)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	xfs_fsize_t		isize = i_size_read(inode);
+
+	if (bytes_written > 0)
+		XFS_STATS_ADD(xs_write_bytes, bytes_written);
+
+	if (unlikely(bytes_written < 0 && bytes_written != -EFAULT &&
+					*ppos > isize))
+		*ppos = isize;
+
+	if (*ppos > ip->i_size) {
+		xfs_ilock(ip, XFS_ILOCK_EXCL);
+		if (*ppos > ip->i_size)
+			ip->i_size = *ppos;
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	}
+}
+
 STATIC ssize_t
 xfs_file_splice_write(
 	struct pipe_inode_info	*pipe,
@@ -331,7 +355,7 @@ xfs_file_splice_write(
 {
 	struct inode		*inode = outfilp->f_mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
-	xfs_fsize_t		isize, new_size;
+	xfs_fsize_t		new_size;
 	int			ioflags = 0;
 	ssize_t			ret;
 
@@ -355,19 +379,8 @@ xfs_file_splice_write(
 	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
 
 	ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
-	if (ret > 0)
-		XFS_STATS_ADD(xs_write_bytes, ret);
-
-	isize = i_size_read(inode);
-	if (unlikely(ret < 0 && ret != -EFAULT && *ppos > isize))
-		*ppos = isize;
 
-	if (*ppos > ip->i_size) {
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		if (*ppos > ip->i_size)
-			ip->i_size = *ppos;
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	}
+	xfs_aio_write_isize_update(inode, ppos, ret);
 
 	if (ip->i_new_size) {
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
@@ -576,7 +589,7 @@ xfs_file_aio_write(
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			ret = 0;
 	int			ioflags = 0;
-	xfs_fsize_t		isize, new_size;
+	xfs_fsize_t		new_size;
 	int			iolock;
 	size_t			ocount = 0, count;
 	int			need_i_mutex;
@@ -742,16 +755,7 @@ write_retry:
 
 	current->backing_dev_info = NULL;
 
-	isize = i_size_read(inode);
-	if (unlikely(ret < 0 && ret != -EFAULT && iocb->ki_pos > isize))
-		iocb->ki_pos = isize;
-
-	if (iocb->ki_pos > ip->i_size) {
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		if (iocb->ki_pos > ip->i_size)
-			ip->i_size = iocb->ki_pos;
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	}
+	xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
 
 	if (ret <= 0)
 		goto out_unlock_internal;
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/8] xfs: introduce xfs_rw_lock() helpers for locking the inode
  2011-01-04  4:48 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Dave Chinner
  2011-01-04  4:48 ` [PATCH 1/8] xfs: ensure sync write errors are returned Dave Chinner
  2011-01-04  4:48 ` [PATCH 2/8] xfs: factor common post-write isize handling code Dave Chinner
@ 2011-01-04  4:48 ` Dave Chinner
  2011-01-05  1:54   ` Alex Elder
  2011-01-04  4:48 ` [PATCH 5/8] xfs: split direct IO write path from xfs_file_aio_write Dave Chinner
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2011-01-04  4:48 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

We need to obtain the i_mutex, i_iolock and i_ilock during the read
and write paths. Add a set of wrapper functions to neatly
encapsulate the lock ordering and shared/exclusive semantics to make
the locking easier to follow and get right.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |  123 ++++++++++++++++++++++++-------------------
 1 files changed, 68 insertions(+), 55 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 33a688c..0d6111e 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -41,6 +41,40 @@
 static const struct vm_operations_struct xfs_file_vm_ops;
 
 /*
+ * Locking primitives for read and write IO paths to ensure we consistently use
+ * and order the inode->i_mutex, ip->i_lock and ip->i_iolock.
+ */
+static inline void
+xfs_rw_ilock(
+	struct xfs_inode	*ip,
+	int			type)
+{
+	if (type & XFS_IOLOCK_EXCL)
+		mutex_lock(&VFS_I(ip)->i_mutex);
+	xfs_ilock(ip, type);
+}
+
+static inline void
+xfs_rw_iunlock(
+	struct xfs_inode	*ip,
+	int			type)
+{
+	xfs_iunlock(ip, type);
+	if (type & XFS_IOLOCK_EXCL)
+		mutex_unlock(&VFS_I(ip)->i_mutex);
+}
+
+static inline void
+xfs_rw_ilock_demote(
+	struct xfs_inode	*ip,
+	int			type)
+{
+	xfs_ilock_demote(ip, type);
+	if (type & XFS_IOLOCK_EXCL)
+		mutex_unlock(&VFS_I(ip)->i_mutex);
+}
+
+/*
  *	xfs_iozero
  *
  *	xfs_iozero clears the specified range of buffer supplied,
@@ -262,22 +296,21 @@ xfs_file_aio_read(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	if (unlikely(ioflags & IO_ISDIRECT))
-		mutex_lock(&inode->i_mutex);
-	xfs_ilock(ip, XFS_IOLOCK_SHARED);
-
 	if (unlikely(ioflags & IO_ISDIRECT)) {
+		xfs_rw_ilock(ip, XFS_IOLOCK_EXCL);
+
 		if (inode->i_mapping->nrpages) {
 			ret = -xfs_flushinval_pages(ip,
 					(iocb->ki_pos & PAGE_CACHE_MASK),
 					-1, FI_REMAPF_LOCKED);
+			if (ret) {
+				xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL);
+				return ret;
+			}
 		}
-		mutex_unlock(&inode->i_mutex);
-		if (ret) {
-			xfs_iunlock(ip, XFS_IOLOCK_SHARED);
-			return ret;
-		}
-	}
+		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
+	} else
+		xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
 
 	trace_xfs_file_read(ip, size, iocb->ki_pos, ioflags);
 
@@ -285,7 +318,7 @@ xfs_file_aio_read(
 	if (ret > 0)
 		XFS_STATS_ADD(xs_read_bytes, ret);
 
-	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+	xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
 	return ret;
 }
 
@@ -309,7 +342,7 @@ xfs_file_splice_read(
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
 		return -EIO;
 
-	xfs_ilock(ip, XFS_IOLOCK_SHARED);
+	xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
 
 	trace_xfs_file_splice_read(ip, count, *ppos, ioflags);
 
@@ -317,7 +350,7 @@ xfs_file_splice_read(
 	if (ret > 0)
 		XFS_STATS_ADD(xs_read_bytes, ret);
 
-	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+	xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
 	return ret;
 }
 
@@ -338,10 +371,10 @@ xfs_aio_write_isize_update(
 		*ppos = isize;
 
 	if (*ppos > ip->i_size) {
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
+		xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
 		if (*ppos > ip->i_size)
 			ip->i_size = *ppos;
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 	}
 }
 
@@ -356,11 +389,11 @@ xfs_aio_write_newsize_update(
 	struct xfs_inode	*ip)
 {
 	if (ip->i_new_size) {
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
+		xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
 		ip->i_new_size = 0;
 		if (ip->i_d.di_size > ip->i_size)
 			ip->i_d.di_size = ip->i_size;
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 	}
 }
 
@@ -386,14 +419,13 @@ xfs_file_splice_write(
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
 		return -EIO;
 
-	xfs_ilock(ip, XFS_IOLOCK_EXCL);
+	xfs_rw_ilock(ip, XFS_ILOCK_EXCL|XFS_IOLOCK_EXCL);
 
 	new_size = *ppos + count;
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	if (new_size > ip->i_size)
 		ip->i_new_size = new_size;
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 
 	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
 
@@ -401,7 +433,7 @@ xfs_file_splice_write(
 
 	xfs_aio_write_isize_update(inode, ppos, ret);
 	xfs_aio_write_newsize_update(ip);
-	xfs_iunlock(ip, XFS_IOLOCK_EXCL);
+	xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL);
 	return ret;
 }
 
@@ -604,7 +636,6 @@ xfs_file_aio_write(
 	xfs_fsize_t		new_size;
 	int			iolock;
 	size_t			ocount = 0, count;
-	int			need_i_mutex;
 
 	XFS_STATS_INC(xs_write_calls);
 
@@ -631,21 +662,16 @@ xfs_file_aio_write(
 relock:
 	if (ioflags & IO_ISDIRECT) {
 		iolock = XFS_IOLOCK_SHARED;
-		need_i_mutex = 0;
 	} else {
 		iolock = XFS_IOLOCK_EXCL;
-		need_i_mutex = 1;
-		mutex_lock(&inode->i_mutex);
 	}
 
-	xfs_ilock(ip, XFS_ILOCK_EXCL|iolock);
-
 start:
 	ret = generic_write_checks(file, &pos, &count,
 					S_ISBLK(inode->i_mode));
 	if (ret) {
-		xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
-		goto out_unlock_mutex;
+		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
+		return ret;
 	}
 
 	if (ioflags & IO_ISDIRECT) {
@@ -654,16 +680,14 @@ start:
 				mp->m_rtdev_targp : mp->m_ddev_targp;
 
 		if ((pos & target->bt_smask) || (count & target->bt_smask)) {
-			xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
+			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
 			return XFS_ERROR(-EINVAL);
 		}
 
-		if (!need_i_mutex && (mapping->nrpages || pos > ip->i_size)) {
-			xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
+		if (iolock != XFS_IOLOCK_EXCL &&
+		    (mapping->nrpages || pos > ip->i_size)) {
+			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
 			iolock = XFS_IOLOCK_EXCL;
-			need_i_mutex = 1;
-			mutex_lock(&inode->i_mutex);
-			xfs_ilock(ip, XFS_ILOCK_EXCL|iolock);
 			goto start;
 		}
 	}
@@ -687,11 +711,11 @@ start:
 	if (pos > ip->i_size) {
 		ret = -xfs_zero_eof(ip, pos, ip->i_size);
 		if (ret) {
-			xfs_iunlock(ip, XFS_ILOCK_EXCL);
+			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 			goto out_unlock_internal;
 		}
 	}
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 
 	/*
 	 * If we're writing the file then make sure to clear the
@@ -708,7 +732,7 @@ start:
 
 	if ((ioflags & IO_ISDIRECT)) {
 		if (mapping->nrpages) {
-			WARN_ON(need_i_mutex == 0);
+			WARN_ON(iolock != XFS_IOLOCK_EXCL);
 			ret = -xfs_flushinval_pages(ip,
 					(pos & PAGE_CACHE_MASK),
 					-1, FI_REMAPF_LOCKED);
@@ -716,13 +740,10 @@ start:
 				goto out_unlock_internal;
 		}
 
-		if (need_i_mutex) {
+		if (iolock == XFS_IOLOCK_EXCL) {
 			/* demote the lock now the cached pages are gone */
-			xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
-			mutex_unlock(&inode->i_mutex);
-
+			xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
 			iolock = XFS_IOLOCK_SHARED;
-			need_i_mutex = 0;
 		}
 
 		trace_xfs_file_direct_write(ip, count, iocb->ki_pos, ioflags);
@@ -740,7 +761,7 @@ start:
 			count -= ret;
 
 			ioflags &= ~IO_ISDIRECT;
-			xfs_iunlock(ip, iolock);
+			xfs_rw_iunlock(ip, iolock);
 			goto relock;
 		}
 	} else {
@@ -779,14 +800,9 @@ write_retry:
 		loff_t end = pos + ret - 1;
 		int error, error2;
 
-		xfs_iunlock(ip, iolock);
-		if (need_i_mutex)
-			mutex_unlock(&inode->i_mutex);
-
+		xfs_rw_iunlock(ip, iolock);
 		error = filemap_write_and_wait_range(mapping, pos, end);
-		if (need_i_mutex)
-			mutex_lock(&inode->i_mutex);
-		xfs_ilock(ip, iolock);
+		xfs_rw_ilock(ip, iolock);
 
 		error2 = -xfs_file_fsync(file,
 					 (file->f_flags & __O_SYNC) ? 0 : 1);
@@ -798,10 +814,7 @@ write_retry:
 
  out_unlock_internal:
 	xfs_aio_write_newsize_update(ip);
-	xfs_iunlock(ip, iolock);
- out_unlock_mutex:
-	if (need_i_mutex)
-		mutex_unlock(&inode->i_mutex);
+	xfs_rw_iunlock(ip, iolock);
 	return ret;
 }
 
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/8] xfs: split direct IO write path from xfs_file_aio_write
  2011-01-04  4:48 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Dave Chinner
                   ` (2 preceding siblings ...)
  2011-01-04  4:48 ` [PATCH 4/8] xfs: introduce xfs_rw_lock() helpers for locking the inode Dave Chinner
@ 2011-01-04  4:48 ` Dave Chinner
  2011-01-05  1:54   ` Alex Elder
  2011-01-07  8:58   ` Christoph Hellwig
  2011-01-04  4:48 ` [PATCH 6/8] xfs: split buffered " Dave Chinner
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 27+ messages in thread
From: Dave Chinner @ 2011-01-04  4:48 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The current xfs_file_aio_write code is a mess of locking shenanigans
to handle the different locking requirements of buffered and direct
IO. Start to clean this up by disentangling the direct IO path from
the mess.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |  168 ++++++++++++++++++++++++++++---------------
 1 files changed, 110 insertions(+), 58 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 0d6111e..d546953 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -619,6 +619,110 @@ out_lock:
 	return error;
 }
 
+/*
+ * xfs_file_dio_aio_write - handle direct IO writes
+ *
+ * Lock the inode appropriately to prepare for and issue a direct IO write.
+ * By spearating it from the buffered write path we remove all the tricky to
+ * follow locking changes and looping. This also clearly indicates that XFS
+ * does not fall back to buffered IO in the direct IO write path.
+ *
+ * Returns with locks held indicated by @iolock and errors indicated by
+ * negative return values.
+ */
+STATIC ssize_t
+xfs_file_dio_aio_write(
+	struct kiocb		*iocb,
+	const struct iovec	*iovp,
+	unsigned long		nr_segs,
+	loff_t			pos,
+	size_t			ocount,
+	int			*iolock)
+{
+	struct file		*file = iocb->ki_filp;
+	struct address_space	*mapping = file->f_mapping;
+	struct inode		*inode = mapping->host;
+	struct xfs_inode	*ip = XFS_I(inode);
+	struct xfs_mount	*mp = ip->i_mount;
+	ssize_t			ret = 0;
+	xfs_fsize_t		new_size;
+	size_t			count = ocount;
+	xfs_buftarg_t		*target = XFS_IS_REALTIME_INODE(ip) ?
+					mp->m_rtdev_targp : mp->m_ddev_targp;
+
+	*iolock = 0;
+	if ((pos & target->bt_smask) || (count & target->bt_smask))
+		return -XFS_ERROR(EINVAL);
+
+	if (mapping->nrpages || pos > ip->i_size)
+		*iolock = XFS_IOLOCK_EXCL;
+	else
+		*iolock = XFS_IOLOCK_SHARED;
+	xfs_rw_ilock(ip, XFS_ILOCK_EXCL|*iolock);
+
+	ret = generic_write_checks(file, &pos, &count,
+					S_ISBLK(inode->i_mode));
+	if (ret) {
+		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|*iolock);
+		*iolock = 0;
+		return ret;
+	}
+
+	new_size = pos + count;
+	if (new_size > ip->i_size)
+		ip->i_new_size = new_size;
+
+	if (likely(!(file->f_mode & FMODE_NOCMTIME)))
+		file_update_time(file);
+
+	/*
+	 * If the offset is beyond the size of the file, we have a couple of
+	 * things to do. First, if there is already space allocated we need to
+	 * either create holes or zero the disk or ...
+	 *
+	 * If there is a page where the previous size lands, we need to zero it
+	 * out up to the new size.
+	 */
+	if (pos > ip->i_size) {
+		ret = -xfs_zero_eof(ip, pos, ip->i_size);
+		if (ret) {
+			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
+			return ret;
+		}
+	}
+	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
+
+	/*
+	 * If we're writing the file then make sure to clear the setuid and
+	 * setgid bits if the process is not being run by root.  This keeps
+	 * people from modifying setuid and setgid binaries.
+	 */
+	ret = file_remove_suid(file);
+	if (unlikely(ret))
+		return ret;
+
+	if (mapping->nrpages) {
+		WARN_ON(*iolock != XFS_IOLOCK_EXCL);
+		ret = -xfs_flushinval_pages(ip, (pos & PAGE_CACHE_MASK), -1,
+							FI_REMAPF_LOCKED);
+		if (ret)
+			return ret;
+	}
+
+	if (*iolock == XFS_IOLOCK_EXCL) {
+		/* demote the lock now the cached pages are gone */
+		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
+		*iolock = XFS_IOLOCK_SHARED;
+	}
+
+	trace_xfs_file_direct_write(ip, count, iocb->ki_pos, 0);
+	ret = generic_file_direct_write(iocb, iovp,
+			&nr_segs, pos, &iocb->ki_pos, count, ocount);
+
+	/* No fallback to buffered IO on errors for XFS. */
+	return ret;
+}
+
 STATIC ssize_t
 xfs_file_aio_write(
 	struct kiocb		*iocb,
@@ -659,14 +763,13 @@ xfs_file_aio_write(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-relock:
 	if (ioflags & IO_ISDIRECT) {
-		iolock = XFS_IOLOCK_SHARED;
-	} else {
-		iolock = XFS_IOLOCK_EXCL;
+		ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
+						ocount, &iolock);
+		goto done_io;
 	}
 
-start:
+	iolock = XFS_IOLOCK_EXCL;
 	ret = generic_write_checks(file, &pos, &count,
 					S_ISBLK(inode->i_mode));
 	if (ret) {
@@ -674,24 +777,6 @@ start:
 		return ret;
 	}
 
-	if (ioflags & IO_ISDIRECT) {
-		xfs_buftarg_t	*target =
-			XFS_IS_REALTIME_INODE(ip) ?
-				mp->m_rtdev_targp : mp->m_ddev_targp;
-
-		if ((pos & target->bt_smask) || (count & target->bt_smask)) {
-			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
-			return XFS_ERROR(-EINVAL);
-		}
-
-		if (iolock != XFS_IOLOCK_EXCL &&
-		    (mapping->nrpages || pos > ip->i_size)) {
-			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
-			iolock = XFS_IOLOCK_EXCL;
-			goto start;
-		}
-	}
-
 	new_size = pos + count;
 	if (new_size > ip->i_size)
 		ip->i_new_size = new_size;
@@ -730,41 +815,7 @@ start:
 	/* We can write back this queue in page reclaim */
 	current->backing_dev_info = mapping->backing_dev_info;
 
-	if ((ioflags & IO_ISDIRECT)) {
-		if (mapping->nrpages) {
-			WARN_ON(iolock != XFS_IOLOCK_EXCL);
-			ret = -xfs_flushinval_pages(ip,
-					(pos & PAGE_CACHE_MASK),
-					-1, FI_REMAPF_LOCKED);
-			if (ret)
-				goto out_unlock_internal;
-		}
-
-		if (iolock == XFS_IOLOCK_EXCL) {
-			/* demote the lock now the cached pages are gone */
-			xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
-			iolock = XFS_IOLOCK_SHARED;
-		}
-
-		trace_xfs_file_direct_write(ip, count, iocb->ki_pos, ioflags);
-		ret = generic_file_direct_write(iocb, iovp,
-				&nr_segs, pos, &iocb->ki_pos, count, ocount);
-
-		/*
-		 * direct-io write to a hole: fall through to buffered I/O
-		 * for completing the rest of the request.
-		 */
-		if (ret >= 0 && ret != count) {
-			XFS_STATS_ADD(xs_write_bytes, ret);
-
-			pos += ret;
-			count -= ret;
-
-			ioflags &= ~IO_ISDIRECT;
-			xfs_rw_iunlock(ip, iolock);
-			goto relock;
-		}
-	} else {
+	if (!(ioflags & IO_ISDIRECT)) {
 		int enospc = 0;
 
 write_retry:
@@ -788,6 +839,7 @@ write_retry:
 
 	current->backing_dev_info = NULL;
 
+done_io:
 	xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
 
 	if (ret <= 0)
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/8] xfs: split buffered IO write path from xfs_file_aio_write
  2011-01-04  4:48 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Dave Chinner
                   ` (3 preceding siblings ...)
  2011-01-04  4:48 ` [PATCH 5/8] xfs: split direct IO write path from xfs_file_aio_write Dave Chinner
@ 2011-01-04  4:48 ` Dave Chinner
  2011-01-05  1:55   ` Alex Elder
  2011-01-04  4:48 ` [PATCH 7/8] xfs: factor common write setup code Dave Chinner
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2011-01-04  4:48 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Complete the split of the different write IO paths by splitting the
buffered IO write path out of xfs_file_aio_write(). This makes the
different mechanisms of the write patchs easier to follow.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |  146 ++++++++++++++++++++-----------------------
 1 files changed, 69 insertions(+), 77 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index d546953..7e7805c 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -724,56 +724,31 @@ xfs_file_dio_aio_write(
 }
 
 STATIC ssize_t
-xfs_file_aio_write(
+xfs_file_buffered_aio_write(
 	struct kiocb		*iocb,
 	const struct iovec	*iovp,
 	unsigned long		nr_segs,
-	loff_t			pos)
+	loff_t			pos,
+	size_t			ocount,
+	int			*iolock)
 {
 	struct file		*file = iocb->ki_filp;
 	struct address_space	*mapping = file->f_mapping;
 	struct inode		*inode = mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
-	struct xfs_mount	*mp = ip->i_mount;
-	ssize_t			ret = 0;
-	int			ioflags = 0;
+	ssize_t			ret;
+	int			enospc = 0;
 	xfs_fsize_t		new_size;
-	int			iolock;
-	size_t			ocount = 0, count;
-
-	XFS_STATS_INC(xs_write_calls);
-
-	BUG_ON(iocb->ki_pos != pos);
-
-	if (unlikely(file->f_flags & O_DIRECT))
-		ioflags |= IO_ISDIRECT;
-	if (file->f_mode & FMODE_NOCMTIME)
-		ioflags |= IO_INVIS;
-
-	ret = generic_segment_checks(iovp, &nr_segs, &ocount, VERIFY_READ);
-	if (ret)
-		return ret;
-
-	count = ocount;
-	if (count == 0)
-		return 0;
-
-	xfs_wait_for_freeze(mp, SB_FREEZE_WRITE);
-
-	if (XFS_FORCED_SHUTDOWN(mp))
-		return -EIO;
+	size_t			count = ocount;
 
-	if (ioflags & IO_ISDIRECT) {
-		ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
-						ocount, &iolock);
-		goto done_io;
-	}
+	*iolock = XFS_IOLOCK_EXCL;
+	xfs_rw_ilock(ip, XFS_ILOCK_EXCL|*iolock);
 
-	iolock = XFS_IOLOCK_EXCL;
 	ret = generic_write_checks(file, &pos, &count,
 					S_ISBLK(inode->i_mode));
 	if (ret) {
-		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
+		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|*iolock);
+		*iolock = 0;
 		return ret;
 	}
 
@@ -781,69 +756,86 @@ xfs_file_aio_write(
 	if (new_size > ip->i_size)
 		ip->i_new_size = new_size;
 
-	if (likely(!(ioflags & IO_INVIS)))
+	if (likely(!(file->f_mode & FMODE_NOCMTIME)))
 		file_update_time(file);
 
-	/*
-	 * If the offset is beyond the size of the file, we have a couple
-	 * of things to do. First, if there is already space allocated
-	 * we need to either create holes or zero the disk or ...
-	 *
-	 * If there is a page where the previous size lands, we need
-	 * to zero it out up to the new size.
-	 */
-
 	if (pos > ip->i_size) {
 		ret = -xfs_zero_eof(ip, pos, ip->i_size);
 		if (ret) {
 			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
-			goto out_unlock_internal;
+			return ret;
 		}
 	}
 	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
 
-	/*
-	 * If we're writing the file then make sure to clear the
-	 * setuid and setgid bits if the process is not being run
-	 * by root.  This keeps people from modifying setuid and
-	 * setgid binaries.
-	 */
 	ret = file_remove_suid(file);
 	if (unlikely(ret))
-		goto out_unlock_internal;
+		return ret;
 
 	/* We can write back this queue in page reclaim */
 	current->backing_dev_info = mapping->backing_dev_info;
 
-	if (!(ioflags & IO_ISDIRECT)) {
-		int enospc = 0;
-
 write_retry:
-		trace_xfs_file_buffered_write(ip, count, iocb->ki_pos, ioflags);
-		ret = generic_file_buffered_write(iocb, iovp, nr_segs,
-				pos, &iocb->ki_pos, count, ret);
-		/*
-		 * if we just got an ENOSPC, flush the inode now we
-		 * aren't holding any page locks and retry *once*
-		 */
-		if (ret == -ENOSPC && !enospc) {
-			ssize_t ret2 = xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
-			if (ret2) {
-				ret = ret2;
-				goto out_unlock_internal;
-			}
-			enospc = 1;
-			goto write_retry;
-		}
+	trace_xfs_file_buffered_write(ip, count, iocb->ki_pos, 0);
+	ret = generic_file_buffered_write(iocb, iovp, nr_segs,
+			pos, &iocb->ki_pos, count, ret);
+	/*
+	 * if we just got an ENOSPC, flush the inode now we aren't holding any
+	 * page locks and retry *once*
+	 */
+	if (ret == -ENOSPC && !enospc) {
+		ret = -xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
+		if (ret)
+			return ret;
+		enospc = 1;
+		goto write_retry;
 	}
-
 	current->backing_dev_info = NULL;
+	return ret;
+}
+
+STATIC ssize_t
+xfs_file_aio_write(
+	struct kiocb		*iocb,
+	const struct iovec	*iovp,
+	unsigned long		nr_segs,
+	loff_t			pos)
+{
+	struct file		*file = iocb->ki_filp;
+	struct address_space	*mapping = file->f_mapping;
+	struct inode		*inode = mapping->host;
+	struct xfs_inode	*ip = XFS_I(inode);
+	ssize_t			ret;
+	int			iolock;
+	size_t			ocount = 0;
+
+	XFS_STATS_INC(xs_write_calls);
+
+	BUG_ON(iocb->ki_pos != pos);
+
+	ret = generic_segment_checks(iovp, &nr_segs, &ocount, VERIFY_READ);
+	if (ret)
+		return ret;
+
+	if (ocount == 0)
+		return 0;
+
+	xfs_wait_for_freeze(ip->i_mount, SB_FREEZE_WRITE);
+
+	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
+		return -EIO;
+
+	if (unlikely(file->f_flags & O_DIRECT))
+		ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
+						ocount, &iolock);
+	else
+		ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
+						ocount, &iolock);
 
-done_io:
 	xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
 
 	if (ret <= 0)
-		goto out_unlock_internal;
+		goto out_unlock;
 
 	XFS_STATS_ADD(xs_write_bytes, ret);
 
@@ -864,7 +856,7 @@ done_io:
 			ret = error2;
 	}
 
- out_unlock_internal:
+out_unlock:
 	xfs_aio_write_newsize_update(ip);
 	xfs_rw_iunlock(ip, iolock);
 	return ret;
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/8] xfs: factor common write setup code
  2011-01-04  4:48 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Dave Chinner
                   ` (4 preceding siblings ...)
  2011-01-04  4:48 ` [PATCH 6/8] xfs: split buffered " Dave Chinner
@ 2011-01-04  4:48 ` Dave Chinner
  2011-01-05  1:55   ` Alex Elder
  2011-01-07  8:53   ` Christoph Hellwig
  2011-01-04  4:48 ` [PATCH 8/8] xfs: serialise unaligned direct IOs Dave Chinner
  2011-01-05  1:53 ` [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Alex Elder
  7 siblings, 2 replies; 27+ messages in thread
From: Dave Chinner @ 2011-01-04  4:48 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The buffered IO and direct Io write paths share a common set of
checks and limiting code prior to issuing the write. Factor that
into a common helper function.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |  129 ++++++++++++++++++++-----------------------
 1 files changed, 59 insertions(+), 70 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 7e7805c..f21b6ad 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -620,6 +620,61 @@ out_lock:
 }
 
 /*
+ * Common pre-write limit and setup checks.
+ *
+ * Returns with iolock held according to @iolock.
+ */
+STATIC ssize_t
+xfs_file_aio_write_checks(
+	struct file		*file,
+	loff_t			*pos,
+	size_t			*count,
+	int			*iolock)
+{
+	struct inode		*inode = file->f_mapping->host;
+	struct xfs_inode	*ip = XFS_I(inode);
+	xfs_fsize_t		new_size;
+	int			error = 0;
+
+	xfs_rw_ilock(ip, XFS_ILOCK_EXCL|*iolock);
+
+	error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
+	if (error) {
+		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|*iolock);
+		*iolock = 0;
+		return error;
+	}
+
+	new_size = *pos + *count;
+	if (new_size > ip->i_size)
+		ip->i_new_size = new_size;
+
+	if (likely(!(file->f_mode & FMODE_NOCMTIME)))
+		file_update_time(file);
+
+	/*
+	 * If the offset is beyond the size of the file, we need to zero any
+	 * blocks that fall between the existing EOF and the start of this
+	 * write.
+	 */
+	if (*pos > ip->i_size)
+		error = -xfs_zero_eof(ip, *pos, ip->i_size);
+
+	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
+	if (error)
+		return error;
+
+	/*
+	 * If we're writing the file then make sure to clear the setuid and
+	 * setgid bits if the process is not being run by root.  This keeps
+	 * people from modifying setuid and setgid binaries.
+	 */
+	error = file_remove_suid(file);
+	return error;
+
+}
+
+/*
  * xfs_file_dio_aio_write - handle direct IO writes
  *
  * Lock the inode appropriately to prepare for and issue a direct IO write.
@@ -645,7 +700,6 @@ xfs_file_dio_aio_write(
 	struct xfs_inode	*ip = XFS_I(inode);
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			ret = 0;
-	xfs_fsize_t		new_size;
 	size_t			count = ocount;
 	xfs_buftarg_t		*target = XFS_IS_REALTIME_INODE(ip) ?
 					mp->m_rtdev_targp : mp->m_ddev_targp;
@@ -658,47 +712,9 @@ xfs_file_dio_aio_write(
 		*iolock = XFS_IOLOCK_EXCL;
 	else
 		*iolock = XFS_IOLOCK_SHARED;
-	xfs_rw_ilock(ip, XFS_ILOCK_EXCL|*iolock);
 
-	ret = generic_write_checks(file, &pos, &count,
-					S_ISBLK(inode->i_mode));
-	if (ret) {
-		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|*iolock);
-		*iolock = 0;
-		return ret;
-	}
-
-	new_size = pos + count;
-	if (new_size > ip->i_size)
-		ip->i_new_size = new_size;
-
-	if (likely(!(file->f_mode & FMODE_NOCMTIME)))
-		file_update_time(file);
-
-	/*
-	 * If the offset is beyond the size of the file, we have a couple of
-	 * things to do. First, if there is already space allocated we need to
-	 * either create holes or zero the disk or ...
-	 *
-	 * If there is a page where the previous size lands, we need to zero it
-	 * out up to the new size.
-	 */
-	if (pos > ip->i_size) {
-		ret = -xfs_zero_eof(ip, pos, ip->i_size);
-		if (ret) {
-			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
-			return ret;
-		}
-	}
-	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
-
-	/*
-	 * If we're writing the file then make sure to clear the setuid and
-	 * setgid bits if the process is not being run by root.  This keeps
-	 * people from modifying setuid and setgid binaries.
-	 */
-	ret = file_remove_suid(file);
-	if (unlikely(ret))
+	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
+	if (ret)
 		return ret;
 
 	if (mapping->nrpages) {
@@ -738,38 +754,11 @@ xfs_file_buffered_aio_write(
 	struct xfs_inode	*ip = XFS_I(inode);
 	ssize_t			ret;
 	int			enospc = 0;
-	xfs_fsize_t		new_size;
 	size_t			count = ocount;
 
 	*iolock = XFS_IOLOCK_EXCL;
-	xfs_rw_ilock(ip, XFS_ILOCK_EXCL|*iolock);
-
-	ret = generic_write_checks(file, &pos, &count,
-					S_ISBLK(inode->i_mode));
-	if (ret) {
-		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|*iolock);
-		*iolock = 0;
-		return ret;
-	}
-
-	new_size = pos + count;
-	if (new_size > ip->i_size)
-		ip->i_new_size = new_size;
-
-	if (likely(!(file->f_mode & FMODE_NOCMTIME)))
-		file_update_time(file);
-
-	if (pos > ip->i_size) {
-		ret = -xfs_zero_eof(ip, pos, ip->i_size);
-		if (ret) {
-			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
-			return ret;
-		}
-	}
-	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
-
-	ret = file_remove_suid(file);
-	if (unlikely(ret))
+	ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
+	if (ret)
 		return ret;
 
 	/* We can write back this queue in page reclaim */
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 8/8] xfs: serialise unaligned direct IOs
  2011-01-04  4:48 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Dave Chinner
                   ` (5 preceding siblings ...)
  2011-01-04  4:48 ` [PATCH 7/8] xfs: factor common write setup code Dave Chinner
@ 2011-01-04  4:48 ` Dave Chinner
  2011-01-05  1:55   ` Alex Elder
  2011-01-05  1:53 ` [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Alex Elder
  7 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2011-01-04  4:48 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When two concurrent unaligned, non-overlapping direct IOs are issued
to the same block, the direct Io layer will race to zero the block.
The result is that one of the concurrent IOs will overwrite data
written by the other IO with zeros. This is demonstrated by the
xfsqa test 240.

To avoid this problem, serialise all unaligned direct IOs to an
inode with a big hammer. We need a big hammer approach as we need to
serialise AIO as well, so we can't just block writes on locks.
Hence, the big hammer is calling xfs_ioend_wait() while holding out
other unaligned direct IOs from starting.

We don't bother trying to serialised aligned vs unaligned IOs as
they are overlapping IO and the result of concurrent overlapping IOs
is undefined - the result of either IO is a valid result so we let
them race. Hence we only penalise unaligned IO, which already has a
major overhead compared to aligned IO so this isn't a major problem.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_file.c |   28 ++++++++++++++++++++++++----
 1 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index f21b6ad..a91e1a8 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -678,10 +678,21 @@ xfs_file_aio_write_checks(
  * xfs_file_dio_aio_write - handle direct IO writes
  *
  * Lock the inode appropriately to prepare for and issue a direct IO write.
- * By spearating it from the buffered write path we remove all the tricky to
+ * By separating it from the buffered write path we remove all the tricky to
  * follow locking changes and looping. This also clearly indicates that XFS
  * does not fall back to buffered IO in the direct IO write path.
  *
+ * In most cases the direct IO writes will be done with IOLOCK_SHARED allowing
+ * them to be done in parallel with reads and other direct IO writes. However,
+ * if the IO is not aligned to filesystem blocks, the direct IO layer needs to
+ * do sub-block zeroing and that requires serialisation against other direct
+ * IOs to the same block. In this case we need to serialise the submission of
+ * the unaligned IOs so that we don't get racing block zeroing in the dio layer.
+ * To avoid the problem with aio, we also need to wait for outstanding IOs to
+ * complete so that unwritten extent conversion is completed before we try to
+ * map the overlapping block. This is currently implemented by hitting it
+ * with a big hammer (i.e. xfs_ioend_wait()).
+ *
  * Returns with locks held indicated by @iolock and errors indicated by
  * negative return values.
  */
@@ -701,6 +712,7 @@ xfs_file_dio_aio_write(
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			ret = 0;
 	size_t			count = ocount;
+	int			unaligned_io = 0;
 	xfs_buftarg_t		*target = XFS_IS_REALTIME_INODE(ip) ?
 					mp->m_rtdev_targp : mp->m_ddev_targp;
 
@@ -708,7 +720,10 @@ xfs_file_dio_aio_write(
 	if ((pos & target->bt_smask) || (count & target->bt_smask))
 		return -XFS_ERROR(EINVAL);
 
-	if (mapping->nrpages || pos > ip->i_size)
+	if ((pos & mp->m_blockmask) || ((pos + count) & mp->m_blockmask))
+		unaligned_io = 1;
+
+	if (unaligned_io || mapping->nrpages || pos > ip->i_size)
 		*iolock = XFS_IOLOCK_EXCL;
 	else
 		*iolock = XFS_IOLOCK_SHARED;
@@ -725,8 +740,13 @@ xfs_file_dio_aio_write(
 			return ret;
 	}
 
-	if (*iolock == XFS_IOLOCK_EXCL) {
-		/* demote the lock now the cached pages are gone */
+	/*
+	 * If we are doing unaligned IO, wait for all other IO to drain,
+	 * otherwise demote the lock if we had to flush cached pages
+	 */
+	if (unaligned_io)
+		xfs_ioend_wait(ip);
+	else if (*iolock == XFS_IOLOCK_EXCL) {
 		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
 		*iolock = XFS_IOLOCK_SHARED;
 	}
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2
  2011-01-04  4:48 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Dave Chinner
                   ` (6 preceding siblings ...)
  2011-01-04  4:48 ` [PATCH 8/8] xfs: serialise unaligned direct IOs Dave Chinner
@ 2011-01-05  1:53 ` Alex Elder
  7 siblings, 0 replies; 27+ messages in thread
From: Alex Elder @ 2011-01-05  1:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-04 at 15:48 +1100, Dave Chinner wrote:
> This series aims to serialise unaligned direct IOs to an inode to
> avoid corruption caused by sub-block zeroing races. The previous
> approaches at the direct IO layer fail because for !DIO_LOCKING
> filesystems like XFS, there is no way we can track and serialise all
> the direct IOs to a given inode in a race free manner. While we can
> track them, we cannot close the races between mapping blocks and
> tracked IO completion occuring before subsequent tracking lookups
> without adding some kind of locking to the DIO layer. Hence for
> !DIO_LOCKING users, unaligned direct IO needs to be serialised at a
> higher layer.
> 
> Because the xfs_file_aio_write() path is so twisted and difficult to
> follow, adding new locking cases to the code is difficult to verify
> that it is correct in all cases. Hence the series starts by cleaning
> up the code and splitting apart the direct IO and buffered IO paths
> before adding the unaligned direct IO detection and serialisation.
> 
> The first patch fixes a sync write error handling bug - we should
> consider pushing that to .38. The next patches factor code that is
> common to write and splice into helpers. The direct and buffered IO
> paths are then separated out and the common write checks and bounds
> limiting is factored out into a helper.
> 
> Finally, the serialisation of unaligned direct IOs is added by a
> big-hammer approach. That is, we take the i_mutex and
> XFS_IOLOCK_EXCL and hold them across the unaligned IO submission.
> This means that unaligned direct IO submission is serialised, and
> non-AIO DIO is serialised completely.
> 
> For unaligned AIO DIO, this would only serialise the submission of
> the DIO, leaving the sub-block zeroing races open for unaligned
> writes into unwritten extents. To avoid this problem, we use
> xfs_ioend_wait() to ensure all AIO writes have completed before we
> submit the unaligned write. We do this wait holding the i_mutex so
> we serialise against other unaligned AIO as there is no need to
> serialise against aligned DIO.
> 
> Version 2:
> - fix initial sync write error return fixup
> - add new patch to abstract locking from read/write path and remove
>   the need for the need_i_mutex variable.
> 

I've reviewed this series, and the net result (the
code cleanup leading to the real fix in particular)
is great.

There are some small things among the patches that
ought to be fixed in order to allow them to stand
alone without bugs (to facilitate git bisects, for
example).  I've noted them in my reviews.

I also think that, having looked over them entirely,
a few of my earlier comments get addressed or made
non-applicable because of your changes.  I'm not
going to go fix them now; if you see things like
that you can explain if you feel it's useful or
just mention that the series renders my comment
inoperative or something.

					-Alex



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] xfs: ensure sync write errors are returned
  2011-01-04  4:48 ` [PATCH 1/8] xfs: ensure sync write errors are returned Dave Chinner
@ 2011-01-05  1:53   ` Alex Elder
  2011-01-07  8:45   ` Christoph Hellwig
  1 sibling, 0 replies; 27+ messages in thread
From: Alex Elder @ 2011-01-05  1:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-04 at 15:48 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_file_aio_write() only returns the error from synchronous
> flushing of the data and inode if error == 0. At the point where
> error is being checked, it is guaranteed to be > 0. Therefore any
> errors returned by the data or fsync flush will never be returned.
> Fix the checks so we overwrite the current error once and only if an
> error really occurred.
> 

This looks good.  You improved it a lot by
fixing up the bizarre "error" logic.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/linux-2.6/xfs_file.c |   51 +++++++++++++++++++++----------------------
>  1 files changed, 25 insertions(+), 26 deletions(-)



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/8] xfs: factor common post-write isize handling code
  2011-01-04  4:48 ` [PATCH 2/8] xfs: factor common post-write isize handling code Dave Chinner
@ 2011-01-05  1:54   ` Alex Elder
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Elder @ 2011-01-05  1:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-04 at 15:48 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 

With one exception noted below, this looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/linux-2.6/xfs_file.c |   52 +++++++++++++++++++++++-------------------
>  1 files changed, 28 insertions(+), 24 deletions(-)

. . .

> @@ -742,16 +755,7 @@ write_retry:
>  
>  	current->backing_dev_info = NULL;
>  
> -	isize = i_size_read(inode);
> -	if (unlikely(ret < 0 && ret != -EFAULT && iocb->ki_pos > isize))
> -		iocb->ki_pos = isize;
> -
> -	if (iocb->ki_pos > ip->i_size) {
> -		xfs_ilock(ip, XFS_ILOCK_EXCL);
> -		if (iocb->ki_pos > ip->i_size)
> -			ip->i_size = iocb->ki_pos;
> -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -	}
> +	xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
>  
>  	if (ret <= 0)
>  		goto out_unlock_internal;

Just below this point in the original code there is a
call to XFS_STATS_ADD(xs_write_bytes, ret); that also
needs to be removed, because xfs_aio_write_isize_update()
now does it for you.


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] xfs: introduce xfs_rw_lock() helpers for locking the inode
  2011-01-04  4:48 ` [PATCH 4/8] xfs: introduce xfs_rw_lock() helpers for locking the inode Dave Chinner
@ 2011-01-05  1:54   ` Alex Elder
  2011-01-05  7:55     ` Dave Chinner
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Elder @ 2011-01-05  1:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-04 at 15:48 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We need to obtain the i_mutex, i_iolock and i_ilock during the read
> and write paths. Add a set of wrapper functions to neatly
> encapsulate the lock ordering and shared/exclusive semantics to make
> the locking easier to follow and get right.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

I like this change, but I think you missed a lock call.
I also notice there are some locking differences, and
I don't really question them but I wonder if you can
offer a little more explanation.

> ---
>  fs/xfs/linux-2.6/xfs_file.c |  123 ++++++++++++++++++++++++-------------------
>  1 files changed, 68 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
> index 33a688c..0d6111e 100644
> --- a/fs/xfs/linux-2.6/xfs_file.c
> +++ b/fs/xfs/linux-2.6/xfs_file.c

. . .

> @@ -262,22 +296,21 @@ xfs_file_aio_read(
>  	if (XFS_FORCED_SHUTDOWN(mp))
>  		return -EIO;
>  
> -	if (unlikely(ioflags & IO_ISDIRECT))
> -		mutex_lock(&inode->i_mutex);
> -	xfs_ilock(ip, XFS_IOLOCK_SHARED);
> -
>  	if (unlikely(ioflags & IO_ISDIRECT)) {
> +		xfs_rw_ilock(ip, XFS_IOLOCK_EXCL);
> +

Previously only XFS_IOLOCK_SHARED was used here.
I understand that using the IOLOCK_EXCL now gets
the desired mutex_lock() call.  Is the previous
code in error here though?  Can you anticipate
any different behavior because of this lock change?
Does this specific change justify separating it
into a small patch just before this one?

>  		if (inode->i_mapping->nrpages) {
>  			ret = -xfs_flushinval_pages(ip,
>  					(iocb->ki_pos & PAGE_CACHE_MASK),
>  					-1, FI_REMAPF_LOCKED);
> +			if (ret) {
> +				xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL);
> +				return ret;
> +			}
>  		}
> -		mutex_unlock(&inode->i_mutex);
> -		if (ret) {
> -			xfs_iunlock(ip, XFS_IOLOCK_SHARED);
> -			return ret;
> -		}
> -	}
> +		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
> +	} else
> +		xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
>  
>  	trace_xfs_file_read(ip, size, iocb->ki_pos, ioflags);
>  

. . .

> @@ -386,14 +419,13 @@ xfs_file_splice_write(
>  	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
>  		return -EIO;
>  
> -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> +	xfs_rw_ilock(ip, XFS_ILOCK_EXCL|XFS_IOLOCK_EXCL);

Similar sentiments here.  We will now be acquiring i_mutex
here where previously we did not.  Is that OK?

>  	new_size = *ppos + count;
>  
> -	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	if (new_size > ip->i_size)
>  		ip->i_new_size = new_size;
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	xfs_rw_iunlock(ip, XFS_ILOCK_EXCL);
>  
>  	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
>  

. . .

> @@ -631,21 +662,16 @@ xfs_file_aio_write(
>  relock:
>  	if (ioflags & IO_ISDIRECT) {
>  		iolock = XFS_IOLOCK_SHARED;
> -		need_i_mutex = 0;
>  	} else {
>  		iolock = XFS_IOLOCK_EXCL;
> -		need_i_mutex = 1;
> -		mutex_lock(&inode->i_mutex);
>  	}
>  
> -	xfs_ilock(ip, XFS_ILOCK_EXCL|iolock);
> -

Maybe I'm missing something, but I think you want to
insert this here:
	xfs_rw_ilock(ip, XFS_ILOCK_EXCL|iolock);
...because (for starters) if generic_write_checks()
returns an error below you're going to be calling
the unlock routine.

>  start:
>  	ret = generic_write_checks(file, &pos, &count,
>  					S_ISBLK(inode->i_mode));
>  	if (ret) {
> -		xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
> -		goto out_unlock_mutex;
> +		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
> +		return ret;
>  	}
>  
>  	if (ioflags & IO_ISDIRECT) {
> @@ -654,16 +680,14 @@ start:
>  				mp->m_rtdev_targp : mp->m_ddev_targp;
>  
>  		if ((pos & target->bt_smask) || (count & target->bt_smask)) {
> -			xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
> +			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
>  			return XFS_ERROR(-EINVAL);
>  		}
>  

One can get a little lost in this code.  I don't know if
this comment is exactly right, but something like it might
be helpful (while you're in here).

		/*
		 * For direct I/O, if there are cached pages or
		 * we're extending the file, we need IOLOCK_EXCL
		 * until we're sure the bytes at the new EOF have
		 * been zeroed and/or the cached pages are flushed
		 * out.  Upgrade the I/O lock and start again.
		 */

> -		if (!need_i_mutex && (mapping->nrpages || pos > ip->i_size)) {
> -			xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
> +		if (iolock != XFS_IOLOCK_EXCL &&
> +		    (mapping->nrpages || pos > ip->i_size)) {
> +			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
>  			iolock = XFS_IOLOCK_EXCL;
> -			need_i_mutex = 1;
> -			mutex_lock(&inode->i_mutex);
> -			xfs_ilock(ip, XFS_ILOCK_EXCL|iolock);
>  			goto start;
>  		}
>  	}

. . .




_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] xfs: split direct IO write path from xfs_file_aio_write
  2011-01-04  4:48 ` [PATCH 5/8] xfs: split direct IO write path from xfs_file_aio_write Dave Chinner
@ 2011-01-05  1:54   ` Alex Elder
  2011-01-05  7:36     ` Dave Chinner
  2011-01-07  8:58   ` Christoph Hellwig
  1 sibling, 1 reply; 27+ messages in thread
From: Alex Elder @ 2011-01-05  1:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-04 at 15:48 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The current xfs_file_aio_write code is a mess of locking shenanigans
> to handle the different locking requirements of buffered and direct
> IO. Start to clean this up by disentangling the direct IO path from
> the mess.

All good, very good.  But I'm not sure why you cut
out the code that backed off to buffered I/O if
generic_file_direct_write() returns an error.
(You gave no explanation.)

					-Alex

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/linux-2.6/xfs_file.c |  168 ++++++++++++++++++++++++++++---------------
>  1 files changed, 110 insertions(+), 58 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
> index 0d6111e..d546953 100644
> --- a/fs/xfs/linux-2.6/xfs_file.c
> +++ b/fs/xfs/linux-2.6/xfs_file.c
> @@ -619,6 +619,110 @@ out_lock:
>  	return error;
>  }
>  
> +/*
> + * xfs_file_dio_aio_write - handle direct IO writes
> + *
> + * Lock the inode appropriately to prepare for and issue a direct IO write.
> + * By spearating it from the buffered write path we remove all the tricky to
         separating

> + * follow locking changes and looping. This also clearly indicates that XFS
> + * does not fall back to buffered IO in the direct IO write path.
> + *
> + * Returns with locks held indicated by @iolock and errors indicated by
> + * negative return values.
> + */
> +STATIC ssize_t
> +xfs_file_dio_aio_write(
> +	struct kiocb		*iocb,
> +	const struct iovec	*iovp,

. . .

> +	trace_xfs_file_direct_write(ip, count, iocb->ki_pos, 0);
> +	ret = generic_file_direct_write(iocb, iovp,
> +			&nr_segs, pos, &iocb->ki_pos, count, ocount);
> +
> +	/* No fallback to buffered IO on errors for XFS. */

Why is this?  The previous code did fall back (so this change
is doing more than just splitting out the direct I/O path).

> +	return ret;
> +}
> +
>  STATIC ssize_t
>  xfs_file_aio_write(
>  	struct kiocb		*iocb,

. . .

> @@ -788,6 +839,7 @@ write_retry:
>  
>  	current->backing_dev_info = NULL;
>  
> +done_io:
>  	xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
>  
>  	if (ret <= 0)



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/8] xfs: split buffered IO write path from xfs_file_aio_write
  2011-01-04  4:48 ` [PATCH 6/8] xfs: split buffered " Dave Chinner
@ 2011-01-05  1:55   ` Alex Elder
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Elder @ 2011-01-05  1:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-04 at 15:48 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Complete the split of the different write IO paths by splitting the
> buffered IO write path out of xfs_file_aio_write(). This makes the
> different mechanisms of the write patchs easier to follow.

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/linux-2.6/xfs_file.c |  146 ++++++++++++++++++++-----------------------
>  1 files changed, 69 insertions(+), 77 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
> index d546953..7e7805c 100644
> --- a/fs/xfs/linux-2.6/xfs_file.c
> +++ b/fs/xfs/linux-2.6/xfs_file.c


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/8] xfs: factor common write setup code
  2011-01-04  4:48 ` [PATCH 7/8] xfs: factor common write setup code Dave Chinner
@ 2011-01-05  1:55   ` Alex Elder
  2011-01-07  8:53   ` Christoph Hellwig
  1 sibling, 0 replies; 27+ messages in thread
From: Alex Elder @ 2011-01-05  1:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-04 at 15:48 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The buffered IO and direct Io write paths share a common set of
> checks and limiting code prior to issuing the write. Factor that
> into a common helper function.

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/linux-2.6/xfs_file.c |  129 ++++++++++++++++++++-----------------------
>  1 files changed, 59 insertions(+), 70 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
> index 7e7805c..f21b6ad 100644
> --- a/fs/xfs/linux-2.6/xfs_file.c

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8] xfs: serialise unaligned direct IOs
  2011-01-04  4:48 ` [PATCH 8/8] xfs: serialise unaligned direct IOs Dave Chinner
@ 2011-01-05  1:55   ` Alex Elder
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Elder @ 2011-01-05  1:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, 2011-01-04 at 15:48 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When two concurrent unaligned, non-overlapping direct IOs are issued
> to the same block, the direct Io layer will race to zero the block.
> The result is that one of the concurrent IOs will overwrite data
> written by the other IO with zeros. This is demonstrated by the
> xfsqa test 240.
> 
> To avoid this problem, serialise all unaligned direct IOs to an
> inode with a big hammer. We need a big hammer approach as we need to
> serialise AIO as well, so we can't just block writes on locks.
> Hence, the big hammer is calling xfs_ioend_wait() while holding out
> other unaligned direct IOs from starting.
> 
> We don't bother trying to serialised aligned vs unaligned IOs as
> they are overlapping IO and the result of concurrent overlapping IOs
> is undefined - the result of either IO is a valid result so we let
> them race. Hence we only penalise unaligned IO, which already has a
> major overhead compared to aligned IO so this isn't a major problem.

Wow, after the rest of this series it gets easy!
Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/linux-2.6/xfs_file.c |   28 ++++++++++++++++++++++++----
>  1 files changed, 24 insertions(+), 4 deletions(-)


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] xfs: split direct IO write path from xfs_file_aio_write
  2011-01-05  1:54   ` Alex Elder
@ 2011-01-05  7:36     ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-01-05  7:36 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Tue, Jan 04, 2011 at 07:54:55PM -0600, Alex Elder wrote:
> On Tue, 2011-01-04 at 15:48 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The current xfs_file_aio_write code is a mess of locking shenanigans
> > to handle the different locking requirements of buffered and direct
> > IO. Start to clean this up by disentangling the direct IO path from
> > the mess.
> 
> All good, very good.  But I'm not sure why you cut
> out the code that backed off to buffered I/O if
> generic_file_direct_write() returns an error.
> (You gave no explanation.)

Oh, I thought I put one in there. It's simple, though - XFS handles
all allocation cases in the direct IO code and never returns a
result that will require falling back to the buffered IO path.  It
is basically a code path that has been unused for years and all it
does is make the logic much more complex to understand.

I'll add that to the commit message....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] xfs: introduce xfs_rw_lock() helpers for locking the inode
  2011-01-05  1:54   ` Alex Elder
@ 2011-01-05  7:55     ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-01-05  7:55 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Tue, Jan 04, 2011 at 07:54:41PM -0600, Alex Elder wrote:
> On Tue, 2011-01-04 at 15:48 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We need to obtain the i_mutex, i_iolock and i_ilock during the read
> > and write paths. Add a set of wrapper functions to neatly
> > encapsulate the lock ordering and shared/exclusive semantics to make
> > the locking easier to follow and get right.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> 
> I like this change, but I think you missed a lock call.
> I also notice there are some locking differences, and
> I don't really question them but I wonder if you can
> offer a little more explanation.
> 
> > ---
> >  fs/xfs/linux-2.6/xfs_file.c |  123 ++++++++++++++++++++++++-------------------
> >  1 files changed, 68 insertions(+), 55 deletions(-)
> > 
> > diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
> > index 33a688c..0d6111e 100644
> > --- a/fs/xfs/linux-2.6/xfs_file.c
> > +++ b/fs/xfs/linux-2.6/xfs_file.c
> 
> . . .
> 
> > @@ -262,22 +296,21 @@ xfs_file_aio_read(
> >  	if (XFS_FORCED_SHUTDOWN(mp))
> >  		return -EIO;
> >  
> > -	if (unlikely(ioflags & IO_ISDIRECT))
> > -		mutex_lock(&inode->i_mutex);
> > -	xfs_ilock(ip, XFS_IOLOCK_SHARED);
> > -
> >  	if (unlikely(ioflags & IO_ISDIRECT)) {
> > +		xfs_rw_ilock(ip, XFS_IOLOCK_EXCL);
> > +
> 
> Previously only XFS_IOLOCK_SHARED was used here.
> I understand that using the IOLOCK_EXCL now gets
> the desired mutex_lock() call.  Is the previous
> code in error here though?

No, it isn't wrong.  However, the new code takes advantage of the
demote process to drop back to IOLOCK_SHARED before the i_mutex is
dropped so there's no externally visible change in behaviour.

This means that the locking heirarchy is much easier to maintain and
understand as all operations use the same wrappers.  i.e. there is
no real need for a i_mutex + IOLOCK_SHARED state because i_mutex
means exclusive access and is the outer lock. Hence the
IOLOCK_SHARED vs IOLOCK_EXCL is pretty much irrelevant so I picked
the simpler one to implement and understand.

> Can you anticipate
> any different behavior because of this lock change?

None that I can think of, because we are supposed to hold the
i_mutex over page cache flushes and it is the outermost lock.

> Does this specific change justify separating it
> into a small patch just before this one?

IMO, not really, but I can if you want. It'll just churn the patch
series, though...

> >  		if (inode->i_mapping->nrpages) {
> >  			ret = -xfs_flushinval_pages(ip,
> >  					(iocb->ki_pos & PAGE_CACHE_MASK),
> >  					-1, FI_REMAPF_LOCKED);
> > +			if (ret) {
> > +				xfs_rw_iunlock(ip, XFS_IOLOCK_EXCL);
> > +				return ret;
> > +			}
> >  		}
> > -		mutex_unlock(&inode->i_mutex);
> > -		if (ret) {
> > -			xfs_iunlock(ip, XFS_IOLOCK_SHARED);
> > -			return ret;
> > -		}
> > -	}
> > +		xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
> > +	} else
> > +		xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
> >  
> >  	trace_xfs_file_read(ip, size, iocb->ki_pos, ioflags);
> >  
> 
> . . .
> 
> > @@ -386,14 +419,13 @@ xfs_file_splice_write(
> >  	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
> >  		return -EIO;
> >  
> > -	xfs_ilock(ip, XFS_IOLOCK_EXCL);
> > +	xfs_rw_ilock(ip, XFS_ILOCK_EXCL|XFS_IOLOCK_EXCL);
> 
> Similar sentiments here.  We will now be acquiring i_mutex
> here where previously we did not.  Is that OK?

It should be because it will have exactly the same serialisation
effect as the current XFS_IOLOCK_EXCL has - exclusive access across
the write.

> > @@ -631,21 +662,16 @@ xfs_file_aio_write(
> >  relock:
> >  	if (ioflags & IO_ISDIRECT) {
> >  		iolock = XFS_IOLOCK_SHARED;
> > -		need_i_mutex = 0;
> >  	} else {
> >  		iolock = XFS_IOLOCK_EXCL;
> > -		need_i_mutex = 1;
> > -		mutex_lock(&inode->i_mutex);
> >  	}
> >  
> > -	xfs_ilock(ip, XFS_ILOCK_EXCL|iolock);
> > -
> 
> Maybe I'm missing something, but I think you want to
> insert this here:
> 	xfs_rw_ilock(ip, XFS_ILOCK_EXCL|iolock);
> ...because (for starters) if generic_write_checks()
> returns an error below you're going to be calling
> the unlock routine.

Yes, good catch. I didn't notice that because I didn't test the
series patch by patch - just the end result. Will fix.

> 
> >  start:
> >  	ret = generic_write_checks(file, &pos, &count,
> >  					S_ISBLK(inode->i_mode));
> >  	if (ret) {
> > -		xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
> > -		goto out_unlock_mutex;
> > +		xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
> > +		return ret;
> >  	}
> >  
> >  	if (ioflags & IO_ISDIRECT) {
> > @@ -654,16 +680,14 @@ start:
> >  				mp->m_rtdev_targp : mp->m_ddev_targp;
> >  
> >  		if ((pos & target->bt_smask) || (count & target->bt_smask)) {
> > -			xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
> > +			xfs_rw_iunlock(ip, XFS_ILOCK_EXCL|iolock);
> >  			return XFS_ERROR(-EINVAL);
> >  		}
> >  
> 
> One can get a little lost in this code.  I don't know if
> this comment is exactly right, but something like it might
> be helpful (while you're in here).
> 
> 		/*
> 		 * For direct I/O, if there are cached pages or
> 		 * we're extending the file, we need IOLOCK_EXCL
> 		 * until we're sure the bytes at the new EOF have
> 		 * been zeroed and/or the cached pages are flushed
> 		 * out.  Upgrade the I/O lock and start again.
> 		 */

Probably a good idea. I'll add something there.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] xfs: ensure sync write errors are returned
  2011-01-04  4:48 ` [PATCH 1/8] xfs: ensure sync write errors are returned Dave Chinner
  2011-01-05  1:53   ` Alex Elder
@ 2011-01-07  8:45   ` Christoph Hellwig
  2011-01-07  9:07     ` Dave Chinner
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-01-07  8:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> +		if (ret == -ENOSPC && !enospc) {
> +			ssize_t ret2 = xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
> +			if (ret2) {
> +				ret = ret2;
>  				goto out_unlock_internal;
> +			}
>  			enospc = 1;
>  			goto write_retry;
>  		}

I don't think there is any reason for the ret2 variale here.  If it's
non-zero we assign it to ret, and if it's zero we overwrite it again
after jumping to the write_retry label.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/8] xfs: factor common write setup code
  2011-01-04  4:48 ` [PATCH 7/8] xfs: factor common write setup code Dave Chinner
  2011-01-05  1:55   ` Alex Elder
@ 2011-01-07  8:53   ` Christoph Hellwig
  2011-01-07  9:20     ` Dave Chinner
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-01-07  8:53 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Jan 04, 2011 at 03:48:37PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The buffered IO and direct Io write paths share a common set of
> checks and limiting code prior to issuing the write. Factor that
> into a common helper function.

Wouldn't it be better to create the helper before duplicating the code,
aka move it before the buffer/direct splits in the series?

> +	/*
> +	 * If we're writing the file then make sure to clear the setuid and
> +	 * setgid bits if the process is not being run by root.  This keeps
> +	 * people from modifying setuid and setgid binaries.
> +	 */
> +	error = file_remove_suid(file);
> +	return error;

just

	return file_remove_suid(file);
?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] xfs: split direct IO write path from xfs_file_aio_write
  2011-01-04  4:48 ` [PATCH 5/8] xfs: split direct IO write path from xfs_file_aio_write Dave Chinner
  2011-01-05  1:54   ` Alex Elder
@ 2011-01-07  8:58   ` Christoph Hellwig
  2011-01-07  9:21     ` Dave Chinner
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2011-01-07  8:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> +This also clearly indicates that XFS
> + * does not fall back to buffered IO in the direct IO write path.

I don't think this comment belongs into the function header.  It's
already present in the right place anyway, so it can just be removed.

> +	xfs_buftarg_t		*target = XFS_IS_REALTIME_INODE(ip) ?
> +					mp->m_rtdev_targp : mp->m_ddev_targp;

struct xfs_buftarg, please.

> +	trace_xfs_file_direct_write(ip, count, iocb->ki_pos, 0);
> +	ret = generic_file_direct_write(iocb, iovp,
> +			&nr_segs, pos, &iocb->ki_pos, count, ocount);
> +
> +	/* No fallback to buffered IO on errors for XFS. */
> +	return ret;

I'd add an

	ASSERT(ret < 0 || ret == count);

here to make sure we don't get problems due to changes in the core
direct I/O code.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] xfs: ensure sync write errors are returned
  2011-01-07  8:45   ` Christoph Hellwig
@ 2011-01-07  9:07     ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-01-07  9:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jan 07, 2011 at 03:45:18AM -0500, Christoph Hellwig wrote:
> > +		if (ret == -ENOSPC && !enospc) {
> > +			ssize_t ret2 = xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
> > +			if (ret2) {
> > +				ret = ret2;
> >  				goto out_unlock_internal;
> > +			}
> >  			enospc = 1;
> >  			goto write_retry;
> >  		}
> 
> I don't think there is any reason for the ret2 variale here.  If it's
> non-zero we assign it to ret, and if it's zero we overwrite it again
> after jumping to the write_retry label.

True, I'll fix it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 7/8] xfs: factor common write setup code
  2011-01-07  8:53   ` Christoph Hellwig
@ 2011-01-07  9:20     ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-01-07  9:20 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jan 07, 2011 at 03:53:57AM -0500, Christoph Hellwig wrote:
> On Tue, Jan 04, 2011 at 03:48:37PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > The buffered IO and direct Io write paths share a common set of
> > checks and limiting code prior to issuing the write. Factor that
> > into a common helper function.
> 
> Wouldn't it be better to create the helper before duplicating the code,
> aka move it before the buffer/direct splits in the series?

Maybe, but at this I'd prefer not to move it as it would mean having
to rework several other patches in the series as well.  Moving it
doesn't change the end result at all, so unless you feel really
strongly about this I'm going to say no this time because we're
now under time pressure to get this tested and finished for 2.6.38.

> > +	/*
> > +	 * If we're writing the file then make sure to clear the setuid and
> > +	 * setgid bits if the process is not being run by root.  This keeps
> > +	 * people from modifying setuid and setgid binaries.
> > +	 */
> > +	error = file_remove_suid(file);
> > +	return error;
> 
> just
> 
> 	return file_remove_suid(file);

Yup, will fix.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] xfs: split direct IO write path from xfs_file_aio_write
  2011-01-07  8:58   ` Christoph Hellwig
@ 2011-01-07  9:21     ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-01-07  9:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jan 07, 2011 at 03:58:30AM -0500, Christoph Hellwig wrote:
> > +This also clearly indicates that XFS
> > + * does not fall back to buffered IO in the direct IO write path.
> 
> I don't think this comment belongs into the function header.  It's
> already present in the right place anyway, so it can just be removed.

OK.

> > +	xfs_buftarg_t		*target = XFS_IS_REALTIME_INODE(ip) ?
> > +					mp->m_rtdev_targp : mp->m_ddev_targp;
> 
> struct xfs_buftarg, please.

Was copy-n-paste. Will fix.

> > +	trace_xfs_file_direct_write(ip, count, iocb->ki_pos, 0);
> > +	ret = generic_file_direct_write(iocb, iovp,
> > +			&nr_segs, pos, &iocb->ki_pos, count, ocount);
> > +
> > +	/* No fallback to buffered IO on errors for XFS. */
> > +	return ret;
> 
> I'd add an
> 
> 	ASSERT(ret < 0 || ret == count);
> 
> here to make sure we don't get problems due to changes in the core
> direct I/O code.

Yup, good idea.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/8] xfs: factor common post-write isize handling code
  2011-01-07 11:30 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V3 Dave Chinner
@ 2011-01-07 11:30 ` Dave Chinner
  2011-01-10 19:19   ` Christoph Hellwig
  0 siblings, 1 reply; 27+ messages in thread
From: Dave Chinner @ 2011-01-07 11:30 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/linux-2.6/xfs_file.c |   54 ++++++++++++++++++++++--------------------
 1 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 10b7fb4..b3915bf 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -321,6 +321,30 @@ xfs_file_splice_read(
 	return ret;
 }
 
+STATIC void
+xfs_aio_write_isize_update(
+	struct inode	*inode,
+	loff_t		*ppos,
+	ssize_t		bytes_written)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	xfs_fsize_t		isize = i_size_read(inode);
+
+	if (bytes_written > 0)
+		XFS_STATS_ADD(xs_write_bytes, bytes_written);
+
+	if (unlikely(bytes_written < 0 && bytes_written != -EFAULT &&
+					*ppos > isize))
+		*ppos = isize;
+
+	if (*ppos > ip->i_size) {
+		xfs_ilock(ip, XFS_ILOCK_EXCL);
+		if (*ppos > ip->i_size)
+			ip->i_size = *ppos;
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	}
+}
+
 STATIC ssize_t
 xfs_file_splice_write(
 	struct pipe_inode_info	*pipe,
@@ -331,7 +355,7 @@ xfs_file_splice_write(
 {
 	struct inode		*inode = outfilp->f_mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
-	xfs_fsize_t		isize, new_size;
+	xfs_fsize_t		new_size;
 	int			ioflags = 0;
 	ssize_t			ret;
 
@@ -355,19 +379,8 @@ xfs_file_splice_write(
 	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
 
 	ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
-	if (ret > 0)
-		XFS_STATS_ADD(xs_write_bytes, ret);
-
-	isize = i_size_read(inode);
-	if (unlikely(ret < 0 && ret != -EFAULT && *ppos > isize))
-		*ppos = isize;
 
-	if (*ppos > ip->i_size) {
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		if (*ppos > ip->i_size)
-			ip->i_size = *ppos;
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	}
+	xfs_aio_write_isize_update(inode, ppos, ret);
 
 	if (ip->i_new_size) {
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
@@ -576,7 +589,7 @@ xfs_file_aio_write(
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			ret = 0;
 	int			ioflags = 0;
-	xfs_fsize_t		isize, new_size;
+	xfs_fsize_t		new_size;
 	int			iolock;
 	size_t			ocount = 0, count;
 	int			need_i_mutex;
@@ -740,22 +753,11 @@ write_retry:
 
 	current->backing_dev_info = NULL;
 
-	isize = i_size_read(inode);
-	if (unlikely(ret < 0 && ret != -EFAULT && iocb->ki_pos > isize))
-		iocb->ki_pos = isize;
-
-	if (iocb->ki_pos > ip->i_size) {
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		if (iocb->ki_pos > ip->i_size)
-			ip->i_size = iocb->ki_pos;
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	}
+	xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
 
 	if (ret <= 0)
 		goto out_unlock_internal;
 
-	XFS_STATS_ADD(xs_write_bytes, ret);
-
 	/* Handle various SYNC-type writes */
 	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
 		loff_t end = pos + ret - 1;
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/8] xfs: factor common post-write isize handling code
  2011-01-07 11:30 ` [PATCH 2/8] xfs: factor common post-write isize handling code Dave Chinner
@ 2011-01-10 19:19   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2011-01-10 19:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/8] xfs: factor common post-write isize handling code
  2011-01-10 23:37 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V4 Dave Chinner
@ 2011-01-10 23:37 ` Dave Chinner
  0 siblings, 0 replies; 27+ messages in thread
From: Dave Chinner @ 2011-01-10 23:37 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/linux-2.6/xfs_file.c |   54 ++++++++++++++++++++++--------------------
 1 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 10b7fb4..b3915bf 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -321,6 +321,30 @@ xfs_file_splice_read(
 	return ret;
 }
 
+STATIC void
+xfs_aio_write_isize_update(
+	struct inode	*inode,
+	loff_t		*ppos,
+	ssize_t		bytes_written)
+{
+	struct xfs_inode	*ip = XFS_I(inode);
+	xfs_fsize_t		isize = i_size_read(inode);
+
+	if (bytes_written > 0)
+		XFS_STATS_ADD(xs_write_bytes, bytes_written);
+
+	if (unlikely(bytes_written < 0 && bytes_written != -EFAULT &&
+					*ppos > isize))
+		*ppos = isize;
+
+	if (*ppos > ip->i_size) {
+		xfs_ilock(ip, XFS_ILOCK_EXCL);
+		if (*ppos > ip->i_size)
+			ip->i_size = *ppos;
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	}
+}
+
 STATIC ssize_t
 xfs_file_splice_write(
 	struct pipe_inode_info	*pipe,
@@ -331,7 +355,7 @@ xfs_file_splice_write(
 {
 	struct inode		*inode = outfilp->f_mapping->host;
 	struct xfs_inode	*ip = XFS_I(inode);
-	xfs_fsize_t		isize, new_size;
+	xfs_fsize_t		new_size;
 	int			ioflags = 0;
 	ssize_t			ret;
 
@@ -355,19 +379,8 @@ xfs_file_splice_write(
 	trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
 
 	ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
-	if (ret > 0)
-		XFS_STATS_ADD(xs_write_bytes, ret);
-
-	isize = i_size_read(inode);
-	if (unlikely(ret < 0 && ret != -EFAULT && *ppos > isize))
-		*ppos = isize;
 
-	if (*ppos > ip->i_size) {
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		if (*ppos > ip->i_size)
-			ip->i_size = *ppos;
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	}
+	xfs_aio_write_isize_update(inode, ppos, ret);
 
 	if (ip->i_new_size) {
 		xfs_ilock(ip, XFS_ILOCK_EXCL);
@@ -576,7 +589,7 @@ xfs_file_aio_write(
 	struct xfs_mount	*mp = ip->i_mount;
 	ssize_t			ret = 0;
 	int			ioflags = 0;
-	xfs_fsize_t		isize, new_size;
+	xfs_fsize_t		new_size;
 	int			iolock;
 	size_t			ocount = 0, count;
 	int			need_i_mutex;
@@ -740,22 +753,11 @@ write_retry:
 
 	current->backing_dev_info = NULL;
 
-	isize = i_size_read(inode);
-	if (unlikely(ret < 0 && ret != -EFAULT && iocb->ki_pos > isize))
-		iocb->ki_pos = isize;
-
-	if (iocb->ki_pos > ip->i_size) {
-		xfs_ilock(ip, XFS_ILOCK_EXCL);
-		if (iocb->ki_pos > ip->i_size)
-			ip->i_size = iocb->ki_pos;
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	}
+	xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
 
 	if (ret <= 0)
 		goto out_unlock_internal;
 
-	XFS_STATS_ADD(xs_write_bytes, ret);
-
 	/* Handle various SYNC-type writes */
 	if ((file->f_flags & O_DSYNC) || IS_SYNC(inode)) {
 		loff_t end = pos + ret - 1;
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-01-10 23:36 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-04  4:48 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Dave Chinner
2011-01-04  4:48 ` [PATCH 1/8] xfs: ensure sync write errors are returned Dave Chinner
2011-01-05  1:53   ` Alex Elder
2011-01-07  8:45   ` Christoph Hellwig
2011-01-07  9:07     ` Dave Chinner
2011-01-04  4:48 ` [PATCH 2/8] xfs: factor common post-write isize handling code Dave Chinner
2011-01-05  1:54   ` Alex Elder
2011-01-04  4:48 ` [PATCH 4/8] xfs: introduce xfs_rw_lock() helpers for locking the inode Dave Chinner
2011-01-05  1:54   ` Alex Elder
2011-01-05  7:55     ` Dave Chinner
2011-01-04  4:48 ` [PATCH 5/8] xfs: split direct IO write path from xfs_file_aio_write Dave Chinner
2011-01-05  1:54   ` Alex Elder
2011-01-05  7:36     ` Dave Chinner
2011-01-07  8:58   ` Christoph Hellwig
2011-01-07  9:21     ` Dave Chinner
2011-01-04  4:48 ` [PATCH 6/8] xfs: split buffered " Dave Chinner
2011-01-05  1:55   ` Alex Elder
2011-01-04  4:48 ` [PATCH 7/8] xfs: factor common write setup code Dave Chinner
2011-01-05  1:55   ` Alex Elder
2011-01-07  8:53   ` Christoph Hellwig
2011-01-07  9:20     ` Dave Chinner
2011-01-04  4:48 ` [PATCH 8/8] xfs: serialise unaligned direct IOs Dave Chinner
2011-01-05  1:55   ` Alex Elder
2011-01-05  1:53 ` [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V2 Alex Elder
  -- strict thread matches above, loose matches on Subject: below --
2011-01-07 11:30 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V3 Dave Chinner
2011-01-07 11:30 ` [PATCH 2/8] xfs: factor common post-write isize handling code Dave Chinner
2011-01-10 19:19   ` Christoph Hellwig
2011-01-10 23:37 [PATCH 0/8] xfs: prevent corruption due to overlapping AIO DIO V4 Dave Chinner
2011-01-10 23:37 ` [PATCH 2/8] xfs: factor common post-write isize handling code Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox