* [PATCH 1/7] xfs: ensure sync write errors are returned
2010-12-15 1:23 [PATCH 0/7] xfs: serialise unaligned direct IOs Dave Chinner
@ 2010-12-15 1:23 ` Dave Chinner
2010-12-16 11:57 ` Christoph Hellwig
2010-12-16 20:50 ` Alex Elder
2010-12-15 1:23 ` [PATCH 2/7] xfs: factor common post-write isize handling code Dave Chinner
` (4 subsequent siblings)
5 siblings, 2 replies; 15+ messages in thread
From: Dave Chinner @ 2010-12-15 1:23 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
xfs_file_aio_write() only returns the error from synchronous
flushing of the data and inode if error == 0. At the point where
error is being checked, it is guaranteed to be > 0. Therefore any
errors returned by the data or fsync flush will never be returned.
Fix the checks so we overwrite the current error once and only if an
error really occurred.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/linux-2.6/xfs_file.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index ba8ad42..e1eaec2 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -769,7 +769,7 @@ write_retry:
mutex_unlock(&inode->i_mutex);
error2 = filemap_write_and_wait_range(mapping, pos, end);
- if (!error)
+ if (error2)
error = error2;
if (need_i_mutex)
mutex_lock(&inode->i_mutex);
@@ -777,7 +777,7 @@ write_retry:
error2 = -xfs_file_fsync(file,
(file->f_flags & __O_SYNC) ? 0 : 1);
- if (!error)
+ if (error2 && error >= 0)
error = error2;
}
--
1.7.2.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH 1/7] xfs: ensure sync write errors are returned
2010-12-15 1:23 ` [PATCH 1/7] xfs: ensure sync write errors are returned Dave Chinner
@ 2010-12-16 11:57 ` Christoph Hellwig
2010-12-16 21:57 ` Dave Chinner
2010-12-16 20:50 ` Alex Elder
1 sibling, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2010-12-16 11:57 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> error2 = filemap_write_and_wait_range(mapping, pos, end);
> - if (!error)
> + if (error2)
> error = error2;
> if (need_i_mutex)
> mutex_lock(&inode->i_mutex);
> @@ -777,7 +777,7 @@ write_retry:
>
> error2 = -xfs_file_fsync(file,
> (file->f_flags & __O_SYNC) ? 0 : 1);
> - if (!error)
> + if (error2 && error >= 0)
> error = error2;
Shouldn't the filemap_write_and_wait_range clause use a similar check as
this one?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/7] xfs: ensure sync write errors are returned
2010-12-16 11:57 ` Christoph Hellwig
@ 2010-12-16 21:57 ` Dave Chinner
0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-12-16 21:57 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 16, 2010 at 06:57:38AM -0500, Christoph Hellwig wrote:
> > error2 = filemap_write_and_wait_range(mapping, pos, end);
> > - if (!error)
> > + if (error2)
> > error = error2;
> > if (need_i_mutex)
> > mutex_lock(&inode->i_mutex);
> > @@ -777,7 +777,7 @@ write_retry:
> >
> > error2 = -xfs_file_fsync(file,
> > (file->f_flags & __O_SYNC) ? 0 : 1);
> > - if (!error)
> > + if (error2 && error >= 0)
> > error = error2;
>
> Shouldn't the filemap_write_and_wait_range clause use a similar check as
> this one?
No need, because when we enter the branch, error is guaranteed to be
greater than zero as the code jumps over this branch if it is <= 0.
Hence we only need to overwrite error with error2 from
filemap_write_and_wait_range() if error2 actually contains an error
(i.e. < 0).
The second hunk then has to deal with the fact that error can have
any value, and we only want to overwrite error if it doesn't already
contain an error value and error2 does. Hence the check for error >=
0 to ensure we don't overwrite an error from
filemap_write_and_wait_range().
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/7] xfs: ensure sync write errors are returned
2010-12-15 1:23 ` [PATCH 1/7] xfs: ensure sync write errors are returned Dave Chinner
2010-12-16 11:57 ` Christoph Hellwig
@ 2010-12-16 20:50 ` Alex Elder
2010-12-17 6:43 ` Dave Chinner
1 sibling, 1 reply; 15+ messages in thread
From: Alex Elder @ 2010-12-16 20:50 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, 2010-12-15 at 12:23 +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> xfs_file_aio_write() only returns the error from synchronous
> flushing of the data and inode if error == 0. At the point where
> error is being checked, it is guaranteed to be > 0. Therefore any
Actually, we have this above the affected code:
...
error = -ret;
if (ret <= 0)
goto out_unlock_internal;
So ret must be positive, therefore error is negative
(the negative of the number of bytes written). In
other words, we enter this block without having seen
an error.
The return at the end of the function is:
return -error;
And by the convoluted logic here, that means that the
value of error should be a negative byte count if
successful, or a positive errno otherwise.
And since filemap_write_and_wait_range() returns a
negative errno, your fix doesn't look right to me.
The existing code is wrong and should be fixed, but a
better fix might make the meaning of the variable "error"
a little less weird.
-Alex
> errors returned by the data or fsync flush will never be returned.
> Fix the checks so we overwrite the current error once and only if an
> error really occurred.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/xfs/linux-2.6/xfs_file.c | 4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
> index ba8ad42..e1eaec2 100644
> --- a/fs/xfs/linux-2.6/xfs_file.c
> +++ b/fs/xfs/linux-2.6/xfs_file.c
> @@ -769,7 +769,7 @@ write_retry:
> mutex_unlock(&inode->i_mutex);
>
> error2 = filemap_write_and_wait_range(mapping, pos, end);
> - if (!error)
> + if (error2)
> error = error2;
> if (need_i_mutex)
> mutex_lock(&inode->i_mutex);
> @@ -777,7 +777,7 @@ write_retry:
>
> error2 = -xfs_file_fsync(file,
> (file->f_flags & __O_SYNC) ? 0 : 1);
> - if (!error)
> + if (error2 && error >= 0)
> error = error2;
> }
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH 1/7] xfs: ensure sync write errors are returned
2010-12-16 20:50 ` Alex Elder
@ 2010-12-17 6:43 ` Dave Chinner
0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-12-17 6:43 UTC (permalink / raw)
To: Alex Elder; +Cc: xfs
On Thu, Dec 16, 2010 at 02:50:49PM -0600, Alex Elder wrote:
> On Wed, 2010-12-15 at 12:23 +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > xfs_file_aio_write() only returns the error from synchronous
> > flushing of the data and inode if error == 0. At the point where
> > error is being checked, it is guaranteed to be > 0. Therefore any
>
> Actually, we have this above the affected code:
> ...
> error = -ret;
> if (ret <= 0)
> goto out_unlock_internal;
>
> So ret must be positive, therefore error is negative
> (the negative of the number of bytes written). In
> other words, we enter this block without having seen
> an error.
>
> The return at the end of the function is:
> return -error;
>
> And by the convoluted logic here, that means that the
> value of error should be a negative byte count if
> successful, or a positive errno otherwise.
Oh, right, yeah, I screwed that up, didn't I?
> And since filemap_write_and_wait_range() returns a
> negative errno, your fix doesn't look right to me.
I will fix it up.
> The existing code is wrong and should be fixed, but a
> better fix might make the meaning of the variable "error"
> a little less weird.
The real problem is that xfs functions return positive errors, and
the linux functions return negative errors. It would be much less
of a hassle if we fixed that problem...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/7] xfs: factor common post-write isize handling code
2010-12-15 1:23 [PATCH 0/7] xfs: serialise unaligned direct IOs Dave Chinner
2010-12-15 1:23 ` [PATCH 1/7] xfs: ensure sync write errors are returned Dave Chinner
@ 2010-12-15 1:23 ` Dave Chinner
2010-12-15 1:23 ` [PATCH 4/7] xfs: split direct IO write path from xfs_file_aio_write Dave Chinner
` (3 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-12-15 1:23 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/linux-2.6/xfs_file.c | 52 +++++++++++++++++++++++-------------------
1 files changed, 28 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index e1eaec2..42adfff 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -321,6 +321,30 @@ xfs_file_splice_read(
return ret;
}
+STATIC void
+xfs_aio_write_isize_update(
+ struct inode *inode,
+ loff_t *ppos,
+ ssize_t bytes_written)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ xfs_fsize_t isize = i_size_read(inode);
+
+ if (bytes_written > 0)
+ XFS_STATS_ADD(xs_write_bytes, bytes_written);
+
+ if (unlikely(bytes_written < 0 && bytes_written != -EFAULT &&
+ *ppos > isize))
+ *ppos = isize;
+
+ if (*ppos > ip->i_size) {
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ if (*ppos > ip->i_size)
+ ip->i_size = *ppos;
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ }
+}
+
STATIC ssize_t
xfs_file_splice_write(
struct pipe_inode_info *pipe,
@@ -331,7 +355,7 @@ xfs_file_splice_write(
{
struct inode *inode = outfilp->f_mapping->host;
struct xfs_inode *ip = XFS_I(inode);
- xfs_fsize_t isize, new_size;
+ xfs_fsize_t new_size;
int ioflags = 0;
ssize_t ret;
@@ -355,19 +379,8 @@ xfs_file_splice_write(
trace_xfs_file_splice_write(ip, count, *ppos, ioflags);
ret = generic_file_splice_write(pipe, outfilp, ppos, count, flags);
- if (ret > 0)
- XFS_STATS_ADD(xs_write_bytes, ret);
-
- isize = i_size_read(inode);
- if (unlikely(ret < 0 && ret != -EFAULT && *ppos > isize))
- *ppos = isize;
- if (*ppos > ip->i_size) {
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- if (*ppos > ip->i_size)
- ip->i_size = *ppos;
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- }
+ xfs_aio_write_isize_update(inode, ppos, ret);
if (ip->i_new_size) {
xfs_ilock(ip, XFS_ILOCK_EXCL);
@@ -576,7 +589,7 @@ xfs_file_aio_write(
struct xfs_mount *mp = ip->i_mount;
ssize_t ret = 0, error = 0;
int ioflags = 0;
- xfs_fsize_t isize, new_size;
+ xfs_fsize_t new_size;
int iolock;
size_t ocount = 0, count;
int need_i_mutex;
@@ -742,16 +755,7 @@ write_retry:
current->backing_dev_info = NULL;
- isize = i_size_read(inode);
- if (unlikely(ret < 0 && ret != -EFAULT && iocb->ki_pos > isize))
- iocb->ki_pos = isize;
-
- if (iocb->ki_pos > ip->i_size) {
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- if (iocb->ki_pos > ip->i_size)
- ip->i_size = iocb->ki_pos;
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- }
+ xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
error = -ret;
if (ret <= 0)
--
1.7.2.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH 4/7] xfs: split direct IO write path from xfs_file_aio_write
2010-12-15 1:23 [PATCH 0/7] xfs: serialise unaligned direct IOs Dave Chinner
2010-12-15 1:23 ` [PATCH 1/7] xfs: ensure sync write errors are returned Dave Chinner
2010-12-15 1:23 ` [PATCH 2/7] xfs: factor common post-write isize handling code Dave Chinner
@ 2010-12-15 1:23 ` Dave Chinner
2010-12-16 12:06 ` Christoph Hellwig
2010-12-15 1:23 ` [PATCH 5/7] xfs: split buffered " Dave Chinner
` (2 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2010-12-15 1:23 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The current xfs_file_aio_write code is a mess of locking shenanigans
to handle the different locking requirements of buffered and direct
IO. Start to clean this up by disentangling the direct IO path from
the mess.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/linux-2.6/xfs_file.c | 184 +++++++++++++++++++++++++++---------------
1 files changed, 118 insertions(+), 66 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 4608aab..d70a249 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -587,6 +587,116 @@ out_lock:
return error;
}
+/*
+ * xfs_file_dio_aio_write - handle direct IO writes
+ *
+ * Lock the inode appropriately to prepare for and issue a direct IO write.
+ * By spearating it from the buffered write path we remove all the tricky to
+ * follow locking changes and looping. This also clearly indicates that XFS
+ * does not fall back to buffered IO in the direct IO write path.
+ *
+ * Returns with locks held indicated by @need_i_mutex and @iolock.
+ */
+STATIC ssize_t
+xfs_file_dio_aio_write(
+ struct kiocb *iocb,
+ const struct iovec *iovp,
+ unsigned long nr_segs,
+ loff_t pos,
+ size_t ocount,
+ int *need_i_mutex,
+ int *iolock)
+{
+ struct file *file = iocb->ki_filp;
+ struct address_space *mapping = file->f_mapping;
+ struct inode *inode = mapping->host;
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_mount *mp = ip->i_mount;
+ ssize_t error = 0;
+ xfs_fsize_t new_size;
+ size_t count = ocount;
+ xfs_buftarg_t *target = XFS_IS_REALTIME_INODE(ip) ?
+ mp->m_rtdev_targp : mp->m_ddev_targp;
+
+ *need_i_mutex = 0;
+ *iolock = 0;
+ if ((pos & target->bt_smask) || (count & target->bt_smask))
+ return XFS_ERROR(-EINVAL);
+
+ if (mapping->nrpages || pos > ip->i_size) {
+ *iolock = XFS_IOLOCK_EXCL;
+ *need_i_mutex = 1;
+ mutex_lock(&inode->i_mutex);
+ } else {
+ *iolock = XFS_IOLOCK_SHARED;
+ }
+ xfs_ilock(ip, XFS_ILOCK_EXCL|*iolock);
+
+ error = generic_write_checks(file, &pos, &count,
+ S_ISBLK(inode->i_mode));
+ if (error) {
+ xfs_iunlock(ip, XFS_ILOCK_EXCL|*iolock);
+ *iolock = 0;
+ return error;
+ }
+
+ new_size = pos + count;
+ if (new_size > ip->i_size)
+ ip->i_new_size = new_size;
+
+ if (likely(!(file->f_mode & FMODE_NOCMTIME)))
+ file_update_time(file);
+
+ /*
+ * If the offset is beyond the size of the file, we have a couple of
+ * things to do. First, if there is already space allocated we need to
+ * either create holes or zero the disk or ...
+ *
+ * If there is a page where the previous size lands, we need to zero it
+ * out up to the new size.
+ */
+ if (pos > ip->i_size) {
+ error = -xfs_zero_eof(ip, pos, ip->i_size);
+ if (error) {
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ return error;
+ }
+ }
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+ /*
+ * If we're writing the file then make sure to clear the setuid and
+ * setgid bits if the process is not being run by root. This keeps
+ * people from modifying setuid and setgid binaries.
+ */
+ error = file_remove_suid(file);
+ if (unlikely(error))
+ return error;
+
+ if (mapping->nrpages) {
+ WARN_ON(*need_i_mutex == 0);
+ error = -xfs_flushinval_pages(ip, (pos & PAGE_CACHE_MASK), -1,
+ FI_REMAPF_LOCKED);
+ if (error)
+ return error;
+ }
+
+ if (*need_i_mutex) {
+ /* demote the lock now the cached pages are gone */
+ xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
+ mutex_unlock(&inode->i_mutex);
+ *iolock = XFS_IOLOCK_SHARED;
+ *need_i_mutex = 0;
+ }
+
+ trace_xfs_file_direct_write(ip, count, iocb->ki_pos, 0);
+ error = generic_file_direct_write(iocb, iovp,
+ &nr_segs, pos, &iocb->ki_pos, count, ocount);
+
+ /* No fallback to buffered IO on errors for XFS. */
+ return error;
+}
+
STATIC ssize_t
xfs_file_aio_write(
struct kiocb *iocb,
@@ -628,19 +738,17 @@ xfs_file_aio_write(
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;
-relock:
if (ioflags & IO_ISDIRECT) {
- iolock = XFS_IOLOCK_SHARED;
- need_i_mutex = 0;
- } else {
- iolock = XFS_IOLOCK_EXCL;
- need_i_mutex = 1;
- mutex_lock(&inode->i_mutex);
+ ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
+ ocount, &need_i_mutex, &iolock);
+ goto done_io;
}
+ iolock = XFS_IOLOCK_EXCL;
+ need_i_mutex = 1;
+ mutex_lock(&inode->i_mutex);
xfs_ilock(ip, XFS_ILOCK_EXCL|iolock);
-start:
error = -generic_write_checks(file, &pos, &count,
S_ISBLK(inode->i_mode));
if (error) {
@@ -648,26 +756,6 @@ start:
goto out_unlock_mutex;
}
- if (ioflags & IO_ISDIRECT) {
- xfs_buftarg_t *target =
- XFS_IS_REALTIME_INODE(ip) ?
- mp->m_rtdev_targp : mp->m_ddev_targp;
-
- if ((pos & target->bt_smask) || (count & target->bt_smask)) {
- xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
- return XFS_ERROR(-EINVAL);
- }
-
- if (!need_i_mutex && (mapping->nrpages || pos > ip->i_size)) {
- xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
- iolock = XFS_IOLOCK_EXCL;
- need_i_mutex = 1;
- mutex_lock(&inode->i_mutex);
- xfs_ilock(ip, XFS_ILOCK_EXCL|iolock);
- goto start;
- }
- }
-
new_size = pos + count;
if (new_size > ip->i_size)
ip->i_new_size = new_size;
@@ -706,44 +794,7 @@ start:
/* We can write back this queue in page reclaim */
current->backing_dev_info = mapping->backing_dev_info;
- if ((ioflags & IO_ISDIRECT)) {
- if (mapping->nrpages) {
- WARN_ON(need_i_mutex == 0);
- error = xfs_flushinval_pages(ip,
- (pos & PAGE_CACHE_MASK),
- -1, FI_REMAPF_LOCKED);
- if (error)
- goto out_unlock_internal;
- }
-
- if (need_i_mutex) {
- /* demote the lock now the cached pages are gone */
- xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
- mutex_unlock(&inode->i_mutex);
-
- iolock = XFS_IOLOCK_SHARED;
- need_i_mutex = 0;
- }
-
- trace_xfs_file_direct_write(ip, count, iocb->ki_pos, ioflags);
- ret = generic_file_direct_write(iocb, iovp,
- &nr_segs, pos, &iocb->ki_pos, count, ocount);
-
- /*
- * direct-io write to a hole: fall through to buffered I/O
- * for completing the rest of the request.
- */
- if (ret >= 0 && ret != count) {
- XFS_STATS_ADD(xs_write_bytes, ret);
-
- pos += ret;
- count -= ret;
-
- ioflags &= ~IO_ISDIRECT;
- xfs_iunlock(ip, iolock);
- goto relock;
- }
- } else {
+ if (!(ioflags & IO_ISDIRECT)) {
int enospc = 0;
ssize_t ret2 = 0;
@@ -767,6 +818,7 @@ write_retry:
current->backing_dev_info = NULL;
+done_io:
xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
error = -ret;
--
1.7.2.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH 4/7] xfs: split direct IO write path from xfs_file_aio_write
2010-12-15 1:23 ` [PATCH 4/7] xfs: split direct IO write path from xfs_file_aio_write Dave Chinner
@ 2010-12-16 12:06 ` Christoph Hellwig
2010-12-17 7:31 ` Dave Chinner
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2010-12-16 12:06 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> +STATIC ssize_t
> +xfs_file_dio_aio_write(
> + struct kiocb *iocb,
> + const struct iovec *iovp,
> + unsigned long nr_segs,
> + loff_t pos,
> + size_t ocount,
> + int *need_i_mutex,
> + int *iolock)
need_i_mutex == 1 is equivalent to iolock == XFS_IOLOCK_EXCL, I think
it's a good idea to throw in a patch to remove the need_i_mutex variable
before this patch.
Maybe that patch can even add xfs_rw_lock or similar helpers to
manipulate the i_mutex, the iolock, and the iolock variable together?
Speaking of that, shouldn't xfs_file_aio_read also take the iolock
exclusive during the page invalidation and then demote it, just like
the write case? The above helpers would enforce that nicely.
> + if ((pos & target->bt_smask) || (count & target->bt_smask))
> + return XFS_ERROR(-EINVAL);
For the error traps to work this needs to be
-XFS_ERROR(EINVAL);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/7] xfs: split direct IO write path from xfs_file_aio_write
2010-12-16 12:06 ` Christoph Hellwig
@ 2010-12-17 7:31 ` Dave Chinner
2010-12-20 11:29 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2010-12-17 7:31 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Thu, Dec 16, 2010 at 07:06:29AM -0500, Christoph Hellwig wrote:
> > +STATIC ssize_t
> > +xfs_file_dio_aio_write(
> > + struct kiocb *iocb,
> > + const struct iovec *iovp,
> > + unsigned long nr_segs,
> > + loff_t pos,
> > + size_t ocount,
> > + int *need_i_mutex,
> > + int *iolock)
>
> need_i_mutex == 1 is equivalent to iolock == XFS_IOLOCK_EXCL, I think
> it's a good idea to throw in a patch to remove the need_i_mutex variable
> before this patch.
>
> Maybe that patch can even add xfs_rw_lock or similar helpers to
> manipulate the i_mutex, the iolock, and the iolock variable together?
That sounds like a fine idea.
> Speaking of that, shouldn't xfs_file_aio_read also take the iolock
> exclusive during the page invalidation and then demote it, just like
> the write case? The above helpers would enforce that nicely.
Probably, though it might be best to leave that to another cleanup
series. I'll see how much perturbation of the read path it makes....
> > + if ((pos & target->bt_smask) || (count & target->bt_smask))
> > + return XFS_ERROR(-EINVAL);
>
> For the error traps to work this needs to be
>
> -XFS_ERROR(EINVAL);
Ok, will fix. I just copied the existing code.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/7] xfs: split direct IO write path from xfs_file_aio_write
2010-12-17 7:31 ` Dave Chinner
@ 2010-12-20 11:29 ` Christoph Hellwig
2010-12-21 0:51 ` Dave Chinner
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2010-12-20 11:29 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Fri, Dec 17, 2010 at 06:31:25PM +1100, Dave Chinner wrote:
> > Speaking of that, shouldn't xfs_file_aio_read also take the iolock
> > exclusive during the page invalidation and then demote it, just like
> > the write case? The above helpers would enforce that nicely.
>
> Probably, though it might be best to leave that to another cleanup
> series. I'll see how much perturbation of the read path it makes....
Yes, it should be a separate patch for sure. If you prefer another
series that's fine with me, too.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/7] xfs: split direct IO write path from xfs_file_aio_write
2010-12-20 11:29 ` Christoph Hellwig
@ 2010-12-21 0:51 ` Dave Chinner
0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-12-21 0:51 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Mon, Dec 20, 2010 at 06:29:47AM -0500, Christoph Hellwig wrote:
> On Fri, Dec 17, 2010 at 06:31:25PM +1100, Dave Chinner wrote:
> > > Speaking of that, shouldn't xfs_file_aio_read also take the iolock
> > > exclusive during the page invalidation and then demote it, just like
> > > the write case? The above helpers would enforce that nicely.
> >
> > Probably, though it might be best to leave that to another cleanup
> > series. I'll see how much perturbation of the read path it makes....
>
> Yes, it should be a separate patch for sure. If you prefer another
> series that's fine with me, too.
Turns out to be pretty trivial to do - I included it in the
new xfs_rw_ilock conversion patch for the moment.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 5/7] xfs: split buffered IO write path from xfs_file_aio_write
2010-12-15 1:23 [PATCH 0/7] xfs: serialise unaligned direct IOs Dave Chinner
` (2 preceding siblings ...)
2010-12-15 1:23 ` [PATCH 4/7] xfs: split direct IO write path from xfs_file_aio_write Dave Chinner
@ 2010-12-15 1:23 ` Dave Chinner
2010-12-15 1:23 ` [PATCH 6/7] xfs: factor common write setup code Dave Chinner
2010-12-15 1:23 ` [PATCH 7/7] xfs: serialise unaligned direct IOs Dave Chinner
5 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-12-15 1:23 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Complete the split of the different write IO paths by splitting the
buffered IO write path out of xfs_file_aio_write(). This makes the
different mechanisms of the write patchs easier to follow.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/linux-2.6/xfs_file.c | 160 ++++++++++++++++++++-----------------------
1 files changed, 75 insertions(+), 85 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index d70a249..e10e703 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -698,132 +698,123 @@ xfs_file_dio_aio_write(
}
STATIC ssize_t
-xfs_file_aio_write(
+xfs_file_buffered_aio_write(
struct kiocb *iocb,
const struct iovec *iovp,
unsigned long nr_segs,
- loff_t pos)
+ loff_t pos,
+ size_t ocount,
+ int *need_i_mutex,
+ int *iolock)
{
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
struct xfs_inode *ip = XFS_I(inode);
- struct xfs_mount *mp = ip->i_mount;
ssize_t ret = 0, error = 0;
- int ioflags = 0;
+ int enospc = 0;
xfs_fsize_t new_size;
- int iolock;
- size_t ocount = 0, count;
- int need_i_mutex;
-
- XFS_STATS_INC(xs_write_calls);
-
- BUG_ON(iocb->ki_pos != pos);
-
- if (unlikely(file->f_flags & O_DIRECT))
- ioflags |= IO_ISDIRECT;
- if (file->f_mode & FMODE_NOCMTIME)
- ioflags |= IO_INVIS;
-
- error = generic_segment_checks(iovp, &nr_segs, &ocount, VERIFY_READ);
- if (error)
- return error;
-
- count = ocount;
- if (count == 0)
- return 0;
-
- xfs_wait_for_freeze(mp, SB_FREEZE_WRITE);
-
- if (XFS_FORCED_SHUTDOWN(mp))
- return -EIO;
-
- if (ioflags & IO_ISDIRECT) {
- ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
- ocount, &need_i_mutex, &iolock);
- goto done_io;
- }
+ size_t count = ocount;
- iolock = XFS_IOLOCK_EXCL;
- need_i_mutex = 1;
+ *iolock = XFS_IOLOCK_EXCL;
+ *need_i_mutex = 1;
mutex_lock(&inode->i_mutex);
- xfs_ilock(ip, XFS_ILOCK_EXCL|iolock);
+ xfs_ilock(ip, XFS_ILOCK_EXCL|*iolock);
- error = -generic_write_checks(file, &pos, &count,
+ error = generic_write_checks(file, &pos, &count,
S_ISBLK(inode->i_mode));
if (error) {
- xfs_iunlock(ip, XFS_ILOCK_EXCL|iolock);
- goto out_unlock_mutex;
+ xfs_iunlock(ip, XFS_ILOCK_EXCL|*iolock);
+ *iolock = 0;
+ return error;
}
new_size = pos + count;
if (new_size > ip->i_size)
ip->i_new_size = new_size;
- if (likely(!(ioflags & IO_INVIS)))
+ if (likely(!(file->f_mode & FMODE_NOCMTIME)))
file_update_time(file);
- /*
- * If the offset is beyond the size of the file, we have a couple
- * of things to do. First, if there is already space allocated
- * we need to either create holes or zero the disk or ...
- *
- * If there is a page where the previous size lands, we need
- * to zero it out up to the new size.
- */
-
if (pos > ip->i_size) {
- error = xfs_zero_eof(ip, pos, ip->i_size);
+ error = -xfs_zero_eof(ip, pos, ip->i_size);
if (error) {
xfs_iunlock(ip, XFS_ILOCK_EXCL);
- goto out_unlock_internal;
+ return error;
}
}
xfs_iunlock(ip, XFS_ILOCK_EXCL);
- /*
- * If we're writing the file then make sure to clear the
- * setuid and setgid bits if the process is not being run
- * by root. This keeps people from modifying setuid and
- * setgid binaries.
- */
- error = -file_remove_suid(file);
+ error = file_remove_suid(file);
if (unlikely(error))
- goto out_unlock_internal;
+ return error;
/* We can write back this queue in page reclaim */
current->backing_dev_info = mapping->backing_dev_info;
- if (!(ioflags & IO_ISDIRECT)) {
- int enospc = 0;
- ssize_t ret2 = 0;
-
write_retry:
- trace_xfs_file_buffered_write(ip, count, iocb->ki_pos, ioflags);
- ret2 = generic_file_buffered_write(iocb, iovp, nr_segs,
- pos, &iocb->ki_pos, count, ret);
- /*
- * if we just got an ENOSPC, flush the inode now we
- * aren't holding any page locks and retry *once*
- */
- if (ret2 == -ENOSPC && !enospc) {
- error = xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
- if (error)
- goto out_unlock_internal;
- enospc = 1;
- goto write_retry;
- }
- ret = ret2;
+ trace_xfs_file_buffered_write(ip, count, iocb->ki_pos, 0);
+ error = generic_file_buffered_write(iocb, iovp, nr_segs,
+ pos, &iocb->ki_pos, count, ret);
+ /*
+ * if we just got an ENOSPC, flush the inode now we aren't holding any
+ * page locks and retry *once*
+ */
+ if (error == -ENOSPC && !enospc) {
+ error = -xfs_flush_pages(ip, 0, -1, 0, FI_NONE);
+ if (error)
+ return error;
+ enospc = 1;
+ goto write_retry;
}
-
current->backing_dev_info = NULL;
+ return error;
+}
+
+STATIC ssize_t
+xfs_file_aio_write(
+ struct kiocb *iocb,
+ const struct iovec *iovp,
+ unsigned long nr_segs,
+ loff_t pos)
+{
+ struct file *file = iocb->ki_filp;
+ struct address_space *mapping = file->f_mapping;
+ struct inode *inode = mapping->host;
+ struct xfs_inode *ip = XFS_I(inode);
+ ssize_t ret = 0, error = 0;
+ int iolock;
+ size_t ocount = 0;
+ int need_i_mutex;
+
+ XFS_STATS_INC(xs_write_calls);
+
+ BUG_ON(iocb->ki_pos != pos);
+
+ error = generic_segment_checks(iovp, &nr_segs, &ocount, VERIFY_READ);
+ if (error)
+ return error;
+
+ if (ocount == 0)
+ return 0;
+
+ xfs_wait_for_freeze(ip->i_mount, SB_FREEZE_WRITE);
+
+ if (XFS_FORCED_SHUTDOWN(ip->i_mount))
+ return -EIO;
+
+ if (unlikely(file->f_flags & O_DIRECT))
+ ret = xfs_file_dio_aio_write(iocb, iovp, nr_segs, pos,
+ ocount, &need_i_mutex, &iolock);
+ else
+ ret = xfs_file_buffered_aio_write(iocb, iovp, nr_segs, pos,
+ ocount, &need_i_mutex, &iolock);
-done_io:
xfs_aio_write_isize_update(inode, &iocb->ki_pos, ret);
error = -ret;
if (ret <= 0)
- goto out_unlock_internal;
+ goto out_unlock;
XFS_STATS_ADD(xs_write_bytes, ret);
@@ -849,10 +840,9 @@ done_io:
error = error2;
}
- out_unlock_internal:
+ out_unlock:
xfs_aio_write_newsize_update(ip);
xfs_iunlock(ip, iolock);
- out_unlock_mutex:
if (need_i_mutex)
mutex_unlock(&inode->i_mutex);
return -error;
--
1.7.2.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH 6/7] xfs: factor common write setup code
2010-12-15 1:23 [PATCH 0/7] xfs: serialise unaligned direct IOs Dave Chinner
` (3 preceding siblings ...)
2010-12-15 1:23 ` [PATCH 5/7] xfs: split buffered " Dave Chinner
@ 2010-12-15 1:23 ` Dave Chinner
2010-12-15 1:23 ` [PATCH 7/7] xfs: serialise unaligned direct IOs Dave Chinner
5 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-12-15 1:23 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The buffered IO and direct Io write paths share a common set of
checks and limiting code prior to issuing the write. Factor that
into a common helper function.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/linux-2.6/xfs_file.c | 130 ++++++++++++++++++++-----------------------
1 files changed, 61 insertions(+), 69 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index e10e703..269da12 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -588,6 +588,63 @@ out_lock:
}
/*
+ * Common pre-write limit and setup checks.
+ *
+ * Returns with iolock held according to @iolock.
+ */
+STATIC ssize_t
+xfs_file_aio_write_checks(
+ struct file *file,
+ loff_t *pos,
+ size_t *count,
+ int *iolock)
+{
+ struct inode *inode = file->f_mapping->host;
+ struct xfs_inode *ip = XFS_I(inode);
+ xfs_fsize_t new_size;
+ int error;
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL|*iolock);
+
+ error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
+ if (error) {
+ xfs_iunlock(ip, XFS_ILOCK_EXCL|*iolock);
+ *iolock = 0;
+ return error;
+ }
+
+ new_size = *pos + *count;
+ if (new_size > ip->i_size)
+ ip->i_new_size = new_size;
+
+ if (likely(!(file->f_mode & FMODE_NOCMTIME)))
+ file_update_time(file);
+
+ /*
+ * If the offset is beyond the size of the file, we need to zero any
+ * blocks that fall between the existing EOF and the start of this
+ * write.
+ */
+ if (*pos > ip->i_size) {
+ error = -xfs_zero_eof(ip, *pos, ip->i_size);
+ if (error) {
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+ return error;
+ }
+ }
+ xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
+ /*
+ * If we're writing the file then make sure to clear the setuid and
+ * setgid bits if the process is not being run by root. This keeps
+ * people from modifying setuid and setgid binaries.
+ */
+ error = file_remove_suid(file);
+ return error;
+
+}
+
+/*
* xfs_file_dio_aio_write - handle direct IO writes
*
* Lock the inode appropriately to prepare for and issue a direct IO write.
@@ -613,7 +670,6 @@ xfs_file_dio_aio_write(
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
ssize_t error = 0;
- xfs_fsize_t new_size;
size_t count = ocount;
xfs_buftarg_t *target = XFS_IS_REALTIME_INODE(ip) ?
mp->m_rtdev_targp : mp->m_ddev_targp;
@@ -630,47 +686,9 @@ xfs_file_dio_aio_write(
} else {
*iolock = XFS_IOLOCK_SHARED;
}
- xfs_ilock(ip, XFS_ILOCK_EXCL|*iolock);
-
- error = generic_write_checks(file, &pos, &count,
- S_ISBLK(inode->i_mode));
- if (error) {
- xfs_iunlock(ip, XFS_ILOCK_EXCL|*iolock);
- *iolock = 0;
- return error;
- }
- new_size = pos + count;
- if (new_size > ip->i_size)
- ip->i_new_size = new_size;
-
- if (likely(!(file->f_mode & FMODE_NOCMTIME)))
- file_update_time(file);
-
- /*
- * If the offset is beyond the size of the file, we have a couple of
- * things to do. First, if there is already space allocated we need to
- * either create holes or zero the disk or ...
- *
- * If there is a page where the previous size lands, we need to zero it
- * out up to the new size.
- */
- if (pos > ip->i_size) {
- error = -xfs_zero_eof(ip, pos, ip->i_size);
- if (error) {
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- return error;
- }
- }
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
- /*
- * If we're writing the file then make sure to clear the setuid and
- * setgid bits if the process is not being run by root. This keeps
- * people from modifying setuid and setgid binaries.
- */
- error = file_remove_suid(file);
- if (unlikely(error))
+ error = xfs_file_aio_write_checks(file, &pos, &count, iolock);
+ if (error)
return error;
if (mapping->nrpages) {
@@ -713,40 +731,14 @@ xfs_file_buffered_aio_write(
struct xfs_inode *ip = XFS_I(inode);
ssize_t ret = 0, error = 0;
int enospc = 0;
- xfs_fsize_t new_size;
size_t count = ocount;
*iolock = XFS_IOLOCK_EXCL;
*need_i_mutex = 1;
mutex_lock(&inode->i_mutex);
- xfs_ilock(ip, XFS_ILOCK_EXCL|*iolock);
-
- error = generic_write_checks(file, &pos, &count,
- S_ISBLK(inode->i_mode));
- if (error) {
- xfs_iunlock(ip, XFS_ILOCK_EXCL|*iolock);
- *iolock = 0;
- return error;
- }
-
- new_size = pos + count;
- if (new_size > ip->i_size)
- ip->i_new_size = new_size;
- if (likely(!(file->f_mode & FMODE_NOCMTIME)))
- file_update_time(file);
-
- if (pos > ip->i_size) {
- error = -xfs_zero_eof(ip, pos, ip->i_size);
- if (error) {
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
- return error;
- }
- }
- xfs_iunlock(ip, XFS_ILOCK_EXCL);
-
- error = file_remove_suid(file);
- if (unlikely(error))
+ error = xfs_file_aio_write_checks(file, &pos, &count, iolock);
+ if (error)
return error;
/* We can write back this queue in page reclaim */
--
1.7.2.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 15+ messages in thread* [PATCH 7/7] xfs: serialise unaligned direct IOs
2010-12-15 1:23 [PATCH 0/7] xfs: serialise unaligned direct IOs Dave Chinner
` (4 preceding siblings ...)
2010-12-15 1:23 ` [PATCH 6/7] xfs: factor common write setup code Dave Chinner
@ 2010-12-15 1:23 ` Dave Chinner
5 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-12-15 1:23 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
When two concurrent unaligned, non-overlapping direct IOs are issued
to the same block, the direct Io layer will race to zero the block.
The result is that one of the concurrent IOs will overwrite data
written by the other IO with zeros. This is demonstrated by the
xfsqa test 240.
To avoid this problem, serialise all unaligned direct IOs to an
inode with a big hammer. We need a big hammer approach as we need to
serialise AIO as well, so we can't just block writes on locks.
Hence, the big hammer is calling xfs_ioend_wait() while holding out
other unaligned direct IOs from starting.
We don't bother trying to serialised aligned vs unaligned IOs as
they are overlapping IO and the result of concurrent overlapping IOs
is undefined - the result of either IO is a valid result so we let
them race. Hence we only penalise unaligned IO, which already has a
major overhead compared to aligned IO so this isn't a major problem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/linux-2.6/xfs_file.c | 29 ++++++++++++++++++++++++-----
1 files changed, 24 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_file.c b/fs/xfs/linux-2.6/xfs_file.c
index 269da12..68f88a5 100644
--- a/fs/xfs/linux-2.6/xfs_file.c
+++ b/fs/xfs/linux-2.6/xfs_file.c
@@ -648,10 +648,21 @@ xfs_file_aio_write_checks(
* xfs_file_dio_aio_write - handle direct IO writes
*
* Lock the inode appropriately to prepare for and issue a direct IO write.
- * By spearating it from the buffered write path we remove all the tricky to
+ * By separating it from the buffered write path we remove all the tricky to
* follow locking changes and looping. This also clearly indicates that XFS
* does not fall back to buffered IO in the direct IO write path.
*
+ * In most cases the direct IO writes will be done with IOLOCK_SHARED allowing
+ * them to be done in parallel with reads and other direct IO writes. However,
+ * if the IO is not aligned to filesystem blocks, the direct IO layer needs to
+ * do sub-block zeroing and that requires serialisation against other direct
+ * IOs to the same block. In this case we need to serialise the submission of
+ * the unaligned IOs so that we don't get racing block zeroing in the dio layer.
+ * To avoid the problem with aio, we also need to wait for outstanding IOs to
+ * complete so that unwritten extent conversion is completed before we try to
+ * map the overlapping block. This is currently implemented by hitting it
+ * with a big hammer (i.e. xfs_ioend_wait()).
+ *
* Returns with locks held indicated by @need_i_mutex and @iolock.
*/
STATIC ssize_t
@@ -671,6 +682,7 @@ xfs_file_dio_aio_write(
struct xfs_mount *mp = ip->i_mount;
ssize_t error = 0;
size_t count = ocount;
+ int unaligned_io = 0;
xfs_buftarg_t *target = XFS_IS_REALTIME_INODE(ip) ?
mp->m_rtdev_targp : mp->m_ddev_targp;
@@ -679,10 +691,15 @@ xfs_file_dio_aio_write(
if ((pos & target->bt_smask) || (count & target->bt_smask))
return XFS_ERROR(-EINVAL);
- if (mapping->nrpages || pos > ip->i_size) {
+ if ((pos & mp->m_blockmask) || ((pos + count) & mp->m_blockmask))
+ unaligned_io = 1;
+
+ if (unaligned_io || mapping->nrpages || pos > ip->i_size) {
*iolock = XFS_IOLOCK_EXCL;
*need_i_mutex = 1;
mutex_lock(&inode->i_mutex);
+ if (unaligned_io)
+ xfs_ioend_wait(ip);
} else {
*iolock = XFS_IOLOCK_SHARED;
}
@@ -700,10 +717,12 @@ xfs_file_dio_aio_write(
}
if (*need_i_mutex) {
- /* demote the lock now the cached pages are gone */
- xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
+ /* demote the lock now the cached pages are gone if we can */
+ if (!unaligned_io) {
+ xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
+ *iolock = XFS_IOLOCK_SHARED;
+ }
mutex_unlock(&inode->i_mutex);
- *iolock = XFS_IOLOCK_SHARED;
*need_i_mutex = 0;
}
--
1.7.2.3
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 15+ messages in thread