* [046/104] xfs: dont serialise direct IO reads on page cache checks
[not found] <20111207161246.GA10995@kroah.com>
@ 2011-12-07 16:11 ` Greg KH
2011-12-07 16:11 ` [047/104] xfs: avoid direct I/O write vs buffered I/O race Greg KH
` (4 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2011-12-07 16:11 UTC (permalink / raw)
To: linux-kernel, stable, greg
Cc: Alex Elder, xfs, bpm, Dave Chinner, akpm, torvalds, alan
3.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner <dchinner@redhat.com>
commit 0c38a2512df272b14ef4238b476a2e4f70da1479 upstream.
There is no need to grab the i_mutex of the IO lock in exclusive
mode if we don't need to invalidate the page cache. Taking these
locks on every direct IO effective serialises them as taking the IO
lock in exclusive mode has to wait for all shared holders to drop
the lock. That only happens when IO is complete, so effective it
prevents dispatch of concurrent direct IO reads to the same inode.
Fix this by taking the IO lock shared to check the page cache state,
and only then drop it and take the IO lock exclusively if there is
work to be done. Hence for the normal direct IO case, no exclusive
locking will occur.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Joern Engel <joern@logfs.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Cc: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/xfs/xfs_file.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -317,7 +317,19 @@ xfs_file_aio_read(
if (XFS_FORCED_SHUTDOWN(mp))
return -EIO;
- if (unlikely(ioflags & IO_ISDIRECT)) {
+ /*
+ * Locking is a bit tricky here. If we take an exclusive lock
+ * for direct IO, we effectively serialise all new concurrent
+ * read IO to this file and block it behind IO that is currently in
+ * progress because IO in progress holds the IO lock shared. We only
+ * need to hold the lock exclusive to blow away the page cache, so
+ * only take lock exclusively if the page cache needs invalidation.
+ * This allows the normal direct IO case of no page cache pages to
+ * proceeed concurrently without serialisation.
+ */
+ xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
+ if ((ioflags & IO_ISDIRECT) && inode->i_mapping->nrpages) {
+ xfs_rw_iunlock(ip, XFS_IOLOCK_SHARED);
xfs_rw_ilock(ip, XFS_IOLOCK_EXCL);
if (inode->i_mapping->nrpages) {
@@ -330,8 +342,7 @@ xfs_file_aio_read(
}
}
xfs_rw_ilock_demote(ip, XFS_IOLOCK_EXCL);
- } else
- xfs_rw_ilock(ip, XFS_IOLOCK_SHARED);
+ }
trace_xfs_file_read(ip, size, iocb->ki_pos, ioflags);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread* [047/104] xfs: avoid direct I/O write vs buffered I/O race
[not found] <20111207161246.GA10995@kroah.com>
2011-12-07 16:11 ` [046/104] xfs: dont serialise direct IO reads on page cache checks Greg KH
@ 2011-12-07 16:11 ` Greg KH
2011-12-07 16:11 ` [048/104] xfs: Return -EIO when xfs_vn_getattr() failed Greg KH
` (3 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2011-12-07 16:11 UTC (permalink / raw)
To: linux-kernel, stable, greg
Cc: xfs, Christoph Hellwig, bpm, Alex Elder, akpm, torvalds,
Christoph Hellwig, alan
3.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig <hch@infradead.org>
commit c58cb165bd44de8aaee9755a144136ae743be116 upstream.
Currently a buffered reader or writer can add pages to the pagecache
while we are waiting for the iolock in xfs_file_dio_aio_write. Prevent
this by re-checking mapping->nrpages after we got the iolock, and if
nessecary upgrade the lock to exclusive mode. To simplify this a bit
only take the ilock inside of xfs_file_aio_write_checks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Cc: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/xfs/xfs_file.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -677,6 +677,7 @@ xfs_file_aio_write_checks(
xfs_fsize_t new_size;
int error = 0;
+ xfs_rw_ilock(ip, XFS_ILOCK_EXCL);
error = generic_write_checks(file, pos, count, S_ISBLK(inode->i_mode));
if (error) {
xfs_rw_iunlock(ip, XFS_ILOCK_EXCL | *iolock);
@@ -768,14 +769,24 @@ xfs_file_dio_aio_write(
*iolock = XFS_IOLOCK_EXCL;
else
*iolock = XFS_IOLOCK_SHARED;
- xfs_rw_ilock(ip, XFS_ILOCK_EXCL | *iolock);
+ xfs_rw_ilock(ip, *iolock);
ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
if (ret)
return ret;
+ /*
+ * Recheck if there are cached pages that need invalidate after we got
+ * the iolock to protect against other threads adding new pages while
+ * we were waiting for the iolock.
+ */
+ if (mapping->nrpages && *iolock == XFS_IOLOCK_SHARED) {
+ xfs_rw_iunlock(ip, *iolock);
+ *iolock = XFS_IOLOCK_EXCL;
+ xfs_rw_ilock(ip, *iolock);
+ }
+
if (mapping->nrpages) {
- WARN_ON(*iolock != XFS_IOLOCK_EXCL);
ret = -xfs_flushinval_pages(ip, (pos & PAGE_CACHE_MASK), -1,
FI_REMAPF_LOCKED);
if (ret)
@@ -820,7 +831,7 @@ xfs_file_buffered_aio_write(
size_t count = ocount;
*iolock = XFS_IOLOCK_EXCL;
- xfs_rw_ilock(ip, XFS_ILOCK_EXCL | *iolock);
+ xfs_rw_ilock(ip, *iolock);
ret = xfs_file_aio_write_checks(file, &pos, &count, iolock);
if (ret)
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread* [048/104] xfs: Return -EIO when xfs_vn_getattr() failed
[not found] <20111207161246.GA10995@kroah.com>
2011-12-07 16:11 ` [046/104] xfs: dont serialise direct IO reads on page cache checks Greg KH
2011-12-07 16:11 ` [047/104] xfs: avoid direct I/O write vs buffered I/O race Greg KH
@ 2011-12-07 16:11 ` Greg KH
2011-12-07 16:11 ` [049/104] xfs: fix buffer flushing during unmount Greg KH
` (2 subsequent siblings)
5 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2011-12-07 16:11 UTC (permalink / raw)
To: linux-kernel, stable, greg
Cc: Mitsuo Hayasaka, xfs, bpm, Alex Elder, akpm, torvalds, alan
3.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
commit ed32201e65e15f3e6955cb84cbb544b08f81e5a5 upstream.
An attribute of inode can be fetched via xfs_vn_getattr() in XFS.
Currently it returns EIO, not negative value, when it failed. As a
result, the system call returns not negative value even though an
error occured. The stat(2), ls and mv commands cannot handle this
error and do not work correctly.
This patch fixes this bug, and returns -EIO, not EIO when an error
is detected in xfs_vn_getattr().
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Cc: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/xfs/xfs_iops.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -465,7 +465,7 @@ xfs_vn_getattr(
trace_xfs_getattr(ip);
if (XFS_FORCED_SHUTDOWN(mp))
- return XFS_ERROR(EIO);
+ return -XFS_ERROR(EIO);
stat->size = XFS_ISIZE(ip);
stat->dev = inode->i_sb->s_dev;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread* [049/104] xfs: fix buffer flushing during unmount
[not found] <20111207161246.GA10995@kroah.com>
` (2 preceding siblings ...)
2011-12-07 16:11 ` [048/104] xfs: Return -EIO when xfs_vn_getattr() failed Greg KH
@ 2011-12-07 16:11 ` Greg KH
2011-12-07 16:11 ` [050/104] xfs: Fix possible memory corruption in xfs_readlink Greg KH
2011-12-07 16:11 ` [051/104] xfs: use doalloc flag in xfs_qm_dqattach_one() Greg KH
5 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2011-12-07 16:11 UTC (permalink / raw)
To: linux-kernel, stable, greg
Cc: xfs, Christoph Hellwig, bpm, Alex Elder, akpm, torvalds,
Christoph Hellwig, alan
3.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Hellwig <hch@infradead.org>
commit 87c7bec7fc3377b3873eb3a0f4b603981ea16ebb upstream.
The code to flush buffers in the umount code is a bit iffy: we first
flush all delwri buffers out, but then might be able to queue up a
new one when logging the sb counts. On a normal shutdown that one
would get flushed out when doing the synchronous superblock write in
xfs_unmountfs_writesb, but we skip that one if the filesystem has
been shut down.
Fix this by moving the delwri list flushing until just before unmounting
the log, and while we're at it also remove the superflous delwri list
and buffer lru flusing for the rt and log device that can never have
cached or delwri buffers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Tested-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Cc: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/xfs/xfs_buf.h | 1 -
fs/xfs/xfs_mount.c | 29 ++++++++++-------------------
2 files changed, 10 insertions(+), 20 deletions(-)
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -320,7 +320,6 @@ extern struct list_head *xfs_get_buftarg
#define xfs_getsize_buftarg(buftarg) block_size((buftarg)->bt_bdev)
#define xfs_readonly_buftarg(buftarg) bdev_read_only((buftarg)->bt_bdev)
-#define xfs_binval(buftarg) xfs_flush_buftarg(buftarg, 1)
#define XFS_bflush(buftarg) xfs_flush_buftarg(buftarg, 1)
#endif /* __XFS_BUF_H__ */
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -44,9 +44,6 @@
#include "xfs_trace.h"
-STATIC void xfs_unmountfs_wait(xfs_mount_t *);
-
-
#ifdef HAVE_PERCPU_SB
STATIC void xfs_icsb_balance_counter(xfs_mount_t *, xfs_sb_field_t,
int);
@@ -1496,11 +1493,6 @@ xfs_unmountfs(
*/
xfs_log_force(mp, XFS_LOG_SYNC);
- xfs_binval(mp->m_ddev_targp);
- if (mp->m_rtdev_targp) {
- xfs_binval(mp->m_rtdev_targp);
- }
-
/*
* Unreserve any blocks we have so that when we unmount we don't account
* the reserved free space as used. This is really only necessary for
@@ -1526,7 +1518,16 @@ xfs_unmountfs(
xfs_warn(mp, "Unable to update superblock counters. "
"Freespace may not be correct on next mount.");
xfs_unmountfs_writesb(mp);
- xfs_unmountfs_wait(mp); /* wait for async bufs */
+
+ /*
+ * Make sure all buffers have been flushed and completed before
+ * unmounting the log.
+ */
+ error = xfs_flush_buftarg(mp->m_ddev_targp, 1);
+ if (error)
+ xfs_warn(mp, "%d busy buffers during unmount.", error);
+ xfs_wait_buftarg(mp->m_ddev_targp);
+
xfs_log_unmount_write(mp);
xfs_log_unmount(mp);
xfs_uuid_unmount(mp);
@@ -1537,16 +1538,6 @@ xfs_unmountfs(
xfs_free_perag(mp);
}
-STATIC void
-xfs_unmountfs_wait(xfs_mount_t *mp)
-{
- if (mp->m_logdev_targp != mp->m_ddev_targp)
- xfs_wait_buftarg(mp->m_logdev_targp);
- if (mp->m_rtdev_targp)
- xfs_wait_buftarg(mp->m_rtdev_targp);
- xfs_wait_buftarg(mp->m_ddev_targp);
-}
-
int
xfs_fs_writable(xfs_mount_t *mp)
{
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread* [050/104] xfs: Fix possible memory corruption in xfs_readlink
[not found] <20111207161246.GA10995@kroah.com>
` (3 preceding siblings ...)
2011-12-07 16:11 ` [049/104] xfs: fix buffer flushing during unmount Greg KH
@ 2011-12-07 16:11 ` Greg KH
2011-12-07 16:11 ` [051/104] xfs: use doalloc flag in xfs_qm_dqattach_one() Greg KH
5 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2011-12-07 16:11 UTC (permalink / raw)
To: linux-kernel, stable, greg
Cc: Carlos Maiolino, xfs, bpm, Alex Elder, akpm, torvalds, alan
3.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carlos Maiolino <cmaiolino@redhat.com>
commit b52a360b2aa1c59ba9970fb0f52bbb093fcc7a24 upstream.
Fixes a possible memory corruption when the link is larger than
MAXPATHLEN and XFS_DEBUG is not enabled. This also remove the
S_ISLNK assert, since the inode mode is checked previously in
xfs_readlink_by_handle() and via VFS.
Updated to address concerns raised by Ben Hutchings about the loose
attention paid to 32- vs 64-bit values, and the lack of handling a
potentially negative pathlen value:
- Changed type of "pathlen" to be xfs_fsize_t, to match that of
ip->i_d.di_size
- Added checking for a negative pathlen to the too-long pathlen
test, and generalized the message that gets reported in that case
to reflect the change
As a result, if a negative pathlen were encountered, this function
would return EFSCORRUPTED (and would fail an assertion for a debug
build)--just as would a too-long pathlen.
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/xfs/xfs_vnodeops.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -113,7 +113,7 @@ xfs_readlink(
char *link)
{
xfs_mount_t *mp = ip->i_mount;
- int pathlen;
+ xfs_fsize_t pathlen;
int error = 0;
trace_xfs_readlink(ip);
@@ -123,13 +123,19 @@ xfs_readlink(
xfs_ilock(ip, XFS_ILOCK_SHARED);
- ASSERT(S_ISLNK(ip->i_d.di_mode));
- ASSERT(ip->i_d.di_size <= MAXPATHLEN);
-
pathlen = ip->i_d.di_size;
if (!pathlen)
goto out;
+ if (pathlen < 0 || pathlen > MAXPATHLEN) {
+ xfs_alert(mp, "%s: inode (%llu) bad symlink length (%lld)",
+ __func__, (unsigned long long) ip->i_ino,
+ (long long) pathlen);
+ ASSERT(0);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+
if (ip->i_df.if_flags & XFS_IFINLINE) {
memcpy(link, ip->i_df.if_u1.if_data, pathlen);
link[pathlen] = '\0';
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread* [051/104] xfs: use doalloc flag in xfs_qm_dqattach_one()
[not found] <20111207161246.GA10995@kroah.com>
` (4 preceding siblings ...)
2011-12-07 16:11 ` [050/104] xfs: Fix possible memory corruption in xfs_readlink Greg KH
@ 2011-12-07 16:11 ` Greg KH
5 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2011-12-07 16:11 UTC (permalink / raw)
To: linux-kernel, stable, greg
Cc: Mitsuo Hayasaka, xfs, Christoph Hellwig, bpm, Alex Elder, akpm,
torvalds, alan
3.1-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
commit db3e74b582915d66e10b0c73a62763418f54c340 upstream.
The doalloc arg in xfs_qm_dqattach_one() is a flag that indicates
whether a new area to handle quota information will be allocated
if needed. Originally, it was passed to xfs_qm_dqget(), but has
been removed by the following commit (probably by mistake):
commit 8e9b6e7fa4544ea8a0e030c8987b918509c8ff47
Author: Christoph Hellwig <hch@lst.de>
Date: Sun Feb 8 21:51:42 2009 +0100
xfs: remove the unused XFS_QMOPT_DQLOCK flag
As the result, xfs_qm_dqget() called from xfs_qm_dqattach_one()
never allocates the new area even if it is needed.
This patch gives the doalloc arg to xfs_qm_dqget() in
xfs_qm_dqattach_one() to fix this problem.
Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
fs/xfs/xfs_qm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -674,7 +674,8 @@ xfs_qm_dqattach_one(
* disk and we didn't ask it to allocate;
* ESRCH if quotas got turned off suddenly.
*/
- error = xfs_qm_dqget(ip->i_mount, ip, id, type, XFS_QMOPT_DOWARN, &dqp);
+ error = xfs_qm_dqget(ip->i_mount, ip, id, type,
+ doalloc | XFS_QMOPT_DOWARN, &dqp);
if (error)
return error;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread