From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Cc: Christoph Hellwig <hch@infradead.org>
Subject: [PATCH 8/9] xfs: log changed inodes instead of writing them synchronously
Date: Tue, 9 Feb 2010 14:56:41 +1100 [thread overview]
Message-ID: <1265687802-23043-9-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1265687802-23043-1-git-send-email-david@fromorbit.com>
From: Christoph Hellwig <hch@infradead.org>
When an inode has already be flushed delayed write,
xfs_inode_clean() returns true and hence xfs_fs_write_inode() can
return on a synchronous inode write without having written the
inode. Currently these sycnhronous writes only come sync(1),
unmount, a sycnhronous NFS export and cachefiles so should be
relatively rare and out of common performance paths.
Realistically, a synchronous inode write is not necessary here; we
can avoid writing the inode by logging any non-transactional changes
that are pending. This needs to be done with synchronous
transactions, but it avoids seeking between the log and inode
clusters as we do now. We don't force the log if the inode is
pinned, though, so this differs from the fsync case. For normal
sys_sync and unmount behaviour this is fine because we do a
synchronous log force in xfs_sync_data which is called from the
->sync_fs code.
It does however break the NFS synchronous export guarantees for now,
but work is under way to fix this at a higher level or for the
higher level to provide an additional flag in the writeback control
to tell us that a log force is needed.
Portions of this patch are based on work from Dave Chinner.
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
fs/xfs/linux-2.6/xfs_super.c | 111 +++++++++++++++++++++++++++++++-----------
1 files changed, 82 insertions(+), 29 deletions(-)
diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 3b5b46b..25ea240 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1021,12 +1021,45 @@ xfs_fs_dirty_inode(
XFS_I(inode)->i_update_core = 1;
}
-/*
- * Attempt to flush the inode, this will actually fail
- * if the inode is pinned, but we dirty the inode again
- * at the point when it is unpinned after a log write,
- * since this is when the inode itself becomes flushable.
- */
+STATIC int
+xfs_log_inode(
+ struct xfs_inode *ip)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp;
+ int error;
+
+ xfs_iunlock(ip, XFS_ILOCK_SHARED);
+ tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
+ error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0);
+
+ if (error) {
+ xfs_trans_cancel(tp, 0);
+ /* we need to return with the lock hold shared */
+ xfs_ilock(ip, XFS_ILOCK_SHARED);
+ return error;
+ }
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+ /*
+ * Note - it's possible that we might have pushed ourselves out of the
+ * way during trans_reserve which would flush the inode. But there's
+ * no guarantee that the inode buffer has actually gone out yet (it's
+ * delwri). Plus the buffer could be pinned anyway if it's part of
+ * an inode in another recent transaction. So we play it safe and
+ * fire off the transaction anyway.
+ */
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+ xfs_trans_ihold(tp, ip);
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ xfs_trans_set_sync(tp);
+ error = xfs_trans_commit(tp, 0);
+ xfs_ilock_demote(ip, XFS_ILOCK_EXCL);
+
+ return error;
+}
+
STATIC int
xfs_fs_write_inode(
struct inode *inode,
@@ -1034,7 +1067,7 @@ xfs_fs_write_inode(
{
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
- int error = 0;
+ int error = EAGAIN;
xfs_itrace_entry(ip);
@@ -1045,35 +1078,55 @@ xfs_fs_write_inode(
error = xfs_wait_on_pages(ip, 0, -1);
if (error)
goto out;
- }
-
- /*
- * Bypass inodes which have already been cleaned by
- * the inode flush clustering code inside xfs_iflush
- */
- if (xfs_inode_clean(ip))
- goto out;
- /*
- * We make this non-blocking if the inode is contended, return
- * EAGAIN to indicate to the caller that they did not succeed.
- * This prevents the flush path from blocking on inodes inside
- * another operation right now, they get caught later by xfs_sync.
- */
- if (sync) {
+ /*
+ * Make sure the inode has hit stable storage. By using the
+ * log and the fsync transactions we reduce the IOs we have
+ * to do here from two (log and inode) to just the log.
+ *
+ * Note: We still need to do a delwri write of the inode after
+ * this to flush it to the backing buffer so that bulkstat
+ * works properly if this is the first time the inode has been
+ * written. Because we hold the ilock atomically over the
+ * transaction commit and the inode flush we are guaranteed
+ * that the inode is not pinned when it returns. If the flush
+ * lock is already held, then the inode has already been
+ * flushed once and we don't need to flush it again. Hence
+ * the code will only flush the inode if it isn't already
+ * being flushed.
+ */
xfs_ilock(ip, XFS_ILOCK_SHARED);
- xfs_iflock(ip);
-
- error = xfs_iflush(ip, SYNC_WAIT);
+ if (ip->i_update_core) {
+ error = xfs_log_inode(ip);
+ if (error)
+ goto out_unlock;
+ }
} else {
- error = EAGAIN;
+ /*
+ * We make this non-blocking if the inode is contended, return
+ * EAGAIN to indicate to the caller that they did not succeed.
+ * This prevents the flush path from blocking on inodes inside
+ * another operation right now, they get caught later by xfs_sync.
+ */
if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED))
goto out;
- if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip))
- goto out_unlock;
+ }
+
+ if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip))
+ goto out_unlock;
- error = xfs_iflush(ip, 0);
+ /*
+ * Now we have the flush lock and the inode is not pinned, we can check
+ * if the inode is really clean as we know that there are no pending
+ * transaction completions, it is not waiting on the delayed write
+ * queue and there is no IO in progress.
+ */
+ if (xfs_inode_clean(ip)) {
+ xfs_ifunlock(ip);
+ error = 0;
+ goto out_unlock;
}
+ error = xfs_iflush(ip, 0);
out_unlock:
xfs_iunlock(ip, XFS_ILOCK_SHARED);
--
1.6.5
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2010-02-09 3:55 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-02-09 3:56 [PATCH 0/9] Delayed write metadata writeback V5 Dave Chinner
2010-02-09 3:56 ` [PATCH 1/9] xfs: Make inode reclaim states explicit Dave Chinner
2010-02-09 3:56 ` [PATCH 2/9] xfs: Use delayed write for inodes rather than async V2 Dave Chinner
2010-02-09 3:56 ` [PATCH 3/9] xfs: Don't issue buffer IO direct from AIL push V2 Dave Chinner
2010-02-09 3:56 ` [PATCH 4/9] xfs: Sort delayed write buffers before dispatch Dave Chinner
2010-02-09 3:56 ` [PATCH 5/9] xfs: Use delay write promotion for dquot flushing Dave Chinner
2010-02-09 3:56 ` [PATCH 6/9] xfs: kill the unused XFS_QMOPT_* flush flags V2 Dave Chinner
2010-02-09 3:56 ` [PATCH 7/9] xfs: remove invalid barrier optimization from xfs_fsync Dave Chinner
2010-02-09 3:56 ` Dave Chinner [this message]
2010-02-09 3:56 ` [PATCH 9/9] xfs: kill xfs_bawrite Dave Chinner
2010-02-09 19:10 ` [PATCH 0/9] Delayed write metadata writeback V5 Alex Elder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1265687802-23043-9-git-send-email-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox