From: Christoph Hellwig <hch@infradead.org>
To: xfs@oss.sgi.com
Cc: Paul Anderson <pha@umich.edu>, Sean Thomas Caron <scaron@umich.edu>
Subject: [PATCH 2/2 v2] xfs: log all dirty inodes in xfs_fs_sync_fs
Date: Tue, 20 Dec 2011 15:08:41 -0500 [thread overview]
Message-ID: <20111220200841.GA2788@infradead.org> (raw)
In-Reply-To: <20111218155015.GC17626@infradead.org>
Since Linux 2.6.36 the writeback code has introduces various measures for
live lock prevention during sync(). Unfortunately some of these are
actively harmful for the XFS model, where the inode gets marked dirty for
metadata from the data I/O handler.
The older_than_this checks that are now more strictly enforced since
writeback: avoid livelocking WB_SYNC_ALL writeback
by only calling into __writeback_inodes_sb and thus only sampling the
current cut off time once. But on a slow enough devices the previous
asynchronous sync pass might not have fully completed yet, and thus XFS
might mark metadata dirty only after that sampling of the cut off time for
the blocking pass already happened. I have not myself reproduced this
myself on a real system, but by introducing artificial delay into the
XFS I/O completion workqueues it can be reproduced easily.
Fix this by iterating over all XFS inodes in ->sync_fs and log all that
are dirty. This might log inode that only got redirtied after the
previous pass, but given how cheap delayed logging of inodes is it
isn't a major concern for performance.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/xfs_sync.c 2011-12-18 23:35:30.909834592 +0100
+++ xfs/fs/xfs/xfs_sync.c 2011-12-18 23:36:40.823167508 +0100
@@ -336,6 +336,32 @@ xfs_sync_fsdata(
return error;
}
+int
+xfs_log_dirty_inode(
+ struct xfs_inode *ip,
+ struct xfs_perag *pag,
+ int flags)
+{
+ struct xfs_mount *mp = ip->i_mount;
+ struct xfs_trans *tp;
+ int error;
+
+ if (!ip->i_update_core)
+ return 0;
+
+ tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
+ error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0);
+ if (error) {
+ xfs_trans_cancel(tp, 0);
+ return error;
+ }
+
+ xfs_ilock(ip, XFS_ILOCK_EXCL);
+ xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+ xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
+ return xfs_trans_commit(tp, 0);
+}
+
/*
* When remounting a filesystem read-only or freezing the filesystem, we have
* two phases to execute. This first phase is syncing the data before we
@@ -359,6 +385,16 @@ xfs_quiesce_data(
{
int error, error2 = 0;
+ /*
+ * Log all pending size and timestamp updates. The vfs writeback
+ * code is supposed to do this, but due to its overagressive
+ * livelock detection it will skip inodes where appending writes
+ * were written out in the first non-blocking sync phase if their
+ * completion took long enough that it happened after taking the
+ * timestamp for the cut-off in the blocking phase.
+ */
+ xfs_inode_ag_iterator(mp, xfs_log_dirty_inode, 0);
+
xfs_qm_sync(mp, SYNC_TRYLOCK);
xfs_qm_sync(mp, SYNC_WAIT);
Index: xfs/fs/xfs/xfs_sync.h
===================================================================
--- xfs.orig/fs/xfs/xfs_sync.h 2011-12-18 16:47:25.000000000 +0100
+++ xfs/fs/xfs/xfs_sync.h 2011-12-18 23:36:45.119834149 +0100
@@ -34,6 +34,8 @@ void xfs_quiesce_attr(struct xfs_mount *
void xfs_flush_inodes(struct xfs_inode *ip);
+int xfs_log_dirty_inode(struct xfs_inode *ip, struct xfs_perag *pag, int flags);
+
int xfs_reclaim_inodes(struct xfs_mount *mp, int mode);
int xfs_reclaim_inodes_count(struct xfs_mount *mp);
void xfs_reclaim_inodes_nr(struct xfs_mount *mp, int nr_to_scan);
Index: xfs/fs/xfs/xfs_super.c
===================================================================
--- xfs.orig/fs/xfs/xfs_super.c 2011-12-18 23:35:42.863167854 +0100
+++ xfs/fs/xfs/xfs_super.c 2011-12-18 23:36:42.963167495 +0100
@@ -869,27 +869,6 @@ xfs_fs_dirty_inode(
}
STATIC int
-xfs_log_inode(
- struct xfs_inode *ip)
-{
- struct xfs_mount *mp = ip->i_mount;
- struct xfs_trans *tp;
- int error;
-
- tp = xfs_trans_alloc(mp, XFS_TRANS_FSYNC_TS);
- error = xfs_trans_reserve(tp, 0, XFS_FSYNC_TS_LOG_RES(mp), 0, 0, 0);
- if (error) {
- xfs_trans_cancel(tp, 0);
- return error;
- }
-
- xfs_ilock(ip, XFS_ILOCK_EXCL);
- xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
- xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
- return xfs_trans_commit(tp, 0);
-}
-
-STATIC int
xfs_fs_write_inode(
struct inode *inode,
struct writeback_control *wbc)
@@ -902,8 +881,6 @@ xfs_fs_write_inode(
if (XFS_FORCED_SHUTDOWN(mp))
return -XFS_ERROR(EIO);
- if (!ip->i_update_core)
- return 0;
if (wbc->sync_mode == WB_SYNC_ALL || wbc->for_kupdate) {
/*
@@ -913,11 +890,14 @@ xfs_fs_write_inode(
* ->sync_fs call do that for thus, which reduces the number
* of synchronous log forces dramatically.
*/
- error = xfs_log_inode(ip);
+ error = xfs_log_dirty_inode(ip, NULL, 0);
if (error)
goto out;
return 0;
} else {
+ if (!ip->i_update_core)
+ return 0;
+
/*
* We make this non-blocking if the inode is contended, return
* EAGAIN to indicate to the caller that they did not succeed.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-12-20 20:08 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20111218154936.GA17626@infradead.org>
2011-12-18 15:49 ` [PATCH 1/1] xfs: log the inode in ->write_inode calls for kupdate Christoph Hellwig
2011-12-20 21:19 ` Dave Chinner
2011-12-23 15:55 ` Mark Tinguely
2011-12-23 17:58 ` Ben Myers
2011-12-28 21:35 ` Hans-Peter Jansen
2011-12-28 21:39 ` Christoph Hellwig
2011-12-18 15:50 ` [PATCH 2/2] xfs: log all dirty inodes in xfs_fs_sync_fs Christoph Hellwig
2011-12-18 20:03 ` Sean Thomas Caron
2011-12-18 20:09 ` Christoph Hellwig
2011-12-18 22:17 ` Dave Chinner
2011-12-18 22:32 ` Christoph Hellwig
2011-12-20 20:08 ` Christoph Hellwig [this message]
2011-12-20 21:21 ` [PATCH 2/2 v2] " Dave Chinner
2011-12-23 15:55 ` Mark Tinguely
2011-12-23 21:47 ` Ben Myers
2011-12-26 12:13 ` Dave Chinner
2011-12-29 15:42 ` Ben Myers
2011-12-29 21:44 ` Dave Chinner
2012-01-03 15:48 ` Mark Tinguely
2011-12-21 17:40 ` sync fixes Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111220200841.GA2788@infradead.org \
--to=hch@infradead.org \
--cc=pha@umich.edu \
--cc=scaron@umich.edu \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox