* [PATCH V2] Ensure sync flushes all dirty data to disk]
@ 2007-10-09 7:34 Lachlan McIlroy
2007-10-09 8:54 ` David Chinner
0 siblings, 1 reply; 2+ messages in thread
From: Lachlan McIlroy @ 2007-10-09 7:34 UTC (permalink / raw)
To: xfs-dev, xfs-oss
[-- Attachment #1: Type: text/plain, Size: 565 bytes --]
[V2 adds a comment for dgc]
In xfs_fs_sync_super() treat a sync the same as a filesystem freeze.
This is needed to force the log to disk for inodes which are not marked
dirty in the Linux inode (the inodes are marked dirty on completion of
the log I/O) and so sync_inodes() will not flush them.
In xfs_fs_write_inode() a synchronous flush will not get an EAGAIN
from xfs_inode_flush() and if an asynchronous flush returns EAGAIN
we should pass it on to the caller. If we get an error while flushing
the inode then re-dirty it so we can try again later.
Lachlan
[-- Attachment #2: sync.diff --]
[-- Type: text/x-patch, Size: 1665 bytes --]
--- fs/xfs/linux-2.6/xfs_super.c_1.400 2007-10-03 17:17:21.000000000 +1000
+++ fs/xfs/linux-2.6/xfs_super.c 2007-10-09 17:31:36.000000000 +1000
@@ -410,13 +410,12 @@ xfs_fs_write_inode(
flags |= FLUSH_SYNC;
}
error = xfs_inode_flush(XFS_I(inode), flags);
- if (error == EAGAIN) {
- if (sync)
- error = xfs_inode_flush(XFS_I(inode),
- flags | FLUSH_LOG);
- else
- error = 0;
- }
+ /*
+ * if we failed to write out the inode then mark
+ * it dirty again so we'll try again later.
+ */
+ if (error)
+ mark_inode_dirty_sync(inode);
return -error;
}
@@ -621,7 +620,19 @@ xfs_fs_sync_super(
int error;
int flags;
- if (unlikely(sb->s_frozen == SB_FREEZE_WRITE)) {
+ /*
+ * Treat a sync operation like a freeze. This is to work
+ * around a race in sync_inodes() which works in two phases
+ * - an asynchronous flush, which can write out an inode
+ * without waiting for file size updates to complete, and a
+ * synchronous flush, which wont do anything because the
+ * async flush removed the inode's dirty flag. Also
+ * sync_inodes() will not see any files that just have
+ * outstanding transactions to be flushed because we don't
+ * dirty the Linux inode until after the transaction I/O
+ * completes.
+ */
+ if (wait || unlikely(sb->s_frozen == SB_FREEZE_WRITE)) {
/*
* First stage of freeze - no more writers will make progress
* now we are here, so we flush delwri and delalloc buffers
@@ -632,7 +643,7 @@ xfs_fs_sync_super(
*/
flags = SYNC_DATA_QUIESCE;
} else
- flags = SYNC_FSDATA | (wait ? SYNC_WAIT : 0);
+ flags = SYNC_FSDATA;
error = xfs_sync(mp, flags);
sb->s_dirt = 0;
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH V2] Ensure sync flushes all dirty data to disk]
2007-10-09 7:34 [PATCH V2] Ensure sync flushes all dirty data to disk] Lachlan McIlroy
@ 2007-10-09 8:54 ` David Chinner
0 siblings, 0 replies; 2+ messages in thread
From: David Chinner @ 2007-10-09 8:54 UTC (permalink / raw)
To: Lachlan McIlroy; +Cc: xfs-dev, xfs-oss
On Tue, Oct 09, 2007 at 05:34:04PM +1000, Lachlan McIlroy wrote:
> [V2 adds a comment for dgc]
>
> In xfs_fs_sync_super() treat a sync the same as a filesystem freeze.
> This is needed to force the log to disk for inodes which are not marked
> dirty in the Linux inode (the inodes are marked dirty on completion of
> the log I/O) and so sync_inodes() will not flush them.
>
> In xfs_fs_write_inode() a synchronous flush will not get an EAGAIN
> from xfs_inode_flush() and if an asynchronous flush returns EAGAIN
> we should pass it on to the caller. If we get an error while flushing
> the inode then re-dirty it so we can try again later.
Looks good now, Lachlan.
Cheers,
Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2007-10-09 8:54 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-09 7:34 [PATCH V2] Ensure sync flushes all dirty data to disk] Lachlan McIlroy
2007-10-09 8:54 ` David Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox