From: Jan Kara <jack@suse.cz>
To: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] fsync: wait for data writeout completion before calling ->fsync
Date: Thu, 3 Sep 2009 12:47:04 +0200 [thread overview]
Message-ID: <20090903104704.GD5060@duck.novell.com> (raw)
In-Reply-To: <20090902221838.GA5311@lst.de>
On Thu 03-09-09 00:18:38, Christoph Hellwig wrote:
> I think we should add this one ontop:
Agreed. Added to the series. I'll now push the whole series to linux-next
via my fs tree.
Honza
>
> --
> Subject: [PATCH] fsync: wait for data writeout completion before calling ->fsync
> From: Christoph Hellwig <hch@lst.de>
>
> Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out
> the data, the calls into ->fsync to write out the metadata and then finally
> calls filemap_fdatawait to wait for the data I/O to complete. What sounds
> like a clever micro-optimization actually is nast trap for many filesystems.
>
> For many modern filesystems i_size or other inode information is only
> updated on I/O completion and we need to wait for I/O to finish before
> we can write out the metadata. For old fashionen filesystems that
> instanciate blocks during the actual write and also update the metadata
> at that point it opens up a large window were we could expose uninitialized
> blocks after a crash. While a few filesystems that need it already wait
> for the I/O to finish inside their ->fsync methods it is rather suboptimal
> as it is done under the i_mutex and also always for the whole file instead
> of just a part as we could do for O_SYNC handling.
>
> Here is a small audit of all fsync instances in the tree:
>
> - spufs_mfc_fsync:
> - ps3flash_fsync:
> - vol_cdev_fsync:
> - printer_fsync:
> - fb_deferred_io_fsync:
> - bad_file_fsync:
> - simple_sync_file:
>
> don't care - filesystems/drivers do't use the page cache or are
> purely in-memory.
>
> - simple_fsync:
> - file_fsync:
> - affs_file_fsync:
> - fat_file_fsync:
> - jfs_fsync:
> - ubifs_fsync:
> - reiserfs_dir_fsync:
> - reiserfs_sync_file:
>
> never touch pagecache themselves. We need to wait before if we do
> not want to expose stale data after an allocation.
>
> - afs_fsync:
> - fuse_fsync_common:
>
> do the waiting writeback itself in awkward ways, would benefit from
> proper semantics
>
> - block_fsync:
>
> Does a filemap_write_and_wait on the block device inode. Because we
> now have f_mapping that is the same inode we call it on in vfs_fsync.
> So just removing it and letting the VFS do the work in one go would
> be an improvement.
>
> - btrfs_sync_file:
> - cifs_fsync:
> - xfs_file_fsync:
>
> need the wait first and currently do it themselves. would benefit from
> doing it outside i_mutex.
>
> - coda_fsync:
> - ecryptfs_fsync:
> - exofs_file_fsync:
> - shm_fsync:
>
> only passes the fsync through to the lower layer
>
> - ext3_sync_file:
>
> doesn't seem to care, comments are confusing.
>
> - ext4_sync_file:
>
> would need the wait to work correctly for delalloc mode with late
> i_size updates. Otherwise the ext3 comment applies.
>
>
> currently implemens it's own writeback and wait in an odd way,
> could benefit from doing it properly.
>
> - gfs2_fsync:
>
> not needed for journaled data mode, but probably harmless there.
> Currently writes back data asynchronously itself. Needs some
> major audit.
>
> - hostfs_fsync:
>
> just calls fsync/datasync on the host FD. Without the wait before
> data might not even be inflight yet if we're unlucky.
>
> - hpfs_file_fsync:
> - ncp_fsync:
>
> no-ops. Dangerous before and after.
>
> - jffs2_fsync:
>
> just calls jffs2_flush_wbuf_gc, not sure how this relates to data.
>
> - nfs_fsync_dir:
>
> just increments stats, claims all directory operations are synchronous
>
> - nfs_file_fsync:
>
> only writes out data??? Looks very odd.
>
> - nilfs_sync_file:
>
> looks like it expects all data done, but not sure from the code
>
> - ntfs_dir_fsync:
> - ntfs_file_fsync:
>
> appear to do their own data writeback. Very convoluted code.
>
> - ocfs2_sync_file:
>
> does it's own data writeback, but no wait. probably needs the wait.
>
> - smb_fsync:
>
> according to a comment expects all pages written already, probably needs
> the wait before.
>
>
> This patch only changes vfs_fsync_range, removal of the wait in the methods
> that have it is left to the filesystem maintainers. Note that most
> filesystems really do need an audit for their fsync methods given the
> gems found in this very brief audit.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: linux-2.6/fs/sync.c
> ===================================================================
> --- linux-2.6.orig/fs/sync.c 2009-09-02 15:03:41.073271287 -0300
> +++ linux-2.6/fs/sync.c 2009-09-02 15:04:34.401269249 -0300
> @@ -216,7 +216,7 @@ int vfs_fsync_range(struct file *file, s
> goto out;
> }
>
> - ret = filemap_fdatawrite_range(mapping, start, end);
> + ret = filemap_write_and_wait_range(mapping, start, end);
>
> /*
> * We need to protect against concurrent writers, which could cause
> @@ -228,9 +228,6 @@ int vfs_fsync_range(struct file *file, s
> ret = err;
> mutex_unlock(&mapping->host->i_mutex);
>
> - err = filemap_fdatawait_range(mapping, start, end);
> - if (!ret)
> - ret = err;
> out:
> return ret;
> }
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2009-09-03 10:47 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-09-02 13:59 [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Jan Kara
2009-09-02 13:59 ` [PATCH 01/16] vfs: Introduce filemap_fdatawait_range Jan Kara
2009-09-02 13:59 ` [PATCH 02/16] vfs: Export __generic_file_aio_write() and add some comments Jan Kara
2009-09-02 13:59 ` [PATCH 03/16] vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write() Jan Kara
2009-09-02 13:59 ` [PATCH 04/16] pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock Jan Kara
2009-09-02 13:59 ` [PATCH 05/16] ocfs2: " Jan Kara
2009-09-02 13:59 ` [PATCH 06/16] vfs: Rename generic_file_aio_write_nolock Jan Kara
2009-09-02 21:47 ` Christoph Hellwig
2009-09-03 10:24 ` Jan Kara
2009-09-03 15:37 ` Christoph Hellwig
2009-09-02 13:59 ` [PATCH 07/16] vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode Jan Kara
2009-09-02 13:59 ` [PATCH 08/16] ext2: Update comment about generic_osync_inode Jan Kara
2009-09-02 13:59 ` [PATCH 09/16] ext3: Remove syncing logic from ext3_file_write Jan Kara
2009-09-02 13:59 ` [PATCH 10/16] ext4: Remove syncing logic from ext4_file_write Jan Kara
2009-09-02 13:59 ` [PATCH 11/16] ntfs: Use new syncing helpers and update comments Jan Kara
2009-09-02 13:59 ` [PATCH 12/16] ocfs2: Update syncing after splicing to match generic version Jan Kara
2009-09-02 13:59 ` [PATCH 13/16] xfs: Convert sync_page_range() to simple filemap_write_and_wait_range() Jan Kara
2009-09-02 13:59 ` [PATCH 14/16] pohmelfs: Use new syncing helper Jan Kara
2009-09-02 13:59 ` [PATCH 15/16] fat: Opencode sync_page_range_nolock() Jan Kara
2009-09-02 13:59 ` [PATCH 16/16] vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() Jan Kara
2009-09-02 14:16 ` [PATCH 0/16] Make O_SYNC handling use standard syncing path (version 4) Christoph Hellwig
2009-09-02 22:18 ` [PATCH] fsync: wait for data writeout completion before calling ->fsync Christoph Hellwig
2009-09-02 22:37 ` Joel Becker
2009-09-03 10:47 ` Jan Kara [this message]
2009-09-03 15:39 ` Christoph Hellwig
2009-09-10 20:25 ` [PATCH 18/16] implement posix O_SYNC and O_DSYNC semantics Christoph Hellwig
2009-09-10 20:38 ` Trond Myklebust
2009-09-10 20:40 ` Christoph Hellwig
2009-09-10 20:43 ` Trond Myklebust
2009-09-10 20:44 ` Christoph Hellwig
2009-09-10 23:07 ` Andreas Dilger
2009-09-10 23:18 ` Christoph Hellwig
2009-09-11 19:16 ` [PATCHv2 " Christoph Hellwig
2009-09-14 16:54 ` Jan Kara
2009-09-14 17:02 ` Christoph Hellwig
2009-09-15 13:12 ` [PATCH] " Christoph Hellwig
2009-09-15 14:10 ` Jan Kara
2009-09-15 14:50 ` Ulrich Drepper
2009-09-17 17:16 ` Christoph Hellwig
2009-09-17 21:03 ` Kyle McMartin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090903104704.GD5060@duck.novell.com \
--to=jack@suse.cz \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox