linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <mason@suse.com>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-fsdevel@vger.kernel.org,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Subject: Re: [PATCH] add -o flush for fat
Date: Mon, 7 Aug 2006 16:23:55 -0400	[thread overview]
Message-ID: <20060807202355.GF26133@watt.suse.com> (raw)
In-Reply-To: <20060805121243.1e976639.akpm@osdl.org>

On Sat, Aug 05, 2006 at 12:12:43PM -0700, Andrew Morton wrote:

[ thanks for the comments everyone ]

> Consequently the semantics which we require of the library functions which
> you've added should be:
> 
> /*
>  * On return from this function all dirty data has been cleaned - it is either
>  * on disk or is in flight in the I/O queues.
>  */
> 
> These semantics are exactly provided by
> do_sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE).
> I'd suggest that do_sync_file_range() could be used to sync the file's
> data in this application, if only for reasons of clarity and precision.

do_sync_file_range is for files.  I don't have files for the directory
inodes.

> 
> That leaves the file metadata, the inode itself and the superblock, for
> which we'd ideally likely same semantics, but stronger sync semantics would
> of course meet our requirement.
> 
> metadata: we don't have an
> SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE implementation of
> fsync_buffers_list() so unless we add such, we need to use
> fsync_buffers_list()'s
> (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER)
> semantics.  I'd suggest a call to sync_mapping_buffers() directly.

Sadly fat doesn't use mark_buffer_dirty_inode.  We can change it to do
so of course.

> 
> The inode: write_inode_now(wait=0) will I think be appropriate here.

The fun part of write_inode_now is the part where __sync_single_inode
does this:

	int wait = wbc->sync_mode == WB_SYNC_ALL;
	...
	if (wait) {
		int err = filemap_fdatawait(mapping);
	...

I forget this every time and had to sysrq-t my way back to this
discovery again today.  This may or may not be the desired result of
write_inode_now, but it is not quite right for -o flush.

> 
> superblock: it's possible that the proposed three-step scheme will leave
> the filesystem's superblock unwritten.  Need to think about that.
> 
> So I don't think any additional helper functions are needed.  The above
> three are all exported to filesystems.
> 
> 
> To test all this I'd suggest that you verify that `-o flush' takes
> /rpoc/meminfo:Dirty reliably down to zero on a quiet system for various
> operations.
> 
My test:

cp -a /usr/share/doc/pages fat_fs/orig
cd fat_fs ; cp -a orig copy
sleep 2
reset

Then diff the directories after things come back.

> 
> If we're not adding new VFS functions then fatfs can generally hack around
> with these functions until it does what we want it to do.  If we _do_ add
> new VFS functions then their exact data integrity behaviour should be
> carefully designed and documented, please.  Otherwise we'll lose data or
> make people's machines slower than they need to be - we have a track record
> of screwing this stuff up.
> 
> There was quite some duplication of code within the fat patch you sent - it
> looked like a little helper function was needed.

New patch below, it is slightly different.  I'm still hoping we can
avoid the timer to cover in flight ios ignored by filemap_flush, but it
may come to that later on.

> 
> I expect the `-o flush' behaviour will be useful on other filesystem types.
> Did you consider moving it into the generic mount options (MS_FLUSH?)
> rather than making it fat-specific?

I did, but this is one of those fuzzy patches.  It's not actually a good
idea, but it's the best compromise I can find between what the users
wanted and what I was willing to ship.

If other filesystems what this bad idea, we can certainly add it, but I
wanted to start small...

> 
> Sorry to complicate a little patch, but I'd prefer to get this right
> going-in.
> 
> iirc, Hirofumi-san had a patch which did the same thing as this one 6-odd
> months ago, but it seemed to die off.
> 

The ones I found 6-odd months ago didn't cover metadata correctly.
His stuff may have improved, sorry if I missed it.

From: Chris Mason <mason@suse.com>
Subject: add -o flush for fat

Fat is commonly used on removable media. Mounting with -o flush tells the
FS to write things to disk as quickly as possible.  It is like -o sync, but
much faster (and not as safe).

Signed-off-by: Chris Mason <mason@suse.com>

---
diff -r 8a50d9b6ce6f fs/fat/file.c
--- a/fs/fat/file.c	Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/fat/file.c	Mon Aug 07 15:45:42 2006 -0400
@@ -13,6 +13,7 @@
 #include <linux/smp_lock.h>
 #include <linux/buffer_head.h>
 #include <linux/writeback.h>
+#include <linux/blkdev.h>
 
 int fat_generic_ioctl(struct inode *inode, struct file *filp,
 		      unsigned int cmd, unsigned long arg)
@@ -112,6 +113,16 @@ int fat_generic_ioctl(struct inode *inod
 	}
 }
 
+static int fat_file_release(struct inode *inode, struct file *filp)
+{
+	if ((filp->f_mode & FMODE_WRITE) &&
+	     MSDOS_SB(inode->i_sb)->options.flush) {
+		fat_flush_inodes(inode->i_sb, inode, NULL);
+		blk_congestion_wait(WRITE, HZ/10);
+	}
+	return 0;
+}
+
 const struct file_operations fat_file_operations = {
 	.llseek		= generic_file_llseek,
 	.read		= do_sync_read,
@@ -121,6 +132,7 @@ const struct file_operations fat_file_op
 	.aio_read	= generic_file_aio_read,
 	.aio_write	= generic_file_aio_write,
 	.mmap		= generic_file_mmap,
+	.release	= fat_file_release,
 	.ioctl		= fat_generic_ioctl,
 	.fsync		= file_fsync,
 	.sendfile	= generic_file_sendfile,
@@ -289,6 +301,7 @@ void fat_truncate(struct inode *inode)
 	lock_kernel();
 	fat_free(inode, nr_clusters);
 	unlock_kernel();
+	fat_flush_inodes(inode->i_sb, inode, NULL);
 }
 
 struct inode_operations fat_file_inode_operations = {
diff -r 8a50d9b6ce6f fs/fat/inode.c
--- a/fs/fat/inode.c	Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/fat/inode.c	Mon Aug 07 16:19:59 2006 -0400
@@ -24,6 +24,7 @@
 #include <linux/vfs.h>
 #include <linux/parser.h>
 #include <linux/uio.h>
+#include <linux/writeback.h>
 #include <asm/unaligned.h>
 
 #ifndef CONFIG_FAT_DEFAULT_IOCHARSET
@@ -861,7 +862,7 @@ enum {
 	Opt_charset, Opt_shortname_lower, Opt_shortname_win95,
 	Opt_shortname_winnt, Opt_shortname_mixed, Opt_utf8_no, Opt_utf8_yes,
 	Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes,
-	Opt_obsolate, Opt_err,
+	Opt_obsolate, Opt_flush, Opt_err,
 };
 
 static match_table_t fat_tokens = {
@@ -893,7 +894,8 @@ static match_table_t fat_tokens = {
 	{Opt_obsolate, "cvf_format=%20s"},
 	{Opt_obsolate, "cvf_options=%100s"},
 	{Opt_obsolate, "posix"},
-	{Opt_err, NULL}
+	{Opt_flush, "flush"},
+	{Opt_err, NULL},
 };
 static match_table_t msdos_tokens = {
 	{Opt_nodots, "nodots"},
@@ -1034,6 +1036,9 @@ static int parse_options(char *options, 
 				return 0;
 			opts->codepage = option;
 			break;
+		case Opt_flush:
+			opts->flush = 1;
+			break;
 
 		/* msdos specific */
 		case Opt_dots:
@@ -1435,6 +1440,56 @@ out_fail:
 
 EXPORT_SYMBOL_GPL(fat_fill_super);
 
+/*
+ * helper function for fat_flush_inodes.  This writes both the inode
+ * and the file data blocks, waiting for in flight data blocks before
+ * the start of the call.  It does not wait for any io started
+ * during the call
+ */
+static int writeback_inode(struct inode *inode)
+{
+
+	int ret;
+	struct address_space *mapping = inode->i_mapping;
+	struct writeback_control wbc = {
+	       .sync_mode = WB_SYNC_NONE,
+	      .nr_to_write = 0,
+	};
+	/* if we used WB_SYNC_ALL, sync_inode waits for the io for the
+	* inode to finish.  So WB_SYNC_NONE is sent down to sync_inode
+	* and filemap_fdatawrite is used for the data blocks
+	*/
+	ret = sync_inode(inode, &wbc);
+	if (!ret)
+	       ret = filemap_fdatawrite(mapping);
+	return ret;
+}
+
+/*
+ * write data and metadata corresponding to i1 and i2.  The io is
+ * started but we do not wait for any of it to finish.
+ *
+ * filemap_flush is used for the block device, so if there is a dirty
+ * page for a block already in flight, we will not wait and start the
+ * io over again
+ */
+int fat_flush_inodes(struct super_block *sb, struct inode *i1, struct inode *i2)
+{
+	int ret = 0;
+	if (!MSDOS_SB(sb)->options.flush)
+		return 0;
+	if (i1)
+		ret = writeback_inode(i1);
+	if (!ret && i2)
+		ret = writeback_inode(i2);
+	if (!ret && sb) {
+		struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+		ret = filemap_flush(mapping);
+	}
+	return ret;
+}
+EXPORT_SYMBOL_GPL(fat_flush_inodes);
+
 static int __init init_fat_fs(void)
 {
 	int err;
diff -r 8a50d9b6ce6f fs/msdos/namei.c
--- a/fs/msdos/namei.c	Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/msdos/namei.c	Mon Aug 07 14:35:17 2006 -0400
@@ -280,7 +280,7 @@ static int msdos_create(struct inode *di
 			struct nameidata *nd)
 {
 	struct super_block *sb = dir->i_sb;
-	struct inode *inode;
+	struct inode *inode = NULL;
 	struct fat_slot_info sinfo;
 	struct timespec ts;
 	unsigned char msdos_name[MSDOS_NAME];
@@ -316,6 +316,8 @@ static int msdos_create(struct inode *di
 	d_instantiate(dentry, inode);
 out:
 	unlock_kernel();
+	if (!err)
+		err = fat_flush_inodes(sb, dir, inode);
 	return err;
 }
 
@@ -348,6 +350,8 @@ static int msdos_rmdir(struct inode *dir
 	fat_detach(inode);
 out:
 	unlock_kernel();
+	if (!err)
+		err = fat_flush_inodes(inode->i_sb, dir, inode);
 
 	return err;
 }
@@ -401,6 +405,7 @@ static int msdos_mkdir(struct inode *dir
 	d_instantiate(dentry, inode);
 
 	unlock_kernel();
+	fat_flush_inodes(sb, dir, inode);
 	return 0;
 
 out_free:
@@ -430,6 +435,8 @@ static int msdos_unlink(struct inode *di
 	fat_detach(inode);
 out:
 	unlock_kernel();
+	if (!err)
+		err = fat_flush_inodes(inode->i_sb, dir, inode);
 
 	return err;
 }
@@ -635,6 +642,8 @@ static int msdos_rename(struct inode *ol
 			      new_dir, new_msdos_name, new_dentry, is_hid);
 out:
 	unlock_kernel();
+	if (!err)
+		err = fat_flush_inodes(old_dir->i_sb, old_dir, new_dir);
 	return err;
 }
 
diff -r 8a50d9b6ce6f include/linux/msdos_fs.h
--- a/include/linux/msdos_fs.h	Fri Aug 04 14:02:26 2006 -0400
+++ b/include/linux/msdos_fs.h	Mon Aug 07 14:35:17 2006 -0400
@@ -204,6 +204,7 @@ struct fat_mount_options {
 		 unicode_xlate:1, /* create escape sequences for unhandled Unicode */
 		 numtail:1,       /* Does first alias have a numeric '~1' type tail? */
 		 atari:1,         /* Use Atari GEMDOS variation of MS-DOS fs */
+		 flush:1,	  /* write things quickly */
 		 nocase:1;	  /* Does this need case conversion? 0=need case conversion*/
 };
 
@@ -412,6 +413,8 @@ extern int fat_fill_super(struct super_b
 extern int fat_fill_super(struct super_block *sb, void *data, int silent,
 			struct inode_operations *fs_dir_inode_ops, int isvfat);
 
+extern int fat_flush_inodes(struct super_block *sb, struct inode *i1,
+		            struct inode *i2);
 /* fat/misc.c */
 extern void fat_fs_panic(struct super_block *s, const char *fmt, ...);
 extern void fat_clusters_flush(struct super_block *sb);

  parent reply	other threads:[~2006-08-07 20:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-08-04 19:27 [PATCH] add -o flush for fat Chris Mason
2006-08-05  1:31 ` Andrew Morton
2006-08-05 12:26   ` Chris Mason
2006-08-05 19:12     ` Andrew Morton
2006-08-05 20:54       ` OGAWA Hirofumi
2006-08-07 20:23       ` Chris Mason [this message]
2006-08-07 23:58         ` Andrew Morton
2006-08-08  8:05           ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060807202355.GF26133@watt.suse.com \
    --to=mason@suse.com \
    --cc=akpm@osdl.org \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).