From: Chris Mason <mason@suse.com>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-fsdevel@vger.kernel.org,
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Subject: Re: [PATCH] add -o flush for fat
Date: Mon, 7 Aug 2006 16:23:55 -0400 [thread overview]
Message-ID: <20060807202355.GF26133@watt.suse.com> (raw)
In-Reply-To: <20060805121243.1e976639.akpm@osdl.org>
On Sat, Aug 05, 2006 at 12:12:43PM -0700, Andrew Morton wrote:
[ thanks for the comments everyone ]
> Consequently the semantics which we require of the library functions which
> you've added should be:
>
> /*
> * On return from this function all dirty data has been cleaned - it is either
> * on disk or is in flight in the I/O queues.
> */
>
> These semantics are exactly provided by
> do_sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE).
> I'd suggest that do_sync_file_range() could be used to sync the file's
> data in this application, if only for reasons of clarity and precision.
do_sync_file_range is for files. I don't have files for the directory
inodes.
>
> That leaves the file metadata, the inode itself and the superblock, for
> which we'd ideally likely same semantics, but stronger sync semantics would
> of course meet our requirement.
>
> metadata: we don't have an
> SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE implementation of
> fsync_buffers_list() so unless we add such, we need to use
> fsync_buffers_list()'s
> (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER)
> semantics. I'd suggest a call to sync_mapping_buffers() directly.
Sadly fat doesn't use mark_buffer_dirty_inode. We can change it to do
so of course.
>
> The inode: write_inode_now(wait=0) will I think be appropriate here.
The fun part of write_inode_now is the part where __sync_single_inode
does this:
int wait = wbc->sync_mode == WB_SYNC_ALL;
...
if (wait) {
int err = filemap_fdatawait(mapping);
...
I forget this every time and had to sysrq-t my way back to this
discovery again today. This may or may not be the desired result of
write_inode_now, but it is not quite right for -o flush.
>
> superblock: it's possible that the proposed three-step scheme will leave
> the filesystem's superblock unwritten. Need to think about that.
>
> So I don't think any additional helper functions are needed. The above
> three are all exported to filesystems.
>
>
> To test all this I'd suggest that you verify that `-o flush' takes
> /rpoc/meminfo:Dirty reliably down to zero on a quiet system for various
> operations.
>
My test:
cp -a /usr/share/doc/pages fat_fs/orig
cd fat_fs ; cp -a orig copy
sleep 2
reset
Then diff the directories after things come back.
>
> If we're not adding new VFS functions then fatfs can generally hack around
> with these functions until it does what we want it to do. If we _do_ add
> new VFS functions then their exact data integrity behaviour should be
> carefully designed and documented, please. Otherwise we'll lose data or
> make people's machines slower than they need to be - we have a track record
> of screwing this stuff up.
>
> There was quite some duplication of code within the fat patch you sent - it
> looked like a little helper function was needed.
New patch below, it is slightly different. I'm still hoping we can
avoid the timer to cover in flight ios ignored by filemap_flush, but it
may come to that later on.
>
> I expect the `-o flush' behaviour will be useful on other filesystem types.
> Did you consider moving it into the generic mount options (MS_FLUSH?)
> rather than making it fat-specific?
I did, but this is one of those fuzzy patches. It's not actually a good
idea, but it's the best compromise I can find between what the users
wanted and what I was willing to ship.
If other filesystems what this bad idea, we can certainly add it, but I
wanted to start small...
>
> Sorry to complicate a little patch, but I'd prefer to get this right
> going-in.
>
> iirc, Hirofumi-san had a patch which did the same thing as this one 6-odd
> months ago, but it seemed to die off.
>
The ones I found 6-odd months ago didn't cover metadata correctly.
His stuff may have improved, sorry if I missed it.
From: Chris Mason <mason@suse.com>
Subject: add -o flush for fat
Fat is commonly used on removable media. Mounting with -o flush tells the
FS to write things to disk as quickly as possible. It is like -o sync, but
much faster (and not as safe).
Signed-off-by: Chris Mason <mason@suse.com>
---
diff -r 8a50d9b6ce6f fs/fat/file.c
--- a/fs/fat/file.c Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/fat/file.c Mon Aug 07 15:45:42 2006 -0400
@@ -13,6 +13,7 @@
#include <linux/smp_lock.h>
#include <linux/buffer_head.h>
#include <linux/writeback.h>
+#include <linux/blkdev.h>
int fat_generic_ioctl(struct inode *inode, struct file *filp,
unsigned int cmd, unsigned long arg)
@@ -112,6 +113,16 @@ int fat_generic_ioctl(struct inode *inod
}
}
+static int fat_file_release(struct inode *inode, struct file *filp)
+{
+ if ((filp->f_mode & FMODE_WRITE) &&
+ MSDOS_SB(inode->i_sb)->options.flush) {
+ fat_flush_inodes(inode->i_sb, inode, NULL);
+ blk_congestion_wait(WRITE, HZ/10);
+ }
+ return 0;
+}
+
const struct file_operations fat_file_operations = {
.llseek = generic_file_llseek,
.read = do_sync_read,
@@ -121,6 +132,7 @@ const struct file_operations fat_file_op
.aio_read = generic_file_aio_read,
.aio_write = generic_file_aio_write,
.mmap = generic_file_mmap,
+ .release = fat_file_release,
.ioctl = fat_generic_ioctl,
.fsync = file_fsync,
.sendfile = generic_file_sendfile,
@@ -289,6 +301,7 @@ void fat_truncate(struct inode *inode)
lock_kernel();
fat_free(inode, nr_clusters);
unlock_kernel();
+ fat_flush_inodes(inode->i_sb, inode, NULL);
}
struct inode_operations fat_file_inode_operations = {
diff -r 8a50d9b6ce6f fs/fat/inode.c
--- a/fs/fat/inode.c Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/fat/inode.c Mon Aug 07 16:19:59 2006 -0400
@@ -24,6 +24,7 @@
#include <linux/vfs.h>
#include <linux/parser.h>
#include <linux/uio.h>
+#include <linux/writeback.h>
#include <asm/unaligned.h>
#ifndef CONFIG_FAT_DEFAULT_IOCHARSET
@@ -861,7 +862,7 @@ enum {
Opt_charset, Opt_shortname_lower, Opt_shortname_win95,
Opt_shortname_winnt, Opt_shortname_mixed, Opt_utf8_no, Opt_utf8_yes,
Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes,
- Opt_obsolate, Opt_err,
+ Opt_obsolate, Opt_flush, Opt_err,
};
static match_table_t fat_tokens = {
@@ -893,7 +894,8 @@ static match_table_t fat_tokens = {
{Opt_obsolate, "cvf_format=%20s"},
{Opt_obsolate, "cvf_options=%100s"},
{Opt_obsolate, "posix"},
- {Opt_err, NULL}
+ {Opt_flush, "flush"},
+ {Opt_err, NULL},
};
static match_table_t msdos_tokens = {
{Opt_nodots, "nodots"},
@@ -1034,6 +1036,9 @@ static int parse_options(char *options,
return 0;
opts->codepage = option;
break;
+ case Opt_flush:
+ opts->flush = 1;
+ break;
/* msdos specific */
case Opt_dots:
@@ -1435,6 +1440,56 @@ out_fail:
EXPORT_SYMBOL_GPL(fat_fill_super);
+/*
+ * helper function for fat_flush_inodes. This writes both the inode
+ * and the file data blocks, waiting for in flight data blocks before
+ * the start of the call. It does not wait for any io started
+ * during the call
+ */
+static int writeback_inode(struct inode *inode)
+{
+
+ int ret;
+ struct address_space *mapping = inode->i_mapping;
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_NONE,
+ .nr_to_write = 0,
+ };
+ /* if we used WB_SYNC_ALL, sync_inode waits for the io for the
+ * inode to finish. So WB_SYNC_NONE is sent down to sync_inode
+ * and filemap_fdatawrite is used for the data blocks
+ */
+ ret = sync_inode(inode, &wbc);
+ if (!ret)
+ ret = filemap_fdatawrite(mapping);
+ return ret;
+}
+
+/*
+ * write data and metadata corresponding to i1 and i2. The io is
+ * started but we do not wait for any of it to finish.
+ *
+ * filemap_flush is used for the block device, so if there is a dirty
+ * page for a block already in flight, we will not wait and start the
+ * io over again
+ */
+int fat_flush_inodes(struct super_block *sb, struct inode *i1, struct inode *i2)
+{
+ int ret = 0;
+ if (!MSDOS_SB(sb)->options.flush)
+ return 0;
+ if (i1)
+ ret = writeback_inode(i1);
+ if (!ret && i2)
+ ret = writeback_inode(i2);
+ if (!ret && sb) {
+ struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+ ret = filemap_flush(mapping);
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(fat_flush_inodes);
+
static int __init init_fat_fs(void)
{
int err;
diff -r 8a50d9b6ce6f fs/msdos/namei.c
--- a/fs/msdos/namei.c Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/msdos/namei.c Mon Aug 07 14:35:17 2006 -0400
@@ -280,7 +280,7 @@ static int msdos_create(struct inode *di
struct nameidata *nd)
{
struct super_block *sb = dir->i_sb;
- struct inode *inode;
+ struct inode *inode = NULL;
struct fat_slot_info sinfo;
struct timespec ts;
unsigned char msdos_name[MSDOS_NAME];
@@ -316,6 +316,8 @@ static int msdos_create(struct inode *di
d_instantiate(dentry, inode);
out:
unlock_kernel();
+ if (!err)
+ err = fat_flush_inodes(sb, dir, inode);
return err;
}
@@ -348,6 +350,8 @@ static int msdos_rmdir(struct inode *dir
fat_detach(inode);
out:
unlock_kernel();
+ if (!err)
+ err = fat_flush_inodes(inode->i_sb, dir, inode);
return err;
}
@@ -401,6 +405,7 @@ static int msdos_mkdir(struct inode *dir
d_instantiate(dentry, inode);
unlock_kernel();
+ fat_flush_inodes(sb, dir, inode);
return 0;
out_free:
@@ -430,6 +435,8 @@ static int msdos_unlink(struct inode *di
fat_detach(inode);
out:
unlock_kernel();
+ if (!err)
+ err = fat_flush_inodes(inode->i_sb, dir, inode);
return err;
}
@@ -635,6 +642,8 @@ static int msdos_rename(struct inode *ol
new_dir, new_msdos_name, new_dentry, is_hid);
out:
unlock_kernel();
+ if (!err)
+ err = fat_flush_inodes(old_dir->i_sb, old_dir, new_dir);
return err;
}
diff -r 8a50d9b6ce6f include/linux/msdos_fs.h
--- a/include/linux/msdos_fs.h Fri Aug 04 14:02:26 2006 -0400
+++ b/include/linux/msdos_fs.h Mon Aug 07 14:35:17 2006 -0400
@@ -204,6 +204,7 @@ struct fat_mount_options {
unicode_xlate:1, /* create escape sequences for unhandled Unicode */
numtail:1, /* Does first alias have a numeric '~1' type tail? */
atari:1, /* Use Atari GEMDOS variation of MS-DOS fs */
+ flush:1, /* write things quickly */
nocase:1; /* Does this need case conversion? 0=need case conversion*/
};
@@ -412,6 +413,8 @@ extern int fat_fill_super(struct super_b
extern int fat_fill_super(struct super_block *sb, void *data, int silent,
struct inode_operations *fs_dir_inode_ops, int isvfat);
+extern int fat_flush_inodes(struct super_block *sb, struct inode *i1,
+ struct inode *i2);
/* fat/misc.c */
extern void fat_fs_panic(struct super_block *s, const char *fmt, ...);
extern void fat_clusters_flush(struct super_block *sb);
next prev parent reply other threads:[~2006-08-07 20:24 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-04 19:27 [PATCH] add -o flush for fat Chris Mason
2006-08-05 1:31 ` Andrew Morton
2006-08-05 12:26 ` Chris Mason
2006-08-05 19:12 ` Andrew Morton
2006-08-05 20:54 ` OGAWA Hirofumi
2006-08-07 20:23 ` Chris Mason [this message]
2006-08-07 23:58 ` Andrew Morton
2006-08-08 8:05 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060807202355.GF26133@watt.suse.com \
--to=mason@suse.com \
--cc=akpm@osdl.org \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).