From: Chris Mason <mason@suse.com>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-fsdevel@vger.kernel.org,
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Subject: Re: [PATCH] add -o flush for fat
Date: Mon, 7 Aug 2006 16:23:55 -0400 [thread overview]
Message-ID: <20060807202355.GF26133@watt.suse.com> (raw)
In-Reply-To: <20060805121243.1e976639.akpm@osdl.org>
On Sat, Aug 05, 2006 at 12:12:43PM -0700, Andrew Morton wrote:
[ thanks for the comments everyone ]
> Consequently the semantics which we require of the library functions which
> you've added should be:
>
> /*
> * On return from this function all dirty data has been cleaned - it is either
> * on disk or is in flight in the I/O queues.
> */
>
> These semantics are exactly provided by
> do_sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE).
> I'd suggest that do_sync_file_range() could be used to sync the file's
> data in this application, if only for reasons of clarity and precision.
do_sync_file_range is for files. I don't have files for the directory
inodes.
>
> That leaves the file metadata, the inode itself and the superblock, for
> which we'd ideally likely same semantics, but stronger sync semantics would
> of course meet our requirement.
>
> metadata: we don't have an
> SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE implementation of
> fsync_buffers_list() so unless we add such, we need to use
> fsync_buffers_list()'s
> (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER)
> semantics. I'd suggest a call to sync_mapping_buffers() directly.
Sadly fat doesn't use mark_buffer_dirty_inode. We can change it to do
so of course.
>
> The inode: write_inode_now(wait=0) will I think be appropriate here.
The fun part of write_inode_now is the part where __sync_single_inode
does this:
int wait = wbc->sync_mode == WB_SYNC_ALL;
...
if (wait) {
int err = filemap_fdatawait(mapping);
...
I forget this every time and had to sysrq-t my way back to this
discovery again today. This may or may not be the desired result of
write_inode_now, but it is not quite right for -o flush.
>
> superblock: it's possible that the proposed three-step scheme will leave
> the filesystem's superblock unwritten. Need to think about that.
>
> So I don't think any additional helper functions are needed. The above
> three are all exported to filesystems.
>
>
> To test all this I'd suggest that you verify that `-o flush' takes
> /rpoc/meminfo:Dirty reliably down to zero on a quiet system for various
> operations.
>
My test:
cp -a /usr/share/doc/pages fat_fs/orig
cd fat_fs ; cp -a orig copy
sleep 2
reset
Then diff the directories after things come back.
>
> If we're not adding new VFS functions then fatfs can generally hack around
> with these functions until it does what we want it to do. If we _do_ add
> new VFS functions then their exact data integrity behaviour should be
> carefully designed and documented, please. Otherwise we'll lose data or
> make people's machines slower than they need to be - we have a track record
> of screwing this stuff up.
>
> There was quite some duplication of code within the fat patch you sent - it
> looked like a little helper function was needed.
New patch below, it is slightly different. I'm still hoping we can
avoid the timer to cover in flight ios ignored by filemap_flush, but it
may come to that later on.
>
> I expect the `-o flush' behaviour will be useful on other filesystem types.
> Did you consider moving it into the generic mount options (MS_FLUSH?)
> rather than making it fat-specific?
I did, but this is one of those fuzzy patches. It's not actually a good
idea, but it's the best compromise I can find between what the users
wanted and what I was willing to ship.
If other filesystems what this bad idea, we can certainly add it, but I
wanted to start small...
>
> Sorry to complicate a little patch, but I'd prefer to get this right
> going-in.
>
> iirc, Hirofumi-san had a patch which did the same thing as this one 6-odd
> months ago, but it seemed to die off.
>
The ones I found 6-odd months ago didn't cover metadata correctly.
His stuff may have improved, sorry if I missed it.
From: Chris Mason <mason@suse.com>
Subject: add -o flush for fat
Fat is commonly used on removable media. Mounting with -o flush tells the
FS to write things to disk as quickly as possible. It is like -o sync, but
much faster (and not as safe).
Signed-off-by: Chris Mason <mason@suse.com>
---
diff -r 8a50d9b6ce6f fs/fat/file.c
--- a/fs/fat/file.c Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/fat/file.c Mon Aug 07 15:45:42 2006 -0400
@@ -13,6 +13,7 @@
#include <linux/smp_lock.h>
#include <linux/buffer_head.h>
#include <linux/writeback.h>
+#include <linux/blkdev.h>
int fat_generic_ioctl(struct inode *inode, struct file *filp,
unsigned int cmd, unsigned long arg)
@@ -112,6 +113,16 @@ int fat_generic_ioctl(struct inode *inod
}
}
+static int fat_file_release(struct inode *inode, struct file *filp)
+{
+ if ((filp->f_mode & FMODE_WRITE) &&
+ MSDOS_SB(inode->i_sb)->options.flush) {
+ fat_flush_inodes(inode->i_sb, inode, NULL);
+ blk_congestion_wait(WRITE, HZ/10);
+ }
+ return 0;
+}
+
const struct file_operations fat_file_operations = {
.llseek = generic_file_llseek,
.read = do_sync_read,
@@ -121,6 +132,7 @@ const struct file_operations fat_file_op
.aio_read = generic_file_aio_read,
.aio_write = generic_file_aio_write,
.mmap = generic_file_mmap,
+ .release = fat_file_release,
.ioctl = fat_generic_ioctl,
.fsync = file_fsync,
.sendfile = generic_file_sendfile,
@@ -289,6 +301,7 @@ void fat_truncate(struct inode *inode)
lock_kernel();
fat_free(inode, nr_clusters);
unlock_kernel();
+ fat_flush_inodes(inode->i_sb, inode, NULL);
}
struct inode_operations fat_file_inode_operations = {
diff -r 8a50d9b6ce6f fs/fat/inode.c
--- a/fs/fat/inode.c Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/fat/inode.c Mon Aug 07 16:19:59 2006 -0400
@@ -24,6 +24,7 @@
#include <linux/vfs.h>
#include <linux/parser.h>
#include <linux/uio.h>
+#include <linux/writeback.h>
#include <asm/unaligned.h>
#ifndef CONFIG_FAT_DEFAULT_IOCHARSET
@@ -861,7 +862,7 @@ enum {
Opt_charset, Opt_shortname_lower, Opt_shortname_win95,
Opt_shortname_winnt, Opt_shortname_mixed, Opt_utf8_no, Opt_utf8_yes,
Opt_uni_xl_no, Opt_uni_xl_yes, Opt_nonumtail_no, Opt_nonumtail_yes,
- Opt_obsolate, Opt_err,
+ Opt_obsolate, Opt_flush, Opt_err,
};
static match_table_t fat_tokens = {
@@ -893,7 +894,8 @@ static match_table_t fat_tokens = {
{Opt_obsolate, "cvf_format=%20s"},
{Opt_obsolate, "cvf_options=%100s"},
{Opt_obsolate, "posix"},
- {Opt_err, NULL}
+ {Opt_flush, "flush"},
+ {Opt_err, NULL},
};
static match_table_t msdos_tokens = {
{Opt_nodots, "nodots"},
@@ -1034,6 +1036,9 @@ static int parse_options(char *options,
return 0;
opts->codepage = option;
break;
+ case Opt_flush:
+ opts->flush = 1;
+ break;
/* msdos specific */
case Opt_dots:
@@ -1435,6 +1440,56 @@ out_fail:
EXPORT_SYMBOL_GPL(fat_fill_super);
+/*
+ * helper function for fat_flush_inodes. This writes both the inode
+ * and the file data blocks, waiting for in flight data blocks before
+ * the start of the call. It does not wait for any io started
+ * during the call
+ */
+static int writeback_inode(struct inode *inode)
+{
+
+ int ret;
+ struct address_space *mapping = inode->i_mapping;
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_NONE,
+ .nr_to_write = 0,
+ };
+ /* if we used WB_SYNC_ALL, sync_inode waits for the io for the
+ * inode to finish. So WB_SYNC_NONE is sent down to sync_inode
+ * and filemap_fdatawrite is used for the data blocks
+ */
+ ret = sync_inode(inode, &wbc);
+ if (!ret)
+ ret = filemap_fdatawrite(mapping);
+ return ret;
+}
+
+/*
+ * write data and metadata corresponding to i1 and i2. The io is
+ * started but we do not wait for any of it to finish.
+ *
+ * filemap_flush is used for the block device, so if there is a dirty
+ * page for a block already in flight, we will not wait and start the
+ * io over again
+ */
+int fat_flush_inodes(struct super_block *sb, struct inode *i1, struct inode *i2)
+{
+ int ret = 0;
+ if (!MSDOS_SB(sb)->options.flush)
+ return 0;
+ if (i1)
+ ret = writeback_inode(i1);
+ if (!ret && i2)
+ ret = writeback_inode(i2);
+ if (!ret && sb) {
+ struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
+ ret = filemap_flush(mapping);
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(fat_flush_inodes);
+
static int __init init_fat_fs(void)
{
int err;
diff -r 8a50d9b6ce6f fs/msdos/namei.c
--- a/fs/msdos/namei.c Fri Aug 04 14:02:26 2006 -0400
+++ b/fs/msdos/namei.c Mon Aug 07 14:35:17 2006 -0400
@@ -280,7 +280,7 @@ static int msdos_create(struct inode *di
struct nameidata *nd)
{
struct super_block *sb = dir->i_sb;
- struct inode *inode;
+ struct inode *inode = NULL;
struct fat_slot_info sinfo;
struct timespec ts;
unsigned char msdos_name[MSDOS_NAME];
@@ -316,6 +316,8 @@ static int msdos_create(struct inode *di
d_instantiate(dentry, inode);
out:
unlock_kernel();
+ if (!err)
+ err = fat_flush_inodes(sb, dir, inode);
return err;
}
@@ -348,6 +350,8 @@ static int msdos_rmdir(struct inode *dir
fat_detach(inode);
out:
unlock_kernel();
+ if (!err)
+ err = fat_flush_inodes(inode->i_sb, dir, inode);
return err;
}
@@ -401,6 +405,7 @@ static int msdos_mkdir(struct inode *dir
d_instantiate(dentry, inode);
unlock_kernel();
+ fat_flush_inodes(sb, dir, inode);
return 0;
out_free:
@@ -430,6 +435,8 @@ static int msdos_unlink(struct inode *di
fat_detach(inode);
out:
unlock_kernel();
+ if (!err)
+ err = fat_flush_inodes(inode->i_sb, dir, inode);
return err;
}
@@ -635,6 +642,8 @@ static int msdos_rename(struct inode *ol
new_dir, new_msdos_name, new_dentry, is_hid);
out:
unlock_kernel();
+ if (!err)
+ err = fat_flush_inodes(old_dir->i_sb, old_dir, new_dir);
return err;
}
diff -r 8a50d9b6ce6f include/linux/msdos_fs.h
--- a/include/linux/msdos_fs.h Fri Aug 04 14:02:26 2006 -0400
+++ b/include/linux/msdos_fs.h Mon Aug 07 14:35:17 2006 -0400
@@ -204,6 +204,7 @@ struct fat_mount_options {
unicode_xlate:1, /* create escape sequences for unhandled Unicode */
numtail:1, /* Does first alias have a numeric '~1' type tail? */
atari:1, /* Use Atari GEMDOS variation of MS-DOS fs */
+ flush:1, /* write things quickly */
nocase:1; /* Does this need case conversion? 0=need case conversion*/
};
@@ -412,6 +413,8 @@ extern int fat_fill_super(struct super_b
extern int fat_fill_super(struct super_block *sb, void *data, int silent,
struct inode_operations *fs_dir_inode_ops, int isvfat);
+extern int fat_flush_inodes(struct super_block *sb, struct inode *i1,
+ struct inode *i2);
/* fat/misc.c */
extern void fat_fs_panic(struct super_block *s, const char *fmt, ...);
extern void fat_clusters_flush(struct super_block *sb);
next prev parent reply other threads:[~2006-08-07 20:24 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-08-04 19:27 [PATCH] add -o flush for fat Chris Mason
2006-08-05 1:31 ` Andrew Morton
2006-08-05 12:26 ` Chris Mason
2006-08-05 19:12 ` Andrew Morton
2006-08-05 20:54 ` OGAWA Hirofumi
2006-08-07 20:23 ` Chris Mason [this message]
2006-08-07 23:58 ` Andrew Morton
2006-08-08 8:05 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060807202355.GF26133@watt.suse.com \
--to=mason@suse.com \
--cc=akpm@osdl.org \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.