* [PATCH 1/3] block: Add block_flush_device()
[not found] ` <1234768181.32677.7.camel@sebastian.kern.oss.ntt.co.jp>
@ 2009-02-16 7:25 ` Fernando Luis Vázquez Cao
2009-02-16 7:29 ` [2/3] ext3: call block_flush_device() on fsync Fernando Luis Vázquez Cao
2009-02-16 7:31 ` [PATCH 3/3] ext4: " Fernando Luis Vázquez Cao
2 siblings, 0 replies; 4+ messages in thread
From: Fernando Luis Vázquez Cao @ 2009-02-16 7:25 UTC (permalink / raw)
To: Theodore Tso
Cc: Jan Kara, Alan Cox, Pavel Machek, kernel list, Jens Axboe,
sandeen, fernando, rwheeler, linux-fsdevel
This patch adds a helper function that should be used by filesystems that need
to flush the underlying block device on fsync()/fdatasync().
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.29-rc4-orig/fs/buffer.c linux-2.6.29-rc4/fs/buffer.c
--- linux-2.6.29-rc4-orig/fs/buffer.c 2009-02-16 14:45:11.000000000 +0900
+++ linux-2.6.29-rc4/fs/buffer.c 2009-02-16 14:53:26.000000000 +0900
@@ -165,6 +165,17 @@ void end_buffer_write_sync(struct buffer
put_bh(bh);
}
+/* Issue flush of write caches on the block device */
+int block_flush_device(struct super_block *sb)
+{
+ int ret = 0;
+
+ ret = blkdev_issue_flush(sb->s_bdev, NULL);
+
+ return (ret == -EOPNOTSUPP) ? 0 : ret;
+}
+EXPORT_SYMBOL(block_flush_device);
+
/*
* Write out and wait upon all the dirty data associated with a block
* device via its mapping. Does not take the superblock lock.
diff -urNp linux-2.6.29-rc4-orig/include/linux/buffer_head.h linux-2.6.29-rc4/include/linux/buffer_head.h
--- linux-2.6.29-rc4-orig/include/linux/buffer_head.h 2009-02-16 14:45:12.000000000 +0900
+++ linux-2.6.29-rc4/include/linux/buffer_head.h 2009-02-16 14:48:28.000000000 +0900
@@ -238,6 +238,7 @@ int nobh_write_end(struct file *, struct
int nobh_truncate_page(struct address_space *, loff_t, get_block_t *);
int nobh_writepage(struct page *page, get_block_t *get_block,
struct writeback_control *wbc);
+int block_flush_device(struct super_block *sb);
void buffer_init(void);
^ permalink raw reply [flat|nested] 4+ messages in thread
* [2/3] ext3: call block_flush_device() on fsync
[not found] ` <1234768181.32677.7.camel@sebastian.kern.oss.ntt.co.jp>
2009-02-16 7:25 ` [PATCH 1/3] block: Add block_flush_device() Fernando Luis Vázquez Cao
@ 2009-02-16 7:29 ` Fernando Luis Vázquez Cao
2009-02-16 7:31 ` [PATCH 3/3] ext4: " Fernando Luis Vázquez Cao
2 siblings, 0 replies; 4+ messages in thread
From: Fernando Luis Vázquez Cao @ 2009-02-16 7:29 UTC (permalink / raw)
To: Theodore Tso
Cc: Jan Kara, Alan Cox, Pavel Machek, kernel list, Jens Axboe,
sandeen, fernando, rwheeler, linux-fsdevel
To ensure that bits are truly on-disk after an fsync or fdatasync, we
should force a disk flush explicitly when there is dirty data/metadata
and the journal didn't emit a write barrier (either because metadata is
not being synched or barriers are disabled).
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.29-rc5-orig/fs/ext3/fsync.c linux-2.6.29-rc5/fs/ext3/fsync.c
--- linux-2.6.29-rc5-orig/fs/ext3/fsync.c 2008-12-25 08:26:37.000000000 +0900
+++ linux-2.6.29-rc5/fs/ext3/fsync.c 2009-02-16 15:56:05.000000000 +0900
@@ -45,6 +45,8 @@
int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync)
{
struct inode *inode = dentry->d_inode;
+ journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
+ unsigned long i_state = inode->i_state;
int ret = 0;
J_ASSERT(ext3_journal_current_handle() == NULL);
@@ -69,23 +71,30 @@ int ext3_sync_file(struct file * file, s
*/
if (ext3_should_journal_data(inode)) {
ret = ext3_force_commit(inode->i_sb);
- goto out;
+ if (!(journal->j_flags & JFS_BARRIER))
+ block_flush_device(inode->i_sb);
+ return ret;
}
- if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
- goto out;
+ if (datasync && !(i_state & I_DIRTY_DATASYNC)) {
+ if (i_state & I_DIRTY_PAGES)
+ block_flush_device(inode->i_sb);
+ return ret;
+ }
/*
* The VFS has written the file data. If the inode is unaltered
* then we need not start a commit.
*/
- if (inode->i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) {
+ if (i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) {
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = 0, /* sys_fsync did this */
};
ret = sync_inode(inode, &wbc);
+ if (journal && !(journal->j_flags & JFS_BARRIER))
+ block_flush_device(inode->i_sb);
}
-out:
+
return ret;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 3/3] ext4: call block_flush_device() on fsync
[not found] ` <1234768181.32677.7.camel@sebastian.kern.oss.ntt.co.jp>
2009-02-16 7:25 ` [PATCH 1/3] block: Add block_flush_device() Fernando Luis Vázquez Cao
2009-02-16 7:29 ` [2/3] ext3: call block_flush_device() on fsync Fernando Luis Vázquez Cao
@ 2009-02-16 7:31 ` Fernando Luis Vázquez Cao
2 siblings, 0 replies; 4+ messages in thread
From: Fernando Luis Vázquez Cao @ 2009-02-16 7:31 UTC (permalink / raw)
To: Theodore Tso
Cc: Jan Kara, Alan Cox, Pavel Machek, kernel list, Jens Axboe,
sandeen, fernando, rwheeler, linux-fsdevel
To ensure that bits are truly on-disk after an fsync or fdatasync, we
should force a disk flush explicitly when there is dirty data/metadata
and the journal didn't emit a write barrier (either because metadata is
not being synched or barriers are disabled).
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
---
diff -urNp linux-2.6.29-rc5-orig/fs/ext4/fsync.c linux-2.6.29-rc5/fs/ext4/fsync.c
--- linux-2.6.29-rc5-orig/fs/ext4/fsync.c 2008-12-25 08:26:37.000000000 +0900
+++ linux-2.6.29-rc5/fs/ext4/fsync.c 2009-02-16 15:52:56.000000000 +0900
@@ -48,6 +48,7 @@ int ext4_sync_file(struct file *file, st
{
struct inode *inode = dentry->d_inode;
journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
+ unsigned long i_state = inode->i_state;
int ret = 0;
J_ASSERT(ext4_journal_current_handle() == NULL);
@@ -76,25 +77,30 @@ int ext4_sync_file(struct file *file, st
*/
if (ext4_should_journal_data(inode)) {
ret = ext4_force_commit(inode->i_sb);
- goto out;
+ if (!(journal->j_flags & JBD2_BARRIER))
+ block_flush_device(inode->i_sb);
+ return ret;
}
- if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
- goto out;
+ if (datasync && !(i_state & I_DIRTY_DATASYNC)) {
+ if (i_state & I_DIRTY_PAGES)
+ block_flush_device(inode->i_sb);
+ return ret;
+ }
/*
* The VFS has written the file data. If the inode is unaltered
* then we need not start a commit.
*/
- if (inode->i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) {
+ if (i_state & (I_DIRTY_SYNC|I_DIRTY_DATASYNC)) {
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = 0, /* sys_fsync did this */
};
ret = sync_inode(inode, &wbc);
- if (journal && (journal->j_flags & JBD2_BARRIER))
- blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
+ if (journal && !(journal->j_flags & JBD2_BARRIER))
+ block_flush_device(inode->i_sb);
}
-out:
+
return ret;
}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: vfs: Add MS_FLUSHONFSYNC mount flag
[not found] ` <20090215225427.GH10706@mini-me.lan>
@ 2009-02-16 7:47 ` Fernando Luis Vázquez Cao
0 siblings, 0 replies; 4+ messages in thread
From: Fernando Luis Vázquez Cao @ 2009-02-16 7:47 UTC (permalink / raw)
To: Theodore Tso
Cc: Christoph Hellwig, Jeff Garzik, Eric Sandeen, Jan Kara, Alan Cox,
Pavel Machek, kernel list, Jens Axboe, fernando, Ric Wheeler,
linux-fsdevel
On Sun, 2009-02-15 at 17:54 -0500, Theodore Tso wrote:
> On Sun, Feb 15, 2009 at 04:23:26PM +0900, Fernando Luis Vázquez Cao wrote:
> > You mentioned "we should integrate this with the barrier settings". Do
> > you imply we should make it a per-device tunable too? Should we keep the
> > barrier-related mount options some filesystems provide?
> >
>
> Making barriers to be a per-device tunable makes sense. The only
> reason why we kept it as a mount option in ext4 is for benchmarking
> purposes, and in ext3, because the filesystem predated the barrier
> code, and there was a desire to be able to benchmark with and without
> the old behavior --- and because akpm is still worried about the
> performance hit of the barrier code, so he's been resistant about
> change the default for ext3.
Ok, I'll turn both barriers and flushonfsync into a sysfs-exported
per-device knob and see how it turns out.
By the way, should we also add/keep a mount option for "benchmarking
purposes"?. I guess that once we get the per-device tunable we probable
do not need it anymore.
Regards,
Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-02-16 7:47 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1232109069.13775.35.camel@sebastian.kern.oss.ntt.co.jp>
[not found] ` <1232114101.13775.63.camel@sebastian.kern.oss.ntt.co.jp>
[not found] ` <20090116163039.GE10617@duck.suse.cz>
[not found] ` <1232185639.4831.18.camel@sebastian.kern.oss.ntt.co.jp>
[not found] ` <1232186449.4831.29.camel@sebastian.kern.oss.ntt.co.jp>
[not found] ` <20090119120349.GA10193@duck.suse.cz>
[not found] ` <1233135913.5399.57.camel@sebastian.kern.oss.ntt.co.jp>
[not found] ` <20090128095518.GA16554@duck.suse.cz>
[not found] ` <1234434811.15270.7.camel@sebastian.kern.oss.ntt.co.jp>
[not found] ` <1234435245.15433.19.camel@sebastian.kern.oss.ntt.co.jp>
[not found] ` <20090215224659.GG10706@mini-me.lan>
[not found] ` <1234768181.32677.7.camel@sebastian.kern.oss.ntt.co.jp>
2009-02-16 7:25 ` [PATCH 1/3] block: Add block_flush_device() Fernando Luis Vázquez Cao
2009-02-16 7:29 ` [2/3] ext3: call block_flush_device() on fsync Fernando Luis Vázquez Cao
2009-02-16 7:31 ` [PATCH 3/3] ext4: " Fernando Luis Vázquez Cao
[not found] ` <1234434970.15433.4.camel@sebastian.kern.oss.ntt.co.jp>
[not found] ` <499458C1.90105@redhat.com>
[not found] ` <49945C90.3010104@garzik.org>
[not found] ` <20090214153626.GA3973@infradead.org>
[not found] ` <1234682606.19783.222.camel@sebastian.kern.oss.ntt.co.jp>
[not found] ` <20090215225427.GH10706@mini-me.lan>
2009-02-16 7:47 ` vfs: Add MS_FLUSHONFSYNC mount flag Fernando Luis Vázquez Cao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).