linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] ext3[34] barrier changes
@ 2008-05-16 14:37 Eric Sandeen
  2008-05-16 14:37 ` [PATCH 1/4] ext3: enable barriers by default Eric Sandeen
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Eric Sandeen @ 2008-05-16 14:37 UTC (permalink / raw)
  To: sandeen


A collection of patches to make ext3 & 4 use barriers by
default, and to call blkdev_issue_flush on fsync if they
are enabled.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] ext3: enable barriers by default
  2008-05-16 14:37 [PATCH 0/4] ext3[34] barrier changes Eric Sandeen
@ 2008-05-16 14:37 ` Eric Sandeen
  2008-05-16 14:37 ` [PATCH 2/4] ext3: call blkdev_issue_flush on fsync Eric Sandeen
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2008-05-16 14:37 UTC (permalink / raw)
  To: sandeen; +Cc: Eric Sandeen

I can't think of any valid reason for ext3 to not use barriers when
they are available;  I believe this is necessary for filesystem
integrity in the face of a volatile write cache on storage.

An administrator who trusts that the cache is sufficiently battery-
backed (and power supplies are sufficiently redundant, etc...)
can always turn it back off again.

SuSE has carried such a patch for quite some time now.

Also document the mount option while we're at it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Acked-by: Jan Kara <jack@suse.cz>
---
 Documentation/filesystems/ext3.txt |   12 ++++++++++--
 fs/ext3/super.c                    |   11 +++++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index b45f3c1..daab1f5 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -52,8 +52,16 @@ commit=nrsec	(*)	Ext3 can be told to sync all its data and metadata
 			Setting it to very large values will improve
 			performance.
 
-barrier=1		This enables/disables barriers.  barrier=0 disables
-			it, barrier=1 enables it.
+barrier=<0|1(*)>	This enables/disables the use of write barriers in
+			the jbd code.  barrier=0 disables, barrier=1 enables.
+			This also requires an IO stack which can support
+			barriers, and if jbd gets an error on a barrier
+			write, it will disable again with a warning.
+			Write barriers enforce proper on-disk ordering
+			of journal commits, making volatile disk write caches
+			safe to use, at some performance penalty.  If
+			your disks are battery-backed in one way or another,
+			disabling barriers may safely improve performance.
 
 orlov		(*)	This enables the new Orlov block allocator. It is
 			enabled by default.
diff --git a/fs/ext3/super.c b/fs/ext3/super.c
index fe3119a..9c30dc7 100644
--- a/fs/ext3/super.c
+++ b/fs/ext3/super.c
@@ -555,6 +555,7 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs)
 	struct super_block *sb = vfs->mnt_sb;
 	struct ext3_sb_info *sbi = EXT3_SB(sb);
 	struct ext3_super_block *es = sbi->s_es;
+	journal_t *journal = sbi->s_journal;
 	unsigned long def_mount_opts;
 
 	def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
@@ -613,8 +614,13 @@ static int ext3_show_options(struct seq_file *seq, struct vfsmount *vfs)
 		seq_printf(seq, ",commit=%u",
 			   (unsigned) (sbi->s_commit_interval / HZ));
 	}
-	if (test_opt(sb, BARRIER))
-		seq_puts(seq, ",barrier=1");
+	/*
+ 	 * jbd inherits the barrier flag from ext3, and may actually
+ 	 * turn off barriers if a write fails, so it's the real test.
+ 	 */
+	if (!test_opt(sb, BARRIER) ||
+	    (journal && !(journal->j_flags & JFS_BARRIER)))
+		seq_puts(seq, ",barrier=0");
 	if (test_opt(sb, NOBH))
 		seq_puts(seq, ",nobh");
 
@@ -1589,6 +1595,7 @@ static int ext3_fill_super (struct super_block *sb, void *data, int silent)
 	sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
 
 	set_opt(sbi->s_mount_opt, RESERVATION);
+	set_opt(sbi->s_mount_opt, BARRIER);
 
 	if (!parse_options ((char *) data, sb, &journal_inum, &journal_devnum,
 			    NULL, 0))
-- 
1.5.3.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] ext3: call blkdev_issue_flush on fsync
  2008-05-16 14:37 [PATCH 0/4] ext3[34] barrier changes Eric Sandeen
  2008-05-16 14:37 ` [PATCH 1/4] ext3: enable barriers by default Eric Sandeen
@ 2008-05-16 14:37 ` Eric Sandeen
  2008-05-16 14:37 ` [PATCH 3/4] ext4: enable barriers by default Eric Sandeen
  2008-05-16 14:37 ` [PATCH 4/4] ext4: call blkdev_issue_flush on fsync Eric Sandeen
  3 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2008-05-16 14:37 UTC (permalink / raw)
  To: sandeen; +Cc: Eric Sandeen

To ensure that bits are truly on-disk after an fsync,
ext3 should call blkdev_issue_flush if barriers are supported.

Inspired by an old thread on barriers, by reiserfs & xfs
which do the same, and by a patch SuSE ships with their kernel

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
 fs/ext3/fsync.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/ext3/fsync.c b/fs/ext3/fsync.c
index d336341..f6167ec 100644
--- a/fs/ext3/fsync.c
+++ b/fs/ext3/fsync.c
@@ -27,6 +27,7 @@
 #include <linux/sched.h>
 #include <linux/writeback.h>
 #include <linux/jbd.h>
+#include <linux/blkdev.h>
 #include <linux/ext3_fs.h>
 #include <linux/ext3_jbd.h>
 
@@ -45,6 +46,7 @@
 int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync)
 {
 	struct inode *inode = dentry->d_inode;
+	journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
 	int ret = 0;
 
 	J_ASSERT(ext3_journal_current_handle() == NULL);
@@ -85,6 +87,9 @@ int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync)
 			.nr_to_write = 0, /* sys_fsync did this */
 		};
 		ret = sync_inode(inode, &wbc);
+
+		if (journal && (journal->j_flags & JFS_BARRIER))
+			blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
 	}
 out:
 	return ret;
-- 
1.5.3.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] ext4: enable barriers by default
  2008-05-16 14:37 [PATCH 0/4] ext3[34] barrier changes Eric Sandeen
  2008-05-16 14:37 ` [PATCH 1/4] ext3: enable barriers by default Eric Sandeen
  2008-05-16 14:37 ` [PATCH 2/4] ext3: call blkdev_issue_flush on fsync Eric Sandeen
@ 2008-05-16 14:37 ` Eric Sandeen
  2008-05-16 14:37 ` [PATCH 4/4] ext4: call blkdev_issue_flush on fsync Eric Sandeen
  3 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2008-05-16 14:37 UTC (permalink / raw)
  To: sandeen; +Cc: Eric Sandeen

I can't think of any valid reason for ext4 to not use barriers when
they are available;  I believe this is necessary for filesystem
integrity in the face of a volatile write cache on storage.

An administrator who trusts that the cache is sufficiently battery-
backed (and power supplies are sufficiently redundant, etc...)
can always turn it back off again.

SuSE has carried such a patch for ext3 for quite some time now.

Also document the mount option while we're at it.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
 Documentation/filesystems/ext4.txt |   12 ++++++++++--
 fs/ext4/super.c                    |   11 +++++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 560f88d..0c5086d 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -139,8 +139,16 @@ commit=nrsec	(*)	Ext4 can be told to sync all its data and metadata
 			Setting it to very large values will improve
 			performance.
 
-barrier=1		This enables/disables barriers.  barrier=0 disables
-			it, barrier=1 enables it.
+barrier=<0|1(*)>	This enables/disables the use of write barriers in
+			the jbd code.  barrier=0 disables, barrier=1 enables.
+			This also requires an IO stack which can support
+			barriers, and if jbd gets an error on a barrier
+			write, it will disable again with a warning.
+			Write barriers enforce proper on-disk ordering
+			of journal commits, making volatile disk write caches
+			safe to use, at some performance penalty.  If
+			your disks are battery-backed in one way or another,
+			disabling barriers may safely improve performance.
 
 orlov		(*)	This enables the new Orlov block allocator. It is
 			enabled by default.
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 52dd067..77b036a 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -671,6 +671,7 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
 	unsigned long def_mount_opts;
 	struct super_block *sb = vfs->mnt_sb;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	journal_t *journal = sbi->s_journal;
 	struct ext4_super_block *es = sbi->s_es;
 
 	def_mount_opts = le32_to_cpu(es->s_default_mount_opts);
@@ -729,8 +730,13 @@ static int ext4_show_options(struct seq_file *seq, struct vfsmount *vfs)
 		seq_printf(seq, ",commit=%u",
 			   (unsigned) (sbi->s_commit_interval / HZ));
 	}
-	if (test_opt(sb, BARRIER))
-		seq_puts(seq, ",barrier=1");
+	/*
+	 * jbd2 inherits the barrier flag from ext4, and may actually
+	 * turn off barriers if a write fails, so it's the real test.
+	 */
+	if (!test_opt(sb, BARRIER) ||
+	    (journal && !(journal->j_flags & JBD2_BARRIER)))
+		seq_puts(seq, ",barrier=0");
 	if (test_opt(sb, NOBH))
 		seq_puts(seq, ",nobh");
 	if (!test_opt(sb, EXTENTS))
@@ -1890,6 +1896,7 @@ static int ext4_fill_super (struct super_block *sb, void *data, int silent)
 	sbi->s_resgid = le16_to_cpu(es->s_def_resgid);
 
 	set_opt(sbi->s_mount_opt, RESERVATION);
+	set_opt(sbi->s_mount_opt, BARRIER);
 
 	/*
 	 * turn on extents feature by default in ext4 filesystem
-- 
1.5.3.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] ext4: call blkdev_issue_flush on fsync
  2008-05-16 14:37 [PATCH 0/4] ext3[34] barrier changes Eric Sandeen
                   ` (2 preceding siblings ...)
  2008-05-16 14:37 ` [PATCH 3/4] ext4: enable barriers by default Eric Sandeen
@ 2008-05-16 14:37 ` Eric Sandeen
  3 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2008-05-16 14:37 UTC (permalink / raw)
  To: sandeen; +Cc: Eric Sandeen

To ensure that bits are truly on-disk after an fsync,
we should call blkdev_issue_flush if barriers are supported.

Inspired by an old thread on barriers, by reiserfs & xfs
which do the same, and by a patch SuSE ships with their kernel

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
 fs/ext4/fsync.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 1c8ba48..a45c373 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -27,6 +27,7 @@
 #include <linux/sched.h>
 #include <linux/writeback.h>
 #include <linux/jbd2.h>
+#include <linux/blkdev.h>
 #include "ext4.h"
 #include "ext4_jbd2.h"
 
@@ -45,6 +46,7 @@
 int ext4_sync_file(struct file * file, struct dentry *dentry, int datasync)
 {
 	struct inode *inode = dentry->d_inode;
+	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
 	int ret = 0;
 
 	J_ASSERT(ext4_journal_current_handle() == NULL);
@@ -85,6 +87,8 @@ int ext4_sync_file(struct file * file, struct dentry *dentry, int datasync)
 			.nr_to_write = 0, /* sys_fsync did this */
 		};
 		ret = sync_inode(inode, &wbc);
+		if (journal && (journal->j_flags & JBD2_BARRIER))
+			blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
 	}
 out:
 	return ret;
-- 
1.5.3.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] ext3: call blkdev_issue_flush on fsync
  2008-05-16 19:02 [PATCH 0/4] (RESEND) ext3[34] barrier changes Eric Sandeen
@ 2008-05-16 19:07 ` Eric Sandeen
  2008-05-16 22:15   ` Jamie Lokier
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2008-05-16 19:07 UTC (permalink / raw)
  To: ext4 development, linux-kernel Mailing List, linux-fsdevel
  Cc: Andrew Morton, Jamie Lokier

To ensure that bits are truly on-disk after an fsync,
ext3 should call blkdev_issue_flush if barriers are supported.

Inspired by an old thread on barriers, by reiserfs & xfs
which do the same, and by a patch SuSE ships with their kernel

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
 fs/ext3/fsync.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/fs/ext3/fsync.c b/fs/ext3/fsync.c
index d336341..f6167ec 100644
--- a/fs/ext3/fsync.c
+++ b/fs/ext3/fsync.c
@@ -27,6 +27,7 @@
 #include <linux/sched.h>
 #include <linux/writeback.h>
 #include <linux/jbd.h>
+#include <linux/blkdev.h>
 #include <linux/ext3_fs.h>
 #include <linux/ext3_jbd.h>
 
@@ -45,6 +46,7 @@
 int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync)
 {
 	struct inode *inode = dentry->d_inode;
+	journal_t *journal = EXT3_SB(inode->i_sb)->s_journal;
 	int ret = 0;
 
 	J_ASSERT(ext3_journal_current_handle() == NULL);
@@ -85,6 +87,9 @@ int ext3_sync_file(struct file * file, struct dentry *dentry, int datasync)
 			.nr_to_write = 0, /* sys_fsync did this */
 		};
 		ret = sync_inode(inode, &wbc);
+
+		if (journal && (journal->j_flags & JFS_BARRIER))
+			blkdev_issue_flush(inode->i_sb->s_bdev, NULL);
 	}
 out:
 	return ret;
-- 1.5.3.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/4] ext3: call blkdev_issue_flush on fsync
  2008-05-16 19:07 ` [PATCH 2/4] ext3: call blkdev_issue_flush on fsync Eric Sandeen
@ 2008-05-16 22:15   ` Jamie Lokier
  0 siblings, 0 replies; 7+ messages in thread
From: Jamie Lokier @ 2008-05-16 22:15 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: ext4 development, linux-kernel Mailing List, linux-fsdevel,
	Andrew Morton

Eric Sandeen wrote:
> To ensure that bits are truly on-disk after an fsync,
> ext3 should call blkdev_issue_flush if barriers are supported.
> 
> Inspired by an old thread on barriers, by reiserfs & xfs
> which do the same, and by a patch SuSE ships with their kernel

blkdev_issue_flush uses BIO_RW_BARRIER, which becomes REQ_HARDBARRIER.
REQ_HARDBARRIER does _not_ mean force the data to disk.

It is a barrier request, which when optimised for the highest
performance can return a long time before the data is physically on
the disk.  (It can even do nothing, correctly).

In many cases, block drivers implement a barrier using a hardware
flush command, so it turns out the same.  Then your patch is effective.

(It is however sending unnecessary hardware commands sometimes,
because fsync's journal write (when there is one) will also issue a
flush command, and another is not required; it may reduce performance.)

But that is not what REQ_HARDBARRIER means: block drivers are allowed
to implement barriers in other ways which are more efficient.

I'm not sure if this problem is theoretical only, or if any block
drivers actually do take advantage of this.

A good fix would be to add REQ_HARDFLUSH, with a precise meaning:
don't return until the hardware reports the data safely committed.  It
would be very little change to most (if not all) current drivers.

-- Jamie

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-05-16 22:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-16 14:37 [PATCH 0/4] ext3[34] barrier changes Eric Sandeen
2008-05-16 14:37 ` [PATCH 1/4] ext3: enable barriers by default Eric Sandeen
2008-05-16 14:37 ` [PATCH 2/4] ext3: call blkdev_issue_flush on fsync Eric Sandeen
2008-05-16 14:37 ` [PATCH 3/4] ext4: enable barriers by default Eric Sandeen
2008-05-16 14:37 ` [PATCH 4/4] ext4: call blkdev_issue_flush on fsync Eric Sandeen
  -- strict thread matches above, loose matches on Subject: below --
2008-05-16 19:02 [PATCH 0/4] (RESEND) ext3[34] barrier changes Eric Sandeen
2008-05-16 19:07 ` [PATCH 2/4] ext3: call blkdev_issue_flush on fsync Eric Sandeen
2008-05-16 22:15   ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).