linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Tso <tytso@MIT.EDU>
To: Chris Mason <chris.mason@oracle.com>,
	Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Eric Sandeen <sandeen@redhat.com>,
	linux-ext4@vger.kernel.org, linu
Subject: [PATCH, RFC] ext4: Fix use of write barrier in commit logic
Date: Mon, 19 May 2008 22:51:12 -0400	[thread overview]
Message-ID: <20080520025112.GN15035@mit.edu> (raw)
In-Reply-To: <20080519144654.GC15035@mit.edu>

On Mon, May 19, 2008 at 10:46:54AM -0400, Theodore Tso wrote:
> Funny thing, I was looking in this over the weekend.  It currently
> avoids barriers entirely if journal_async_commit is enabled (which is
> not the default); if enabled, it effectively forces barrier=0.  This
> is IMHO a bug.
> 
> I'm working on a patch where "barrier=1" will use a barrier before and
> after, and "barrier=1,journal_async_commit" will use a barrier after.

And here it is.... 

Comments, please.  And Eric, if you have a chance to benchmark this to
see how this patch works out, comparing "barrier=0, "barrier=1", and
"barrier=1,journal_async_commit", as we had discussed earlier on the
ext4, I'd be very much obliged.

As I mentioned earlier, adding blkdev_issue_flush() to ext[34]/fsync.c
is I believe not necessary.  We should be doing the right thing in the
commit.c file.  In the future, if we want some extra bonus points, in
the case where writes have taken place to the inode, but no metadata
operations have taken place (the common case when a database is
writing to a pre-existing tablespace), it would be nice if fsync()
could notice this case, force out the datablocks the old-fashioned way
without forcing an journal commit, and then calling
blkdev_issue_flush().  That should give us some nice performance wins
for database workloads; unfortunately it probably won't do us any good
on mailserver workloads.

Regards,

						- Ted

>From ce99d82266d003e9b2284a1f46bd6f0e3a87c066 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso@mit.edu>
Date: Mon, 19 May 2008 22:40:52 -0400
Subject: [PATCH] ext4: Fix use of write barrier in commit logic

We were previously using set_buffer_ordered() and then calling
submit_bh() to write the commit block, which resulted in this flush
behavior:

write log blocks
blkdev_issue_flush()	// flush #1
write commit block
blkdev_issue_flush()	// flush #2
write metadata blocks

And if the journal_async_commit mount option was used, the commit was
written without using set_buffer_ordered(), which resulted in no
flushes happening at all.

Change the commit code to use blkdev_issue_flush() explicitly, and to
omit flush #1 if journal_async_commit is enabled, but to always
perform the second flush if write barriers are enabled.

This change should hopefully reduce the overhead of using write
barriers when an ext4 filesystem is mounted using
"barrier=1,journal_async_commit", while still maintaining filesystem
safety.  If this works out, we should probably make this the default
mode for ext4.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/jbd2/commit.c |   44 +++++++++++++++++---------------------------
 1 files changed, 17 insertions(+), 27 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 3041d75..3fe5be5 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -22,6 +22,7 @@
 #include <linux/pagemap.h>
 #include <linux/jiffies.h>
 #include <linux/crc32.h>
+#include <linux/blkdev.h>
 
 /*
  * Default IO end handler for temporary BJ_IO buffer_heads.
@@ -111,7 +112,6 @@ static int journal_submit_commit_record(journal_t *journal,
 	struct commit_header *tmp;
 	struct buffer_head *bh;
 	int ret;
-	int barrier_done = 0;
 	struct timespec now = current_kernel_time();
 
 	if (is_journal_aborted(journal))
@@ -147,34 +147,24 @@ static int journal_submit_commit_record(journal_t *journal,
 	if (journal->j_flags & JBD2_BARRIER &&
 		!JBD2_HAS_INCOMPAT_FEATURE(journal,
 					 JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
-		set_buffer_ordered(bh);
-		barrier_done = 1;
+		ret = blkdev_issue_flush(bh->b_bdev, NULL);
+		if (ret == -EOPNOTSUPP) {
+			char b[BDEVNAME_SIZE];
+
+			printk(KERN_WARNING
+			       "JBD: barrier-based sync failed on %s - "
+			       "disabling barriers\n",
+			       bdevname(journal->j_dev, b));
+			spin_lock(&journal->j_state_lock);
+			journal->j_flags &= ~JBD2_BARRIER;
+			spin_unlock(&journal->j_state_lock);
+		}
 	}
 	ret = submit_bh(WRITE, bh);
-	if (barrier_done)
-		clear_buffer_ordered(bh);
 
-	/* is it possible for another commit to fail at roughly
-	 * the same time as this one?  If so, we don't want to
-	 * trust the barrier flag in the super, but instead want
-	 * to remember if we sent a barrier request
-	 */
-	if (ret == -EOPNOTSUPP && barrier_done) {
-		char b[BDEVNAME_SIZE];

  reply	other threads:[~2008-05-20  2:53 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-16 19:02 [PATCH 0/4] (RESEND) ext3[34] barrier changes Eric Sandeen
2008-05-16 19:05 ` [PATCH 1/4] ext3: enable barriers by default Eric Sandeen
2008-05-19  8:58   ` Pavel Machek
2008-05-16 19:07 ` [PATCH 2/4] ext3: call blkdev_issue_flush on fsync Eric Sandeen
2008-05-16 22:15   ` Jamie Lokier
2008-05-16 19:08 ` [PATCH 3/4] ext4: enable barriers by default Eric Sandeen
2008-05-16 19:09 ` [PATCH 4/4] ext4: call blkdev_issue_flush on fsync Eric Sandeen
2008-05-20  2:34   ` Theodore Tso
2008-05-20 15:43     ` Jamie Lokier
2008-05-20 15:52       ` Eric Sandeen
2008-05-20 20:14         ` Jens Axboe
2008-05-20 19:54       ` Jens Axboe
2008-05-20 22:02         ` Jamie Lokier
2008-05-21  7:30           ` Jens Axboe
2008-05-16 20:05 ` [PATCH 0/4] (RESEND) ext3[34] barrier changes Andrew Morton
2008-05-16 20:53   ` Eric Sandeen
2008-05-16 20:58     ` Andrew Morton
2008-05-16 21:45       ` Jamie Lokier
2008-05-16 22:03         ` Eric Sandeen
2008-05-16 22:09           ` Jamie Lokier
2008-05-16 22:03     ` Jamie Lokier
2008-05-16 22:21       ` Eric Sandeen
2008-05-16 22:53         ` Jamie Lokier
2008-05-17  0:20           ` Theodore Tso
2008-05-17  0:35             ` Andrew Morton
2008-05-17 13:43               ` Theodore Tso
2008-05-17 17:59                 ` Andreas Dilger
2008-05-17 20:44                 ` Theodore Tso
2008-05-20 14:45                   ` Jamie Lokier
2008-05-18  0:48               ` Chris Mason
2008-05-18  1:36                 ` Theodore Tso
2008-05-18 14:49                   ` Ric Wheeler
2008-05-20 14:42                     ` Jamie Lokier
2008-05-20 23:48                     ` Jamie Lokier
     [not found]                   ` <4830420D.4080608__28835.4277647615$1211137279$gmane$org@gmail.com>
2008-05-18 19:59                     ` Andi Kleen
2008-05-18 16:07                       ` Ric Wheeler
2008-05-20 23:44                 ` Jamie Lokier
2008-05-18 20:03         ` Andi Kleen
2008-05-19  0:43           ` Theodore Tso
2008-05-19  2:29             ` Eric Sandeen
2008-05-19  4:11               ` Andrew Morton
2008-05-19 17:16                 ` Chris Mason
2008-05-19 18:39                   ` Chris Mason
2008-05-19 22:39                     ` Jan Kara
2008-05-20  0:29                       ` Chris Mason
2008-05-20  3:29                         ` Timothy Shimmin
2008-05-20 12:04                           ` Chris Mason
2008-05-20  8:25                     ` Jens Axboe
2008-05-20 12:17                       ` Chris Mason
2008-05-21 11:22                     ` Pavel Machek
2008-05-21 12:32                       ` Theodore Tso
2008-05-21 18:03                       ` Andrew Morton
2008-05-21 18:15                         ` Eric Sandeen
2008-05-21 19:43                           ` Jamie Lokier
2008-05-21 18:29                         ` Theodore Tso
2008-05-21 18:49                           ` Andrew Morton
2008-05-21 19:42                             ` Jamie Lokier
2008-05-21 19:36                           ` Jamie Lokier
2008-05-21 19:40                             ` Chris Mason
2008-05-21 19:54                         ` Jamie Lokier
2008-05-20 14:58                   ` Jamie Lokier
2008-05-21 22:30                   ` Daniel Phillips
2008-05-20 23:35               ` Jamie Lokier
2008-05-19  0:28       ` Theodore Tso
2008-05-20 15:13         ` Jamie Lokier
2008-05-21 20:25           ` Greg Smith
2008-05-16 22:30   ` Jamie Lokier
2008-05-18 19:54   ` Andi Kleen
2008-05-19 13:26     ` Chris Mason
2008-05-19 14:46       ` Theodore Tso
2008-05-20  2:51         ` Theodore Tso [this message]
2008-05-20 15:23           ` [PATCH, RFC] ext4: Fix use of write barrier in commit logic Jamie Lokier
2008-05-23 18:33         ` [PATCH 0/4] (RESEND) ext3[34] barrier changes Ric Wheeler
2008-05-20 15:36       ` Jamie Lokier
2008-05-20 16:02         ` Chris Mason
2008-05-20 16:27           ` Jamie Lokier
2008-05-20 17:08             ` Chris Mason
2008-05-20 22:26               ` Jamie Lokier
2008-05-19  9:04   ` Pavel Machek
2008-05-29 13:36   ` Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080520025112.GN15035@mit.edu \
    --to=tytso@mit.edu \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=chris.mason@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).