linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Monakhov <dmonakhov@gmail.com>
To: jaxboe@fusionio.com
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Dmitry Monakhov <dmonakhov@gmail.com>
Subject: [PATCH 2/2] writeback: skip useless data integrity tricks for sync_filesystem
Date: Fri,  8 Oct 2010 19:00:27 +0400	[thread overview]
Message-ID: <1286550027-9684-2-git-send-email-dmonakhov@gmail.com> (raw)
In-Reply-To: <1286550027-9684-1-git-send-email-dmonakhov@gmail.com>

If we about to write many inodes at the time, even in for
data integrity sync, it is reasonable to skip data integrity
logic for each inode, but perform all necessary steps at the end.

The frozen sync() issue:
If we try to call sync() then other process dirties inodes in
parallels we end up with writing inodes in sync mode, which
usually result in io_barriers spam. Which result in almost 100 times
performance degradation.

___sync_task____			____writer_task____
sync_filesystem() {
  __sync_filesystem(sb, 0)
                                while(num--) {mark_inode_dirty(inode++);}
  __sync_filesystem(sb, 1)
     ->sb->s_opt->sync_fs()
 }
But in case of sync_fs we do know that final ->sync_fs() is responsible
for data integrity guaranties.

Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>
---
 fs/ext3/inode.c           |    3 ++-
 fs/ext4/inode.c           |    4 +++-
 fs/fs-writeback.c         |    3 +++
 fs/reiserfs/inode.c       |    3 ++-
 include/linux/writeback.h |    1 +
 5 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index 5e0faf4..8d90ada 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -3153,7 +3153,8 @@ int ext3_write_inode(struct inode *inode, struct writeback_control *wbc)
 
 	if (wbc->sync_mode != WB_SYNC_ALL)
 		return 0;
-
+	if (wbc->for_sync_fs)
+		return 0;
 	return ext3_force_commit(inode->i_sb);
 }
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4b8debe..8d627e8 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5416,7 +5416,9 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
 
 		if (wbc->sync_mode != WB_SYNC_ALL)
 			return 0;
-
+		/* Caller is responsible to call ->sync_fs() after writeback */
+		if (wbc->for_sync_fs)
+			return 0;
 		err = ext4_force_commit(inode->i_sb);
 	} else {
 		struct ext4_iloc iloc;
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 81e086d..4f50067 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -39,6 +39,7 @@ struct wb_writeback_work {
 	unsigned int for_kupdate:1;
 	unsigned int range_cyclic:1;
 	unsigned int for_background:1;
+	unsigned int for_sync_fs:1;
 
 	struct list_head list;		/* pending work list */
 	struct completion *done;	/* set if the caller waits */
@@ -600,6 +601,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		.older_than_this	= NULL,
 		.for_kupdate		= work->for_kupdate,
 		.for_background		= work->for_background,
+		.for_sync_fs		= work->for_sync_fs,
 		.range_cyclic		= work->range_cyclic,
 	};
 	unsigned long oldest_jif;
@@ -1124,6 +1126,7 @@ void sync_inodes_sb(struct super_block *sb)
 		.sync_mode	= WB_SYNC_ALL,
 		.nr_pages	= LONG_MAX,
 		.range_cyclic	= 0,
+		.for_sync_fs	= 1,
 		.done		= &done,
 	};
 
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index caa7583..244bf0e 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -1638,7 +1638,8 @@ int reiserfs_write_inode(struct inode *inode, struct writeback_control *wbc)
 	 ** inode needs to reach disk for safety, and they can safely be
 	 ** ignored because the altered inode has already been logged.
 	 */
-	if (wbc->sync_mode == WB_SYNC_ALL && !(current->flags & PF_MEMALLOC)) {
+	if (wbc->sync_mode == WB_SYNC_ALL && !(current->flags & PF_MEMALLOC) &&
+		!wbc->for_sync_fs) {
 		reiserfs_write_lock(inode->i_sb);
 		if (!journal_begin(&th, inode->i_sb, jbegin_count)) {
 			reiserfs_update_sd(&th, inode);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 72a5d64..a2f7c57 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -50,6 +50,7 @@ struct writeback_control {
 	unsigned for_kupdate:1;		/* A kupdate writeback */
 	unsigned for_background:1;	/* A background writeback */
 	unsigned for_reclaim:1;		/* Invoked from the page allocator */
+	unsigned for_sync_fs:1;		/* Invoked from sync_filesystem*/
 	unsigned range_cyclic:1;	/* range_start is cyclic */
 	unsigned more_io:1;		/* more io to be dispatched */
 };
-- 
1.6.6.1


  reply	other threads:[~2010-10-08 15:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-08 15:00 [PATCH 1/2] writeback: sync quota after inodes writeback Dmitry Monakhov
2010-10-08 15:00 ` Dmitry Monakhov [this message]
2010-10-08 17:09   ` [PATCH 2/2] writeback: skip useless data integrity tricks for sync_filesystem Christoph Hellwig
2010-10-08 17:08 ` [PATCH 1/2] writeback: sync quota after inodes writeback Christoph Hellwig
2010-10-20 12:25   ` Jan Kara
2010-10-20 15:11     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1286550027-9684-2-git-send-email-dmonakhov@gmail.com \
    --to=dmonakhov@gmail.com \
    --cc=jaxboe@fusionio.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).