All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: no To-header on input <""@suse.de>,
	"Wu, Fengguang" <fengguang.wu@intel.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] jbd2: avoid the concurrent data writeback
Date: Wed, 17 Nov 2010 09:36:21 +0800	[thread overview]
Message-ID: <20101117093621.29fed027@feng-i7> (raw)
In-Reply-To: <20101116121323.GE4757@quack.suse.cz>

Hi Jan,

On Tue, 16 Nov 2010 20:13:23 +0800
Jan Kara <jack@suse.cz> wrote:

>   Hi,
> 
>   sorry for chiming in a bit late...
> On Mon 15-11-10 17:59:43, Feng Tang wrote:
> > From b16cfc5a560f2549ac69dbb235a550500ea1719f Mon Sep 17 00:00:00
> > 2001 From: Feng Tang <feng.tang@intel.com>
> > Date: Mon, 15 Nov 2010 21:06:44 +0800
> > Subject: [PATCH] jbd2: avoid the concurrent data writeback
> > 
> > When dd a big file to an ext4 partition, it is very likely to happen
> > that both the background flush thread and kjounald try to do data
> > writeback for it, that the flush thread is doing the writeback for
> > this file and jbd2 thread are also waken up to commit the
> > transaction. Because kjounald only calls the generic_writepages()
> > whose path doesn't really allocate disk blocks, the ext4_witepage()
> > may be called lots of times (100000+ for a 1g file dd) without
> > really writing one page back (skipped), which will consume lots of
> > unnecessary CPU time
> > 
> > This could be found by a simple test case with ftrace:
> > $ sync;
> > $ echo 40960 > buffer_size_kb;echo 1 > events/writeback/enable;echo
> > 1 > events/jbd2/enable;echo 1 > events/ext4/enable; $ dd
> > if=/dev/zero of=/home/test/1g.bin bs=1M count=1024;sync; $ cat
> > trace > /home/test/jbd2_ext4_1g_dd.log $ grep -c
> > wcb_writepage /home/test/jbd2_ext4_1g_dd.log
> > 
> > This patch will check if the inode is under data syncing, if yes
> > then don't start the writeback from kjournald
> > 
> > The Perf statics (On my Core Duo 2 + 4G RAM + SATA disk + Ext4 in
> > all default modes): before the patch >  112191
> > writeback:wbc_writepage  #      0.005 M/sec after the patch  >  54
> > writeback:wbc_writepage  #      0.000 M/sec
> > 
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > ---
> >  fs/jbd2/commit.c |   11 +++++++++++
> >  1 files changed, 11 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> > index f3ad159..0f3e356 100644
> > --- a/fs/jbd2/commit.c
> > +++ b/fs/jbd2/commit.c
> > @@ -170,6 +170,10 @@ static int
> > journal_wait_on_commit_record(journal_t *journal,
> >   * We don't do block allocation here even for delalloc. We don't
> >   * use writepages() because with dealyed allocation we may be doing
> >   * block allocation in writepages().
> > + *
> > + * Sometimes when this get called, the host inode may be under data
> > + * syncing initiated by flush thread(especially for a large file),
> > and 
> > + * in such situation, we should skip this path of writeback
> >   */
> >  static int journal_submit_inode_data_buffers(struct address_space
> > *mapping) {
> > @@ -181,6 +185,13 @@ static int
> > journal_submit_inode_data_buffers(struct address_space
> > *mapping) .range_end = i_size_read(mapping->host), };
> >  
> > +	spin_lock(&inode_lock);
> > +	if (mapping->host->i_state & I_SYNC) {
> > +		spin_unlock(&inode_lock);
> > +		return 0;
> > +	}
> > +	spin_unlock(&inode_lock);
> > +
>   Sorry, but this is just wrong. Not only because of inode_lock as
> Christoph pointed out but mainly principially. ext4 and ocfs2 in
> data=ordered mode rely on data pages (with underlying blocks already
> allocated) being written out before transaction commit proceeds for
> data integrity. So you cannot just go and remove the writeback saying
> it improves performance.
> 
> I'm not saying that ext4 handling of ordered mode does not need a
> revision (we actually talked with Ted about it at Kernel Summit). But
> the solution for it is to use IO completion callback to do extent
> tree manipulations and stop using JBD2 for data syncing. We already
> do that for direct IO and conversion of preallocated space so doing
> it in all cases should be reasonably easy. Until that happens, you
> can run ext4 in data=writeback mode which will also stop JBD2 from
> doing the writeback (and effectively is rather similar to your patch).

Glad to know that the revision is on the way, and thanks for the 
detailed clarification.

- Feng 

      reply	other threads:[~2010-11-17  1:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1289827533-2576-1-git-send-email-feng.tang@intel.com>
2010-11-15  5:54 ` [PATCH] jbd2: avoid the concurrent data writeback Wu Fengguang
2010-11-15  9:59   ` Feng Tang
2010-11-15 11:27     ` Christoph Hellwig
2010-11-16  8:13       ` Feng Tang
2010-11-16 12:13     ` Jan Kara
2010-11-17  1:36       ` Feng Tang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101117093621.29fed027@feng-i7 \
    --to=feng.tang@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.