From mboxrd@z Thu Jan  1 00:00:00 1970
From: Wu Fengguang <fengguang.wu@intel.com>
Subject: Re: [PATCH 1/2] writeback: Improve busyloop prevention
Date: Sat, 22 Oct 2011 12:20:19 +0800
Message-ID: <20111022042019.GA10287@localhost>
References: <20111013201835.GD27363@quack.suse.cz>
 <20111014160047.GA13330@localhost>
 <20111014162807.GA4617@localhost>
 <20111018005128.GI4528@quack.suse.cz>
 <20111018143504.GA17818@localhost>
 <20111019115630.GA22266@quack.suse.cz>
 <20111020120909.GA8193@localhost>
 <20111020123300.GA12317@localhost>
 <20111020133938.GA18058@localhost>
 <20111020222616.GA20542@quack.suse.cz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mga14.intel.com ([143.182.124.37]:32599 "EHLO mga14.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752013Ab1JVEWN (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Sat, 22 Oct 2011 00:22:13 -0400
Content-Disposition: inline
In-Reply-To: <20111020222616.GA20542@quack.suse.cz>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Fri, Oct 21, 2011 at 06:26:16AM +0800, Jan Kara wrote:
> On Thu 20-10-11 21:39:38, Wu Fengguang wrote:
> > On Thu, Oct 20, 2011 at 08:33:00PM +0800, Wu Fengguang wrote:
> > > On Thu, Oct 20, 2011 at 08:09:09PM +0800, Wu Fengguang wrote:
> > > > Jan,
> > > > 
> > > > I tried the below combined patch over the ioless one, and find some
> > > > minor regressions. I studied the thresh=1G/ext3-1dd case in particular
> > > > and find that nr_writeback and the iostat avgrq-sz drops from time to time.
> > > > 
> > > > I'll try to bisect the changeset.
> > 
> > This is interesting, the culprit is found to be patch 1, which is
> > simply
> >                 if (work->for_kupdate) {
> >                         oldest_jif = jiffies -
> >                                 msecs_to_jiffies(dirty_expire_interval * 10);
> > -                       work->older_than_this = &oldest_jif;
> > -               }
> > +               } else if (work->for_background)
> > +                       oldest_jif = jiffies;
>   Yeah. I had a look into the trace and you can notice that during the
> whole dd run, we were running a single background writeback work (you can
> verify that by work->nr_pages decreasing steadily).

Yes, it is.

> Without refreshing
> oldest_jif, we'd write block device inode for /dev/sda (you can identify
> that by bdi=8:0, ino=0) only once. When refreshing oldest_jif, we write it
> every 5 seconds (kjournald dirties the device inode after committing a
> transaction by dirtying metadata buffers which were just committed and can
> now be checkpointed either by kjournald or flusher thread).

OK, now I understand the regular drops of nr_writeback and avgrq-sz:
on every 5s, it takes _some time_ to write inode 0, during which the
flusher is blocked and the IO queue runs low.

> So although the performance is slightly reduced, I'd say that the
> behavior is a desired one.

OK. However it's sad to see the flusher get blocked from time to time...

> Also if you observed the performance on a really long run, the difference
> should get smaller because eventually, kjournald has to flush the metadata
> blocks when the journal fills up and we need to free some journal space and
> at that point flushing is even more expensive because we have to do a
> blocking write during which all transaction operations, thus effectively
> the whole filesystem, are blocked.

OK. The dd test time was 300s, I'll increase it to 900s (cannot do
more because it's a 90GB disk partition).

Thanks,
Fengguang