From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Christoph Hellwig <hch@infradead.org>,
Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 1/2] writeback: Improve busyloop prevention
Date: Thu, 3 Nov 2011 22:52:22 +0800 [thread overview]
Message-ID: <20111103145222.GA9681@localhost> (raw)
In-Reply-To: <20111103015136.GB13266@quack.suse.cz>
On Thu, Nov 03, 2011 at 09:51:36AM +0800, Jan Kara wrote:
> On Thu 03-11-11 02:56:03, Wu Fengguang wrote:
> > On Fri, Oct 28, 2011 at 04:31:04AM +0800, Jan Kara wrote:
> > > On Thu 27-10-11 14:31:33, Wu Fengguang wrote:
> > > > On Fri, Oct 21, 2011 at 06:26:16AM +0800, Jan Kara wrote:
> > > > > On Thu 20-10-11 21:39:38, Wu Fengguang wrote:
> > > > > > On Thu, Oct 20, 2011 at 08:33:00PM +0800, Wu Fengguang wrote:
> > > > > > > On Thu, Oct 20, 2011 at 08:09:09PM +0800, Wu Fengguang wrote:
> > > > > > > > Jan,
> > > > > > > >
> > > > > > > > I tried the below combined patch over the ioless one, and find some
> > > > > > > > minor regressions. I studied the thresh=1G/ext3-1dd case in particular
> > > > > > > > and find that nr_writeback and the iostat avgrq-sz drops from time to time.
> > > > > > > >
> > > > > > > > I'll try to bisect the changeset.
> > > > > >
> > > > > > This is interesting, the culprit is found to be patch 1, which is
> > > > > > simply
> > > > > > if (work->for_kupdate) {
> > > > > > oldest_jif = jiffies -
> > > > > > msecs_to_jiffies(dirty_expire_interval * 10);
> > > > > > - work->older_than_this = &oldest_jif;
> > > > > > - }
> > > > > > + } else if (work->for_background)
> > > > > > + oldest_jif = jiffies;
> > > > > Yeah. I had a look into the trace and you can notice that during the
> > > > > whole dd run, we were running a single background writeback work (you can
> > > > > verify that by work->nr_pages decreasing steadily). Without refreshing
> > > > > oldest_jif, we'd write block device inode for /dev/sda (you can identify
> > > > > that by bdi=8:0, ino=0) only once. When refreshing oldest_jif, we write it
> > > > > every 5 seconds (kjournald dirties the device inode after committing a
> > > > > transaction by dirtying metadata buffers which were just committed and can
> > > > > now be checkpointed either by kjournald or flusher thread). So although the
> > > > > performance is slightly reduced, I'd say that the behavior is a desired
> > > > > one.
> > > > >
> > > > > Also if you observed the performance on a really long run, the difference
> > > > > should get smaller because eventually, kjournald has to flush the metadata
> > > > > blocks when the journal fills up and we need to free some journal space and
> > > > > at that point flushing is even more expensive because we have to do a
> > > > > blocking write during which all transaction operations, thus effectively
> > > > > the whole filesystem, are blocked.
> > > >
> > > > Jan, I got figures for test case
> > > >
> > > > ext3-1dd-4k-8p-2941M-1000M:10-3.1.0-rc9-ioless-full-nfs-wq5-next-20111014+
> > > >
> > > > There is no single drop of nr_writeback in the longer 1200s run, which
> > > > wrote ~60GB data.
> > > I did some calculations. Default journal size for a filesystem of your
> > > size is 128 MB which allows recording of around 128 GB of data. So your
> > > test probably didn't hit the point where the journal is recycled yet. An
> > > easy way to make sure journal gets recycled is to set its size to a lower
> > > value when creating the filesystem by
> > > mke2fs -J size=8
> > >
> > > Then at latest after writing 8 GB the effect of journal recycling should
> > > be visible (I suggest writing at least 16 or so so that we can see some
> > > pattern). Also note that without the patch altering background writeback,
> > > kjournald will do all the writeback of the metadata and kjournal works with
> > > buffer heads. Thus IO it does is *not* accounted in mm statistics. You will
> > > observe its effects only by a sudden increase in await or svctm because the
> > > disk got busy by IO you don't see. Also secondarily you could probably
> > > observe that as a hiccup in the number of dirtied/written pages.
> >
> > Jan, finally the `correct' results for "-J size=8" w/o the patch
> > altering background writeback.
> >
> > I noticed the periodic small drops of nr_writeback in
> > global_dirty_state.png, other than that it looks pretty good.
> If you look at iostat graphs, you'll notice periodic increases in await
> time in roughly 100 s intervals. I belive this could be checkpointing
Yes it is. And there is frequent drop of IO queue size.
> that's going on in the background. Also there are (negative) peaks in the
> "paused" graph.
Yeah, this happens on all ext3/4 workloads and I'm kind of used to it ;)
> Anyway, the main question is - do you see any throughput
> difference with/without the background writeback patch with the small
> journal?
Here are the comparisons w/o the patch. The results w/ the patch
should be available tomorrow.
wfg@bee /export/writeback% ./compare.rb -g ext4 -c fs -e io_wkB_s thresh*/*-20111102+
ext4 ext4:jsize=8
------------------------ ------------------------
47684.35 -0.3% 47546.62 thresh=1000M/X-100dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
54015.86 -1.6% 53166.76 thresh=1000M/X-10dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
55320.03 +0.6% 55657.48 thresh=1000M/X-1dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
44271.29 +2.6% 45443.23 thresh=100M/X-10dd-4k-8p-4096M-100M:10-3.1.0-ioless-full-next-20111102+
54334.22 -1.0% 53801.15 thresh=100M/X-1dd-4k-8p-4096M-100M:10-3.1.0-ioless-full-next-20111102+
52563.67 -0.7% 52207.05 thresh=100M/X-2dd-4k-8p-4096M-100M:10-3.1.0-ioless-full-next-20111102+
308189.41 -0.1% 307822.30 TOTAL io_wkB_s
wfg@bee /export/writeback% ./compare.rb -g ext3 -c fs -e io_wkB_s thresh*/*-20111102+
ext3 ext3:jsize=8
------------------------ ------------------------
36231.89 -1.6% 35659.34 thresh=1000M/X-100dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
41115.07 -6.2% 38564.52 thresh=1000M/X-10dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
48025.75 -3.8% 46213.55 thresh=1000M/X-1dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
45317.31 +1.6% 46023.21 thresh=100M/X-1dd-4k-8p-4096M-100M:10-3.1.0-ioless-full-next-20111102+
40552.64 +4.0% 42182.84 thresh=100M/X-2dd-4k-8p-4096M-100M:10-3.1.0-ioless-full-next-20111102+
211242.67 -1.2% 208643.45 TOTAL io_wkB_s
I'm currently rewriting the test scripts to make it easier for others
to understand and make use of it. It will also gain the feature to run
each test 2+ times to get a better idea of the fluctuations :)
Thanks,
Fengguang
next prev parent reply other threads:[~2011-11-03 14:52 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-10-12 20:57 [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Jan Kara
2011-10-12 20:57 ` [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-10-13 14:26 ` Wu Fengguang
2011-10-13 20:13 ` Jan Kara
2011-10-14 7:18 ` Christoph Hellwig
2011-10-14 19:31 ` Chris Mason
[not found] ` <20111013143939.GA9691@localhost>
2011-10-13 20:18 ` Jan Kara
2011-10-14 16:00 ` Wu Fengguang
2011-10-14 16:28 ` Wu Fengguang
2011-10-18 0:51 ` Jan Kara
2011-10-18 14:35 ` Wu Fengguang
2011-10-19 11:56 ` Jan Kara
2011-10-19 13:25 ` Wu Fengguang
2011-10-19 13:30 ` Wu Fengguang
2011-10-19 13:35 ` Wu Fengguang
2011-10-20 12:09 ` Wu Fengguang
2011-10-20 12:33 ` Wu Fengguang
2011-10-20 13:39 ` Wu Fengguang
2011-10-20 22:26 ` Jan Kara
2011-10-22 4:20 ` Wu Fengguang
2011-10-24 15:45 ` Jan Kara
[not found] ` <20111027063133.GA10146@localhost>
2011-10-27 20:31 ` Jan Kara
[not found] ` <20111101134231.GA31718@localhost>
2011-11-01 21:53 ` Jan Kara
2011-11-02 17:25 ` Wu Fengguang
[not found] ` <20111102185603.GA4034@localhost>
2011-11-03 1:51 ` Jan Kara
2011-11-03 14:52 ` Wu Fengguang [this message]
[not found] ` <20111104152054.GA11577@localhost>
2011-11-08 23:52 ` Jan Kara
2011-11-09 13:51 ` Wu Fengguang
2011-11-10 14:50 ` Jan Kara
2011-12-05 8:02 ` Wu Fengguang
2011-12-07 10:13 ` Jan Kara
2011-12-07 11:45 ` Wu Fengguang
[not found] ` <20111027064745.GA14017@localhost>
2011-10-27 20:50 ` Jan Kara
2011-10-20 9:46 ` Christoph Hellwig
2011-10-20 15:32 ` Jan Kara
2011-10-15 12:41 ` Wu Fengguang
2011-10-12 20:57 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-10-13 14:30 ` Wu Fengguang
2011-10-13 14:15 ` [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Wu Fengguang
-- strict thread matches above, loose matches on Subject: below --
2011-09-08 0:44 [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-09-08 0:57 ` Wu Fengguang
2011-09-08 13:49 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111103145222.GA9681@localhost \
--to=fengguang.wu@intel.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).