All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 1/2] writeback: Improve busyloop prevention
Date: Thu, 3 Nov 2011 22:52:22 +0800	[thread overview]
Message-ID: <20111103145222.GA9681@localhost> (raw)
In-Reply-To: <20111103015136.GB13266@quack.suse.cz>

On Thu, Nov 03, 2011 at 09:51:36AM +0800, Jan Kara wrote:
> On Thu 03-11-11 02:56:03, Wu Fengguang wrote:
> > On Fri, Oct 28, 2011 at 04:31:04AM +0800, Jan Kara wrote:
> > > On Thu 27-10-11 14:31:33, Wu Fengguang wrote:
> > > > On Fri, Oct 21, 2011 at 06:26:16AM +0800, Jan Kara wrote:
> > > > > On Thu 20-10-11 21:39:38, Wu Fengguang wrote:
> > > > > > On Thu, Oct 20, 2011 at 08:33:00PM +0800, Wu Fengguang wrote:
> > > > > > > On Thu, Oct 20, 2011 at 08:09:09PM +0800, Wu Fengguang wrote:
> > > > > > > > Jan,
> > > > > > > > 
> > > > > > > > I tried the below combined patch over the ioless one, and find some
> > > > > > > > minor regressions. I studied the thresh=1G/ext3-1dd case in particular
> > > > > > > > and find that nr_writeback and the iostat avgrq-sz drops from time to time.
> > > > > > > > 
> > > > > > > > I'll try to bisect the changeset.
> > > > > > 
> > > > > > This is interesting, the culprit is found to be patch 1, which is
> > > > > > simply
> > > > > >                 if (work->for_kupdate) {
> > > > > >                         oldest_jif = jiffies -
> > > > > >                                 msecs_to_jiffies(dirty_expire_interval * 10);
> > > > > > -                       work->older_than_this = &oldest_jif;
> > > > > > -               }
> > > > > > +               } else if (work->for_background)
> > > > > > +                       oldest_jif = jiffies;
> > > > >   Yeah. I had a look into the trace and you can notice that during the
> > > > > whole dd run, we were running a single background writeback work (you can
> > > > > verify that by work->nr_pages decreasing steadily). Without refreshing
> > > > > oldest_jif, we'd write block device inode for /dev/sda (you can identify
> > > > > that by bdi=8:0, ino=0) only once. When refreshing oldest_jif, we write it
> > > > > every 5 seconds (kjournald dirties the device inode after committing a
> > > > > transaction by dirtying metadata buffers which were just committed and can
> > > > > now be checkpointed either by kjournald or flusher thread). So although the
> > > > > performance is slightly reduced, I'd say that the behavior is a desired
> > > > > one.
> > > > > 
> > > > > Also if you observed the performance on a really long run, the difference
> > > > > should get smaller because eventually, kjournald has to flush the metadata
> > > > > blocks when the journal fills up and we need to free some journal space and
> > > > > at that point flushing is even more expensive because we have to do a
> > > > > blocking write during which all transaction operations, thus effectively
> > > > > the whole filesystem, are blocked.
> > > > 
> > > > Jan, I got figures for test case
> > > > 
> > > > ext3-1dd-4k-8p-2941M-1000M:10-3.1.0-rc9-ioless-full-nfs-wq5-next-20111014+
> > > > 
> > > > There is no single drop of nr_writeback in the longer 1200s run, which
> > > > wrote ~60GB data.
> > >   I did some calculations. Default journal size for a filesystem of your
> > > size is 128 MB which allows recording of around 128 GB of data. So your
> > > test probably didn't hit the point where the journal is recycled yet. An
> > > easy way to make sure journal gets recycled is to set its size to a lower
> > > value when creating the filesystem by
> > >   mke2fs -J size=8
> > > 
> > >   Then at latest after writing 8 GB the effect of journal recycling should
> > > be visible (I suggest writing at least 16 or so so that we can see some
> > > pattern). Also note that without the patch altering background writeback,
> > > kjournald will do all the writeback of the metadata and kjournal works with
> > > buffer heads. Thus IO it does is *not* accounted in mm statistics. You will
> > > observe its effects only by a sudden increase in await or svctm because the
> > > disk got busy by IO you don't see. Also secondarily you could probably
> > > observe that as a hiccup in the number of dirtied/written pages.
> > 
> > Jan, finally the `correct' results for "-J size=8" w/o the patch
> > altering background writeback.
> > 
> > I noticed the periodic small drops of nr_writeback in
> > global_dirty_state.png, other than that it looks pretty good.
>   If you look at iostat graphs, you'll notice periodic increases in await
> time in roughly 100 s intervals. I belive this could be checkpointing

Yes it is. And there is frequent drop of IO queue size.

> that's going on in the background. Also there are (negative) peaks in the
> "paused" graph.

Yeah, this happens on all ext3/4 workloads and I'm kind of used to it ;)

> Anyway, the main question is - do you see any throughput
> difference with/without the background writeback patch with the small
> journal?

Here are the comparisons w/o the patch. The results w/ the patch
should be available tomorrow.

wfg@bee /export/writeback% ./compare.rb -g ext4 -c fs -e io_wkB_s thresh*/*-20111102+
                    ext4              ext4:jsize=8
------------------------  ------------------------
                47684.35        -0.3%     47546.62  thresh=1000M/X-100dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
                54015.86        -1.6%     53166.76  thresh=1000M/X-10dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
                55320.03        +0.6%     55657.48  thresh=1000M/X-1dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
                44271.29        +2.6%     45443.23  thresh=100M/X-10dd-4k-8p-4096M-100M:10-3.1.0-ioless-full-next-20111102+
                54334.22        -1.0%     53801.15  thresh=100M/X-1dd-4k-8p-4096M-100M:10-3.1.0-ioless-full-next-20111102+
                52563.67        -0.7%     52207.05  thresh=100M/X-2dd-4k-8p-4096M-100M:10-3.1.0-ioless-full-next-20111102+
               308189.41        -0.1%    307822.30  TOTAL io_wkB_s

wfg@bee /export/writeback% ./compare.rb -g ext3 -c fs -e io_wkB_s thresh*/*-20111102+
                    ext3              ext3:jsize=8
------------------------  ------------------------
                36231.89        -1.6%     35659.34  thresh=1000M/X-100dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
                41115.07        -6.2%     38564.52  thresh=1000M/X-10dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
                48025.75        -3.8%     46213.55  thresh=1000M/X-1dd-4k-8p-4096M-1000M:10-3.1.0-ioless-full-next-20111102+
                45317.31        +1.6%     46023.21  thresh=100M/X-1dd-4k-8p-4096M-100M:10-3.1.0-ioless-full-next-20111102+
                40552.64        +4.0%     42182.84  thresh=100M/X-2dd-4k-8p-4096M-100M:10-3.1.0-ioless-full-next-20111102+
               211242.67        -1.2%    208643.45  TOTAL io_wkB_s

I'm currently rewriting the test scripts to make it easier for others
to understand and make use of it. It will also gain the feature to run
each test 2+ times to get a better idea of the fluctuations :)

Thanks,
Fengguang

  reply	other threads:[~2011-11-03 14:52 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-12 20:57 [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Jan Kara
2011-10-12 20:57 ` [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-10-13 14:26   ` Wu Fengguang
2011-10-13 20:13     ` Jan Kara
2011-10-14  7:18       ` Christoph Hellwig
2011-10-14 19:31         ` Chris Mason
     [not found]     ` <20111013143939.GA9691@localhost>
2011-10-13 20:18       ` Jan Kara
2011-10-14 16:00         ` Wu Fengguang
2011-10-14 16:28           ` Wu Fengguang
2011-10-18  0:51             ` Jan Kara
2011-10-18 14:35               ` Wu Fengguang
2011-10-19 11:56                 ` Jan Kara
2011-10-19 13:25                   ` Wu Fengguang
2011-10-19 13:30                   ` Wu Fengguang
2011-10-19 13:35                   ` Wu Fengguang
2011-10-20 12:09                   ` Wu Fengguang
2011-10-20 12:33                     ` Wu Fengguang
2011-10-20 13:39                       ` Wu Fengguang
2011-10-20 22:26                         ` Jan Kara
2011-10-22  4:20                           ` Wu Fengguang
2011-10-24 15:45                             ` Jan Kara
     [not found]                           ` <20111027063133.GA10146@localhost>
2011-10-27 20:31                             ` Jan Kara
     [not found]                               ` <20111101134231.GA31718@localhost>
2011-11-01 21:53                                 ` Jan Kara
2011-11-02 17:25                                   ` Wu Fengguang
     [not found]                               ` <20111102185603.GA4034@localhost>
2011-11-03  1:51                                 ` Jan Kara
2011-11-03 14:52                                   ` Wu Fengguang [this message]
     [not found]                                   ` <20111104152054.GA11577@localhost>
2011-11-08 23:52                                     ` Jan Kara
2011-11-09 13:51                                       ` Wu Fengguang
2011-11-10 14:50                                       ` Jan Kara
2011-12-05  8:02                                         ` Wu Fengguang
2011-12-07 10:13                                           ` Jan Kara
2011-12-07 11:45                                             ` Wu Fengguang
     [not found]                           ` <20111027064745.GA14017@localhost>
2011-10-27 20:50                             ` Jan Kara
2011-10-20  9:46               ` Christoph Hellwig
2011-10-20 15:32                 ` Jan Kara
2011-10-15 12:41           ` Wu Fengguang
2011-10-12 20:57 ` [PATCH 2/2] writeback: Replace some redirty_tail() calls with requeue_io() Jan Kara
2011-10-13 14:30   ` Wu Fengguang
2011-10-13 14:15 ` [PATCH 0/2 v4] writeback: Improve busyloop prevention and inode requeueing Wu Fengguang
  -- strict thread matches above, loose matches on Subject: below --
2011-10-05 17:58 [PATCH 0/2] Avoid putting of writeback of inodes for too long (v3) Jan Kara
2011-10-05 17:58 ` [PATCH 1/2] writeback: Improve busyloop prevention Jan Kara
2011-09-08  0:44 Jan Kara
2011-09-08  0:57 ` Wu Fengguang
2011-09-08 13:49   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111103145222.GA9681@localhost \
    --to=fengguang.wu@intel.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.