linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Minchan Kim <minchan.kim@gmail.com>,
	Boaz Harrosh <bharrosh@panasas.com>,
	Sorin Faibish <sfaibish@emc.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: IO-less dirty throttling V6 results available
Date: Fri, 25 Feb 2011 22:44:12 +0800	[thread overview]
Message-ID: <20110225144412.GA19448@localhost> (raw)
In-Reply-To: <20110224185632.GJ23042@quack.suse.cz>

On Fri, Feb 25, 2011 at 02:56:32AM +0800, Jan Kara wrote:
> On Thu 24-02-11 23:25:09, Wu Fengguang wrote:
> > The bdi base throttle bandwidth is updated based on three class of
> > parameters.
> > 
> > (1) level of dirty pages
> > 
> > We try to avoid updating the base bandwidth whenever possible. The
> > main update criteria are based on the level of dirty pages, when
> > - the dirty pages are nearby the up or low control scope, or
> > - the dirty pages are departing from the global/bdi dirty goals
> > it's time to update the base bandwidth.
> > 
> > Because the dirty pages are fluctuating steadily, we try to avoid
> > disturbing the base bandwidth when the smoothed number of dirty pages
> > is within (write bandwidth / 8) distance to the goal, based on the
> > fact that fluctuations are typically bounded by the write bandwidth.
> > 
> > (2) the position bandwidth
> > 
> > The position bandwidth is equal to the base bandwidth if the dirty
> > number is equal to the dirty goal, and will be scaled up/down when
> > the dirty pages grow larger than or drop below the goal.
> > 
> > When it's decided to update the base bandwidth, the delta between
> > base bandwidth and position bandwidth will be calculated. The delta
> > value will be scaled down at least 8 times, and the smaller delta
> > value, the more it will be shrank. It's then added to the base
> > bandwidth. In this way, the base bandwidth will adapt to the position
> > bandwidth fast when there are large gaps, and remain stable when the
> > gap is small enough. 
> > 
> > The delta is scaled down considerably because the position bandwidth
> > is not very reliable. It fluctuates sharply when the dirty pages hit
> > the up/low limits. And it takes time for the dirty pages to return to
> > the goal even when the base bandwidth has be adjusted to the right
> > value. So if tracking the position bandwidth closely, the base
> > bandwidth could be overshot.
> > 
> > (3) the reference bandwidth
> > 
> > It's the theoretic base bandwidth! I take time to calculate it as a
> > reference value of base bandwidth to eliminate the fast-convergence
> > vs. steady-state-stability dilemma in pure position based control.
> > It would be optimal control if used directly, however the reference
> > bandwidth is not directly used as the base bandwidth because the
> > numbers for calculating it are all fluctuating, and it's not
> > acceptable for the base bandwidth to fluctuate in the plateau state.
> > So the roughly-accurate calculated value is now used as a very useful
> > double limit when updating the base bandwidth.
> > 
> > Now you should be able to understand the information rich
> > balance_dirty_pages-pages.png graph. Here are two nice ones:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/4G-60%25/btrfs-16dd-1M-8p-3927M-60%-2.6.38-rc6-dt6+-2011-02-24-23-14/balance_dirty_pages-pages.png
> > 
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/10HDD-JBOD-6G-6%25/xfs-1dd-1M-16p-5904M-6%25-2.6.38-rc5-dt6+-2011-02-21-20-00/balance_dirty_pages-pages.png
>   Thanks for the update on your patch series :). As you probably noted,
> I've created patches which implement IO-less balance_dirty_pages()
> differently so we have two implementations to compare (which is a good
> thing I believe). The question is how to do the comparison...

Yeah :)

> I have implemented comments, Peter had to my patches and I have finished
> scripts for gathering mm statistics and processing trace output and
> plotting them. Looking at your test scripts I can probably use some
> of your workloads as mine are currently simpler. Currently I have some
> simple dd tests running, I'll run something over NFS, SATA+USB and
> hopefully several SATA drives next week.

The tests are pretty time consuming. It will help to reuse test
scripts for saving time and for ease of comparison.

> The question is how to compare results? Any idea? Obvious metrics are
> overall throughput and fairness for IO bound tasks. But then there are

I guess there will be little difference in throughput, as long as the
iostat output all have 100% disk util and full IO size.

As for faireness, I have the "ls-files" output for comparing the
file size created by each dd task. For example,

wfg ~/bee% cat xfs-4dd-1M-8p-970M-20%-2.6.38-rc6-dt6+-2011-02-25-21-55/ls-files
131 -rw-r--r-- 1 root root 2783969280 Feb 25 21:58 /fs/sda7/zero-1
132 -rw-r--r-- 1 root root 2772434944 Feb 25 21:58 /fs/sda7/zero-2
133 -rw-r--r-- 1 root root 2733637632 Feb 25 21:58 /fs/sda7/zero-3
134 -rw-r--r-- 1 root root 2735734784 Feb 25 21:58 /fs/sda7/zero-4

> more subtle things like how the algorithm behaves for tasks that are not IO
> bound for most of the time (or do less IO). Any good metrics here? More
> things we could compare?

For non IO bound tasks, there are fio job files that do different
dirty rates.  I have not run them though, as the bandwidth based
algorithm obviously assigns higher bandwidth to light dirtiers :)

Thanks,
Fengguang

  reply	other threads:[~2011-02-25 14:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-22 14:25 IO-less dirty throttling V6 results available Wu Fengguang
2011-02-23 15:13 ` Wu Fengguang
2011-02-24 15:25   ` Wu Fengguang
2011-02-24 18:56     ` Jan Kara
2011-02-25 14:44       ` Wu Fengguang [this message]
2011-02-28 17:22         ` Jan Kara
2011-03-01  9:55           ` Wu Fengguang
2011-03-01 13:51             ` Wu Fengguang
2011-03-01 13:52     ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110225144412.GA19448@localhost \
    --to=fengguang.wu@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=bharrosh@panasas.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=sfaibish@emc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).