From: Wu Fengguang <fengguang.wu@intel.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de>,
Dave Chinner <david@fromorbit.com>,
Greg Thelen <gthelen@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
Andrea Righi <arighi@develer.com>, linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/5] writeback: dirty position control
Date: Fri, 26 Aug 2011 09:56:10 +0800 [thread overview]
Message-ID: <20110826015610.GA10320@localhost> (raw)
In-Reply-To: <20110825222001.GG27162@redhat.com>
On Fri, Aug 26, 2011 at 06:20:01AM +0800, Vivek Goyal wrote:
> On Thu, Aug 25, 2011 at 11:19:34AM +0800, Wu Fengguang wrote:
>
> [..]
> > > So you are trying to make one feedback loop aware of second loop so that
> > > if second loop is unbalanced, first loop reacts to that as well and not
> > > just look at dirty_rate and write_bw. So refining new balanced rate by
> > > pos_ratio helps.
> > > write_bw
> > > bdi->dirty_ratelimit_(i+1) = bdi->dirty_ratelimit_i * --------- * pos_ratio
> > > dirty_bw
> > >
> > > Now if global dirty pages are imbalanced, balanced rate will still go
> > > down despite the fact that dirty_bw == write_bw. This will lead to
> > > further reduction in task dirty rate. Which in turn will lead to reduced
> > > number of dirty rate and should eventually lead to pos_ratio=1.
> >
> > Right, that's a good alternative viewpoint to the below one.
> >
> > write_bw
> > bdi->dirty_ratelimit_(i+1) = task_ratelimit_i * ---------
> > dirty_bw
> >
> > (1) the periodic rate estimation uses that to refresh the balanced rate on every 200ms
> > (2) as long as the rate estimation is correct, pos_ratio is able to drive itself to 1.0
>
> Personally I found it much easier to understand the other representation.
> Once you have come up with equation.
>
> balance_rate_(i+1) = balance_rate(i) * write_bw/dirty_bw
>
> Can you please put few lines of comments to explain that why above
> alone is not sufficient and we need to take pos_ratio also in to
> account to keep number of dirty pages in check. And then go onto
>
> balance_rate_(i+1) = balance_rate(i) * write_bw/dirty_bw * pos_ratio
>
> This kind of maintains the continuity of explanation and explains
> that why are we deviating from the theory we discussed so far.
Good point. Here is the commented code:
/*
* task_ratelimit reflects each dd's dirty rate for the past 200ms.
*/
task_ratelimit = (u64)dirty_ratelimit *
pos_ratio >> RATELIMIT_CALC_SHIFT;
/*
* A linear estimation of the "balanced" throttle rate. The theory is,
* if there are N dd tasks, each throttled at task_ratelimit, the bdi's
* dirty_rate will be measured to be (N * task_ratelimit). So the below
* formula will yield the balanced rate limit (write_bw / N).
*
* Note that the expanded form is not a pure rate feedback:
* rate_(i+1) = rate_(i) * (write_bw / dirty_rate) (1)
* but also takes pos_ratio into account:
* rate_(i+1) = rate_(i) * (write_bw / dirty_rate) * pos_ratio (2)
*
* (1) is not realistic because pos_ratio also takes part in balancing
* the dirty rate. Consider the state
* pos_ratio = 0.5 (3)
* rate = 2 * (write_bw / N) (4)
* If (1) is used, it will stuck in that state! Because each dd will be
* throttled at
* task_ratelimit = pos_ratio * rate = (write_bw / N) (5)
* yielding
* dirty_rate = N * task_ratelimit = write_bw (6)
* put (6) into (1) we get
* rate_(i+1) = rate_(i) (7)
*
* So we end up using (2) to always keep
* rate_(i+1) ~= (write_bw / N) (8)
* regardless of the value of pos_ratio. As long as (8) is satisfied,
* pos_ratio is able to drive itself to 1.0, which is not only where
* the dirty count meet the setpoint, but also where the slope of
* pos_ratio is most flat and hence task_ratelimit is least fluctuated.
*/
balanced_dirty_ratelimit = div_u64((u64)task_ratelimit * write_bw,
dirty_rate | 1);
> >
> > > A related question though I should have asked you this long back. How does
> > > throttling based on rate helps. Why we could not just work with two
> > > pos_ratios. One is gloabl postion ratio and other is bdi position ratio.
> > > And then throttle task gradually to achieve smooth throttling behavior.
> > > IOW, what property does rate provide which is not available just by
> > > looking at per bdi dirty pages. Can't we come up with bdi setpoint and
> > > limit the way you have done for gloabl setpoint and throttle tasks
> > > accordingly?
> >
> > Good question. If we have no idea of the balanced rate at all, but
> > still want to limit dirty pages within the range [freerun, limit],
> > all we can do is to throttle the task at eg. 1TB/s at @freerun and
> > 0 at @limit. Then you get a really sharp control line which will make
> > task_ratelimit fluctuate like mad...
> >
> > So the balanced rate estimation is the key to get smooth task_ratelimit,
> > while pos_ratio is the ultimate guarantee for the dirty pages range.
>
> Ok, that makes sense. By keeping an estimation of rate at which bdi
> can write, our range of throttling goes down. Say 0 to 300MB/s instead
> of 0 to 1TB/sec and that can lead to a more smooth behavior.
Yeah exactly, and even better, we can make the slope much more flat
around the setpoint to achieve excellent smoothness in stable state :)
Thanks,
Fengguang
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-08-26 1:56 UTC|newest]
Thread overview: 136+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-06 8:44 [PATCH 0/5] IO-less dirty throttling v8 Wu Fengguang
2011-08-06 8:44 ` [PATCH 1/5] writeback: account per-bdi accumulated dirtied pages Wu Fengguang
2011-08-06 8:44 ` [PATCH 2/5] writeback: dirty position control Wu Fengguang
2011-08-08 13:46 ` Peter Zijlstra
2011-08-08 14:11 ` Wu Fengguang
2011-08-08 14:31 ` Peter Zijlstra
2011-08-08 22:47 ` Wu Fengguang
2011-08-09 9:31 ` Peter Zijlstra
2011-08-10 12:28 ` Wu Fengguang
2011-08-08 14:41 ` Peter Zijlstra
2011-08-08 23:05 ` Wu Fengguang
2011-08-09 10:32 ` Peter Zijlstra
2011-08-09 17:20 ` Peter Zijlstra
2011-08-10 22:34 ` Jan Kara
2011-08-11 2:29 ` Wu Fengguang
2011-08-11 11:14 ` Jan Kara
2011-08-16 8:35 ` Wu Fengguang
2011-08-12 13:19 ` Wu Fengguang
2011-08-10 21:40 ` Vivek Goyal
2011-08-16 8:55 ` Wu Fengguang
2011-08-11 22:56 ` Peter Zijlstra
2011-08-12 2:43 ` Wu Fengguang
2011-08-12 3:18 ` Wu Fengguang
2011-08-12 5:45 ` Wu Fengguang
2011-08-12 9:45 ` Peter Zijlstra
2011-08-12 11:07 ` Wu Fengguang
2011-08-12 12:17 ` Peter Zijlstra
2011-08-12 9:47 ` Peter Zijlstra
2011-08-12 11:11 ` Wu Fengguang
2011-08-12 12:54 ` Peter Zijlstra
2011-08-12 12:59 ` Wu Fengguang
2011-08-12 13:08 ` Peter Zijlstra
2011-08-12 13:04 ` Peter Zijlstra
2011-08-12 14:20 ` Wu Fengguang
2011-08-22 15:38 ` Peter Zijlstra
2011-08-23 3:40 ` Wu Fengguang
2011-08-23 10:01 ` Peter Zijlstra
2011-08-23 14:15 ` Wu Fengguang
2011-08-23 17:47 ` Vivek Goyal
2011-08-24 0:12 ` Wu Fengguang
2011-08-24 16:12 ` Peter Zijlstra
2011-08-26 0:18 ` Wu Fengguang
2011-08-26 9:04 ` Peter Zijlstra
2011-08-26 10:04 ` Wu Fengguang
2011-08-26 10:42 ` Peter Zijlstra
2011-08-26 10:52 ` Wu Fengguang
2011-08-26 11:26 ` Wu Fengguang
2011-08-26 12:11 ` Peter Zijlstra
2011-08-26 12:20 ` Wu Fengguang
2011-08-26 13:13 ` Wu Fengguang
2011-08-26 13:18 ` Peter Zijlstra
2011-08-26 13:24 ` Wu Fengguang
2011-08-24 18:00 ` Vivek Goyal
2011-08-25 3:19 ` Wu Fengguang
2011-08-25 22:20 ` Vivek Goyal
2011-08-26 1:56 ` Wu Fengguang [this message]
2011-08-26 8:56 ` Peter Zijlstra
2011-08-26 9:53 ` Wu Fengguang
2011-08-29 13:12 ` Peter Zijlstra
2011-08-29 13:37 ` Wu Fengguang
2011-09-02 12:16 ` Peter Zijlstra
2011-09-06 12:40 ` Peter Zijlstra
2011-08-24 15:57 ` Peter Zijlstra
2011-08-25 5:30 ` Wu Fengguang
2011-08-23 14:36 ` Vivek Goyal
2011-08-09 2:08 ` Vivek Goyal
2011-08-16 8:59 ` Wu Fengguang
2011-08-06 8:44 ` [PATCH 3/5] writeback: dirty rate control Wu Fengguang
2011-08-09 14:54 ` Vivek Goyal
2011-08-11 3:42 ` Wu Fengguang
2011-08-09 14:57 ` Peter Zijlstra
2011-08-10 11:07 ` Wu Fengguang
2011-08-10 16:17 ` Peter Zijlstra
2011-08-15 14:08 ` Wu Fengguang
2011-08-09 15:50 ` Vivek Goyal
2011-08-09 16:16 ` Peter Zijlstra
2011-08-09 16:19 ` Peter Zijlstra
2011-08-10 14:07 ` Wu Fengguang
2011-08-10 14:00 ` Wu Fengguang
2011-08-10 17:10 ` Peter Zijlstra
2011-08-15 14:11 ` Wu Fengguang
2011-08-09 16:56 ` Peter Zijlstra
2011-08-10 14:10 ` Wu Fengguang
2011-08-09 17:02 ` Peter Zijlstra
2011-08-10 14:15 ` Wu Fengguang
2011-08-06 8:44 ` [PATCH 4/5] writeback: per task dirty rate limit Wu Fengguang
2011-08-06 14:35 ` Andrea Righi
2011-08-07 6:19 ` Wu Fengguang
2011-08-08 13:47 ` Peter Zijlstra
2011-08-08 14:21 ` Wu Fengguang
2011-08-08 23:32 ` Wu Fengguang
2011-08-08 14:23 ` Wu Fengguang
2011-08-08 14:26 ` Peter Zijlstra
2011-08-08 22:38 ` Wu Fengguang
2011-08-13 16:28 ` Andrea Righi
2011-08-15 14:21 ` Wu Fengguang
2011-08-15 14:26 ` Andrea Righi
2011-08-09 17:46 ` Vivek Goyal
2011-08-10 3:29 ` Wu Fengguang
2011-08-10 18:18 ` Vivek Goyal
2011-08-11 0:55 ` Wu Fengguang
2011-08-09 18:35 ` Peter Zijlstra
2011-08-10 3:40 ` Wu Fengguang
2011-08-10 10:25 ` Peter Zijlstra
2011-08-10 11:13 ` Wu Fengguang
2011-08-06 8:44 ` [PATCH 5/5] writeback: IO-less balance_dirty_pages() Wu Fengguang
2011-08-06 14:48 ` Andrea Righi
2011-08-07 6:44 ` Wu Fengguang
2011-08-06 16:46 ` Andrea Righi
2011-08-07 7:18 ` Wu Fengguang
2011-08-07 9:50 ` Andrea Righi
2011-08-09 18:15 ` Vivek Goyal
2011-08-09 18:41 ` Peter Zijlstra
2011-08-10 3:22 ` Wu Fengguang
2011-08-10 3:26 ` Wu Fengguang
2011-08-09 19:16 ` Vivek Goyal
2011-08-10 4:33 ` Wu Fengguang
2011-08-09 2:01 ` [PATCH 0/5] IO-less dirty throttling v8 Vivek Goyal
2011-08-09 5:55 ` Dave Chinner
2011-08-09 14:04 ` Vivek Goyal
2011-08-10 7:41 ` Greg Thelen
2011-08-10 18:40 ` Vivek Goyal
2011-08-11 3:21 ` Wu Fengguang
2011-08-11 20:42 ` Vivek Goyal
2011-08-11 21:00 ` Vivek Goyal
-- strict thread matches above, loose matches on Subject: below --
2011-08-16 2:20 [PATCH 0/5] IO-less dirty throttling v9 Wu Fengguang
2011-08-16 2:20 ` [PATCH 2/5] writeback: dirty position control Wu Fengguang
2011-08-16 19:41 ` Jan Kara
2011-08-17 13:23 ` Wu Fengguang
2011-08-17 13:49 ` Wu Fengguang
2011-08-17 20:24 ` Jan Kara
2011-08-18 4:18 ` Wu Fengguang
2011-08-18 4:41 ` Wu Fengguang
2011-08-18 19:16 ` Jan Kara
2011-08-24 3:16 ` Wu Fengguang
2011-08-19 2:53 ` Vivek Goyal
2011-08-19 3:25 ` Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110826015610.GA10320@localhost \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=arighi@develer.com \
--cc=david@fromorbit.com \
--cc=gthelen@google.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=peterz@infradead.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).