From: Wu Fengguang <fengguang.wu@intel.com>
To: linux-fsdevel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Wu Fengguang <fengguang.wu@intel.com>, Jan Kara <jack@suse.cz>,
Christoph Hellwig <hch@lst.de>,
Dave Chinner <david@fromorbit.com>,
Greg Thelen <gthelen@google.com>,
Minchan Kim <minchan.kim@gmail.com>,
Vivek Goyal <vgoyal@redhat.com>,
Andrea Righi <arighi@develer.com>, linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 3/5] writeback: dirty rate control
Date: Sat, 06 Aug 2011 16:44:50 +0800 [thread overview]
Message-ID: <20110806094526.878435971@intel.com> (raw)
In-Reply-To: 20110806084447.388624428@intel.com
[-- Attachment #1: dirty-ratelimit --]
[-- Type: text/plain, Size: 6718 bytes --]
It's all about bdi->dirty_ratelimit, which aims to be (write_bw / N)
when there are N dd tasks.
On write() syscall, use bdi->dirty_ratelimit
============================================
balance_dirty_pages(pages_dirtied)
{
pos_bw = bdi->dirty_ratelimit * bdi_position_ratio();
pause = pages_dirtied / pos_bw;
sleep(pause);
}
On every 200ms, update bdi->dirty_ratelimit
===========================================
bdi_update_dirty_ratelimit()
{
bw = bdi->dirty_ratelimit;
ref_bw = bw * bdi_position_ratio() * write_bw / dirty_bw;
if (dirty pages unbalanced)
bdi->dirty_ratelimit = (bw * 3 + ref_bw) / 4;
}
Estimation of balanced bdi->dirty_ratelimit
===========================================
When started N dd, throttle each dd at
task_ratelimit = pos_bw (any non-zero initial value is OK)
After 200ms, we got
dirty_bw = # of pages dirtied by app / 200ms
write_bw = # of pages written to disk / 200ms
For aggressive dirtiers, the equality holds
dirty_bw == N * task_ratelimit
== N * pos_bw (1)
The balanced throttle bandwidth can be estimated by
ref_bw = pos_bw * write_bw / dirty_bw (2)
>From (1) and (2), we get equality
ref_bw == write_bw / N (3)
If the N dd's are all throttled at ref_bw, the dirty/writeback rates
will match. So ref_bw is the balanced dirty rate.
In practice, the ref_bw calculated by (2) may fluctuate and have
estimation errors. So the bdi->dirty_ratelimit update policy is to
follow it only when both pos_bw and ref_bw point to the same direction
(indicating not only the dirty position has deviated from the global/bdi
setpoints, but also it's still departing away).
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
include/linux/backing-dev.h | 7 +++
mm/backing-dev.c | 1
mm/page-writeback.c | 69 +++++++++++++++++++++++++++++++++-
3 files changed, 75 insertions(+), 2 deletions(-)
--- linux-next.orig/include/linux/backing-dev.h 2011-08-05 18:05:36.000000000 +0800
+++ linux-next/include/linux/backing-dev.h 2011-08-05 18:05:36.000000000 +0800
@@ -75,10 +75,17 @@ struct backing_dev_info {
struct percpu_counter bdi_stat[NR_BDI_STAT_ITEMS];
unsigned long bw_time_stamp; /* last time write bw is updated */
+ unsigned long dirtied_stamp;
unsigned long written_stamp; /* pages written at bw_time_stamp */
unsigned long write_bandwidth; /* the estimated write bandwidth */
unsigned long avg_write_bandwidth; /* further smoothed write bw */
+ /*
+ * The base throttle bandwidth, re-calculated on every 200ms.
+ * All the bdi tasks' dirty rate will be curbed under it.
+ */
+ unsigned long dirty_ratelimit;
+
struct prop_local_percpu completions;
int dirty_exceeded;
--- linux-next.orig/mm/backing-dev.c 2011-08-05 18:05:36.000000000 +0800
+++ linux-next/mm/backing-dev.c 2011-08-05 18:05:36.000000000 +0800
@@ -674,6 +674,7 @@ int bdi_init(struct backing_dev_info *bd
bdi->bw_time_stamp = jiffies;
bdi->written_stamp = 0;
+ bdi->dirty_ratelimit = INIT_BW;
bdi->write_bandwidth = INIT_BW;
bdi->avg_write_bandwidth = INIT_BW;
--- linux-next.orig/mm/page-writeback.c 2011-08-05 18:05:36.000000000 +0800
+++ linux-next/mm/page-writeback.c 2011-08-06 09:08:35.000000000 +0800
@@ -736,6 +736,66 @@ static void global_update_bandwidth(unsi
spin_unlock(&dirty_lock);
}
+/*
+ * Maintain bdi->dirty_ratelimit, the base throttle bandwidth.
+ *
+ * Normal bdi tasks will be curbed at or below it in long term.
+ * Obviously it should be around (write_bw / N) when there are N dd tasks.
+ */
+static void bdi_update_dirty_ratelimit(struct backing_dev_info *bdi,
+ unsigned long thresh,
+ unsigned long dirty,
+ unsigned long bdi_thresh,
+ unsigned long bdi_dirty,
+ unsigned long dirtied,
+ unsigned long elapsed)
+{
+ unsigned long bw = bdi->dirty_ratelimit;
+ unsigned long dirty_bw;
+ unsigned long pos_bw;
+ unsigned long ref_bw;
+ unsigned long long pos_ratio;
+
+ /*
+ * The dirty rate will match the writeback rate in long term, except
+ * when dirty pages are truncated by userspace or re-dirtied by FS.
+ */
+ dirty_bw = (dirtied - bdi->dirtied_stamp) * HZ / elapsed;
+
+ pos_ratio = bdi_position_ratio(bdi, thresh, dirty,
+ bdi_thresh, bdi_dirty);
+ /*
+ * pos_bw reflects each dd's dirty rate enforced for the past 200ms.
+ */
+ pos_bw = bw * pos_ratio >> BANDWIDTH_CALC_SHIFT;
+ pos_bw++; /* this avoids bdi->dirty_ratelimit get stuck in 0 */
+
+ /*
+ * ref_bw = pos_bw * write_bw / dirty_bw
+ *
+ * It's a linear estimation of the "balanced" throttle bandwidth.
+ */
+ pos_ratio *= bdi->avg_write_bandwidth;
+ do_div(pos_ratio, dirty_bw | 1);
+ ref_bw = bw * pos_ratio >> BANDWIDTH_CALC_SHIFT;
+
+ /*
+ * dirty_ratelimit will follow ref_bw/pos_bw conservatively iff they
+ * are on the same side of dirty_ratelimit. Which not only makes it
+ * more stable, but also is essential for preventing it being driven
+ * away by possible systematic errors in ref_bw.
+ */
+ if (pos_bw < bw) {
+ if (ref_bw < bw)
+ bw = max(ref_bw, pos_bw);
+ } else {
+ if (ref_bw > bw)
+ bw = min(ref_bw, pos_bw);
+ }
+
+ bdi->dirty_ratelimit = bw;
+}
+
void __bdi_update_bandwidth(struct backing_dev_info *bdi,
unsigned long thresh,
unsigned long dirty,
@@ -745,6 +805,7 @@ void __bdi_update_bandwidth(struct backi
{
unsigned long now = jiffies;
unsigned long elapsed = now - bdi->bw_time_stamp;
+ unsigned long dirtied;
unsigned long written;
/*
@@ -753,6 +814,7 @@ void __bdi_update_bandwidth(struct backi
if (elapsed < BANDWIDTH_INTERVAL)
return;
+ dirtied = percpu_counter_read(&bdi->bdi_stat[BDI_DIRTIED]);
written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]);
/*
@@ -762,12 +824,15 @@ void __bdi_update_bandwidth(struct backi
if (elapsed > HZ && time_before(bdi->bw_time_stamp, start_time))
goto snapshot;
- if (thresh)
+ if (thresh) {
global_update_bandwidth(thresh, dirty, now);
-
+ bdi_update_dirty_ratelimit(bdi, thresh, dirty, bdi_thresh,
+ bdi_dirty, dirtied, elapsed);
+ }
bdi_update_write_bandwidth(bdi, elapsed, written);
snapshot:
+ bdi->dirtied_stamp = dirtied;
bdi->written_stamp = written;
bdi->bw_time_stamp = now;
}
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-08-06 12:20 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-06 8:44 [PATCH 0/5] IO-less dirty throttling v8 Wu Fengguang
2011-08-06 8:44 ` [PATCH 1/5] writeback: account per-bdi accumulated dirtied pages Wu Fengguang
2011-08-06 8:44 ` [PATCH 2/5] writeback: dirty position control Wu Fengguang
2011-08-08 13:46 ` Peter Zijlstra
2011-08-08 14:11 ` Wu Fengguang
2011-08-08 14:31 ` Peter Zijlstra
2011-08-08 22:47 ` Wu Fengguang
2011-08-09 9:31 ` Peter Zijlstra
2011-08-10 12:28 ` Wu Fengguang
2011-08-08 14:41 ` Peter Zijlstra
2011-08-08 23:05 ` Wu Fengguang
2011-08-09 10:32 ` Peter Zijlstra
2011-08-09 17:20 ` Peter Zijlstra
2011-08-10 22:34 ` Jan Kara
2011-08-11 2:29 ` Wu Fengguang
2011-08-11 11:14 ` Jan Kara
2011-08-16 8:35 ` Wu Fengguang
2011-08-12 13:19 ` Wu Fengguang
2011-08-10 21:40 ` Vivek Goyal
2011-08-16 8:55 ` Wu Fengguang
2011-08-11 22:56 ` Peter Zijlstra
2011-08-12 2:43 ` Wu Fengguang
2011-08-12 3:18 ` Wu Fengguang
2011-08-12 5:45 ` Wu Fengguang
2011-08-12 9:45 ` Peter Zijlstra
2011-08-12 11:07 ` Wu Fengguang
2011-08-12 12:17 ` Peter Zijlstra
2011-08-12 9:47 ` Peter Zijlstra
2011-08-12 11:11 ` Wu Fengguang
2011-08-12 12:54 ` Peter Zijlstra
2011-08-12 12:59 ` Wu Fengguang
2011-08-12 13:08 ` Peter Zijlstra
2011-08-12 13:04 ` Peter Zijlstra
2011-08-12 14:20 ` Wu Fengguang
2011-08-22 15:38 ` Peter Zijlstra
2011-08-23 3:40 ` Wu Fengguang
2011-08-23 10:01 ` Peter Zijlstra
2011-08-23 14:15 ` Wu Fengguang
2011-08-23 17:47 ` Vivek Goyal
2011-08-24 0:12 ` Wu Fengguang
2011-08-24 16:12 ` Peter Zijlstra
2011-08-26 0:18 ` Wu Fengguang
2011-08-26 9:04 ` Peter Zijlstra
2011-08-26 10:04 ` Wu Fengguang
2011-08-26 10:42 ` Peter Zijlstra
2011-08-26 10:52 ` Wu Fengguang
2011-08-26 11:26 ` Wu Fengguang
2011-08-26 12:11 ` Peter Zijlstra
2011-08-26 12:20 ` Wu Fengguang
2011-08-26 13:13 ` Wu Fengguang
2011-08-26 13:18 ` Peter Zijlstra
2011-08-26 13:24 ` Wu Fengguang
2011-08-24 18:00 ` Vivek Goyal
2011-08-25 3:19 ` Wu Fengguang
2011-08-25 22:20 ` Vivek Goyal
2011-08-26 1:56 ` Wu Fengguang
2011-08-26 8:56 ` Peter Zijlstra
2011-08-26 9:53 ` Wu Fengguang
2011-08-29 13:12 ` Peter Zijlstra
2011-08-29 13:37 ` Wu Fengguang
2011-09-02 12:16 ` Peter Zijlstra
2011-09-06 12:40 ` Peter Zijlstra
2011-08-24 15:57 ` Peter Zijlstra
2011-08-25 5:30 ` Wu Fengguang
2011-08-23 14:36 ` Vivek Goyal
2011-08-09 2:08 ` Vivek Goyal
2011-08-16 8:59 ` Wu Fengguang
2011-08-06 8:44 ` Wu Fengguang [this message]
2011-08-09 14:54 ` [PATCH 3/5] writeback: dirty rate control Vivek Goyal
2011-08-11 3:42 ` Wu Fengguang
2011-08-09 14:57 ` Peter Zijlstra
2011-08-10 11:07 ` Wu Fengguang
2011-08-10 16:17 ` Peter Zijlstra
2011-08-15 14:08 ` Wu Fengguang
2011-08-09 15:50 ` Vivek Goyal
2011-08-09 16:16 ` Peter Zijlstra
2011-08-09 16:19 ` Peter Zijlstra
2011-08-10 14:07 ` Wu Fengguang
2011-08-10 14:00 ` Wu Fengguang
2011-08-10 17:10 ` Peter Zijlstra
2011-08-15 14:11 ` Wu Fengguang
2011-08-09 16:56 ` Peter Zijlstra
2011-08-10 14:10 ` Wu Fengguang
2011-08-09 17:02 ` Peter Zijlstra
2011-08-10 14:15 ` Wu Fengguang
2011-08-06 8:44 ` [PATCH 4/5] writeback: per task dirty rate limit Wu Fengguang
2011-08-06 14:35 ` Andrea Righi
2011-08-07 6:19 ` Wu Fengguang
2011-08-08 13:47 ` Peter Zijlstra
2011-08-08 14:21 ` Wu Fengguang
2011-08-08 23:32 ` Wu Fengguang
2011-08-08 14:23 ` Wu Fengguang
2011-08-08 14:26 ` Peter Zijlstra
2011-08-08 22:38 ` Wu Fengguang
2011-08-13 16:28 ` Andrea Righi
2011-08-15 14:21 ` Wu Fengguang
2011-08-15 14:26 ` Andrea Righi
2011-08-09 17:46 ` Vivek Goyal
2011-08-10 3:29 ` Wu Fengguang
2011-08-10 18:18 ` Vivek Goyal
2011-08-11 0:55 ` Wu Fengguang
2011-08-09 18:35 ` Peter Zijlstra
2011-08-10 3:40 ` Wu Fengguang
2011-08-10 10:25 ` Peter Zijlstra
2011-08-10 11:13 ` Wu Fengguang
2011-08-06 8:44 ` [PATCH 5/5] writeback: IO-less balance_dirty_pages() Wu Fengguang
2011-08-06 14:48 ` Andrea Righi
2011-08-07 6:44 ` Wu Fengguang
2011-08-06 16:46 ` Andrea Righi
2011-08-07 7:18 ` Wu Fengguang
2011-08-07 9:50 ` Andrea Righi
2011-08-09 18:15 ` Vivek Goyal
2011-08-09 18:41 ` Peter Zijlstra
2011-08-10 3:22 ` Wu Fengguang
2011-08-10 3:26 ` Wu Fengguang
2011-08-09 19:16 ` Vivek Goyal
2011-08-10 4:33 ` Wu Fengguang
2011-08-09 2:01 ` [PATCH 0/5] IO-less dirty throttling v8 Vivek Goyal
2011-08-09 5:55 ` Dave Chinner
2011-08-09 14:04 ` Vivek Goyal
2011-08-10 7:41 ` Greg Thelen
2011-08-10 18:40 ` Vivek Goyal
2011-08-11 3:21 ` Wu Fengguang
2011-08-11 20:42 ` Vivek Goyal
2011-08-11 21:00 ` Vivek Goyal
-- strict thread matches above, loose matches on Subject: below --
2011-08-16 2:20 [PATCH 0/5] IO-less dirty throttling v9 Wu Fengguang
2011-08-16 2:20 ` [PATCH 3/5] writeback: dirty rate control Wu Fengguang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110806094526.878435971@intel.com \
--to=fengguang.wu@intel.com \
--cc=akpm@linux-foundation.org \
--cc=arighi@develer.com \
--cc=david@fromorbit.com \
--cc=gthelen@google.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).