linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: <linux-fsdevel@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrea Righi <arighi@develer.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Subject: [PATCH 2/5] writeback: dirty position control
Date: Sat, 06 Aug 2011 16:44:49 +0800	[thread overview]
Message-ID: <20110806094526.733282037@intel.com> (raw)
In-Reply-To: 20110806084447.388624428@intel.com

[-- Attachment #1: writeback-control-algorithms.patch --]
[-- Type: text/plain, Size: 7533 bytes --]

Old scheme is,
                                          |
                           free run area  |  throttle area
  ----------------------------------------+---------------------------->
                                    thresh^                  dirty pages

New scheme is,

  ^ task rate limit
  |
  |            *
  |             *
  |              *
  |[free run]      *      [smooth throttled]
  |                  *
  |                     *
  |                         *
  ..bdi->dirty_ratelimit..........*
  |                               .     *
  |                               .          *
  |                               .              *
  |                               .                 *
  |                               .                    *
  +-------------------------------.-----------------------*------------>
                          setpoint^                  limit^  dirty pages

For simplicity, only the global/bdi setpoint control lines are
implemented here, so the [*] curve is more straight than the ideal one
showed in the above figure.

bdi_position_ratio() provides a scale factor to bdi->dirty_ratelimit, so
that the resulted task rate limit can drive the dirty pages back to the
global/bdi setpoints.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |  143 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)

--- linux-next.orig/mm/page-writeback.c	2011-08-06 10:31:32.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-08-06 11:17:07.000000000 +0800
@@ -46,6 +46,8 @@
  */
 #define BANDWIDTH_INTERVAL	max(HZ/5, 1)
 
+#define BANDWIDTH_CALC_SHIFT	10
+
 /*
  * After a CPU has dirtied this many pages, balance_dirty_pages_ratelimited
  * will look to see if it needs to force writeback or throttling.
@@ -495,6 +497,147 @@ unsigned long bdi_dirty_limit(struct bac
 	return bdi_dirty;
 }
 
+/*
+ * Dirty position control.
+ *
+ * (o) global/bdi setpoints
+ *
+ *  When the number of dirty pages go higher/lower than the setpoint, the dirty
+ *  position ratio (and hence dirty rate limit) will be decreased/increased to
+ *  bring the dirty pages back to the setpoint.
+ *
+ *                              setpoint
+ *                                 v
+ * |-------------------------------*-------------------------------|-----------|
+ * ^                               ^                               ^           ^
+ * (thresh + background_thresh)/2  thresh - thresh/DIRTY_SCOPE     thresh  limit
+ *
+ *                          bdi setpoint
+ *                                 v
+ * |-------------------------------*-------------------------------------------|
+ * ^                               ^                                           ^
+ * 0                               bdi_thresh - bdi_thresh/DIRTY_SCOPE     limit
+ *
+ * (o) pseudo code
+ *
+ *     pos_ratio = 1 << BANDWIDTH_CALC_SHIFT
+ *
+ *     if (dirty < thresh) scale up   pos_ratio
+ *     if (dirty > thresh) scale down pos_ratio
+ *
+ *     if (bdi_dirty < bdi_thresh) scale up   pos_ratio
+ *     if (bdi_dirty > bdi_thresh) scale down pos_ratio
+ *
+ * (o) global/bdi control lines
+ *
+ * Based on the number of dirty pages (the X), pos_ratio (the Y) is scaled by
+ * several control lines in turn.
+ *
+ * The control lines for the global/bdi setpoints both stretch up to @limit.
+ * If any control line drops below Y=0 before reaching @limit, an auxiliary
+ * line will be setup to connect them. The below figure illustrates the main
+ * bdi control line with an auxiliary line extending it to @limit.
+ *
+ * This allows smoothly throttling bdi_dirty down to normal if it starts high
+ * in situations like
+ * - start writing to a slow SD card and a fast disk at the same time. The SD
+ *   card's bdi_dirty may rush to 5 times higher than bdi setpoint.
+ * - the bdi dirty thresh goes down quickly due to change of JBOD workload
+ *
+ *   o
+ *     o
+ *       o                                      [o] main control line
+ *         o                                    [*] auxiliary control line
+ *           o
+ *             o
+ *               o
+ *                 o
+ *                   o
+ *                     o
+ *                       o--------------------- balance point, bw scale = 1
+ *                       | o
+ *                       |   o
+ *                       |     o
+ *                       |       o
+ *                       |         o
+ *                       |           o
+ *                       |             o------- connect point, bw scale = 1/2
+ *                       |               .*
+ *                       |                 .   *
+ *                       |                   .      *
+ *                       |                     .         *
+ *                       |                       .           *
+ *                       |                         .              *
+ *                       |                           .                 *
+ *  [--------------------+-----------------------------.--------------------*]
+ *  0                 bdi setpoint                 bdi origin            limit
+ *
+ * The bdi control line: if (origin < limit), an auxiliary control line (*)
+ * will be setup to extend the main control line (o) to @limit.
+ */
+static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
+					unsigned long thresh,
+					unsigned long dirty,
+					unsigned long bdi_thresh,
+					unsigned long bdi_dirty)
+{
+	unsigned long limit = hard_dirty_limit(thresh);
+	unsigned long origin;
+	unsigned long goal;
+	unsigned long long span;
+	unsigned long long pos_ratio;	/* for scaling up/down the rate limit */
+
+	if (unlikely(dirty >= limit))
+		return 0;
+
+	/*
+	 * global setpoint
+	 */
+	goal = thresh - thresh / DIRTY_SCOPE;
+	origin = 4 * thresh;
+
+	if (unlikely(origin < limit && dirty > (goal + origin) / 2)) {
+		origin = limit;			/* auxiliary control line */
+		goal = (goal + origin) / 2;
+		pos_ratio >>= 1;
+	}
+	pos_ratio = origin - dirty;
+	pos_ratio <<= BANDWIDTH_CALC_SHIFT;
+	do_div(pos_ratio, origin - goal + 1);
+
+	/*
+	 * bdi setpoint
+	 */
+	if (unlikely(bdi_thresh > thresh))
+		bdi_thresh = thresh;
+	goal = bdi_thresh - bdi_thresh / DIRTY_SCOPE;
+	/*
+	 * Use span=(4*bw) in single disk case and transit to bdi_thresh in
+	 * JBOD case.  For JBOD, bdi_thresh could fluctuate up to its own size.
+	 * Otherwise the bdi write bandwidth is good for limiting the floating
+	 * area, which makes the bdi control line a good backup when the global
+	 * control line is too flat/weak in large memory systems.
+	 */
+	span = (u64) bdi_thresh * (thresh - bdi_thresh) +
+		(4 * bdi->avg_write_bandwidth) * bdi_thresh;
+	do_div(span, thresh + 1);
+	origin = goal + 2 * span;
+
+	if (unlikely(bdi_dirty > goal + span)) {
+		if (bdi_dirty > limit)
+			return 0;
+		if (origin < limit) {
+			origin = limit;		/* auxiliary control line */
+			goal += span;
+			pos_ratio >>= 1;
+		}
+	}
+	pos_ratio *= origin - bdi_dirty;
+	do_div(pos_ratio, origin - goal + 1);
+
+	return pos_ratio;
+}
+
 static void bdi_update_write_bandwidth(struct backing_dev_info *bdi,
 				       unsigned long elapsed,
 				       unsigned long written)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-08-06  8:44 UTC|newest]

Thread overview: 136+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-06  8:44 [PATCH 0/5] IO-less dirty throttling v8 Wu Fengguang
2011-08-06  8:44 ` [PATCH 1/5] writeback: account per-bdi accumulated dirtied pages Wu Fengguang
2011-08-06  8:44 ` Wu Fengguang [this message]
2011-08-08 13:46   ` [PATCH 2/5] writeback: dirty position control Peter Zijlstra
2011-08-08 14:11     ` Wu Fengguang
2011-08-08 14:31       ` Peter Zijlstra
2011-08-08 22:47         ` Wu Fengguang
2011-08-09  9:31           ` Peter Zijlstra
2011-08-10 12:28             ` Wu Fengguang
2011-08-08 14:41       ` Peter Zijlstra
2011-08-08 23:05         ` Wu Fengguang
2011-08-09 10:32           ` Peter Zijlstra
2011-08-09 17:20           ` Peter Zijlstra
2011-08-10 22:34             ` Jan Kara
2011-08-11  2:29               ` Wu Fengguang
2011-08-11 11:14                 ` Jan Kara
2011-08-16  8:35                   ` Wu Fengguang
2011-08-12 13:19             ` Wu Fengguang
2011-08-10 21:40           ` Vivek Goyal
2011-08-16  8:55             ` Wu Fengguang
2011-08-11 22:56           ` Peter Zijlstra
2011-08-12  2:43             ` Wu Fengguang
2011-08-12  3:18               ` Wu Fengguang
2011-08-12  5:45               ` Wu Fengguang
2011-08-12  9:45                 ` Peter Zijlstra
2011-08-12 11:07                   ` Wu Fengguang
2011-08-12 12:17                     ` Peter Zijlstra
2011-08-12  9:47               ` Peter Zijlstra
2011-08-12 11:11                 ` Wu Fengguang
2011-08-12 12:54           ` Peter Zijlstra
2011-08-12 12:59             ` Wu Fengguang
2011-08-12 13:08               ` Peter Zijlstra
2011-08-12 13:04           ` Peter Zijlstra
2011-08-12 14:20             ` Wu Fengguang
2011-08-22 15:38               ` Peter Zijlstra
2011-08-23  3:40                 ` Wu Fengguang
2011-08-23 10:01                   ` Peter Zijlstra
2011-08-23 14:15                     ` Wu Fengguang
2011-08-23 17:47                       ` Vivek Goyal
2011-08-24  0:12                         ` Wu Fengguang
2011-08-24 16:12                           ` Peter Zijlstra
2011-08-26  0:18                             ` Wu Fengguang
2011-08-26  9:04                               ` Peter Zijlstra
2011-08-26 10:04                                 ` Wu Fengguang
2011-08-26 10:42                                   ` Peter Zijlstra
2011-08-26 10:52                                     ` Wu Fengguang
2011-08-26 11:26                                   ` Wu Fengguang
2011-08-26 12:11                                     ` Peter Zijlstra
2011-08-26 12:20                                       ` Wu Fengguang
2011-08-26 13:13                                         ` Wu Fengguang
2011-08-26 13:18                                           ` Peter Zijlstra
2011-08-26 13:24                                             ` Wu Fengguang
2011-08-24 18:00                           ` Vivek Goyal
2011-08-25  3:19                             ` Wu Fengguang
2011-08-25 22:20                               ` Vivek Goyal
2011-08-26  1:56                                 ` Wu Fengguang
2011-08-26  8:56                                   ` Peter Zijlstra
2011-08-26  9:53                                     ` Wu Fengguang
2011-08-29 13:12                             ` Peter Zijlstra
2011-08-29 13:37                               ` Wu Fengguang
2011-09-02 12:16                                 ` Peter Zijlstra
2011-09-06 12:40                                 ` Peter Zijlstra
2011-08-24 15:57                       ` Peter Zijlstra
2011-08-25  5:30                         ` Wu Fengguang
2011-08-23 14:36                     ` Vivek Goyal
2011-08-09  2:08   ` Vivek Goyal
2011-08-16  8:59     ` Wu Fengguang
2011-08-06  8:44 ` [PATCH 3/5] writeback: dirty rate control Wu Fengguang
2011-08-09 14:54   ` Vivek Goyal
2011-08-11  3:42     ` Wu Fengguang
2011-08-09 14:57   ` Peter Zijlstra
2011-08-10 11:07     ` Wu Fengguang
2011-08-10 16:17       ` Peter Zijlstra
2011-08-15 14:08         ` Wu Fengguang
2011-08-09 15:50   ` Vivek Goyal
2011-08-09 16:16     ` Peter Zijlstra
2011-08-09 16:19       ` Peter Zijlstra
2011-08-10 14:07         ` Wu Fengguang
2011-08-10 14:00       ` Wu Fengguang
2011-08-10 17:10         ` Peter Zijlstra
2011-08-15 14:11           ` Wu Fengguang
2011-08-09 16:56   ` Peter Zijlstra
2011-08-10 14:10     ` Wu Fengguang
2011-08-09 17:02   ` Peter Zijlstra
2011-08-10 14:15     ` Wu Fengguang
2011-08-06  8:44 ` [PATCH 4/5] writeback: per task dirty rate limit Wu Fengguang
2011-08-06 14:35   ` Andrea Righi
2011-08-07  6:19     ` Wu Fengguang
2011-08-08 13:47   ` Peter Zijlstra
2011-08-08 14:21     ` Wu Fengguang
2011-08-08 23:32       ` Wu Fengguang
2011-08-08 14:23     ` Wu Fengguang
2011-08-08 14:26       ` Peter Zijlstra
2011-08-08 22:38         ` Wu Fengguang
2011-08-13 16:28       ` Andrea Righi
2011-08-15 14:21         ` Wu Fengguang
2011-08-15 14:26           ` Andrea Righi
2011-08-09 17:46   ` Vivek Goyal
2011-08-10  3:29     ` Wu Fengguang
2011-08-10 18:18       ` Vivek Goyal
2011-08-11  0:55         ` Wu Fengguang
2011-08-09 18:35   ` Peter Zijlstra
2011-08-10  3:40     ` Wu Fengguang
2011-08-10 10:25       ` Peter Zijlstra
2011-08-10 11:13         ` Wu Fengguang
2011-08-06  8:44 ` [PATCH 5/5] writeback: IO-less balance_dirty_pages() Wu Fengguang
2011-08-06 14:48   ` Andrea Righi
2011-08-07  6:44     ` Wu Fengguang
2011-08-06 16:46   ` Andrea Righi
2011-08-07  7:18     ` Wu Fengguang
2011-08-07  9:50       ` Andrea Righi
2011-08-09 18:15   ` Vivek Goyal
2011-08-09 18:41     ` Peter Zijlstra
2011-08-10  3:22       ` Wu Fengguang
2011-08-10  3:26     ` Wu Fengguang
2011-08-09 19:16   ` Vivek Goyal
2011-08-10  4:33     ` Wu Fengguang
2011-08-09  2:01 ` [PATCH 0/5] IO-less dirty throttling v8 Vivek Goyal
2011-08-09  5:55   ` Dave Chinner
2011-08-09 14:04     ` Vivek Goyal
2011-08-10  7:41       ` Greg Thelen
2011-08-10 18:40         ` Vivek Goyal
2011-08-11  3:21   ` Wu Fengguang
2011-08-11 20:42     ` Vivek Goyal
2011-08-11 21:00       ` Vivek Goyal
  -- strict thread matches above, loose matches on Subject: below --
2011-08-16  2:20 [PATCH 0/5] IO-less dirty throttling v9 Wu Fengguang
2011-08-16  2:20 ` [PATCH 2/5] writeback: dirty position control Wu Fengguang
2011-08-16 19:41   ` Jan Kara
2011-08-17 13:23     ` Wu Fengguang
2011-08-17 13:49       ` Wu Fengguang
2011-08-17 20:24       ` Jan Kara
2011-08-18  4:18         ` Wu Fengguang
2011-08-18  4:41           ` Wu Fengguang
2011-08-18 19:16           ` Jan Kara
2011-08-24  3:16         ` Wu Fengguang
2011-08-19  2:53   ` Vivek Goyal
2011-08-19  3:25     ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110806094526.733282037@intel.com \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).