linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] bdi write bandwidth estimation
@ 2011-06-12 15:18 Wu Fengguang
  2011-06-12 15:18 ` [PATCH 1/3] writeback: account per-bdi accumulated written pages Wu Fengguang
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Wu Fengguang @ 2011-06-12 15:18 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Dave Chinner, Christoph Hellwig, Andrew Morton,
	Wu Fengguang, LKML


Do bdi write bandwidth estimation in the flusher thread at 200ms intervals,
and in case the flusher is blocked syncing large files, the throttled dirtier
tasks will back it up.

To get an idea of the adaption speed and fluctuation range, here are
some real examples (check the red dots and the yellow line):

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v8/3G/xfs-1dd-4k-8p-2948M-20:10-3.0.0-rc2-next-20110610+-2011-06-12.21:51/balance_dirty_pages-bandwidth.png
http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v8/3G/ext3-1dd-4k-8p-2948M-20:10-3.0.0-rc2-next-20110610+-2011-06-12.22:02/balance_dirty_pages-bandwidth.png
http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v8/3G/ext4-1dd-4k-8p-2948M-20:10-3.0.0-rc2-next-20110610+-2011-06-12.21:57/balance_dirty_pages-bandwidth.png
http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v8/3G/btrfs-1dd-4k-8p-2948M-20:10-3.0.0-rc2-next-20110610+-2011-06-12.22:07/balance_dirty_pages-bandwidth.png

The old version outputs, for your reference:

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/4G-60%25/ext3-1dd-1M-8p-3911M-60%25-2.6.38-rc5-dt6+-2011-02-22-11-51/balance_dirty_pages-bandwidth.png
http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/4G-60%25/xfs-1dd-1M-8p-3911M-60%25-2.6.38-rc5-dt6+-2011-02-22-11-10/balance_dirty_pages-bandwidth.png
http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/NFS/nfs-1dd-1M-8p-2945M-20%25-2.6.38-rc6-dt6+-2011-02-22-21-09/balance_dirty_pages-bandwidth.png

This is merely the estimation part. The in-kernel users of the estimated
bandwidth will follow in the coming series.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] writeback: account per-bdi accumulated written pages
  2011-06-12 15:18 [PATCH 0/3] bdi write bandwidth estimation Wu Fengguang
@ 2011-06-12 15:18 ` Wu Fengguang
  2011-06-12 15:18 ` [PATCH 2/3] writeback: bdi write bandwidth estimation Wu Fengguang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Wu Fengguang @ 2011-06-12 15:18 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Michael Rubin, Wu Fengguang, Dave Chinner,
	Christoph Hellwig, Andrew Morton, LKML

[-- Attachment #1: writeback-bdi-written.patch --]
[-- Type: text/plain, Size: 2376 bytes --]

From: Jan Kara <jack@suse.cz>

Introduce the BDI_WRITTEN counter. It will be used for estimating the
bdi's write bandwidth.

Peter Zijlstra <a.p.zijlstra@chello.nl>:
Move BDI_WRITTEN accounting into __bdi_writeout_inc().
This will cover and fix fuse, which only calls bdi_writeout_inc().

CC: Michael Rubin <mrubin@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/backing-dev.h |    1 +
 mm/backing-dev.c            |   10 ++++++++--
 mm/page-writeback.c         |    1 +
 3 files changed, 10 insertions(+), 2 deletions(-)

--- linux-next.orig/include/linux/backing-dev.h	2011-06-12 13:29:02.000000000 +0800
+++ linux-next/include/linux/backing-dev.h	2011-06-12 20:48:47.000000000 +0800
@@ -40,6 +40,7 @@ typedef int (congested_fn)(void *, int);
 enum bdi_stat_item {
 	BDI_RECLAIMABLE,
 	BDI_WRITEBACK,
+	BDI_WRITTEN,
 	NR_BDI_STAT_ITEMS
 };
 
--- linux-next.orig/mm/backing-dev.c	2011-06-12 13:29:02.000000000 +0800
+++ linux-next/mm/backing-dev.c	2011-06-12 20:49:51.000000000 +0800
@@ -97,6 +97,7 @@ static int bdi_debug_stats_show(struct s
 		   "BdiDirtyThresh:   %8lu kB\n"
 		   "DirtyThresh:      %8lu kB\n"
 		   "BackgroundThresh: %8lu kB\n"
+		   "BdiWritten:       %8lu kB\n"
 		   "b_dirty:          %8lu\n"
 		   "b_io:             %8lu\n"
 		   "b_more_io:        %8lu\n"
@@ -104,8 +105,13 @@ static int bdi_debug_stats_show(struct s
 		   "state:            %8lx\n",
 		   (unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)),
 		   (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)),
-		   K(bdi_thresh), K(dirty_thresh),
-		   K(background_thresh), nr_dirty, nr_io, nr_more_io,
+		   K(bdi_thresh),
+		   K(dirty_thresh),
+		   K(background_thresh),
+		   (unsigned long) K(bdi_stat(bdi, BDI_WRITTEN)),
+		   nr_dirty,
+		   nr_io,
+		   nr_more_io,
 		   !list_empty(&bdi->bdi_list), bdi->state);
 #undef K
 
--- linux-next.orig/mm/page-writeback.c	2011-06-12 13:29:06.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-06-12 20:48:47.000000000 +0800
@@ -219,6 +219,7 @@ int dirty_bytes_handler(struct ctl_table
  */
 static inline void __bdi_writeout_inc(struct backing_dev_info *bdi)
 {
+	__inc_bdi_stat(bdi, BDI_WRITTEN);
 	__prop_inc_percpu_max(&vm_completions, &bdi->completions,
 			      bdi->max_prop_frac);
 }



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 2/3] writeback: bdi write bandwidth estimation
  2011-06-12 15:18 [PATCH 0/3] bdi write bandwidth estimation Wu Fengguang
  2011-06-12 15:18 ` [PATCH 1/3] writeback: account per-bdi accumulated written pages Wu Fengguang
@ 2011-06-12 15:18 ` Wu Fengguang
  2011-06-12 15:18 ` [PATCH 3/3] writeback: show bdi write bandwidth in debugfs Wu Fengguang
  2011-06-13 22:23 ` [PATCH 0/3] bdi write bandwidth estimation Andrew Morton
  3 siblings, 0 replies; 6+ messages in thread
From: Wu Fengguang @ 2011-06-12 15:18 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Li Shaohua, Peter Zijlstra, Wu Fengguang, Dave Chinner,
	Christoph Hellwig, Andrew Morton, LKML

[-- Attachment #1: writeback-write-bandwidth.patch --]
[-- Type: text/plain, Size: 7073 bytes --]

The estimation value will start from 100MB/s and adapt to the real
bandwidth in seconds.

It tries to update the bandwidth only when disk is fully utilized.
Any inactive period of more than one second will be skipped.

The estimation is not done purely in the flusher thread because
there is no guarantee for write_cache_pages() to return timely
to update the bandwidth.

The bdi->avg_write_bandwidth smoothing is very effective for filtering
out sudden spikes, however has the cost of possibly a little biased
towards low.

The overheads are low because the bdi bandwidth update only occurs
at >200ms intervals.

CC: Li Shaohua <shaohua.li@intel.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 fs/fs-writeback.c           |   13 +++++
 include/linux/backing-dev.h |    5 ++
 include/linux/writeback.h   |    3 +
 mm/backing-dev.c            |   12 +++++
 mm/page-writeback.c         |   81 ++++++++++++++++++++++++++++++++++
 5 files changed, 114 insertions(+)

--- linux-next.orig/fs/fs-writeback.c	2011-06-12 20:57:01.000000000 +0800
+++ linux-next/fs/fs-writeback.c	2011-06-12 20:59:33.000000000 +0800
@@ -629,6 +629,16 @@ static inline bool over_bground_thresh(v
 }
 
 /*
+ * Called under &wb->list_lock. If there are multiple wb per bdi,
+ * only the flusher working on the first wb should do it.
+ */
+static void wb_update_bandwidth(struct bdi_writeback *wb,
+				unsigned long start_time)
+{
+	__bdi_update_bandwidth(wb->bdi, start_time);
+}
+
+/*
  * Explicit flushing or periodic writeback of "old" data.
  *
  * Define "old": the first time one of an inode's pages is dirtied, we mark the
@@ -658,6 +668,7 @@ static long wb_writeback(struct bdi_writ
 	long wrote = 0;
 	long write_chunk = MAX_WRITEBACK_PAGES;
 	struct inode *inode;
+	unsigned long wb_start = jiffies;
 
 	if (!wbc.range_cyclic) {
 		wbc.range_start = 0;
@@ -727,6 +738,8 @@ static long wb_writeback(struct bdi_writ
 			__writeback_inodes_wb(wb, &wbc);
 		trace_wbc_writeback_written(&wbc, wb->bdi);
 
+		wb_update_bandwidth(wb, wb_start);
+
 		work->nr_pages -= write_chunk - wbc.nr_to_write;
 		wrote += write_chunk - wbc.nr_to_write;
 
--- linux-next.orig/include/linux/backing-dev.h	2011-06-12 20:57:03.000000000 +0800
+++ linux-next/include/linux/backing-dev.h	2011-06-12 20:59:32.000000000 +0800
@@ -73,6 +73,11 @@ struct backing_dev_info {
 
 	struct percpu_counter bdi_stat[NR_BDI_STAT_ITEMS];
 
+	unsigned long bw_time_stamp;
+	unsigned long written_stamp;
+	unsigned long write_bandwidth;
+	unsigned long avg_write_bandwidth;
+
 	struct prop_local_percpu completions;
 	int dirty_exceeded;
 
--- linux-next.orig/include/linux/writeback.h	2011-06-12 20:57:01.000000000 +0800
+++ linux-next/include/linux/writeback.h	2011-06-12 20:59:33.000000000 +0800
@@ -122,6 +122,9 @@ void global_dirty_limits(unsigned long *
 unsigned long bdi_dirty_limit(struct backing_dev_info *bdi,
 			       unsigned long dirty);
 
+void __bdi_update_bandwidth(struct backing_dev_info *bdi,
+			    unsigned long start_time);
+
 void page_writeback_init(void);
 void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
 					unsigned long nr_pages_dirtied);
--- linux-next.orig/mm/backing-dev.c	2011-06-12 20:57:03.000000000 +0800
+++ linux-next/mm/backing-dev.c	2011-06-12 21:06:11.000000000 +0800
@@ -649,6 +649,11 @@ static void bdi_wb_init(struct bdi_write
 	setup_timer(&wb->wakeup_timer, wakeup_timer_fn, (unsigned long)bdi);
 }
 
+/*
+ * Initial write bandwidth: 100 MB/s
+ */
+#define INIT_BW		(100 << (20 - PAGE_SHIFT))
+
 int bdi_init(struct backing_dev_info *bdi)
 {
 	int i, err;
@@ -671,6 +676,13 @@ int bdi_init(struct backing_dev_info *bd
 	}
 
 	bdi->dirty_exceeded = 0;
+
+	bdi->bw_time_stamp = jiffies;
+	bdi->written_stamp = 0;
+
+	bdi->write_bandwidth = INIT_BW;
+	bdi->avg_write_bandwidth = INIT_BW;
+
 	err = prop_local_init_percpu(&bdi->completions);
 
 	if (err) {
--- linux-next.orig/mm/page-writeback.c	2011-06-12 20:57:03.000000000 +0800
+++ linux-next/mm/page-writeback.c	2011-06-12 21:14:57.000000000 +0800
@@ -37,6 +37,11 @@
 #include <trace/events/writeback.h>
 
 /*
+ * Sleep at most 200ms at a time in balance_dirty_pages().
+ */
+#define MAX_PAUSE	max(HZ/5, 1)
+
+/*
  * After a CPU has dirtied this many pages, balance_dirty_pages_ratelimited
  * will look to see if it needs to force writeback or throttling.
  */
@@ -472,6 +477,79 @@ unsigned long bdi_dirty_limit(struct bac
 	return bdi_dirty;
 }
 
+static void bdi_update_write_bandwidth(struct backing_dev_info *bdi,
+				       unsigned long elapsed,
+				       unsigned long written)
+{
+	const unsigned long period = roundup_pow_of_two(3 * HZ);
+	unsigned long avg = bdi->avg_write_bandwidth;
+	unsigned long old = bdi->write_bandwidth;
+	unsigned long cur;
+	u64 bw;
+
+	bw = written - bdi->written_stamp;
+	bw *= HZ;
+	if (unlikely(elapsed > period / 2)) {
+		do_div(bw, elapsed);
+		elapsed = period / 2;
+		bw *= elapsed;
+	}
+	bw += (u64)bdi->write_bandwidth * (period - elapsed);
+	cur = bw >> ilog2(period);
+	bdi->write_bandwidth = cur;
+
+	/*
+	 * one more level of smoothing
+	 */
+	if (avg > old && old > cur)
+		avg -= (avg - old) >> 3;
+
+	if (avg < old && old < cur)
+		avg += (old - avg) >> 3;
+
+	bdi->avg_write_bandwidth = avg;
+}
+
+void __bdi_update_bandwidth(struct backing_dev_info *bdi,
+			    unsigned long start_time)
+{
+	unsigned long now = jiffies;
+	unsigned long elapsed = now - bdi->bw_time_stamp;
+	unsigned long written;
+
+	/*
+	 * rate-limit, only update once every 200ms.
+	 */
+	if (elapsed < MAX_PAUSE)
+		return;
+
+	written = percpu_counter_read(&bdi->bdi_stat[BDI_WRITTEN]);
+
+	/*
+	 * Skip quiet periods when disk bandwidth is under-utilized.
+	 * (at least 1s idle time between two flusher runs)
+	 */
+	if (elapsed > HZ && time_before(bdi->bw_time_stamp, start_time))
+		goto snapshot;
+
+	bdi_update_write_bandwidth(bdi, elapsed, written);
+
+snapshot:
+	bdi->written_stamp = written;
+	bdi->bw_time_stamp = now;
+}
+
+static void bdi_update_bandwidth(struct backing_dev_info *bdi,
+				 unsigned long start_time)
+{
+	if (jiffies - bdi->bw_time_stamp <= MAX_PAUSE + MAX_PAUSE / 10)
+		return;
+	if (spin_trylock(&bdi->wb.list_lock)) {
+		__bdi_update_bandwidth(bdi, start_time);
+		spin_unlock(&bdi->wb.list_lock);
+	}
+}
+
 /*
  * balance_dirty_pages() must be called by processes which are generating dirty
  * data.  It looks at the number of dirty pages in the machine and will force
@@ -491,6 +569,7 @@ static void balance_dirty_pages(struct a
 	unsigned long pause = 1;
 	bool dirty_exceeded = false;
 	struct backing_dev_info *bdi = mapping->backing_dev_info;
+	unsigned long start_time = jiffies;
 
 	for (;;) {
 		struct writeback_control wbc = {
@@ -552,6 +631,8 @@ static void balance_dirty_pages(struct a
 		if (!bdi->dirty_exceeded)
 			bdi->dirty_exceeded = 1;
 
+		bdi_update_bandwidth(bdi, start_time);
+
 		/* Note: nr_reclaimable denotes nr_dirty + nr_unstable.
 		 * Unstable writes are a feature of certain networked
 		 * filesystems (i.e. NFS) in which data may have been

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 3/3] writeback: show bdi write bandwidth in debugfs
  2011-06-12 15:18 [PATCH 0/3] bdi write bandwidth estimation Wu Fengguang
  2011-06-12 15:18 ` [PATCH 1/3] writeback: account per-bdi accumulated written pages Wu Fengguang
  2011-06-12 15:18 ` [PATCH 2/3] writeback: bdi write bandwidth estimation Wu Fengguang
@ 2011-06-12 15:18 ` Wu Fengguang
  2011-06-13 22:23 ` [PATCH 0/3] bdi write bandwidth estimation Andrew Morton
  3 siblings, 0 replies; 6+ messages in thread
From: Wu Fengguang @ 2011-06-12 15:18 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Jan Kara, Theodore Tso, Peter Zijlstra, Wu Fengguang,
	Dave Chinner, Christoph Hellwig, Andrew Morton, LKML

[-- Attachment #1: writeback-bandwidth-show.patch --]
[-- Type: text/plain, Size: 2003 bytes --]

Add a "BdiWriteBandwidth" entry and indent others in /debug/bdi/*/stats.

btw, increase digital field width to 10, for keeping the possibly
huge BdiWritten number aligned at least for desktop systems.

Impact: this could break user space tools if they are dumb enough to
depend on the number of white spaces.

CC: Theodore Ts'o <tytso@mit.edu>
CC: Jan Kara <jack@suse.cz>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/backing-dev.c |   24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

--- linux-next.orig/mm/backing-dev.c	2011-06-12 20:57:03.000000000 +0800
+++ linux-next/mm/backing-dev.c	2011-06-12 20:58:28.000000000 +0800
@@ -92,23 +92,25 @@ static int bdi_debug_stats_show(struct s
 
 #define K(x) ((x) << (PAGE_SHIFT - 10))
 	seq_printf(m,
-		   "BdiWriteback:     %8lu kB\n"
-		   "BdiReclaimable:   %8lu kB\n"
-		   "BdiDirtyThresh:   %8lu kB\n"
-		   "DirtyThresh:      %8lu kB\n"
-		   "BackgroundThresh: %8lu kB\n"
-		   "BdiWritten:       %8lu kB\n"
-		   "b_dirty:          %8lu\n"
-		   "b_io:             %8lu\n"
-		   "b_more_io:        %8lu\n"
-		   "bdi_list:         %8u\n"
-		   "state:            %8lx\n",
+		   "BdiWriteback:       %10lu kB\n"
+		   "BdiReclaimable:     %10lu kB\n"
+		   "BdiDirtyThresh:     %10lu kB\n"
+		   "DirtyThresh:        %10lu kB\n"
+		   "BackgroundThresh:   %10lu kB\n"
+		   "BdiWritten:         %10lu kB\n"
+		   "BdiWriteBandwidth:  %10lu kBps\n"
+		   "b_dirty:            %10lu\n"
+		   "b_io:               %10lu\n"
+		   "b_more_io:          %10lu\n"
+		   "bdi_list:           %10u\n"
+		   "state:              %10lx\n",
 		   (unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)),
 		   (unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)),
 		   K(bdi_thresh),
 		   K(dirty_thresh),
 		   K(background_thresh),
 		   (unsigned long) K(bdi_stat(bdi, BDI_WRITTEN)),
+		   (unsigned long) K(bdi->write_bandwidth),
 		   nr_dirty,
 		   nr_io,
 		   nr_more_io,

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] bdi write bandwidth estimation
  2011-06-12 15:18 [PATCH 0/3] bdi write bandwidth estimation Wu Fengguang
                   ` (2 preceding siblings ...)
  2011-06-12 15:18 ` [PATCH 3/3] writeback: show bdi write bandwidth in debugfs Wu Fengguang
@ 2011-06-13 22:23 ` Andrew Morton
  2011-06-14  3:45   ` Wu Fengguang
  3 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2011-06-13 22:23 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: linux-fsdevel, Jan Kara, Dave Chinner, Christoph Hellwig, LKML

On Sun, 12 Jun 2011 23:18:21 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> Do bdi write bandwidth estimation in the flusher thread at 200ms intervals,

stdrant: anything which is paced using "seconds" is basically always
wrong.  The bandwidth of storage systems varies by who-knows-how-many
orders of magnitude.  If 200ms is correct for one system then it is
vastly incorrect for another.

A more suitable clock for this estimate would be "per 200 requests",
for a block-based BDI.

Also of course the bandwidth of a particular BDI varies vastly
depending on workload.  For the purpose of this work, that's probably
a desirable thing.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/3] bdi write bandwidth estimation
  2011-06-13 22:23 ` [PATCH 0/3] bdi write bandwidth estimation Andrew Morton
@ 2011-06-14  3:45   ` Wu Fengguang
  0 siblings, 0 replies; 6+ messages in thread
From: Wu Fengguang @ 2011-06-14  3:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-fsdevel@vger.kernel.org, Jan Kara, Dave Chinner,
	Christoph Hellwig, LKML

On Tue, Jun 14, 2011 at 06:23:30AM +0800, Andrew Morton wrote:
> On Sun, 12 Jun 2011 23:18:21 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Do bdi write bandwidth estimation in the flusher thread at 200ms intervals,
> 
> stdrant: anything which is paced using "seconds" is basically always
> wrong.  The bandwidth of storage systems varies by who-knows-how-many
> orders of magnitude.  If 200ms is correct for one system then it is
> vastly incorrect for another.
> 
> A more suitable clock for this estimate would be "per 200 requests",
> for a block-based BDI.
> 
> Also of course the bandwidth of a particular BDI varies vastly
> depending on workload.  For the purpose of this work, that's probably
> a desirable thing.

It would be good to be able to get more timely estimation for fast
devices. However have to balance between "timely" and "fluctuations"..

The main problem is, IO completions may come in bursts. The NFS commit
can be as large as seconds worth of data. The XFS completions may be 
half second worth of data if we are going to increase the write chunk
size to half second worth of data.

Looking at the other filesystems, eg. ext4

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v8/3G/ext4-1dd-4k-8p-2948M-20:10-3.0.0-rc2-next-20110610+-2011-06-12.21:57/balance_dirty_pages-bandwidth.png

You'll notice fluctuations with the time period of around 5 seconds.

Here is another pattern with irregular periods of up to 20 seconds on SSD:

http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/1SSD-64G/ext4-1dd-1M-64p-64288M-20%25-2.6.38-rc6-dt6+-2011-03-01-16-19/balance_dirty_pages-bandwidth.png

That's why I'm not only doing the estimation at 200ms intervals, but
also averaging them over a period of 3 seconds and then go further to
do another level of smoothing (the avg_write_bandwidth).

Since it's a reasonable optimization for the filesystems to do IO
completions in batches, the time based interval would be suitable to
average out the bursts and being efficient enough for both fast/slow
storages.


Another important fact is: the estimation is carried out on every
200ms when the flusher thread is _already busy_.

In this way, it won't lead to pointless CPU wakeups at idle time.

The estimated bandwidth will be reflecting how fast the device can
writeout when fully utilized, so won't drop to 0 when it goes idle.
The value will remain constant at disk idle time. At busy write time,
if not considering fluctuations, it will also remain high unless be
knocked down by possible concurrent reads that take some disk time and
bandwidth away.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-06-14  3:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-06-12 15:18 [PATCH 0/3] bdi write bandwidth estimation Wu Fengguang
2011-06-12 15:18 ` [PATCH 1/3] writeback: account per-bdi accumulated written pages Wu Fengguang
2011-06-12 15:18 ` [PATCH 2/3] writeback: bdi write bandwidth estimation Wu Fengguang
2011-06-12 15:18 ` [PATCH 3/3] writeback: show bdi write bandwidth in debugfs Wu Fengguang
2011-06-13 22:23 ` [PATCH 0/3] bdi write bandwidth estimation Andrew Morton
2011-06-14  3:45   ` Wu Fengguang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).