From: Jan Kara <jack@suse.cz>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Christoph Hellwig <hch@lst.de>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 6/7] writeback: dirty ratelimit - think time compensation
Date: Wed, 7 Dec 2011 17:14:40 +0100 [thread overview]
Message-ID: <20111207161440.GL4622@quack.suse.cz> (raw)
In-Reply-To: <20111128140513.653000555@intel.com>
On Mon 28-11-11 21:53:44, Wu Fengguang wrote:
> Compensate the task's think time when computing the final pause time,
> so that ->dirty_ratelimit can be executed accurately.
>
> think time := time spend outside of balance_dirty_pages()
>
> In the rare case that the task slept longer than the 200ms period time
> (result in negative pause time), the sleep time will be compensated in
> the following periods, too, if it's less than 1 second.
>
> Accumulated errors are carefully avoided as long as the max pause area
> is not hitted.
>
> Pseudo code:
>
> period = pages_dirtied / task_ratelimit;
> think = jiffies - dirty_paused_when;
> pause = period - think;
>
> 1) normal case: period > think
>
> pause = period - think
> dirty_paused_when = jiffies + pause
> nr_dirtied = 0
>
> period time
> |===============================>|
> think time pause time
> |===============>|==============>|
> ------|----------------|---------------|------------------------
> dirty_paused_when jiffies
>
>
> 2) no pause case: period <= think
>
> don't pause; reduce future pause time by:
> dirty_paused_when += period
> nr_dirtied = 0
>
> period time
> |===============================>|
> think time
> |===================================================>|
> ------|--------------------------------+-------------------|----
> dirty_paused_when jiffies
>
> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Looks good. You can add:
Acked-by: Jan Kara <jack@suse.cz>
Honza
> ---
> include/linux/sched.h | 1
> include/trace/events/writeback.h | 14 ++++++++---
> kernel/fork.c | 1
> mm/page-writeback.c | 36 +++++++++++++++++++++++++----
> 4 files changed, 45 insertions(+), 7 deletions(-)
>
> --- linux-next.orig/include/linux/sched.h 2011-11-17 20:12:22.000000000 +0800
> +++ linux-next/include/linux/sched.h 2011-11-17 20:12:35.000000000 +0800
> @@ -1527,6 +1527,7 @@ struct task_struct {
> */
> int nr_dirtied;
> int nr_dirtied_pause;
> + unsigned long dirty_paused_when; /* start of a write-and-pause period */
>
> #ifdef CONFIG_LATENCYTOP
> int latency_record_count;
> --- linux-next.orig/mm/page-writeback.c 2011-11-17 20:12:22.000000000 +0800
> +++ linux-next/mm/page-writeback.c 2011-11-17 20:12:38.000000000 +0800
> @@ -1010,6 +1010,7 @@ static void balance_dirty_pages(struct a
> unsigned long background_thresh;
> unsigned long dirty_thresh;
> unsigned long bdi_thresh;
> + long period;
> long pause = 0;
> long uninitialized_var(max_pause);
> bool dirty_exceeded = false;
> @@ -1020,6 +1021,8 @@ static void balance_dirty_pages(struct a
> unsigned long start_time = jiffies;
>
> for (;;) {
> + unsigned long now = jiffies;
> +
> /*
> * Unstable writes are a feature of certain networked
> * filesystems (i.e. NFS) in which data may have been
> @@ -1039,8 +1042,11 @@ static void balance_dirty_pages(struct a
> */
> freerun = dirty_freerun_ceiling(dirty_thresh,
> background_thresh);
> - if (nr_dirty <= freerun)
> + if (nr_dirty <= freerun) {
> + current->dirty_paused_when = now;
> + current->nr_dirtied = 0;
> break;
> + }
>
> if (unlikely(!writeback_in_progress(bdi)))
> bdi_start_background_writeback(bdi);
> @@ -1098,10 +1104,21 @@ static void balance_dirty_pages(struct a
> task_ratelimit = ((u64)dirty_ratelimit * pos_ratio) >>
> RATELIMIT_CALC_SHIFT;
> if (unlikely(task_ratelimit == 0)) {
> + period = max_pause;
> pause = max_pause;
> goto pause;
> }
> - pause = HZ * pages_dirtied / task_ratelimit;
> + period = HZ * pages_dirtied / task_ratelimit;
> + pause = period;
> + if (current->dirty_paused_when)
> + pause -= now - current->dirty_paused_when;
> + /*
> + * For less than 1s think time (ext3/4 may block the dirtier
> + * for up to 800ms from time to time on 1-HDD; so does xfs,
> + * however at much less frequency), try to compensate it in
> + * future periods by updating the virtual time; otherwise just
> + * do a reset, as it may be a light dirtier.
> + */
> if (unlikely(pause <= 0)) {
> trace_balance_dirty_pages(bdi,
> dirty_thresh,
> @@ -1112,8 +1129,16 @@ static void balance_dirty_pages(struct a
> dirty_ratelimit,
> task_ratelimit,
> pages_dirtied,
> + period,
> pause,
> start_time);
> + if (pause < -HZ) {
> + current->dirty_paused_when = now;
> + current->nr_dirtied = 0;
> + } else if (period) {
> + current->dirty_paused_when += period;
> + current->nr_dirtied = 0;
> + }
> pause = 1; /* avoid resetting nr_dirtied_pause below */
> break;
> }
> @@ -1129,11 +1154,15 @@ pause:
> dirty_ratelimit,
> task_ratelimit,
> pages_dirtied,
> + period,
> pause,
> start_time);
> __set_current_state(TASK_KILLABLE);
> io_schedule_timeout(pause);
>
> + current->dirty_paused_when = now + pause;
> + current->nr_dirtied = 0;
> +
> /*
> * This is typically equal to (nr_dirty < dirty_thresh) and can
> * also keep "1000+ dd on a slow USB stick" under control.
> @@ -1148,11 +1177,10 @@ pause:
> if (!dirty_exceeded && bdi->dirty_exceeded)
> bdi->dirty_exceeded = 0;
>
> - current->nr_dirtied = 0;
> if (pause == 0) { /* in freerun area */
> current->nr_dirtied_pause =
> dirty_poll_interval(nr_dirty, dirty_thresh);
> - } else if (pause <= max_pause / 4 &&
> + } else if (period <= max_pause / 4 &&
> pages_dirtied >= current->nr_dirtied_pause) {
> current->nr_dirtied_pause = clamp_val(
> dirty_ratelimit * (max_pause / 2) / HZ,
> --- linux-next.orig/kernel/fork.c 2011-11-17 20:12:23.000000000 +0800
> +++ linux-next/kernel/fork.c 2011-11-17 20:12:35.000000000 +0800
> @@ -1296,6 +1296,7 @@ static struct task_struct *copy_process(
>
> p->nr_dirtied = 0;
> p->nr_dirtied_pause = 128 >> (PAGE_SHIFT - 10);
> + p->dirty_paused_when = 0;
>
> /*
> * Ok, make it visible to the rest of the system.
> --- linux-next.orig/include/trace/events/writeback.h 2011-11-17 19:13:41.000000000 +0800
> +++ linux-next/include/trace/events/writeback.h 2011-11-17 20:12:35.000000000 +0800
> @@ -289,12 +289,13 @@ TRACE_EVENT(balance_dirty_pages,
> unsigned long dirty_ratelimit,
> unsigned long task_ratelimit,
> unsigned long dirtied,
> + unsigned long period,
> long pause,
> unsigned long start_time),
>
> TP_ARGS(bdi, thresh, bg_thresh, dirty, bdi_thresh, bdi_dirty,
> dirty_ratelimit, task_ratelimit,
> - dirtied, pause, start_time),
> + dirtied, period, pause, start_time),
>
> TP_STRUCT__entry(
> __array( char, bdi, 32)
> @@ -309,6 +310,8 @@ TRACE_EVENT(balance_dirty_pages,
> __field(unsigned int, dirtied_pause)
> __field(unsigned long, paused)
> __field( long, pause)
> + __field(unsigned long, period)
> + __field( long, think)
> ),
>
> TP_fast_assign(
> @@ -325,6 +328,9 @@ TRACE_EVENT(balance_dirty_pages,
> __entry->task_ratelimit = KBps(task_ratelimit);
> __entry->dirtied = dirtied;
> __entry->dirtied_pause = current->nr_dirtied_pause;
> + __entry->think = current->dirty_paused_when == 0 ? 0 :
> + (long)(jiffies - current->dirty_paused_when) * 1000/HZ;
> + __entry->period = period * 1000 / HZ;
> __entry->pause = pause * 1000 / HZ;
> __entry->paused = (jiffies - start_time) * 1000 / HZ;
> ),
> @@ -335,7 +341,7 @@ TRACE_EVENT(balance_dirty_pages,
> "bdi_setpoint=%lu bdi_dirty=%lu "
> "dirty_ratelimit=%lu task_ratelimit=%lu "
> "dirtied=%u dirtied_pause=%u "
> - "paused=%lu pause=%ld",
> + "paused=%lu pause=%ld period=%lu think=%ld",
> __entry->bdi,
> __entry->limit,
> __entry->setpoint,
> @@ -347,7 +353,9 @@ TRACE_EVENT(balance_dirty_pages,
> __entry->dirtied,
> __entry->dirtied_pause,
> __entry->paused, /* ms */
> - __entry->pause /* ms */
> + __entry->pause, /* ms */
> + __entry->period, /* ms */
> + __entry->think /* ms */
> )
> );
>
>
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
next prev parent reply other threads:[~2011-12-07 16:14 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-28 13:53 [PATCH 0/7] dirty throttling bits for 3.3 (v2) Wu Fengguang
2011-11-28 13:53 ` [PATCH 1/7] writeback: balanced_rate cannot exceed write bandwidth Wu Fengguang
2011-12-07 10:21 ` Jan Kara
2011-11-28 13:53 ` [PATCH 2/7] writeback: charge leaked page dirties to active tasks Wu Fengguang
2011-12-07 10:23 ` Jan Kara
2011-11-28 13:53 ` [PATCH 3/7] writeback: fix dirtied pages accounting on sub-page writes Wu Fengguang
2011-12-07 10:53 ` Jan Kara
2011-12-07 12:08 ` Wu Fengguang
2011-12-07 16:07 ` Jan Kara
2011-12-08 2:44 ` Wu Fengguang
2011-11-28 13:53 ` [PATCH 4/7] writeback: fix dirtied pages accounting on redirty Wu Fengguang
2011-12-07 16:09 ` Jan Kara
2011-11-28 13:53 ` [PATCH 5/7] btrfs: fix dirtied pages accounting on sub-page writes Wu Fengguang
2011-11-28 14:16 ` Wu Fengguang
2011-11-28 13:53 ` [PATCH 6/7] writeback: dirty ratelimit - think time compensation Wu Fengguang
2011-12-07 16:14 ` Jan Kara [this message]
2011-11-28 13:53 ` [PATCH 7/7] writeback: comment on the bdi dirty threshold Wu Fengguang
2011-12-07 10:57 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20111207161440.GL4622@quack.suse.cz \
--to=jack@suse.cz \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=fengguang.wu@intel.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.