All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/6] writeback: avoid unnecessary calculation of bdi dirty thresholds
Date: Thu, 5 Aug 2010 00:41:59 +0800	[thread overview]
Message-ID: <20100804164159.GA22189@localhost> (raw)
In-Reply-To: <1280847822.1923.597.camel@laptop>

On Tue, Aug 03, 2010 at 11:03:42PM +0800, Peter Zijlstra wrote:
> On Sun, 2010-07-11 at 10:06 +0800, Wu Fengguang wrote:
> > plain text document attachment (writeback-less-bdi-calc.patch)
> > Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(),
> > so that the latter can be avoided when under global dirty background
> > threshold (which is the normal state for most systems).
> 
> The patch looks OK, although esp with the proposed comments in the
> follow up email, bdi_dirty_limit() gets a bit confusing wrt to how and
> what the limit is.
> 
> Maybe its clearer to not call task_dirty_limit() from bdi_dirty_limit(),
> that way the comment can focus on the device write request completion
> proportion thing.

Done, thanks.

> > +unsigned long bdi_dirty_limit(struct backing_dev_info *bdi,
> > +			       unsigned long dirty)
> > +{
> > +	u64 bdi_dirty;
> > +	long numerator, denominator;
> >  
> > +	/*
> > +	 * Calculate this BDI's share of the dirty ratio.
> > +	 */
> > +	bdi_writeout_fraction(bdi, &numerator, &denominator);
> >  
> > +	bdi_dirty = (dirty * (100 - bdi_min_ratio)) / 100;
> > +	bdi_dirty *= numerator;
> > +	do_div(bdi_dirty, denominator);
> >  
> > +	bdi_dirty += (dirty * bdi->min_ratio) / 100;
> > +	if (bdi_dirty > (dirty * bdi->max_ratio) / 100)
> > +		bdi_dirty = dirty * bdi->max_ratio / 100;
> > +
>   +       return bdi_dirty;
> >  }
> 
> And then add the call to task_dirty_limit() here:

Done. I omitted adding task_dirty_limit() to the bdi_dirty_limit()
inside bdi_debug_stats_show() -- looks unnecessary there.

> > +++ linux-next/mm/backing-dev.c	2010-07-11 08:53:44.000000000 +0800
> > @@ -83,7 +83,8 @@ static int bdi_debug_stats_show(struct s
> >  		nr_more_io++;
> >  	spin_unlock(&inode_lock);
> >  
> > -	get_dirty_limits(&background_thresh, &dirty_thresh, &bdi_thresh, bdi);
> > +	global_dirty_limits(&background_thresh, &dirty_thresh);
> > +	bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh);
>   +       bdi_thresh = task_dirty_limit(current, bdi_thresh);
> 
> And add a comment to task_dirty_limit() as well, explaining its reason
> for existence (protecting light/slow dirtying tasks from heavier/fast
> ones).

Comments updated as below. Any suggestions/corrections?

Thanks,
Fengguang

Subject: writeback: add comment to the dirty limits functions
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Thu Jul 15 09:54:25 CST 2010

Document global_dirty_limits(), bdi_dirty_limit() and task_dirty_limit().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |   31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

--- linux-next.orig/mm/page-writeback.c	2010-08-03 23:14:19.000000000 +0800
+++ linux-next/mm/page-writeback.c	2010-08-05 00:37:17.000000000 +0800
@@ -261,11 +261,18 @@ static inline void task_dirties_fraction
 }
 
 /*
- * scale the dirty limit
+ * task_dirty_limit - scale down dirty throttling threshold for one task
  *
  * task specific dirty limit:
  *
  *   dirty -= (dirty/8) * p_{t}
+ *
+ * To protect light/slow dirtying tasks from heavier/fast ones, we start
+ * throttling individual tasks before reaching the bdi dirty limit.
+ * Relatively low thresholds will be allocated to heavy dirtiers. So when
+ * dirty pages grow large, heavy dirtiers will be throttled first, which will
+ * effectively curb the growth of dirty pages. Light dirtiers with high enough
+ * dirty threshold may never get throttled.
  */
 static unsigned long task_dirty_limit(struct task_struct *tsk,
 				       unsigned long bdi_dirty)
@@ -390,6 +397,15 @@ unsigned long determine_dirtyable_memory
 	return x + 1;	/* Ensure that we never return 0 */
 }
 
+/**
+ * global_dirty_limits - background-writeback and dirty-throttling thresholds
+ *
+ * Calculate the dirty thresholds based on sysctl parameters
+ * - vm.dirty_background_ratio  or  vm.dirty_background_bytes
+ * - vm.dirty_ratio             or  vm.dirty_bytes
+ * The dirty limits will be lifted by 1/4 for PF_LESS_THROTTLE (ie. nfsd) and
+ * runtime tasks.
+ */
 void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty)
 {
 	unsigned long background;
@@ -424,8 +440,17 @@ void global_dirty_limits(unsigned long *
 	*pdirty = dirty;
 }
 
-unsigned long bdi_dirty_limit(struct backing_dev_info *bdi,
-			       unsigned long dirty)
+/**
+ * bdi_dirty_limit - @bdi's share of dirty throttling threshold
+ *
+ * Allocate high/low dirty limits to fast/slow devices, in order to prevent
+ * - starving fast devices
+ * - piling up dirty pages (that will take long time to sync) on slow devices
+ *
+ * The bdi's share of dirty limit will be adapting to its throughput and
+ * bounded by the bdi->min_ratio and/or bdi->max_ratio parameters, if set.
+ */
+unsigned long bdi_dirty_limit(struct backing_dev_info *bdi, unsigned long dirty)
 {
 	u64 bdi_dirty;
 	long numerator, denominator;

WARNING: multiple messages have this Message-ID (diff)
From: Wu Fengguang <fengguang.wu@intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/6] writeback: avoid unnecessary calculation of bdi dirty thresholds
Date: Thu, 5 Aug 2010 00:41:59 +0800	[thread overview]
Message-ID: <20100804164159.GA22189@localhost> (raw)
In-Reply-To: <1280847822.1923.597.camel@laptop>

On Tue, Aug 03, 2010 at 11:03:42PM +0800, Peter Zijlstra wrote:
> On Sun, 2010-07-11 at 10:06 +0800, Wu Fengguang wrote:
> > plain text document attachment (writeback-less-bdi-calc.patch)
> > Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(),
> > so that the latter can be avoided when under global dirty background
> > threshold (which is the normal state for most systems).
> 
> The patch looks OK, although esp with the proposed comments in the
> follow up email, bdi_dirty_limit() gets a bit confusing wrt to how and
> what the limit is.
> 
> Maybe its clearer to not call task_dirty_limit() from bdi_dirty_limit(),
> that way the comment can focus on the device write request completion
> proportion thing.

Done, thanks.

> > +unsigned long bdi_dirty_limit(struct backing_dev_info *bdi,
> > +			       unsigned long dirty)
> > +{
> > +	u64 bdi_dirty;
> > +	long numerator, denominator;
> >  
> > +	/*
> > +	 * Calculate this BDI's share of the dirty ratio.
> > +	 */
> > +	bdi_writeout_fraction(bdi, &numerator, &denominator);
> >  
> > +	bdi_dirty = (dirty * (100 - bdi_min_ratio)) / 100;
> > +	bdi_dirty *= numerator;
> > +	do_div(bdi_dirty, denominator);
> >  
> > +	bdi_dirty += (dirty * bdi->min_ratio) / 100;
> > +	if (bdi_dirty > (dirty * bdi->max_ratio) / 100)
> > +		bdi_dirty = dirty * bdi->max_ratio / 100;
> > +
>   +       return bdi_dirty;
> >  }
> 
> And then add the call to task_dirty_limit() here:

Done. I omitted adding task_dirty_limit() to the bdi_dirty_limit()
inside bdi_debug_stats_show() -- looks unnecessary there.

> > +++ linux-next/mm/backing-dev.c	2010-07-11 08:53:44.000000000 +0800
> > @@ -83,7 +83,8 @@ static int bdi_debug_stats_show(struct s
> >  		nr_more_io++;
> >  	spin_unlock(&inode_lock);
> >  
> > -	get_dirty_limits(&background_thresh, &dirty_thresh, &bdi_thresh, bdi);
> > +	global_dirty_limits(&background_thresh, &dirty_thresh);
> > +	bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh);
>   +       bdi_thresh = task_dirty_limit(current, bdi_thresh);
> 
> And add a comment to task_dirty_limit() as well, explaining its reason
> for existence (protecting light/slow dirtying tasks from heavier/fast
> ones).

Comments updated as below. Any suggestions/corrections?

Thanks,
Fengguang

Subject: writeback: add comment to the dirty limits functions
From: Wu Fengguang <fengguang.wu@intel.com>
Date: Thu Jul 15 09:54:25 CST 2010

Document global_dirty_limits(), bdi_dirty_limit() and task_dirty_limit().

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/page-writeback.c |   31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

--- linux-next.orig/mm/page-writeback.c	2010-08-03 23:14:19.000000000 +0800
+++ linux-next/mm/page-writeback.c	2010-08-05 00:37:17.000000000 +0800
@@ -261,11 +261,18 @@ static inline void task_dirties_fraction
 }
 
 /*
- * scale the dirty limit
+ * task_dirty_limit - scale down dirty throttling threshold for one task
  *
  * task specific dirty limit:
  *
  *   dirty -= (dirty/8) * p_{t}
+ *
+ * To protect light/slow dirtying tasks from heavier/fast ones, we start
+ * throttling individual tasks before reaching the bdi dirty limit.
+ * Relatively low thresholds will be allocated to heavy dirtiers. So when
+ * dirty pages grow large, heavy dirtiers will be throttled first, which will
+ * effectively curb the growth of dirty pages. Light dirtiers with high enough
+ * dirty threshold may never get throttled.
  */
 static unsigned long task_dirty_limit(struct task_struct *tsk,
 				       unsigned long bdi_dirty)
@@ -390,6 +397,15 @@ unsigned long determine_dirtyable_memory
 	return x + 1;	/* Ensure that we never return 0 */
 }
 
+/**
+ * global_dirty_limits - background-writeback and dirty-throttling thresholds
+ *
+ * Calculate the dirty thresholds based on sysctl parameters
+ * - vm.dirty_background_ratio  or  vm.dirty_background_bytes
+ * - vm.dirty_ratio             or  vm.dirty_bytes
+ * The dirty limits will be lifted by 1/4 for PF_LESS_THROTTLE (ie. nfsd) and
+ * runtime tasks.
+ */
 void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty)
 {
 	unsigned long background;
@@ -424,8 +440,17 @@ void global_dirty_limits(unsigned long *
 	*pdirty = dirty;
 }
 
-unsigned long bdi_dirty_limit(struct backing_dev_info *bdi,
-			       unsigned long dirty)
+/**
+ * bdi_dirty_limit - @bdi's share of dirty throttling threshold
+ *
+ * Allocate high/low dirty limits to fast/slow devices, in order to prevent
+ * - starving fast devices
+ * - piling up dirty pages (that will take long time to sync) on slow devices
+ *
+ * The bdi's share of dirty limit will be adapting to its throughput and
+ * bounded by the bdi->min_ratio and/or bdi->max_ratio parameters, if set.
+ */
+unsigned long bdi_dirty_limit(struct backing_dev_info *bdi, unsigned long dirty)
 {
 	u64 bdi_dirty;
 	long numerator, denominator;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-08-04 16:42 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-11  2:06 [PATCH 0/6] writeback cleanups and trivial fixes Wu Fengguang
2010-07-11  2:06 ` Wu Fengguang
2010-07-11  2:06 ` Wu Fengguang
2010-07-11  2:06 ` [PATCH 1/6] writeback: take account of NR_WRITEBACK_TEMP in balance_dirty_pages() Wu Fengguang
2010-07-11  2:06   ` Wu Fengguang
2010-07-12 21:52   ` Andrew Morton
2010-07-12 21:52     ` Andrew Morton
2010-07-12 21:52     ` Andrew Morton
2010-07-13  8:58     ` Miklos Szeredi
2010-07-13  8:58       ` Miklos Szeredi
2010-07-15 14:50       ` Wu Fengguang
2010-07-15 14:50         ` Wu Fengguang
2010-07-11  2:06 ` [PATCH 2/6] writeback: reduce calls to global_page_state " Wu Fengguang
2010-07-11  2:06   ` Wu Fengguang
2010-07-11  2:06   ` Wu Fengguang
2010-07-26 15:19   ` Jan Kara
2010-07-26 15:19     ` Jan Kara
2010-07-27  3:59     ` Wu Fengguang
2010-07-27  3:59       ` Wu Fengguang
2010-07-27  9:12       ` Jan Kara
2010-07-27  9:12         ` Jan Kara
2010-07-28  2:04         ` Wu Fengguang
2010-07-28  2:04           ` Wu Fengguang
2010-08-03 14:55   ` Peter Zijlstra
2010-08-03 14:55     ` Peter Zijlstra
2010-07-11  2:06 ` [PATCH 3/6] writeback: avoid unnecessary calculation of bdi dirty thresholds Wu Fengguang
2010-07-11  2:06   ` Wu Fengguang
2010-07-11  2:06   ` Wu Fengguang
2010-07-12 21:56   ` Andrew Morton
2010-07-12 21:56     ` Andrew Morton
2010-07-12 21:56     ` Andrew Morton
2010-07-15 14:55     ` Wu Fengguang
2010-07-15 14:55       ` Wu Fengguang
2010-07-19 21:35   ` Andrew Morton
2010-07-19 21:35     ` Andrew Morton
2010-07-19 21:35     ` Andrew Morton
2010-07-20  3:34     ` Wu Fengguang
2010-07-20  3:34       ` Wu Fengguang
2010-07-20  3:34       ` Wu Fengguang
2010-07-20  4:14       ` Andrew Morton
2010-07-20  4:14         ` Andrew Morton
2010-08-03 15:03   ` Peter Zijlstra
2010-08-03 15:03     ` Peter Zijlstra
2010-08-03 15:10     ` Wu Fengguang
2010-08-03 15:10       ` Wu Fengguang
2010-08-04 16:41     ` Wu Fengguang [this message]
2010-08-04 16:41       ` Wu Fengguang
2010-08-04 17:10       ` Peter Zijlstra
2010-08-04 17:10         ` Peter Zijlstra
2010-07-11  2:07 ` [PATCH 4/6] writeback: dont redirty tail an inode with dirty pages Wu Fengguang
2010-07-11  2:07   ` Wu Fengguang
2010-07-12  2:01   ` Dave Chinner
2010-07-12  2:01     ` Dave Chinner
2010-07-12 15:31     ` Wu Fengguang
2010-07-12 15:31       ` Wu Fengguang
2010-07-12 22:13       ` Andrew Morton
2010-07-12 22:13         ` Andrew Morton
2010-07-15 15:35         ` Wu Fengguang
2010-07-15 15:35           ` Wu Fengguang
2010-07-11  2:07 ` [PATCH 5/6] writeback: fix queue_io() ordering Wu Fengguang
2010-07-11  2:07   ` Wu Fengguang
2010-07-11  2:07   ` Wu Fengguang
2010-07-12 22:15   ` Andrew Morton
2010-07-12 22:15     ` Andrew Morton
2010-07-12 22:15     ` Andrew Morton
2010-07-11  2:07 ` [PATCH 6/6] writeback: merge for_kupdate and !for_kupdate cases Wu Fengguang
2010-07-11  2:07   ` Wu Fengguang
2010-07-12  2:08   ` Dave Chinner
2010-07-12  2:08     ` Dave Chinner
2010-07-12 15:52     ` Wu Fengguang
2010-07-12 15:52       ` Wu Fengguang
2010-07-12 22:06       ` Dave Chinner
2010-07-12 22:06         ` Dave Chinner
2010-07-12 22:22       ` Andrew Morton
2010-07-12 22:22         ` Andrew Morton
2010-08-05 16:01         ` Wu Fengguang
2010-08-05 16:01           ` Wu Fengguang
2010-07-11  2:44 ` [PATCH 0/6] writeback cleanups and trivial fixes Christoph Hellwig
2010-07-11  2:44   ` Christoph Hellwig
2010-07-11  2:50   ` Wu Fengguang
2010-07-11  2:50     ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100804164159.GA22189@localhost \
    --to=fengguang.wu@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.