linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Wu Fengguang <fengguang.wu@intel.com>
Cc: Jan Kara <jack@suse.cz>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>,
	Dave Chinner <david@fromorbit.com>,
	Greg Thelen <gthelen@google.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Andrea Righi <arighi@develer.com>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/5] writeback: dirty position control
Date: Wed, 17 Aug 2011 22:24:14 +0200	[thread overview]
Message-ID: <20110817202414.GK9959@quack.suse.cz> (raw)
In-Reply-To: <20110817132347.GA6628@localhost>

  Hi Fengguang,

On Wed 17-08-11 21:23:47, Wu Fengguang wrote:
> On Wed, Aug 17, 2011 at 03:41:12AM +0800, Jan Kara wrote:
> > > +static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
> > > +					unsigned long thresh,
> > > +					unsigned long bg_thresh,
> > > +					unsigned long dirty,
> > > +					unsigned long bdi_thresh,
> > > +					unsigned long bdi_dirty)
> > > +{
> > > +	unsigned long freerun = dirty_freerun_ceiling(thresh, bg_thresh);
> > > +	unsigned long limit = hard_dirty_limit(thresh);
> > > +	unsigned long x_intercept;
> > > +	unsigned long setpoint;		/* the target balance point */
> > > +	unsigned long span;
> > > +	long long pos_ratio;		/* for scaling up/down the rate limit */
> > > +	long x;
> > > +
> > > +	if (unlikely(dirty >= limit))
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * global setpoint
> > > +	 *
> > > +	 *                         setpoint - dirty 3
> > > +	 *        f(dirty) := 1 + (----------------)
> > > +	 *                         limit - setpoint
> > > +	 *
> > > +	 * it's a 3rd order polynomial that subjects to
> > > +	 *
> > > +	 * (1) f(freerun)  = 2.0 => rampup base_rate reasonably fast
> > > +	 * (2) f(setpoint) = 1.0 => the balance point
> > > +	 * (3) f(limit)    = 0   => the hard limit
> > > +	 * (4) df/dx       < 0	 => negative feedback control
                          ^^^ Strictly speaking this is <= 0

> > > +	 * (5) the closer to setpoint, the smaller |df/dx| (and the reverse)
> > > +	 *     => fast response on large errors; small oscillation near setpoint
> > > +	 */
> > > +	setpoint = (freerun + limit) / 2;
> > > +	x = div_s64((setpoint - dirty) << RATELIMIT_CALC_SHIFT,
> > > +		    limit - setpoint + 1);
> > > +	pos_ratio = x;
> > > +	pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> > > +	pos_ratio = pos_ratio * x >> RATELIMIT_CALC_SHIFT;
> > > +	pos_ratio += 1 << RATELIMIT_CALC_SHIFT;
> > > +
> > > +	/*
> > > +	 * bdi setpoint
  OK, so if I understand the code right, we now have basic pos_ratio based
on global situation. Now, in the following code, we might scale pos_ratio
further down, if bdi_dirty is too much over bdi's share, right? Do we also
want to scale pos_ratio up, if we are under bdi's share? If yes, do we
really want to do it even if global pos_ratio < 1 (i.e. we are over global
setpoint)?

Maybe we could update the comment with something like:
 * We have computed basic pos_ratio above based on global situation. If the
 * bdi is over its share of dirty pages, we want to scale pos_ratio further
 * down. That is done by the following mechanism:
and now describe how updating works.

> > > +	 *
> > > +	 *        f(dirty) := 1.0 + k * (dirty - setpoint)
                  ^^^^^^^ bdi_dirty?             ^^^ maybe I'd name it
bdi_setpoint to distinguish clearly from the global value.

> > > +	 *
> > > +	 * The main bdi control line is a linear function that subjects to
> > > +	 *
> > > +	 * (1) f(setpoint) = 1.0
> > > +	 * (2) k = - 1 / (8 * write_bw)  (in single bdi case)
> > > +	 *     or equally: x_intercept = setpoint + 8 * write_bw
> > > +	 *
> > > +	 * For single bdi case, the dirty pages are observed to fluctuate
> > > +	 * regularly within range
> > > +	 *        [setpoint - write_bw/2, setpoint + write_bw/2]
> > > +	 * for various filesystems, where (2) can yield in a reasonable 12.5%
> > > +	 * fluctuation range for pos_ratio.
> > > +	 *
> > > +	 * For JBOD case, bdi_thresh (not bdi_dirty!) could fluctuate up to its
> > > +	 * own size, so move the slope over accordingly.
> > > +	 */
> > > +	if (unlikely(bdi_thresh > thresh))
> > > +		bdi_thresh = thresh;
> > > +	/*
> > > +	 * scale global setpoint to bdi's:  setpoint *= bdi_thresh / thresh
> > > +	 */
> > > +	x = div_u64((u64)bdi_thresh << 16, thresh | 1);
> > > +	setpoint = setpoint * (u64)x >> 16;
> > > +	/*
> > > +	 * Use span=(4*write_bw) in single bdi case as indicated by
> > > +	 * (thresh - bdi_thresh ~= 0) and transit to bdi_thresh in JBOD case.
> > > +	 */
> > > +	span = div_u64((u64)bdi_thresh * (thresh - bdi_thresh) +
> > > +		       (u64)(4 * bdi->avg_write_bandwidth) * bdi_thresh,
> > > +		       thresh + 1);
> >   I think you can slightly simplify this to:
> > (thresh - bdi_thresh + 4 * bdi->avg_write_bandwidth) * (u64)x >> 16;
> 
> Good idea!
> 
> > > +	x_intercept = setpoint + 2 * span;
   ^^ BTW, why do you have 2*span here? It can result in x_intercept being
~3*bdi_thresh... So maybe you should use bdi_thresh/2 in the computation of
span?

> >   What if x_intercept >  bdi_thresh? Since 8*bdi->avg_write_bandwidth is
> > easily 500 MB, that happens quite often I imagine?
> 
> That's fine because I no longer target "bdi_thresh" as some limiting
> factor as the global "thresh". Due to it being unstable in small
> memory JBOD systems, which is the big and unique problem in JBOD.
  I see. Given the control mechanism below, I think we can try this idea
and see whether it makes problems in practice or not. But the fact that
bdi_thresh is no longer treated as limit should be noted in a changelog -
probably of the last patch (although that is already too long for my taste
so I'll look into how we could make it shorter so that average developer
has enough patience to read it ;).

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-08-17 20:24 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-16  2:20 [PATCH 0/5] IO-less dirty throttling v9 Wu Fengguang
2011-08-16  2:20 ` [PATCH 1/5] writeback: account per-bdi accumulated dirtied pages Wu Fengguang
2011-08-16  2:20 ` [PATCH 2/5] writeback: dirty position control Wu Fengguang
2011-08-16 19:41   ` Jan Kara
2011-08-17 13:23     ` Wu Fengguang
2011-08-17 13:49       ` Wu Fengguang
2011-08-17 20:24       ` Jan Kara [this message]
2011-08-18  4:18         ` Wu Fengguang
2011-08-18  4:41           ` Wu Fengguang
2011-08-18 19:16           ` Jan Kara
2011-08-24  3:16         ` Wu Fengguang
2011-08-19  2:53   ` Vivek Goyal
2011-08-19  3:25     ` Wu Fengguang
2011-08-16  2:20 ` [PATCH 3/5] writeback: dirty rate control Wu Fengguang
2011-08-16  2:20 ` [PATCH 4/5] writeback: per task dirty rate limit Wu Fengguang
2011-08-16  7:17   ` Andrea Righi
2011-08-16  7:22     ` Wu Fengguang
2011-08-16  2:20 ` [PATCH 5/5] writeback: IO-less balance_dirty_pages() Wu Fengguang
2011-08-19  2:06   ` Vivek Goyal
2011-08-19  2:54     ` Wu Fengguang
2011-08-19 19:00       ` Vivek Goyal
2011-08-21  3:46         ` Wu Fengguang
2011-08-22 17:22           ` Vivek Goyal
2011-08-23  1:07             ` Wu Fengguang
2011-08-23  3:53               ` Wu Fengguang
2011-08-23 13:53               ` Vivek Goyal
2011-08-24  3:09                 ` Wu Fengguang
  -- strict thread matches above, loose matches on Subject: below --
2011-08-06  8:44 [PATCH 0/5] IO-less dirty throttling v8 Wu Fengguang
2011-08-06  8:44 ` [PATCH 2/5] writeback: dirty position control Wu Fengguang
2011-08-08 13:46   ` Peter Zijlstra
2011-08-08 14:11     ` Wu Fengguang
2011-08-08 14:31       ` Peter Zijlstra
2011-08-08 22:47         ` Wu Fengguang
2011-08-09  9:31           ` Peter Zijlstra
2011-08-10 12:28             ` Wu Fengguang
2011-08-08 14:41       ` Peter Zijlstra
2011-08-08 23:05         ` Wu Fengguang
2011-08-09 10:32           ` Peter Zijlstra
2011-08-09 17:20           ` Peter Zijlstra
2011-08-10 22:34             ` Jan Kara
2011-08-11  2:29               ` Wu Fengguang
2011-08-11 11:14                 ` Jan Kara
2011-08-16  8:35                   ` Wu Fengguang
2011-08-12 13:19             ` Wu Fengguang
2011-08-10 21:40           ` Vivek Goyal
2011-08-16  8:55             ` Wu Fengguang
2011-08-11 22:56           ` Peter Zijlstra
2011-08-12  2:43             ` Wu Fengguang
2011-08-12  3:18               ` Wu Fengguang
2011-08-12  5:45               ` Wu Fengguang
2011-08-12  9:45                 ` Peter Zijlstra
2011-08-12 11:07                   ` Wu Fengguang
2011-08-12 12:17                     ` Peter Zijlstra
2011-08-12  9:47               ` Peter Zijlstra
2011-08-12 11:11                 ` Wu Fengguang
2011-08-12 12:54           ` Peter Zijlstra
2011-08-12 12:59             ` Wu Fengguang
2011-08-12 13:08               ` Peter Zijlstra
2011-08-12 13:04           ` Peter Zijlstra
2011-08-12 14:20             ` Wu Fengguang
2011-08-22 15:38               ` Peter Zijlstra
2011-08-23  3:40                 ` Wu Fengguang
2011-08-23 10:01                   ` Peter Zijlstra
2011-08-23 14:15                     ` Wu Fengguang
2011-08-23 17:47                       ` Vivek Goyal
2011-08-24  0:12                         ` Wu Fengguang
2011-08-24 16:12                           ` Peter Zijlstra
2011-08-26  0:18                             ` Wu Fengguang
2011-08-26  9:04                               ` Peter Zijlstra
2011-08-26 10:04                                 ` Wu Fengguang
2011-08-26 10:42                                   ` Peter Zijlstra
2011-08-26 10:52                                     ` Wu Fengguang
2011-08-26 11:26                                   ` Wu Fengguang
2011-08-26 12:11                                     ` Peter Zijlstra
2011-08-26 12:20                                       ` Wu Fengguang
2011-08-26 13:13                                         ` Wu Fengguang
2011-08-26 13:18                                           ` Peter Zijlstra
2011-08-26 13:24                                             ` Wu Fengguang
2011-08-24 18:00                           ` Vivek Goyal
2011-08-25  3:19                             ` Wu Fengguang
2011-08-25 22:20                               ` Vivek Goyal
2011-08-26  1:56                                 ` Wu Fengguang
2011-08-26  8:56                                   ` Peter Zijlstra
2011-08-26  9:53                                     ` Wu Fengguang
2011-08-29 13:12                             ` Peter Zijlstra
2011-08-29 13:37                               ` Wu Fengguang
2011-09-02 12:16                                 ` Peter Zijlstra
2011-09-06 12:40                                 ` Peter Zijlstra
2011-08-24 15:57                       ` Peter Zijlstra
2011-08-25  5:30                         ` Wu Fengguang
2011-08-23 14:36                     ` Vivek Goyal
2011-08-09  2:08   ` Vivek Goyal
2011-08-16  8:59     ` Wu Fengguang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110817202414.GK9959@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=david@fromorbit.com \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).