linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Wu Fengguang <fengguang.wu@intel.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@lst.de>,
	Trond Myklebust <Trond.Myklebust@netapp.com>,
	Dave Chinner <david@fromorbit.com>, Theodore Ts'o <tytso@mit.edu>,
	Chris Mason <chris.mason@oracle.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mel Gorman <mel@csn.ul.ie>, Rik van Riel <riel@redhat.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Greg Thelen <gthelen@google.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Andrea Righi <arighi@develer.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	linux-mm <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 00/12] IO-less dirty throttling v7
Date: Thu, 28 Apr 2011 22:27:49 +0800	[thread overview]
Message-ID: <20110428142749.GA11068@localhost> (raw)
In-Reply-To: <20110426171954.GD9414@redhat.com>

Hi Vivek,

On Wed, Apr 27, 2011 at 01:19:54AM +0800, Vivek Goyal wrote:
> On Sat, Apr 16, 2011 at 09:25:46PM +0800, Wu Fengguang wrote:
> > Andrew,
> > 
> > This revision undergoes a number of simplifications, cleanups and fixes.
> > Independent patches are separated out. The core patches (07, 08) now have
> > easier to understand changelog. Detailed rationals can be found in patch 08.
> > 
> > In response to the complexity complaints, an introduction document is
> > written explaining the rationals, algorithm and visual case studies:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/slides/smooth-dirty-throttling.pdf
> > 
> 
> Hi Fenguang,
> 
> I went quickly browsed through above document and am trying to understand
> the meaning of following lines and see how does it fit into the framework
> of existing IO conroller.

Thanks for taking the look! Regarding this diff:

http://git.kernel.org/?p=linux/kernel/git/wfg/writeback.git;a=blobdiff;f=mm/page-writeback.c;h=0b579e7fd338fd1f59cc36bf15fda06ff6260634;hp=34dff9f0d28d0f4f0794eb41187f71b4ade6b8a2;hb=1a58ad99ce1f6a9df6618a4b92fa4859cc3e7e90;hpb=5b6fcb3125ea52ff04a2fad27a51307842deb1a0

> - task IO controller endogenous

Normally the bandwidth the current task to be throttled at (referred
to as task_bw below) is runtime calculated, however if there is an
interface (the patch reuses current->signal->rlim[RLIMIT_RSS].rlim_cur),
then it can just use that bandwidth to throttle the current task. No
extra code is needed.  In this sense, it has the endogenous capability
to do per-task async write IO controller.

> - proportional IO controller endogenous

Sorry, "priority" could be more accurate than "proportional".
When task_bw is calculated in the normal way, you may further do

        task_bw *= 2;

to grant it doubled bandwidth than the other tasks. Or do

        task_bw *= current->async_write_priority;

to give it whatever configurable async write priority. When you do
this, the base bandwidth is smart enough to adapt to the new balance
point.  In this sense, exact priority control is also endogenous.

> - cgroup IO controller well integrated

The async write cgroup IO controller is implemented in the same way as
the "global IO controller", in that it's also based on the "base
bandwidth" concept and is calculated with the same algorithm.

> You had sent me a link where you had prepared a patch to control the
> async IO completely. So because this code is all about measuring the
> bdi writeback rate and then coming up task ratelimit accoridingly, it
> will never know about other IO going on in the cgroup. READS and direct
> IO.

Right.

> So IIUC, to make use of above logic for cgroup throttling, one shall have
> to come up with explicity notion of async bandwidth per cgroup which does
> not control other writes. Currently we have following when it comes to
> throttling.
> 
> blkio.throttle_read_bps
> blkio.throttle_write_bps
> 
> The intention is to be able to control the WRITE bandwidth of cgroup and
> it could be any kind of WRITE (be it buffered WRITE or direct WRITES). 
> Currently we control only direct WRITES and question of how to also
> control buffered writes is still on the table.
> 
> Because your patch does not know about other WRITES happening in the
> system, one needs to create a way so that buffered WRITES and direct
> WRITES can be accounted together against a group and throttled
> accordingly.

Basically it is now possible to also send DIRECT writes to the new
balance_dirty_pages(), because it's RATE based rather than THRESHOLD
based. The DIRECT writes have nothing to do with dirty THRESHOLD, so
the legacy balance_dirty_pages() was not able to handle them at all.

Then there is the danger that DIRECT writes be double throttled --
explicitly in balance_dirty_pages() and implicitly in
get_request_wait().  But as long as the latter do not sleep for too
long time (< 500ms for now), it will be compensated in
balance_dirty_pages() (aka. think time compensation).

Or even safer, we may let DIRECT writes enter balance_dirty_pages()
only if it's to be cgroup throttled. The cgroup IO controller can be
enhanced to do "leak" control that can effectively account for all
get_request_wait() latencies.

> What does "proportional IO controller endogenous" mean? Currently we do
> all proportional IO division in CFQ. So are you proposing that for 
> buffered WRITES we come up with a different policy altogether in writeback
> layer or somehow it is integrating with CFQ mechanism?

See above. It's not related to CFQ and totally within the scope of
(async) writes.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

      reply	other threads:[~2011-04-28 14:27 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-16 13:25 [PATCH 00/12] IO-less dirty throttling v7 Wu Fengguang
2011-04-16 13:25 ` [PATCH 01/12] writeback: account per-bdi accumulated written pages Wu Fengguang
2011-04-16 13:25 ` [PATCH 02/12] writeback: account per-bdi accumulated dirtied pages Wu Fengguang
2011-04-16 13:25 ` [PATCH 03/12] writeback: bdi write bandwidth estimation Wu Fengguang
2011-04-16 13:25 ` [PATCH 04/12] writeback: smoothed global/bdi dirty pages Wu Fengguang
2011-04-16 13:25 ` [PATCH 05/12] writeback: smoothed dirty threshold and limit Wu Fengguang
2011-04-16 13:25 ` [PATCH 06/12] writeback: enforce 1/4 gap between the dirty/background thresholds Wu Fengguang
2011-04-16 13:25 ` [PATCH 07/12] writeback: base throttle bandwidth and position ratio Wu Fengguang
2011-04-16 13:25 ` [PATCH 08/12] writeback: IO-less balance_dirty_pages() Wu Fengguang
2011-04-16 13:25 ` [PATCH 09/12] writeback: show bdi write bandwidth in debugfs Wu Fengguang
2011-04-16 13:25 ` [PATCH 10/12] writeback: trace dirty_ratelimit Wu Fengguang
2011-04-16 13:25 ` [PATCH 11/12] writeback: trace balance_dirty_pages Wu Fengguang
2011-04-16 13:25 ` [PATCH 12/12] writeback: trace global_dirty_state Wu Fengguang
2011-04-16 16:27 ` [PATCH 00/12] IO-less dirty throttling v7 Sedat Dilek
2011-04-17  1:44   ` Wu Fengguang
2011-04-17  3:18     ` Sedat Dilek
2011-04-17  4:10       ` Wu Fengguang
2011-04-17  4:46         ` Sedat Dilek
2011-04-17  6:46           ` Sedat Dilek
2011-04-18  0:13         ` Wu Fengguang
2011-04-18  6:57           ` Sedat Dilek
2011-04-18  8:18             ` Wu Fengguang
2011-04-18 10:22               ` Sedat Dilek
2011-04-17  7:31 ` Marco Stornelli
2011-04-17  9:30   ` Wu Fengguang
2011-04-17 17:44     ` Marco Stornelli
2011-04-17 23:31       ` Wu Fengguang
2011-04-26 17:19 ` Vivek Goyal
2011-04-28 14:27   ` Wu Fengguang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110428142749.GA11068@localhost \
    --to=fengguang.wu@intel.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=arighi@develer.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=gthelen@google.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    --cc=tytso@mit.edu \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).