Re: [ATTEND] [LSF/MM TOPIC] Buffered writes throttling

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Vivek Goyal <vgoyal@redhat.com>
To: Andrea Righi <andrea@betterlinux.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>,
	Suresh Jayaraman <sjayaraman@suse.com>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, Jan Kara <jack@suse.cz>,
	Greg Thelen <gthelen@google.com>
Subject: Re: [ATTEND] [LSF/MM TOPIC] Buffered writes throttling
Date: Wed, 7 Mar 2012 15:52:09 -0500	[thread overview]
Message-ID: <20120307205209.GK13430@redhat.com> (raw)
In-Reply-To: <20120305225801.GB7545@thinkpad>

On Mon, Mar 05, 2012 at 11:58:01PM +0100, Andrea Righi wrote:

[..]
> What about this scenario? (Sorry, I've not followed some of the recent
> discussions on this topic, so I'm sure I'm oversimplifying a bit or
> ignoring some details):
> 
>  - track inodes per-memcg for writeback IO (provided Greg's patch)
>  - provide per-memcg dirty limit (global, not per-device); when this
>    limit is exceeded flusher threads are awekened and all tasks that
>    continue to generate new dirty pages inside the memcg are put to
>    sleep
>  - flusher threads start to write some dirty inodes of this memcg (using
>    the inode tracking feature), let say they start with a chunk of N
>    pages of the first dirty inode
>  - flusher threads can't flush in this way more than N pages / sec
>    (where N * PAGE_SIZE / sec is the blkcg "buffered write rate limit"
>    on the inode's block device); if a flusher thread exceeds this limit
>    it won't be blocked directly, it just stops flushing pages for this
>    memcg after the first chunk and it can continue to flush dirty pages
>    of a different memcg.
> 

So, IIUC, the only thing little different here is that throttling is
implemented by flusher thread. But it is still per device per cgroup. I
think that is just a implementation detail whether we implement it
in block layer, or in writeback or somewhere else.  We can very well
implement it in block layer and provide per bdi/per_group congestion
flag in bdi so that flusher will stop pushing more IO if group on 
a bdi is congested (because IO is throttled).

I think first important thing is to figure out what is minimal set of
requirement (As jan said in another mail), which will solve wide
variety of cases. I am trying to list some of points. 


- Throttling for buffered writes
	- Do we want per device throttling limits or global throttling
	  limtis.

	- Exising direct write limtis are per device and implemented in
	  block layer.

	- I personally think that both kind of limits might make sense.
	  But a global limit for async write might make more sense at
	  least for the workloads like backup which can run on a throttled
  	  speed.

	- Absolute throttling IO will make most sense on top level device
	  in the IO stack.

	- For per device rate throttling, do we want a common limit for
	  direct write and buffered write or a separate limit just for
	  buffered writes.

- Proportional IO for async writes
	- Will probably make most sense on bottom most devices in the IO
	  stack (If we are able to somehow retain the submitter's context).
	
	- Logically it will make sense to keep sync and async writes in
	  same group and try to provide fair share of disk between groups.
	  Technically CFQ can do that but in practice I think it will be
 	  problematic. Writes of one group will take precedence of reads
	  of another group. Currently any read is prioritized over 
	  buffered writes. So by splitting buffered writes in their own
	  cgroups, they can serverly impact the latency of reads in
	  another group. Not sure how many people really want to do
	  that in practice.

	- Do we really need proportional IO for async writes. CFQ had
	  tried implementing ioprio for async writes but it does not
	  work. Should we just care about groups of sync IO and let
	  all the async IO on device go in a single queue and lets
	  make suere it is not starved while sync IO is going on.


	- I thought that most of the people cared about not impacting
	  sync latencies badly while buffered writes are happening. Not
	  many complained that buffered writes of one application should
	  happen faster than other application. 

	- If we agree that not many people require service differentation
	  between buffered writes, then we probably don't have to do
	  anything in this space and we can keep things simple. I
	  personally prefer this option. Trying to provide proportional
	  IO for async writes will make things complicated and we might
	  not achieve much. 

	- CFQ already does a very good job of prioritizing sync over async
	  (at the cost of reduced throuhgput on fast devices). So what's
	  the use case of proportion IO for async writes.

Once we figure out what are the requirements, we can discuss the
implementation details.

Thanks
Vivek

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2012-03-07 20:52 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-02  7:18 [ATTEND] [LSF/MM TOPIC] Buffered writes throttling Suresh Jayaraman
2012-03-02 15:33 ` Vivek Goyal
2012-03-05 19:22   ` Fengguang Wu
2012-03-05 21:11     ` Vivek Goyal
2012-03-05 22:30       ` Fengguang Wu
2012-03-05 23:19         ` Andrea Righi
2012-03-05 23:51           ` Fengguang Wu
2012-03-06  0:46             ` Andrea Righi
2012-03-07 20:26               ` Vivek Goyal
2012-03-05 22:58       ` Andrea Righi
2012-03-07 20:52         ` Vivek Goyal [this message]
2012-03-07 22:04           ` Jeff Moyer
2012-03-08  8:08           ` Greg Thelen
2012-03-05 20:23   ` [Lsf-pc] " Jan Kara
2012-03-05 21:41     ` Vivek Goyal
2012-03-07 17:24       ` Jan Kara
2012-03-07 21:29         ` Vivek Goyal
2012-03-05 22:18     ` Vivek Goyal
2012-03-05 22:36       ` Jan Kara
2012-03-07  6:42         ` Fengguang Wu
2012-03-07  6:31     ` Fengguang Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120307205209.GK13430@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=andrea@betterlinux.com \
    --cc=fengguang.wu@intel.com \
    --cc=gthelen@google.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=sjayaraman@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).