All of lore.kernel.org
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: linux kernel mailing list <linux-kernel@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Nauman Rafique <nauman@google.com>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	Divyesh Shah <dpshah@google.com>,
	Heinz Mauelshagen <heinzm@redhat.com>,
	arighi@develer.com,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Subject: Re: [RFC PATCH] Bio Throttling support for block IO controller
Date: Thu, 2 Sep 2010 23:02:45 +0530	[thread overview]
Message-ID: <20100902173245.GC18218@balbir.in.ibm.com> (raw)
In-Reply-To: <20100902151824.GA2702@redhat.com>

* Vivek Goyal <vgoyal@redhat.com> [2010-09-02 11:18:24]:

> On Wed, Sep 01, 2010 at 04:07:56PM -0400, Vivek Goyal wrote:
> > On Wed, Sep 01, 2010 at 01:58:30PM -0400, Vivek Goyal wrote:
> > > Hi,
> > > 
> > > Currently CFQ provides the weight based proportional division of bandwidth.
> > > People also have been looking at extending block IO controller to provide
> > > throttling/max bandwidth control.
> > > 
> > > I have started to write the support for throttling in block layer on 
> > > request queue so that it can be used both for higher level logical
> > > devices as well as leaf nodes. This patch is still work in progress but
> > > I wanted to post it for early feedback.
> > > 
> > > Basically currently I have hooked into __make_request() function to 
> > > check which cgroup bio belongs to and if it is exceeding the specified
> > > BW rate. If no, thread can continue to dispatch bio as it is otherwise
> > > bio is queued internally and dispatched later with the help of a worker
> > > thread.
> > > 
> > > HOWTO
> > > =====
> > > - Mount blkio controller
> > > 	mount -t cgroup -o blkio none /cgroup/blkio
> > > 
> > > - Specify a bandwidth rate on particular device for root group. The format
> > >   for policy is "<major>:<minor>  <byes_per_second>".
> > > 
> > > 	echo "8:16  1048576" > /cgroup/blkio/blkio.read_bps_device
> > > 
> > >   Above will put a limit of 1MB/second on reads happening for root group
> > >   on device having major/minor number 8:16.
> > > 
> > > - Run dd to read a file and see if rate is throttled to 1MB/s or not.
> > > 
> > > 	# dd if=/mnt/common/zerofile of=/dev/null bs=4K count=1024 iflag=direct
> > > 	1024+0 records in
> > > 	1024+0 records out
> > > 	4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
> > >  
> > >  Limits for writes can be put using blkio.write_bps_device file.
> > > 
> > > Open Issues
> > > ===========
> > > - Do we need to provide additional queue congestion semantics as we are
> > >   throttling and queuing bios at request queue and probably we don't want
> > >   a user space application to consume all the memory allocating bios
> > >   and bombarding request queue with those bios.
> > > 
> > > - How to handle the current blkio cgroup stats file and two policies
> > >   in the background. If for some reason both throttling and proportional
> > >   BW policies are operating on request queue, then stats will be very
> > >   confusing.
> > > 
> > >   May be we can allow activating either throttling or proportional BW
> > >   policy per request queue and we can create a /sys tunable to list and
> > >   chose between policies (something like choosing IO scheduler). The
> > >   only downside of this apporach is that user also need to be aware of
> > >   the storage hierachy and activate right policy at each node/request
> > >   queue.
> > 
> > Thinking more about it. The issue of stats from proportional bandwidth
> > controller and max bandwidth controller clobbering each other can 
> > probably be solved by also specifying policy name with the stat. For 
> > example, currently blkio.io_serviced, looks as follows.
> > 
> > # cat blkio.io_serviced
> > 253:2 Read 61
> > 253:2 Write 0
> > 253:2 Sync 61
> > 253:2 Async 0
> > 253:2 Total 61
> > 
> > We can introduce one more field to specify policy for which this stats are as 
> > follows.
> > 
> > # cat blkio.io_serviced
> > 253:2 Read 61	throttle
> > 253:2 Write 0	throttle
> > 253:2 Sync 61	throttle
> > 253:2 Async 0	throttle
> > 253:2 Total 61	throttle
> > 
> > 253:2 Read 61	proportional	
> > 253:2 Write 0	proportional
> > 253:2 Sync 61   proportional
> > 253:2 Async 0   proportional
> > 253:2 Total 61  proportional
> > 
> 
> Option 1
> ========
> I was looking at the blkio stat code more. It seems to be key value pair
> thing. So looks like I shall have to change the format of the file and
> use second field for policy name and that will break any existing tools
> parsing these blkio cgroup files.

We could go this way and marking the current stats as
deprecated and to be removed say in 2.6.39 or so

> 
> # cat blkio.io_serviced
> 253:2 throttle Read 61
> 253:2 throttle Write 0
> 253:2 throttle Sync 61
> 253:2 throttle Async 0
> 253:2 throttle Total 61
> 
> 253:2 proportional Read 61
> 253:2 proportional Write 0
> 253:2 proportional Sync 61
> 253:2 proportional Async 0
> 253:2 proportional Total 61
> 
> Option 2
> ========
> Introduce policy column only for new policy. 
> 
> 253:2 Read 61
> 253:2 Write 0
> 253:2 Sync 61
> 253:2 Async 0
> 253:2 Total 61
> 
> 253:2 throttle Read 61
> 253:2 throttle Write 0
> 253:2 throttle Sync 61
> 253:2 throttle Async 0
> 253:2 throttle Total 61
> 
> Here old lines continue to represent proportional weight policy stats and
> new lines with "throttle" key word represent throttling stats.
> 
> This is just like adding new fields to "stat" file. I guess it might still
> might break some script which might get stumped by new lines. But if scripts
> are not parsing all the lines and just selectively picking data then these
> should be fine.
> 
> Option 3
> ========
> The other option is that I introduce new cgroup files for the new
> policy. Something like what memory cgroup has done for swap accounting
> files.
> 
> blkio.throttle.io_serviced
> blkio.throttle.io_service_bytes
> 
> That will make sure ABI is not broken but number of files per cgroup
> increase and there are already significant number of files in the group.
> 
> Actually I think I should atleast rename the read and write bw files so that
> they explicitly tell these belong to throtlling poilcy.
> 
> blkio.throttle.read_bps_device
> blkio.throttle.write_bps_device
> 
> Any thoughts on what is the best way forward.
>

I'd prefer option 3, if not fallback to option 1. The problem is that
with ABI changes, tools always have to figure out what version they
are dealing with.
 
-- 
	Three Cheers,
	Balbir

  parent reply	other threads:[~2010-09-02 17:32 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-01 17:58 [RFC PATCH] Bio Throttling support for block IO controller Vivek Goyal
2010-09-01 20:07 ` Vivek Goyal
2010-09-02 15:18   ` Vivek Goyal
2010-09-02 16:22     ` Nauman Rafique
2010-09-02 17:22       ` Vivek Goyal
2010-09-02 17:32     ` Balbir Singh [this message]
2010-09-02 18:39 ` Paul E. McKenney
2010-09-03  1:57   ` Vivek Goyal
2010-09-03 23:36     ` Paul E. McKenney
2010-09-03  9:50 ` Gui Jianfeng
2010-09-03 12:48   ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100902173245.GC18218@balbir.in.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=arighi@develer.com \
    --cc=axboe@kernel.dk \
    --cc=dpshah@google.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=heinzm@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nauman@google.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.