public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Munehiro Ikeda <m-ikeda@ds.jp.nec.com>
Cc: linux-kernel@vger.kernel.org, jens.axboe@oracle.com,
	Ryo Tsuruta <ryov@valinux.co.jp>,
	taka@valinux.co.jp, kamezawa.hiroyu@jp.fujitsu.com,
	Andrea Righi <righi.andrea@gmail.com>,
	Gui Jianfeng <guijianfeng@cn.fujitsu.com>,
	akpm@linux-foundation.org, balbir@linux.vnet.ibm.com
Subject: Re: [RFC][PATCH 00/11] blkiocg async support
Date: Fri, 9 Jul 2010 09:45:46 -0400	[thread overview]
Message-ID: <20100709134546.GC3672@redhat.com> (raw)
In-Reply-To: <4C369009.80503@ds.jp.nec.com>

On Thu, Jul 08, 2010 at 10:57:13PM -0400, Munehiro Ikeda wrote:
> These RFC patches are trial to add async (cached) write support on blkio
> controller.
> 
> Only test which has been done is to compile, boot, and that write bandwidth
> seems prioritized when pages which were dirtied by two different processes in
> different cgroups are written back to a device simultaneously.  I know this
> is the minimum (or less) test but I posted this as RFC because I would like
> to hear your opinions about the design direction in the early stage.
> 
> Patches are for 2.6.35-rc4.
> 
> This patch series consists of two chunks.
> 
> (1) iotrack (patch 01/11 -- 06/11)
> 
> This is a functionality to track who dirtied a page, in exact which cgroup a
> process which dirtied a page belongs to.  Blkio controller will read the info
> later and prioritize when the page is actually written to a block device.
> This work is originated from Ryo Tsuruta and Hirokazu Takahashi and includes
> Andrea Righi's idea.  It was posted as a part of dm-ioband which was one of
> proposals for IO controller.
> 
> 
> (2) blkio controller modification (07/11 -- 11/11)
> 
> The main part of blkio controller async write support.
> Currently async queues are device-wide and async write IOs are always treated
> as root group.
> These patches make async queues per a cfq_group per a device to control them.
> Async write is handled by flush kernel thread.  Because queue pointers are
> stored in cfq_io_context, io_context of the thread has to have multiple
> cfq_io_contexts per a device.  So these patches make cfq_io_context per an
> io_context per a cfq_group, which means per an io_context per a cgroup per a
> device.
> 
> 
> This might be a piece of puzzle for complete async write support of blkio
> controller.  One of other pieces in my head is page dirtying ratio control.
> I believe Andrea Righi was working on it...how about the situation?

Thanks Muuh. I will look into the patches in detail. 

In my initial patches I had implemented the support for ASYNC control
(also included Ryo's IO tracking patches) but it did not work well and
it was unpredictable. I realized that until and unless we implement
some kind of per group dirty ratio/page cache share at VM level and
create parallel paths for ASYNC IO, writes often get serialized.

So writes belonging to high priority group get stuck behind low priority
group and you don't get any service differentiation.

So IMHO, this piece should go into kernel after we have first fixed the
problem at VM (read memory controller) with per cgroup dirty ratio kind
of thing.

> 
> And also, I'm thinking that async write support is required by bandwidth
> capping policy of blkio controller.  Bandwidth capping can be done in upper
> layer than elevator.

I think capping facility we should implement in higher layers otherwise
it is not useful for higher level logical devices (dm/md).

It was ok to implement proportional bandwidth division at CFQ level
because one can do proportional BW division at each leaf node and still get
overall service differentation at higher level logical node. But same can
not be done for max BW control.
 
>  However I think it should be also done in elevator layer
> in my opinion.  Elevator buffers and sort requests.  If there is another
> buffering functionality in upper layer, it is doubled buffering and it can be
> harmful for elevator's prediction.

I don't mind doing it at elevator layer also because in that case of
somebody is not using dm/md, then one does not have to load max bw
control module and one can simply enable max bw control in CFQ. 

Thinking more about it, now we are suggesting implementing max BW
control at two places. I think it will be duplication of code and
increased complexity in CFQ. Probably implement max bw control with
the help of dm module and use same for CFQ also. There is pain 
associated with configuring dm device but I guess it is easier than
maintaining two max bw control schemes in kernel.

Thanks
Vivek

  parent reply	other threads:[~2010-07-09 13:46 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-09  2:57 [RFC][PATCH 00/11] blkiocg async support Munehiro Ikeda
2010-07-09  3:14 ` [RFC][PATCH 01/11] blkiocg async: Make page_cgroup independent from memory controller Munehiro Ikeda
2010-07-26  6:49   ` Balbir Singh
2010-07-09  3:15 ` [RFC][PATCH 02/11] blkiocg async: The main part of iotrack Munehiro Ikeda
2010-07-09  7:35   ` KAMEZAWA Hiroyuki
2010-07-09 23:06     ` Munehiro Ikeda
2010-07-12  0:11       ` KAMEZAWA Hiroyuki
2010-07-14 14:46         ` Munehiro IKEDA
2010-07-09  7:38   ` KAMEZAWA Hiroyuki
2010-07-09 23:09     ` Munehiro Ikeda
2010-07-10 10:06       ` Andrea Righi
2010-07-09  3:16 ` [RFC][PATCH 03/11] blkiocg async: Hooks for iotrack Munehiro Ikeda
2010-07-09  9:24   ` Andrea Righi
2010-07-09 23:43     ` Munehiro Ikeda
2010-07-09  3:16 ` [RFC][PATCH 04/11] blkiocg async: block_commit_write not to record process info Munehiro Ikeda
2010-07-09  3:17 ` [RFC][PATCH 05/11] blkiocg async: __set_page_dirty_nobuffer " Munehiro Ikeda
2010-07-09  3:17 ` [RFC][PATCH 06/11] blkiocg async: ext4_writepage not to overwrite iotrack info Munehiro Ikeda
2010-07-09  3:18 ` [RFC][PATCH 07/11] blkiocg async: Pass bio to elevator_ops functions Munehiro Ikeda
2010-07-09  3:19 ` [RFC][PATCH 08/11] blkiocg async: Function to search blkcg from css ID Munehiro Ikeda
2010-07-09  3:20 ` [RFC][PATCH 09/11] blkiocg async: Functions to get cfqg from bio Munehiro Ikeda
2010-07-09  3:22 ` [RFC][PATCH 10/11] blkiocg async: Async queue per cfq_group Munehiro Ikeda
2010-08-13  1:24   ` Nauman Rafique
2010-08-13 21:00     ` Munehiro Ikeda
2010-08-13 23:01       ` Nauman Rafique
2010-08-14  0:49         ` Munehiro Ikeda
2010-07-09  3:23 ` [RFC][PATCH 11/11] blkiocg async: Workload timeslice adjustment for async queues Munehiro Ikeda
2010-07-09 10:04 ` [RFC][PATCH 00/11] blkiocg async support Andrea Righi
2010-07-09 13:45 ` Vivek Goyal [this message]
2010-07-10  0:17   ` Munehiro Ikeda
2010-07-10  0:55     ` Nauman Rafique
2010-07-10 13:24       ` Vivek Goyal
2010-07-12  0:20         ` KAMEZAWA Hiroyuki
2010-07-12 13:18           ` Vivek Goyal
2010-07-13  4:36             ` KAMEZAWA Hiroyuki
2010-07-14 14:29               ` Vivek Goyal
2010-07-15  0:00                 ` KAMEZAWA Hiroyuki
2010-07-16 13:43                   ` Vivek Goyal
2010-07-16 14:15                     ` Daniel P. Berrange
2010-07-16 14:35                       ` Vivek Goyal
2010-07-16 14:53                         ` Daniel P. Berrange
2010-07-16 15:12                           ` Vivek Goyal
2010-07-27 10:40                             ` Daniel P. Berrange
2010-07-27 14:03                               ` Vivek Goyal
2010-07-22 19:28           ` Greg Thelen
2010-07-22 23:59             ` KAMEZAWA Hiroyuki
2010-07-26  6:41 ` Balbir Singh
2010-07-27  6:40   ` Greg Thelen
2010-07-27  6:39     ` KAMEZAWA Hiroyuki
2010-08-02 20:58 ` Vivek Goyal
2010-08-03 14:31   ` Munehiro Ikeda
2010-08-03 19:24     ` Nauman Rafique
2010-08-04 14:32       ` Munehiro Ikeda
2010-08-03 20:15     ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100709134546.GC3672@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=jens.axboe@oracle.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m-ikeda@ds.jp.nec.com \
    --cc=righi.andrea@gmail.com \
    --cc=ryov@valinux.co.jp \
    --cc=taka@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox