From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner+w=401wt.eu-S1756528AbYIROF3@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756528AbYIROF3 (ORCPT <rfc822;w@1wt.eu>);
	Thu, 18 Sep 2008 10:05:29 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753599AbYIROFU
	(ORCPT <rfc822;linux-kernel-outgoing>);
	Thu, 18 Sep 2008 10:05:20 -0400
Received: from mx1.redhat.com ([66.187.233.31]:34112 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754093AbYIROFT (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
	Thu, 18 Sep 2008 10:05:19 -0400
Date: Thu, 18 Sep 2008 09:55:13 -0400
From: Vivek Goyal <vgoyal@redhat.com>
To: Andrea Righi <righi.andrea@gmail.com>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>, randy.dunlap@oracle.com,
       menage@google.com, chlunde@ping.uio.no, dpshah@google.com,
       eric.rannaud@gmail.com, balbir@linux.vnet.ibm.com,
       fernando@oss.ntt.co.jp, akpm@linux-foundation.org, agk@sourceware.org,
       subrata@linux.vnet.ibm.com, axboe@kernel.dk, m.innocenti@cineca.it,
       containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
       dave@linux.vnet.ibm.com, matt@bluehost.com, roberto@unbit.it,
       ngupta@google.com
Subject: Re: [RFC][PATCH -mm 0/5] cgroup: block device i/o controller (v9)
Message-ID: <20080918135513.GE20640@redhat.com>
References: <1219853257-11052-1-git-send-email-righi.andrea@gmail.com> <20080917.161811.27257227.taka@valinux.co.jp> <48D0C43A.2010102@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <48D0C43A.2010102@gmail.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Wed, Sep 17, 2008 at 10:47:54AM +0200, Andrea Righi wrote:
> Hirokazu Takahashi wrote:
> > Hi,
> > 
> >> TODO:
> >>
> >> * Try to push down the throttling and implement it directly in the I/O
> >>   schedulers, using bio-cgroup (http://people.valinux.co.jp/~ryov/bio-cgroup/)
> >>   to keep track of the right cgroup context. This approach could lead to more
> >>   memory consumption and increases the number of dirty pages (hard/slow to
> >>   reclaim pages) in the system, since dirty-page ratio in memory is not
> >>   limited. This could even lead to potential OOM conditions, but these problems
> >>   can be resolved directly into the memory cgroup subsystem
> >>
> >> * Handle I/O generated by kswapd: at the moment there's no control on the I/O
> >>   generated by kswapd; try to use the page_cgroup functionality of the memory
> >>   cgroup controller to track this kind of I/O and charge the right cgroup when
> >>   pages are swapped in/out
> > 
> > FYI, this also can be done with bio-cgroup, which determine the owner cgroup
> > of a given anonymous page.
> > 
> > Thanks,
> > Hirokazu Takahashi
> 
> That would be great! FYI here is how I would like to proceed:
> 
> - today I'll post a new version of my cgroup-io-throttle patch rebased
>   to 2.6.27-rc5-mm1 (it's well tested and seems to be stable enough).
>   To keep the things light and simpler I've implemented custom
>   get_cgroup_from_page() / put_cgroup_from_page() in the memory
>   controller to retrieve the owner of a page, holding a reference to the
>   corresponding memcg, during async writes in submit_bio(); this is not
>   probably the best way to proceed, and a more generic framework like
>   bio-cgroup sounds better, but it seems to work quite well. The only
>   problem I've found is that during swap_writepage() the page is not
>   assigned to any page_cgroup (page_get_page_cgroup() returns NULL), and
>   so I'm not able to charge the cost of this I/O operation to the right
>   cgroup. Does bio-cgroup address or even resolve this issue?
> - begin to implement a new branch of cgroup-io-throttle on top of
>   bio-cgroup
> - also start to implement an additional request queue to provide first a
>   control at the cgroup level and a dispatcher to pass the request to
>   the elevator (as suggested by Vivek)
> 

Hi Andrea,

So if we maintain and rb-tree per request queue and implement the cgroup
rules there, then that will take care of io-throttling also. (One can
control the release of bio/requests to elevator based on any kind of
rules. proportional weight/max-bandwidth).

If that's the case, I was wondering what do you mean by "begin to
implement new branch of cgroup-io-throttle" on top of bio-cgroup".

Thanks
Vivek