From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner+w=401wt.eu-S1756558AbYIROym@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1756558AbYIROym (ORCPT <rfc822;w@1wt.eu>);
	Thu, 18 Sep 2008 10:54:42 -0400
Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754018AbYIROye
	(ORCPT <rfc822;linux-kernel-outgoing>);
	Thu, 18 Sep 2008 10:54:34 -0400
Received: from rv-out-0506.google.com ([209.85.198.239]:5163 "EHLO
	rv-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1753988AbYIROyd (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 18 Sep 2008 10:54:33 -0400
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:reply-to:user-agent:mime-version:to:cc:subject
         :references:in-reply-to:x-enigmail-version:content-type
         :content-transfer-encoding;
        b=u1C0seje/zHp8i2iDNdVObeez4YaBS1eWRaszIX/zlYmjm1UEiCwEGFMWJGae3av4/
         +6HBBt262j4u9DsG5QboGn1vtF225uNQ4UzkV0JHv4mhJdtct46NYFPOe8PeXOXQMlvx
         bBFCNMBWugL0D1rUli7mAR0M1oRIWyn42EC+I=
Message-ID: <48D26BA3.40009@gmail.com>
Date: Thu, 18 Sep 2008 16:54:27 +0200
From: Andrea Righi <righi.andrea@gmail.com>
Reply-To: righi.andrea@gmail.com
User-Agent: Thunderbird 2.0.0.16 (X11/20080724)
MIME-Version: 1.0
To: Vivek Goyal <vgoyal@redhat.com>
CC: Hirokazu Takahashi <taka@valinux.co.jp>, randy.dunlap@oracle.com,
       menage@google.com, chlunde@ping.uio.no, dpshah@google.com,
       eric.rannaud@gmail.com, balbir@linux.vnet.ibm.com,
       fernando@oss.ntt.co.jp, akpm@linux-foundation.org, agk@sourceware.org,
       subrata@linux.vnet.ibm.com, axboe@kernel.dk, m.innocenti@cineca.it,
       containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
       dave@linux.vnet.ibm.com, matt@bluehost.com, roberto@unbit.it,
       ngupta@google.com
Subject: Re: [RFC][PATCH -mm 0/5] cgroup: block device i/o controller (v9)
References: <1219853257-11052-1-git-send-email-righi.andrea@gmail.com> <20080917.161811.27257227.taka@valinux.co.jp> <48D0C43A.2010102@gmail.com> <20080918135513.GE20640@redhat.com>
In-Reply-To: <20080918135513.GE20640@redhat.com>
X-Enigmail-Version: 0.95.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Vivek Goyal wrote:
> On Wed, Sep 17, 2008 at 10:47:54AM +0200, Andrea Righi wrote:
>> Hirokazu Takahashi wrote:
>>> Hi,
>>>
>>>> TODO:
>>>>
>>>> * Try to push down the throttling and implement it directly in the I/O
>>>>   schedulers, using bio-cgroup (http://people.valinux.co.jp/~ryov/bio-cgroup/)
>>>>   to keep track of the right cgroup context. This approach could lead to more
>>>>   memory consumption and increases the number of dirty pages (hard/slow to
>>>>   reclaim pages) in the system, since dirty-page ratio in memory is not
>>>>   limited. This could even lead to potential OOM conditions, but these problems
>>>>   can be resolved directly into the memory cgroup subsystem
>>>>
>>>> * Handle I/O generated by kswapd: at the moment there's no control on the I/O
>>>>   generated by kswapd; try to use the page_cgroup functionality of the memory
>>>>   cgroup controller to track this kind of I/O and charge the right cgroup when
>>>>   pages are swapped in/out
>>> FYI, this also can be done with bio-cgroup, which determine the owner cgroup
>>> of a given anonymous page.
>>>
>>> Thanks,
>>> Hirokazu Takahashi
>> That would be great! FYI here is how I would like to proceed:
>>
>> - today I'll post a new version of my cgroup-io-throttle patch rebased
>>   to 2.6.27-rc5-mm1 (it's well tested and seems to be stable enough).
>>   To keep the things light and simpler I've implemented custom
>>   get_cgroup_from_page() / put_cgroup_from_page() in the memory
>>   controller to retrieve the owner of a page, holding a reference to the
>>   corresponding memcg, during async writes in submit_bio(); this is not
>>   probably the best way to proceed, and a more generic framework like
>>   bio-cgroup sounds better, but it seems to work quite well. The only
>>   problem I've found is that during swap_writepage() the page is not
>>   assigned to any page_cgroup (page_get_page_cgroup() returns NULL), and
>>   so I'm not able to charge the cost of this I/O operation to the right
>>   cgroup. Does bio-cgroup address or even resolve this issue?
>> - begin to implement a new branch of cgroup-io-throttle on top of
>>   bio-cgroup
>> - also start to implement an additional request queue to provide first a
>>   control at the cgroup level and a dispatcher to pass the request to
>>   the elevator (as suggested by Vivek)
>>
> 
> Hi Andrea,
> 
> So if we maintain and rb-tree per request queue and implement the cgroup
> rules there, then that will take care of io-throttling also. (One can
> control the release of bio/requests to elevator based on any kind of
> rules. proportional weight/max-bandwidth).
> 
> If that's the case, I was wondering what do you mean by "begin to
> implement new branch of cgroup-io-throttle" on top of bio-cgroup".

Correct, with the rb-tree per request queue solution there's no need to
keep track of the context in the struct bio, since the i/o control
based on per cgroup rules has been already performed by the first i/o
dispatcher. And I would really like to dedicate all my efforts to move
in this direction, but it would be interesting as well to test the
bio-cgroup functionality since it's working from now, it's a generic
framework and used by another project (dm-ioband). This is the reason
because I put it there, specifying to open a new branch, because it
would be an alternative solution to the following point.

-Andrea