From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756558AbYIROym (ORCPT ); Thu, 18 Sep 2008 10:54:42 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754018AbYIROye (ORCPT ); Thu, 18 Sep 2008 10:54:34 -0400 Received: from rv-out-0506.google.com ([209.85.198.239]:5163 "EHLO rv-out-0506.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753988AbYIROyd (ORCPT ); Thu, 18 Sep 2008 10:54:33 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:reply-to:user-agent:mime-version:to:cc:subject :references:in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=u1C0seje/zHp8i2iDNdVObeez4YaBS1eWRaszIX/zlYmjm1UEiCwEGFMWJGae3av4/ +6HBBt262j4u9DsG5QboGn1vtF225uNQ4UzkV0JHv4mhJdtct46NYFPOe8PeXOXQMlvx bBFCNMBWugL0D1rUli7mAR0M1oRIWyn42EC+I= Message-ID: <48D26BA3.40009@gmail.com> Date: Thu, 18 Sep 2008 16:54:27 +0200 From: Andrea Righi Reply-To: righi.andrea@gmail.com User-Agent: Thunderbird 2.0.0.16 (X11/20080724) MIME-Version: 1.0 To: Vivek Goyal CC: Hirokazu Takahashi , randy.dunlap@oracle.com, menage@google.com, chlunde@ping.uio.no, dpshah@google.com, eric.rannaud@gmail.com, balbir@linux.vnet.ibm.com, fernando@oss.ntt.co.jp, akpm@linux-foundation.org, agk@sourceware.org, subrata@linux.vnet.ibm.com, axboe@kernel.dk, m.innocenti@cineca.it, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org, dave@linux.vnet.ibm.com, matt@bluehost.com, roberto@unbit.it, ngupta@google.com Subject: Re: [RFC][PATCH -mm 0/5] cgroup: block device i/o controller (v9) References: <1219853257-11052-1-git-send-email-righi.andrea@gmail.com> <20080917.161811.27257227.taka@valinux.co.jp> <48D0C43A.2010102@gmail.com> <20080918135513.GE20640@redhat.com> In-Reply-To: <20080918135513.GE20640@redhat.com> X-Enigmail-Version: 0.95.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Vivek Goyal wrote: > On Wed, Sep 17, 2008 at 10:47:54AM +0200, Andrea Righi wrote: >> Hirokazu Takahashi wrote: >>> Hi, >>> >>>> TODO: >>>> >>>> * Try to push down the throttling and implement it directly in the I/O >>>> schedulers, using bio-cgroup (http://people.valinux.co.jp/~ryov/bio-cgroup/) >>>> to keep track of the right cgroup context. This approach could lead to more >>>> memory consumption and increases the number of dirty pages (hard/slow to >>>> reclaim pages) in the system, since dirty-page ratio in memory is not >>>> limited. This could even lead to potential OOM conditions, but these problems >>>> can be resolved directly into the memory cgroup subsystem >>>> >>>> * Handle I/O generated by kswapd: at the moment there's no control on the I/O >>>> generated by kswapd; try to use the page_cgroup functionality of the memory >>>> cgroup controller to track this kind of I/O and charge the right cgroup when >>>> pages are swapped in/out >>> FYI, this also can be done with bio-cgroup, which determine the owner cgroup >>> of a given anonymous page. >>> >>> Thanks, >>> Hirokazu Takahashi >> That would be great! FYI here is how I would like to proceed: >> >> - today I'll post a new version of my cgroup-io-throttle patch rebased >> to 2.6.27-rc5-mm1 (it's well tested and seems to be stable enough). >> To keep the things light and simpler I've implemented custom >> get_cgroup_from_page() / put_cgroup_from_page() in the memory >> controller to retrieve the owner of a page, holding a reference to the >> corresponding memcg, during async writes in submit_bio(); this is not >> probably the best way to proceed, and a more generic framework like >> bio-cgroup sounds better, but it seems to work quite well. The only >> problem I've found is that during swap_writepage() the page is not >> assigned to any page_cgroup (page_get_page_cgroup() returns NULL), and >> so I'm not able to charge the cost of this I/O operation to the right >> cgroup. Does bio-cgroup address or even resolve this issue? >> - begin to implement a new branch of cgroup-io-throttle on top of >> bio-cgroup >> - also start to implement an additional request queue to provide first a >> control at the cgroup level and a dispatcher to pass the request to >> the elevator (as suggested by Vivek) >> > > Hi Andrea, > > So if we maintain and rb-tree per request queue and implement the cgroup > rules there, then that will take care of io-throttling also. (One can > control the release of bio/requests to elevator based on any kind of > rules. proportional weight/max-bandwidth). > > If that's the case, I was wondering what do you mean by "begin to > implement new branch of cgroup-io-throttle" on top of bio-cgroup". Correct, with the rb-tree per request queue solution there's no need to keep track of the context in the struct bio, since the i/o control based on per cgroup rules has been already performed by the first i/o dispatcher. And I would really like to dedicate all my efforts to move in this direction, but it would be interesting as well to test the bio-cgroup functionality since it's working from now, it's a generic framework and used by another project (dm-ioband). This is the reason because I put it there, specifying to open a new branch, because it would be an alternative solution to the following point. -Andrea