From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760613Ab2D0Pkq (ORCPT ); Fri, 27 Apr 2012 11:40:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52502 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760394Ab2D0Pkp (ORCPT ); Fri, 27 Apr 2012 11:40:45 -0400 Date: Fri, 27 Apr 2012 11:40:34 -0400 From: Vivek Goyal To: Tejun Heo Cc: Jeff Moyer , axboe@kernel.dk, ctalbott@google.com, rni@google.com, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, containers@lists.linux-foundation.org, fengguang.wu@intel.com, hughd@google.com, akpm@linux-foundation.org Subject: Re: [PATCH 11/11] blkcg: implement per-blkg request allocation Message-ID: <20120427154033.GJ10579@redhat.com> References: <1335477561-11131-1-git-send-email-tj@kernel.org> <1335477561-11131-12-git-send-email-tj@kernel.org> <20120427150217.GK27486@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20120427150217.GK27486@google.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Apr 27, 2012 at 08:02:17AM -0700, Tejun Heo wrote: > Hello, > > On Fri, Apr 27, 2012 at 10:54:01AM -0400, Jeff Moyer wrote: > > > This patch implements per-blkg request_list. Each blkg has its own > > > request_list and any IO allocates its request from the matching blkg > > > making blkcgs completely isolated in terms of request allocation. > > > > So, nr_requests is now actually nr_requests * # of blk cgroups. Is that > > right? Are you at all concerned about the amount of memory that can be > > tied up as the number of cgroups increases? > > Yeah, I thought about it and I don't think there's a single good > solution here. The other extreme would be splitting nr_requests by > the number of cgroups but that seems even worse - each cgroup should > be able to hit maximum throughput. Given that a lot of workloads tend > to regulate themselves before hitting nr_requests, I think it's best > to leave it as-is and treat each cgroup as having separate channel for > now. It's a configurable parameter after all. So on a slow device a malicious application can easily create thousands of group, queue up tons of IO and create unreclaimable memory easily? Sounds little scary. I had used two separate limits. Per queue limit and per group limit (nr_requests and nr_group_requests). That had made implementation complex and relied on user doing the right configuration so that one cgroup does not get serialized behind other once we hit nr_requests. I am not advocating that solution as it was not very nice either. Hmm.., tricky... Thanks Vivek