From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vivek Goyal Subject: Re: [PATCH 11/11] blkcg: implement per-blkg request allocation Date: Fri, 27 Apr 2012 11:40:34 -0400 Message-ID: <20120427154033.GJ10579@redhat.com> References: <1335477561-11131-1-git-send-email-tj@kernel.org> <1335477561-11131-12-git-send-email-tj@kernel.org> <20120427150217.GK27486@google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20120427150217.GK27486-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: containers-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Tejun Heo Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org, ctalbott-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, rni-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Jeff Moyer , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org, fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org On Fri, Apr 27, 2012 at 08:02:17AM -0700, Tejun Heo wrote: > Hello, > > On Fri, Apr 27, 2012 at 10:54:01AM -0400, Jeff Moyer wrote: > > > This patch implements per-blkg request_list. Each blkg has its own > > > request_list and any IO allocates its request from the matching blkg > > > making blkcgs completely isolated in terms of request allocation. > > > > So, nr_requests is now actually nr_requests * # of blk cgroups. Is that > > right? Are you at all concerned about the amount of memory that can be > > tied up as the number of cgroups increases? > > Yeah, I thought about it and I don't think there's a single good > solution here. The other extreme would be splitting nr_requests by > the number of cgroups but that seems even worse - each cgroup should > be able to hit maximum throughput. Given that a lot of workloads tend > to regulate themselves before hitting nr_requests, I think it's best > to leave it as-is and treat each cgroup as having separate channel for > now. It's a configurable parameter after all. So on a slow device a malicious application can easily create thousands of group, queue up tons of IO and create unreclaimable memory easily? Sounds little scary. I had used two separate limits. Per queue limit and per group limit (nr_requests and nr_group_requests). That had made implementation complex and relied on user doing the right configuration so that one cgroup does not get serialized behind other once we hit nr_requests. I am not advocating that solution as it was not very nice either. Hmm.., tricky... Thanks Vivek