From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751034AbWFFKrm (ORCPT ); Tue, 6 Jun 2006 06:47:42 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751150AbWFFKrm (ORCPT ); Tue, 6 Jun 2006 06:47:42 -0400 Received: from e35.co.us.ibm.com ([32.97.110.153]:21702 "EHLO e35.co.us.ibm.com") by vger.kernel.org with ESMTP id S1751034AbWFFKrl (ORCPT ); Tue, 6 Jun 2006 06:47:41 -0400 Date: Tue, 6 Jun 2006 16:17:29 +0530 From: Srivatsa Vaddagiri To: Matt Helsley Cc: Peter Williams , LKML , Andrew Morton , dev@openvz.org, ckrm-tech@lists.sourceforge.net, balbir@in.ibm.com, Balbir Singh , Mike Galbraith , Con Kolivas , Sam Vilain , Kingsley Cheung , "Eric W. Biederman" , Ingo Molnar , Rene Herman , "Chandra S. Seetharaman" Subject: Re: [RFC 3/5] sched: Add CPU rate hard caps Message-ID: <20060606104728.GB4394@in.ibm.com> Reply-To: vatsa@in.ibm.com References: <200606020003.51504.a1426z@gawab.com> <447F956B.3090402@bigpond.net.au> <1149247384.28649.691.camel@stark> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1149247384.28649.691.camel@stark> User-Agent: Mutt/1.5.11 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 02, 2006 at 04:23:04AM -0700, Matt Helsley wrote: > There are two problems as I see it: > > 1) If X1 grows to use 35% then X2's usage can't grow back from 15% until > X1 relents. This is seems unpleasantly like cooperative scheduling > within group X because if we take this to its limit X2 gets 0% and X1 > gets 50% -- effectively starving X2. What little I know about nice > suggests this wouldn't really happen. However I think may highlight one > case where fiddling with nice can't effectively control CPU usage. I would expect task Z to adjust the limits of X1, X2 again when it notices that X2 is "hungry". Until Z gets around to do that, what situation you describe will be true. If Z is configured to run quite frequently (every 5 seconds?) to monitor/adjust limits, then this starvation (of X2) may be avoided for longer periods? > 2) Suppose we add group Y with tasks Y1-YM, Y's CPU usage is limited to > 49%, each task of Y uses its limit of (M/49)% CPU, and the remaining 1% > is left for Z (i.e. the single CPU is being used heavily). Z must use > this 1% to read accounting information and adjust nice values as > described above. If X1 spawns X3 we're likely in trouble -- Z might not > get to run for a while but X3 has inheritted X1's nice value. If we > return to our initial assumption that X1 and X2 are each using their > limit of 25% then X3 will get limited to 25% too. The sum of Xi can now > exceed 50% until Z is scheduled next. This only gets worse if there is > an imbalance between X1 and X2 as described earlier. In that case group > X could use 100% CPU until Z is scheduled! It also probably gets worse > as load increases and the number of scheduling opportunities for Z > decrease. > > > I don't see how task Z could solve the second problem. As with UP, in > SMP I think it depends on when Z (or one Z fixed to each CPU) is > scheduled. Wouldn't it help if Z is made to run with nice -20 (or with RT prio maybe), so that when Z wants to run (every 5 or 10 seconds) it is run immediately? This is assuming that Z can do its job of adjusting limits for all tasks "quickly" (maybe 100-200 ms?). > > I think these are simple scenarios that demonstrate the problem with > splitting resource management into accounting and control with userspace > in between. > > Cheers, > -Matt Helsley -- Regards, vatsa