From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762122AbYEWHfo (ORCPT ); Fri, 23 May 2008 03:35:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753942AbYEWHfg (ORCPT ); Fri, 23 May 2008 03:35:36 -0400 Received: from e4.ny.us.ibm.com ([32.97.182.144]:47767 "EHLO e4.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752878AbYEWHff (ORCPT ); Fri, 23 May 2008 03:35:35 -0400 Date: Fri, 23 May 2008 13:14:31 +0530 From: Srivatsa Vaddagiri To: "Chris Friesen" Cc: Peter Zijlstra , "Li, Tong N" , linux-kernel@vger.kernel.org, mingo@elte.hu, pj@sgi.com Subject: Re: fair group scheduler not so fair? Message-ID: <20080523074431.GJ3780@linux.vnet.ibm.com> Reply-To: vatsa@linux.vnet.ibm.com References: <4834B75A.40900@nortel.com> <1211439417.29104.7.camel@twins> <4835D14B.20904@nortel.com> <1211486868.6463.134.camel@lappy.programming.kicks-ass.net> <5FD5754DDBA0B1499B5A0B4BB54194850357ED61@fmsmsx411.amr.corp.intel.com> <1211490819.6463.172.camel@lappy.programming.kicks-ass.net> <48360D21.9060102@nortel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48360D21.9060102@nortel.com> User-Agent: Mutt/1.5.16 (2007-06-09) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 22, 2008 at 06:17:37PM -0600, Chris Friesen wrote: > Peter Zijlstra wrote: > >> Given the following: >> root >> / | \ >> _A_ 1 2 >> /| |\ >> 3 4 5 B >> / \ >> 6 7 >> CPU0 CPU1 >> root root >> / \ / \ >> A 1 A 2 >> / \ / \ >> 4 B 3 5 >> / \ >> 6 7 > > How do you move specific groups to different cpus. Is this simply using > cpusets? No. Moving groups to different cpus is just a group-aware extension to move_tasks() that is invoked as part of regular load balance operation. move_tasks()->sched_fair_class.load_balance() has been modified to understand how much various task-groups at various levels (ex: A at level 1, B at level 2 etc) contribute to cpu load. It moves tasks between cpus using this knowledge. For ex: if we were to consider all tasks shown above to be in same cpu, CPU0, this is how it would look: CPU0 CPU1 root root / | \ A 1 2 /| |\ 3 4 5 B / \ 6 7 Then cpu0 load = weight of A + weight of 1 + weight of 2 = 1024 + 1024 + 1024 = 3072 while cpu1 load = 0 load to be moved to cut down this imbalance = 3072/2 = 1536 move_tasks() running on CPU1 would try to pull iteratively tasks such that total weight moved is <= 1536. Task moved Total Weight moved --------- ------------ 2 1024 3 1024 + 256 = 1280 5 1280 + 256 = 1536 resulting in: CPU0 CPU1 root root / \ / \ A 1 A 2 / \ / \ 4 B 3 5 / \ 6 7 >> Numerical examples given the above scenario, assuming every body's >> weight is 1024: > >> s_(0,A) = s_(1,A) = 512 > > Just to make sure I understand what's going on...this is half of 1024 > because it shows up on both cpus? not exactly ..as Peter put it: s_(i,g) = W_g * rw_(i,g) / \Sum_j rw_(j,g) In this case, s_(0,A) = W_A * rw_(0, A) / \Sum_j rw_(j, A) W_A = shares given to A by admin = 1024 rw_(0,A) = Weight of 4 + Weight of B = 1024 + 1024 = 2048 rw_(1,A) = Weight of 3 + Weight of 5 = 1024 + 1024 = 2048 \Sum_j rw_(j, A) = 4096 So, s_(0,A) = 1024 *2048 / 4096 = 512 >> s_(0,B) = 1024, s_(1,B) = 0 > > This gets the full 1024 because it's only on one cpu. Not exactly. rw_(0, B) = \Sum_j rw_(j, B) and that's why s_(0,B) = 1024 >> rw_(0,A) = rw(1,A) = 2048 >> rw_(0,B) = 2048, rw_(1,B) = 0 > > How do we get 2048? Shouldn't this be 1024? Hope this is clarified from above. >> h_load_(0,A) = h_load_(1,A) = 512 >> h_load_(0,B) = 256, h_load(1,B) = 0 > > At this point the numbers make sense, but I'm not sure how the formula for > h_load_ works given that I'm not sure what's going on for rw_. -- Regards, vatsa