linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: balbir@linux.vnet.ibm.com
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>,
	Ingo Molnar <mingo@elte.hu>,
	linux-kernel@vger.kernel.org, Gautham Shenoy <ego@in.ibm.com>,
	"svaidy@linux.vnet.ibm.com" <svaidy@linux.vnet.ibm.com>
Subject: Re: [PATCH 11/15] sched: Pass unlimited __cpu_power information to upper domain level groups
Date: Tue, 25 Aug 2009 10:30:17 +0200	[thread overview]
Message-ID: <1251189017.7538.1114.camel@twins> (raw)
In-Reply-To: <20090825080433.GZ29572@balbir.in.ibm.com>

On Tue, 2009-08-25 at 13:34 +0530, Balbir Singh wrote:
> * Peter Zijlstra <peterz@infradead.org> [2009-08-25 09:11:14]:
> 
> > On Mon, 2009-08-24 at 23:49 +0530, Balbir Singh wrote:
> > 
> > > That reminds me, accounting is currently broken and should be based on
> > > APER/MPERF (Power gets it right - based on SPURR).
> > 
> > What accounting?
> > 
> 
> 
> We need scaled time accounting for x86 (see *timescaled). By scaled
> accounting I mean ratio of APERF/MPERF

Runtime accounting? I don't see why that would need to be scaled by a/m,
we're accounting wall-time, not a virtual time quantity that represents
work.

> > > > The trouble is that cpu_power is now abused for placement decisions too,
> > > > and that needs to be taken out.
> > > 
> > > OK.. so you propose extending the static cpu_power to dynamic
> > > cpu_power but based on current topology?
> > 
> > Right, so cpu_power is primarily used to normalize domain weight in the
> > load-balancer.
> > 
> > Suppose a 4 core machine with 1 unplugged core:
> > 
> >  0,1,3
> > 
> > 0,1  3
> > 
> > The sd-0,1 will have cpu_power 2048, while the sd-3 will have 1024, this
> > allowed find_busiest_group() for sd-0,1,3 to pick the one which is
> > relatively most overloaded.
> > 
> > Supposing 3, 2, 2 (nice0) tasks on these cores, the domain weight of
> > sd-0,1 is 5*1024 and sd-3 is 2*1024, normalized that becomes 5/2 and 2
> > resp. which clearly shows sd-0,1 to be the busiest of the pair.
> > 
> > Now back in the days Nick wrote all this, he did the cpu_power hack for
> > SMT which sets the combined cpu_power of 2 threads (that's all we had
> > back then) to 1024, because two threads share 1 core, and are roughly as
> > fast.
> > 
> > He then also used this to influence task placement, preferring to move
> > tasks to another sibling domain before getting the second thread active,
> > this worked.
> > 
> > Then multi-core with shared caches came along and people did the same
> > trick for mc power save in order to get that placement stuff, but that
> > horribly broke the load-balancer normalization.
> > 
> > Now comes multi-node, and people asking for more elaborate placement
> > strategies and all this starts creaking like a ghost house about to
> > collapse.
> > 
> > Therefore I want cpu_power back to load normalization only, and do the
> > placement stuff with something else.
> > 

> What do you have in mind for the something else? Aren't normalization
> and placement two sides of the same coin? My concern is that load
> normalization might give different recommendations from the placement
> stuff, then what do we do?

They are related but not the same. People have been asking for placement
policies that exceed the relation.

Also the current ties between them are already strained on multi-level
placement policies.

So what I'd like to see is move all placement decisions to SD_flags and
restore cpu_power to a straight sum of work capacity.

  reply	other threads:[~2009-08-25  8:31 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-20 13:12 [RFC][PATCH 0/15] sched: Fix scheduling for multi-node processors Andreas Herrmann
2009-08-20 13:15 ` [PATCH 1/15] x86, sched: Add config option for multi-node CPU scheduling Andreas Herrmann
2009-08-21 13:50   ` Valdis.Kletnieks
2009-08-24  8:49     ` Andreas Herrmann
2009-08-20 13:34 ` [PATCH 2/15] sched, x86: Provide initializer for MN scheduling domain, define MN level Andreas Herrmann
2009-08-20 13:34 ` [PATCH 3/15] sched: Add cpumask to be used when building MN domain Andreas Herrmann
2009-08-20 13:35 ` [PATCH 4/15] sched: Define per CPU variables and cpu_to_group function for " Andreas Herrmann
2009-08-20 13:36 ` [PATCH 5/15] sched: Add function to build MN sched domain Andreas Herrmann
2009-08-20 13:37 ` [PATCH 6/15] sched: Add support for MN domain in build_sched_groups Andreas Herrmann
2009-08-20 13:38 ` [PATCH 7/15] sched: Activate build of MN domains Andreas Herrmann
2009-08-20 13:39 ` [PATCH 8/15] sched: Add parameter sched_mn_power_savings to control MN domain sched policy Andreas Herrmann
2009-08-24 14:56   ` Peter Zijlstra
2009-08-24 15:32     ` Vaidyanathan Srinivasan
2009-08-24 15:45       ` Peter Zijlstra
2009-08-25  7:52         ` Andreas Herrmann
2009-08-25  7:50       ` Andreas Herrmann
2009-08-25  6:24     ` Andreas Herrmann
2009-08-25  6:41       ` Peter Zijlstra
2009-08-25  8:38         ` Andreas Herrmann
2009-08-26  9:30   ` Gautham R Shenoy
2009-08-27 12:47     ` Andreas Herrmann
2009-08-20 13:40 ` [PATCH 9/15] sched: Check sched_mn_power_savings when setting flags for CPU and MN domains Andreas Herrmann
2009-08-24 14:57   ` Peter Zijlstra
2009-08-25  9:34     ` Gautham R Shenoy
2009-08-26 10:01   ` Gautham R Shenoy
2009-08-20 13:41 ` [PATCH 10/15] sched: Check for sched_mn_power_savings when doing load balancing Andreas Herrmann
2009-08-24 15:03   ` Peter Zijlstra
2009-08-24 15:40     ` Vaidyanathan Srinivasan
2009-08-25  8:00       ` Andreas Herrmann
2009-08-20 13:41 ` [PATCH 11/15] sched: Pass unlimited __cpu_power information to upper domain level groups Andreas Herrmann
2009-08-24 15:21   ` Peter Zijlstra
2009-08-24 16:44     ` Balbir Singh
2009-08-24 17:26       ` Peter Zijlstra
2009-08-24 18:19         ` Balbir Singh
2009-08-25  7:11           ` Peter Zijlstra
2009-08-25  8:04             ` Balbir Singh
2009-08-25  8:30               ` Peter Zijlstra [this message]
2009-08-25  8:51     ` Andreas Herrmann
2009-08-20 13:42 ` [PATCH 12/15] sched: Allow NODE domain to be parent of MC instead of CPU domain Andreas Herrmann
2009-08-24 15:32   ` Peter Zijlstra
2009-08-25  8:55     ` Andreas Herrmann
2009-08-20 13:43 ` [PATCH 13/15] sched: Detect child domain of NUMA (aka NODE) domain Andreas Herrmann
2009-08-24 15:34   ` Peter Zijlstra
2009-08-25  9:13     ` Andreas Herrmann
2009-08-20 13:45 ` [PATCH 14/15] sched: Conditionally limit __cpu_power when child sched domain has type NODE Andreas Herrmann
2009-08-24 15:35   ` Peter Zijlstra
2009-08-25  9:19     ` Andreas Herrmann
2009-08-20 13:46 ` [PATCH 15/15] x86: Fix cpu_coregroup_mask to return correct cpumask on multi-node processors Andreas Herrmann
2009-08-24 15:36   ` Peter Zijlstra
2009-08-24 18:21     ` Ingo Molnar
2009-08-25 10:13       ` Andreas Herrmann
2009-08-25 10:36         ` Ingo Molnar
2009-08-27 13:18           ` Andreas Herrmann
2009-08-25  9:31     ` Andreas Herrmann
2009-08-25  9:55       ` Peter Zijlstra
2009-08-25 10:20         ` Ingo Molnar
2009-08-25 10:24         ` Andreas Herrmann
2009-08-25 10:28           ` Ingo Molnar
2009-08-25 10:35           ` Peter Zijlstra
2009-08-27 15:42             ` Andreas Herrmann
2009-08-27 15:25         ` Andreas Herrmann
2009-08-28 10:39           ` Peter Zijlstra
2009-08-28 12:03             ` Andreas Herrmann
2009-08-28 12:50               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1251189017.7538.1114.camel@twins \
    --to=peterz@infradead.org \
    --cc=andreas.herrmann3@amd.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=ego@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=svaidy@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).