* [PATCH/RFC 1/5] [PATCH] sched: merge cpu_to_core_group functions
2010-08-12 17:25 [PATCH/RFC 0/5] sched: add new 'book' scheduling domain Heiko Carstens
@ 2010-08-12 17:25 ` Heiko Carstens
2010-08-13 21:11 ` Suresh Siddha
2010-08-12 17:25 ` [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store Heiko Carstens
` (4 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Heiko Carstens @ 2010-08-12 17:25 UTC (permalink / raw)
To: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Suresh Siddha,
Andreas Herrmann
Cc: linux-kernel, Martin Schwidefsky, Heiko Carstens
[-- Attachment #1: 01-sched-cputocore.diff --]
[-- Type: text/plain, Size: 1634 bytes --]
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Merge and simplify the two cpu_to_core_group variants so that the
resulting function follows the same pattern like cpu_to_phys_group.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
kernel/sched.c | 18 +++++-------------
1 file changed, 5 insertions(+), 13 deletions(-)
diff -urpN linux-2.6/kernel/sched.c linux-2.6-patched/kernel/sched.c
--- linux-2.6/kernel/sched.c 2010-08-11 13:47:16.000000000 +0200
+++ linux-2.6-patched/kernel/sched.c 2010-08-11 13:47:22.000000000 +0200
@@ -6546,31 +6546,23 @@ cpu_to_cpu_group(int cpu, const struct c
#ifdef CONFIG_SCHED_MC
static DEFINE_PER_CPU(struct static_sched_domain, core_domains);
static DEFINE_PER_CPU(struct static_sched_group, sched_group_core);
-#endif /* CONFIG_SCHED_MC */
-#if defined(CONFIG_SCHED_MC) && defined(CONFIG_SCHED_SMT)
static int
cpu_to_core_group(int cpu, const struct cpumask *cpu_map,
struct sched_group **sg, struct cpumask *mask)
{
int group;
-
+#ifdef CONFIG_SCHED_SMT
cpumask_and(mask, topology_thread_cpumask(cpu), cpu_map);
group = cpumask_first(mask);
+#else
+ group = cpu;
+#endif
if (sg)
*sg = &per_cpu(sched_group_core, group).sg;
return group;
}
-#elif defined(CONFIG_SCHED_MC)
-static int
-cpu_to_core_group(int cpu, const struct cpumask *cpu_map,
- struct sched_group **sg, struct cpumask *unused)
-{
- if (sg)
- *sg = &per_cpu(sched_group_core, cpu).sg;
- return cpu;
-}
-#endif
+#endif /* CONFIG_SCHED_MC */
static DEFINE_PER_CPU(struct static_sched_domain, phys_domains);
static DEFINE_PER_CPU(struct static_sched_group, sched_group_phys);
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH/RFC 1/5] [PATCH] sched: merge cpu_to_core_group functions
2010-08-12 17:25 ` [PATCH/RFC 1/5] [PATCH] sched: merge cpu_to_core_group functions Heiko Carstens
@ 2010-08-13 21:11 ` Suresh Siddha
2010-08-31 8:26 ` Heiko Carstens
0 siblings, 1 reply; 17+ messages in thread
From: Suresh Siddha @ 2010-08-13 21:11 UTC (permalink / raw)
To: Heiko Carstens
Cc: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Andreas Herrmann,
linux-kernel@vger.kernel.org, Martin Schwidefsky
On Thu, 2010-08-12 at 10:25 -0700, Heiko Carstens wrote:
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>
> Merge and simplify the two cpu_to_core_group variants so that the
> resulting function follows the same pattern like cpu_to_phys_group.
>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
> ---
>
> kernel/sched.c | 18 +++++-------------
> 1 file changed, 5 insertions(+), 13 deletions(-)
>
> diff -urpN linux-2.6/kernel/sched.c linux-2.6-patched/kernel/sched.c
> --- linux-2.6/kernel/sched.c 2010-08-11 13:47:16.000000000 +0200
> +++ linux-2.6-patched/kernel/sched.c 2010-08-11 13:47:22.000000000 +0200
> @@ -6546,31 +6546,23 @@ cpu_to_cpu_group(int cpu, const struct c
> #ifdef CONFIG_SCHED_MC
> static DEFINE_PER_CPU(struct static_sched_domain, core_domains);
> static DEFINE_PER_CPU(struct static_sched_group, sched_group_core);
> -#endif /* CONFIG_SCHED_MC */
>
> -#if defined(CONFIG_SCHED_MC) && defined(CONFIG_SCHED_SMT)
> static int
> cpu_to_core_group(int cpu, const struct cpumask *cpu_map,
> struct sched_group **sg, struct cpumask *mask)
> {
> int group;
> -
> +#ifdef CONFIG_SCHED_SMT
> cpumask_and(mask, topology_thread_cpumask(cpu), cpu_map);
> group = cpumask_first(mask);
> +#else
> + group = cpu;
> +#endif
> if (sg)
> *sg = &per_cpu(sched_group_core, group).sg;
> return group;
> }
> -#elif defined(CONFIG_SCHED_MC)
> -static int
> -cpu_to_core_group(int cpu, const struct cpumask *cpu_map,
> - struct sched_group **sg, struct cpumask *unused)
> -{
> - if (sg)
> - *sg = &per_cpu(sched_group_core, cpu).sg;
> - return cpu;
> -}
> -#endif
> +#endif /* CONFIG_SCHED_MC */
>
> static DEFINE_PER_CPU(struct static_sched_domain, phys_domains);
> static DEFINE_PER_CPU(struct static_sched_group, sched_group_phys);
Reason why this code was structured like this was because of the
feedback from Andrew Morton. http://lkml.org/lkml/2006/1/27/308
May be we can further clean all this code up as part of your new
proposal. I can help in some of this. Thanks.
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH/RFC 1/5] [PATCH] sched: merge cpu_to_core_group functions
2010-08-13 21:11 ` Suresh Siddha
@ 2010-08-31 8:26 ` Heiko Carstens
0 siblings, 0 replies; 17+ messages in thread
From: Heiko Carstens @ 2010-08-31 8:26 UTC (permalink / raw)
To: Suresh Siddha
Cc: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Andreas Herrmann,
linux-kernel@vger.kernel.org, Martin Schwidefsky
On Fri, Aug 13, 2010 at 02:11:54PM -0700, Suresh Siddha wrote:
> On Thu, 2010-08-12 at 10:25 -0700, Heiko Carstens wrote:
> > diff -urpN linux-2.6/kernel/sched.c linux-2.6-patched/kernel/sched.c
> > --- linux-2.6/kernel/sched.c 2010-08-11 13:47:16.000000000 +0200
> > +++ linux-2.6-patched/kernel/sched.c 2010-08-11 13:47:22.000000000 +0200
> > @@ -6546,31 +6546,23 @@ cpu_to_cpu_group(int cpu, const struct c
> > #ifdef CONFIG_SCHED_MC
> > static DEFINE_PER_CPU(struct static_sched_domain, core_domains);
> > static DEFINE_PER_CPU(struct static_sched_group, sched_group_core);
> > -#endif /* CONFIG_SCHED_MC */
> >
> > -#if defined(CONFIG_SCHED_MC) && defined(CONFIG_SCHED_SMT)
> > static int
> > cpu_to_core_group(int cpu, const struct cpumask *cpu_map,
> > struct sched_group **sg, struct cpumask *mask)
> > {
> > int group;
> > -
> > +#ifdef CONFIG_SCHED_SMT
> > cpumask_and(mask, topology_thread_cpumask(cpu), cpu_map);
> > group = cpumask_first(mask);
> > +#else
> > + group = cpu;
> > +#endif
> > if (sg)
> > *sg = &per_cpu(sched_group_core, group).sg;
> > return group;
> > }
> > -#elif defined(CONFIG_SCHED_MC)
> > -static int
> > -cpu_to_core_group(int cpu, const struct cpumask *cpu_map,
> > - struct sched_group **sg, struct cpumask *unused)
> > -{
> > - if (sg)
> > - *sg = &per_cpu(sched_group_core, cpu).sg;
> > - return cpu;
> > -}
> > -#endif
> > +#endif /* CONFIG_SCHED_MC */
> >
> > static DEFINE_PER_CPU(struct static_sched_domain, phys_domains);
> > static DEFINE_PER_CPU(struct static_sched_group, sched_group_phys);
>
> Reason why this code was structured like this was because of the
> feedback from Andrew Morton. http://lkml.org/lkml/2006/1/27/308
Well, if I wouldn't merge this then the upcoming cpu_to_book_group
function would just be horribly long and unreadable. I think merging
this so it looks the same like cpu_to_phys_group is the right thing
to do.
If I wouldn't do that then the cpu_to_book_group function would just
be a real big mess instead of quite simple function.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store
2010-08-12 17:25 [PATCH/RFC 0/5] sched: add new 'book' scheduling domain Heiko Carstens
2010-08-12 17:25 ` [PATCH/RFC 1/5] [PATCH] sched: merge cpu_to_core_group functions Heiko Carstens
@ 2010-08-12 17:25 ` Heiko Carstens
2010-08-13 21:13 ` Suresh Siddha
2010-08-16 8:29 ` Peter Zijlstra
2010-08-12 17:25 ` [PATCH/RFC 3/5] [PATCH] sched: add book scheduling domain Heiko Carstens
` (3 subsequent siblings)
5 siblings, 2 replies; 17+ messages in thread
From: Heiko Carstens @ 2010-08-12 17:25 UTC (permalink / raw)
To: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Suresh Siddha,
Andreas Herrmann
Cc: linux-kernel, Martin Schwidefsky, Heiko Carstens
[-- Attachment #1: 02-sched-powersavings.diff --]
[-- Type: text/plain, Size: 2037 bytes --]
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Pass the corresponding sched domain level to sched_power_savings_store instead
of a yes/no flag which indicates if the level is SMT or MC.
This is needed to easily extend the function so it can be used for a third
level.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
kernel/sched.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff -urpN linux-2.6/kernel/sched.c linux-2.6-patched/kernel/sched.c
--- linux-2.6/kernel/sched.c 2010-08-11 13:47:22.000000000 +0200
+++ linux-2.6-patched/kernel/sched.c 2010-08-11 13:47:22.000000000 +0200
@@ -7380,7 +7380,8 @@ static void arch_reinit_sched_domains(vo
put_online_cpus();
}
-static ssize_t sched_power_savings_store(const char *buf, size_t count, int smt)
+static ssize_t sched_power_savings_store(const char *buf, size_t count,
+ enum sched_domain_level sd_level)
{
unsigned int level = 0;
@@ -7397,10 +7398,16 @@ static ssize_t sched_power_savings_store
if (level >= MAX_POWERSAVINGS_BALANCE_LEVELS)
return -EINVAL;
- if (smt)
+ switch (sd_level) {
+ case SD_LV_SIBLING:
sched_smt_power_savings = level;
- else
+ break;
+ case SD_LV_MC:
sched_mc_power_savings = level;
+ break;
+ default:
+ break;
+ }
arch_reinit_sched_domains();
@@ -7418,7 +7425,7 @@ static ssize_t sched_mc_power_savings_st
struct sysdev_class_attribute *attr,
const char *buf, size_t count)
{
- return sched_power_savings_store(buf, count, 0);
+ return sched_power_savings_store(buf, count, SD_LV_MC);
}
static SYSDEV_CLASS_ATTR(sched_mc_power_savings, 0644,
sched_mc_power_savings_show,
@@ -7436,7 +7443,7 @@ static ssize_t sched_smt_power_savings_s
struct sysdev_class_attribute *attr,
const char *buf, size_t count)
{
- return sched_power_savings_store(buf, count, 1);
+ return sched_power_savings_store(buf, count, SD_LV_SIBLING);
}
static SYSDEV_CLASS_ATTR(sched_smt_power_savings, 0644,
sched_smt_power_savings_show,
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store
2010-08-12 17:25 ` [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store Heiko Carstens
@ 2010-08-13 21:13 ` Suresh Siddha
2010-08-19 11:36 ` Andreas Herrmann
2010-08-16 8:29 ` Peter Zijlstra
1 sibling, 1 reply; 17+ messages in thread
From: Suresh Siddha @ 2010-08-13 21:13 UTC (permalink / raw)
To: Heiko Carstens
Cc: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Andreas Herrmann,
linux-kernel@vger.kernel.org, Martin Schwidefsky
On Thu, 2010-08-12 at 10:25 -0700, Heiko Carstens wrote:
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>
> Pass the corresponding sched domain level to sched_power_savings_store instead
> of a yes/no flag which indicates if the level is SMT or MC.
> This is needed to easily extend the function so it can be used for a third
> level.
>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
> ---
>
> kernel/sched.c | 17 ++++++++++++-----
> 1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff -urpN linux-2.6/kernel/sched.c linux-2.6-patched/kernel/sched.c
> --- linux-2.6/kernel/sched.c 2010-08-11 13:47:22.000000000 +0200
> +++ linux-2.6-patched/kernel/sched.c 2010-08-11 13:47:22.000000000 +0200
> @@ -7380,7 +7380,8 @@ static void arch_reinit_sched_domains(vo
> put_online_cpus();
> }
>
> -static ssize_t sched_power_savings_store(const char *buf, size_t count, int smt)
> +static ssize_t sched_power_savings_store(const char *buf, size_t count,
> + enum sched_domain_level sd_level)
> {
> unsigned int level = 0;
>
> @@ -7397,10 +7398,16 @@ static ssize_t sched_power_savings_store
> if (level >= MAX_POWERSAVINGS_BALANCE_LEVELS)
> return -EINVAL;
>
> - if (smt)
> + switch (sd_level) {
> + case SD_LV_SIBLING:
> sched_smt_power_savings = level;
> - else
> + break;
> + case SD_LV_MC:
> sched_mc_power_savings = level;
> + break;
> + default:
> + break;
> + }
>
> arch_reinit_sched_domains();
>
> @@ -7418,7 +7425,7 @@ static ssize_t sched_mc_power_savings_st
> struct sysdev_class_attribute *attr,
> const char *buf, size_t count)
> {
> - return sched_power_savings_store(buf, count, 0);
> + return sched_power_savings_store(buf, count, SD_LV_MC);
> }
> static SYSDEV_CLASS_ATTR(sched_mc_power_savings, 0644,
> sched_mc_power_savings_show,
> @@ -7436,7 +7443,7 @@ static ssize_t sched_smt_power_savings_s
> struct sysdev_class_attribute *attr,
> const char *buf, size_t count)
> {
> - return sched_power_savings_store(buf, count, 1);
> + return sched_power_savings_store(buf, count, SD_LV_SIBLING);
> }
> static SYSDEV_CLASS_ATTR(sched_smt_power_savings, 0644,
> sched_smt_power_savings_show,
>
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store
2010-08-13 21:13 ` Suresh Siddha
@ 2010-08-19 11:36 ` Andreas Herrmann
0 siblings, 0 replies; 17+ messages in thread
From: Andreas Herrmann @ 2010-08-19 11:36 UTC (permalink / raw)
To: Suresh Siddha
Cc: Heiko Carstens, Peter Zijlstra, Mike Galbraith, Ingo Molnar,
linux-kernel@vger.kernel.org, Martin Schwidefsky
On Fri, Aug 13, 2010 at 05:13:40PM -0400, Suresh Siddha wrote:
> On Thu, 2010-08-12 at 10:25 -0700, Heiko Carstens wrote:
> > From: Heiko Carstens <heiko.carstens@de.ibm.com>
> >
> > Pass the corresponding sched domain level to sched_power_savings_store instead
> > of a yes/no flag which indicates if the level is SMT or MC.
> > This is needed to easily extend the function so it can be used for a third
> > level.
> >
> > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>
> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store
2010-08-12 17:25 ` [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store Heiko Carstens
2010-08-13 21:13 ` Suresh Siddha
@ 2010-08-16 8:29 ` Peter Zijlstra
2010-08-19 11:41 ` Andreas Herrmann
1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2010-08-16 8:29 UTC (permalink / raw)
To: Heiko Carstens
Cc: Mike Galbraith, Ingo Molnar, Suresh Siddha, Andreas Herrmann,
linux-kernel, Martin Schwidefsky
On Thu, 2010-08-12 at 19:25 +0200, Heiko Carstens wrote:
> Pass the corresponding sched domain level to sched_power_savings_store instead
> of a yes/no flag which indicates if the level is SMT or MC.
> This is needed to easily extend the function so it can be used for a third
> level.
Ah, so the plan is to reduce the number of knobs, not create more.
Sysadmins really aren't interested in having a powersavings knob per
topology level.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store
2010-08-16 8:29 ` Peter Zijlstra
@ 2010-08-19 11:41 ` Andreas Herrmann
2010-08-19 12:35 ` Peter Zijlstra
0 siblings, 1 reply; 17+ messages in thread
From: Andreas Herrmann @ 2010-08-19 11:41 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Heiko Carstens, Mike Galbraith, Ingo Molnar, Suresh Siddha,
linux-kernel@vger.kernel.org, Martin Schwidefsky
On Mon, Aug 16, 2010 at 04:29:57AM -0400, Peter Zijlstra wrote:
> On Thu, 2010-08-12 at 19:25 +0200, Heiko Carstens wrote:
> > Pass the corresponding sched domain level to sched_power_savings_store instead
> > of a yes/no flag which indicates if the level is SMT or MC.
> > This is needed to easily extend the function so it can be used for a third
> > level.
>
> Ah, so the plan is to reduce the number of knobs, not create more.
Don't think so.
> Sysadmins really aren't interested in having a powersavings knob per
> topology level.
It just allows to use the same store functions for three instead of
two different knobs.
Andreas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store
2010-08-19 11:41 ` Andreas Herrmann
@ 2010-08-19 12:35 ` Peter Zijlstra
2010-08-19 12:32 ` Andreas Herrmann
0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2010-08-19 12:35 UTC (permalink / raw)
To: Andreas Herrmann
Cc: Heiko Carstens, Mike Galbraith, Ingo Molnar, Suresh Siddha,
linux-kernel@vger.kernel.org, Martin Schwidefsky
On Thu, 2010-08-19 at 13:41 +0200, Andreas Herrmann wrote:
> It just allows to use the same store functions for three instead of
> two different knobs.
Creating more knobs for powersave scheduling is a fail.
We already have 2^3 powersave scheduling states, it should be decreased
to 2 (namely on/off), not increased to 3^3.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store
2010-08-19 12:35 ` Peter Zijlstra
@ 2010-08-19 12:32 ` Andreas Herrmann
0 siblings, 0 replies; 17+ messages in thread
From: Andreas Herrmann @ 2010-08-19 12:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Heiko Carstens, Mike Galbraith, Ingo Molnar, Suresh Siddha,
linux-kernel@vger.kernel.org, Martin Schwidefsky
On Thu, Aug 19, 2010 at 08:35:11AM -0400, Peter Zijlstra wrote:
> On Thu, 2010-08-19 at 13:41 +0200, Andreas Herrmann wrote:
> > It just allows to use the same store functions for three instead of
> > two different knobs.
>
> Creating more knobs for powersave scheduling is a fail.
>
> We already have 2^3 powersave scheduling states, it should be decreased
> to 2 (namely on/off), not increased to 3^3.
I think it should be possible to select a domain level at which power
saving scheduling should happen (this would result in 3 states in the
z196 case).
Andreas
--
Operating | Advanced Micro Devices GmbH
System | Einsteinring 24, 85609 Dornach b. München, Germany
Research | Geschäftsführer: Alberto Bozzo, Andrew Bowd
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
(OSRC) | Registergericht München, HRB Nr. 43632
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH/RFC 3/5] [PATCH] sched: add book scheduling domain
2010-08-12 17:25 [PATCH/RFC 0/5] sched: add new 'book' scheduling domain Heiko Carstens
2010-08-12 17:25 ` [PATCH/RFC 1/5] [PATCH] sched: merge cpu_to_core_group functions Heiko Carstens
2010-08-12 17:25 ` [PATCH/RFC 2/5] [PATCH] sched: pass sched_domain_level to sched_power_savings_store Heiko Carstens
@ 2010-08-12 17:25 ` Heiko Carstens
2010-08-13 21:22 ` Suresh Siddha
2010-08-12 17:25 ` [PATCH/RFC 4/5] [PATCH] topology/sysfs: provide book id and siblings attributes Heiko Carstens
` (2 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Heiko Carstens @ 2010-08-12 17:25 UTC (permalink / raw)
To: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Suresh Siddha,
Andreas Herrmann
Cc: linux-kernel, Martin Schwidefsky, Heiko Carstens
[-- Attachment #1: 03-sched-book.diff --]
[-- Type: text/plain, Size: 12431 bytes --]
From: Heiko Carstens <heiko.carstens@de.ibm.com>
On top of the SMT and MC scheduling domains this adds the BOOK scheduling
domain. This is useful for machines that have a four level cache hierarchy
and but do not fall into the NUMA category.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
arch/s390/defconfig | 1
include/linux/sched.h | 19 +++++++
include/linux/topology.h | 6 ++
kernel/sched.c | 112 ++++++++++++++++++++++++++++++++++++++++++++---
kernel/sched_fair.c | 11 ++--
5 files changed, 137 insertions(+), 12 deletions(-)
diff -urpN linux-2.6/arch/s390/defconfig linux-2.6-patched/arch/s390/defconfig
--- linux-2.6/arch/s390/defconfig 2010-08-02 00:11:14.000000000 +0200
+++ linux-2.6-patched/arch/s390/defconfig 2010-08-11 13:47:23.000000000 +0200
@@ -248,6 +248,7 @@ CONFIG_64BIT=y
CONFIG_SMP=y
CONFIG_NR_CPUS=32
CONFIG_HOTPLUG_CPU=y
+# CONFIG_SCHED_BOOK is not set
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_AUDIT_ARCH=y
diff -urpN linux-2.6/include/linux/sched.h linux-2.6-patched/include/linux/sched.h
--- linux-2.6/include/linux/sched.h 2010-08-11 13:47:16.000000000 +0200
+++ linux-2.6-patched/include/linux/sched.h 2010-08-11 13:47:23.000000000 +0200
@@ -807,7 +807,9 @@ enum powersavings_balance_level {
MAX_POWERSAVINGS_BALANCE_LEVELS
};
-extern int sched_mc_power_savings, sched_smt_power_savings;
+extern int sched_smt_power_savings;
+extern int sched_mc_power_savings;
+extern int sched_book_power_savings;
static inline int sd_balance_for_mc_power(void)
{
@@ -820,11 +822,23 @@ static inline int sd_balance_for_mc_powe
return 0;
}
-static inline int sd_balance_for_package_power(void)
+static inline int sd_balance_for_book_power(void)
{
if (sched_mc_power_savings | sched_smt_power_savings)
return SD_POWERSAVINGS_BALANCE;
+ if (!sched_book_power_savings)
+ return SD_PREFER_SIBLING;
+
+ return 0;
+}
+
+static inline int sd_balance_for_package_power(void)
+{
+ if (sched_book_power_savings | sched_mc_power_savings |
+ sched_smt_power_savings)
+ return SD_POWERSAVINGS_BALANCE;
+
return SD_PREFER_SIBLING;
}
@@ -875,6 +889,7 @@ enum sched_domain_level {
SD_LV_NONE = 0,
SD_LV_SIBLING,
SD_LV_MC,
+ SD_LV_BOOK,
SD_LV_CPU,
SD_LV_NODE,
SD_LV_ALLNODES,
diff -urpN linux-2.6/include/linux/topology.h linux-2.6-patched/include/linux/topology.h
--- linux-2.6/include/linux/topology.h 2010-08-11 13:47:16.000000000 +0200
+++ linux-2.6-patched/include/linux/topology.h 2010-08-11 13:47:23.000000000 +0200
@@ -201,6 +201,12 @@ int arch_update_cpu_topology(void);
.balance_interval = 64, \
}
+#ifdef CONFIG_SCHED_BOOK
+#ifndef SD_BOOK_INIT
+#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
+#endif
+#endif /* CONFIG_SCHED_BOOK */
+
#ifdef CONFIG_NUMA
#ifndef SD_NODE_INIT
#error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
diff -urpN linux-2.6/kernel/sched.c linux-2.6-patched/kernel/sched.c
--- linux-2.6/kernel/sched.c 2010-08-11 13:47:23.000000000 +0200
+++ linux-2.6-patched/kernel/sched.c 2010-08-11 13:47:23.000000000 +0200
@@ -6472,7 +6472,9 @@ static void sched_domain_node_span(int n
}
#endif /* CONFIG_NUMA */
-int sched_smt_power_savings = 0, sched_mc_power_savings = 0;
+int sched_smt_power_savings;
+int sched_mc_power_savings;
+int sched_book_power_savings;
/*
* The cpus mask in sched_group and sched_domain hangs off the end.
@@ -6500,6 +6502,7 @@ struct s_data {
cpumask_var_t nodemask;
cpumask_var_t this_sibling_map;
cpumask_var_t this_core_map;
+ cpumask_var_t this_book_map;
cpumask_var_t send_covered;
cpumask_var_t tmpmask;
struct sched_group **sched_group_nodes;
@@ -6511,6 +6514,7 @@ enum s_alloc {
sa_rootdomain,
sa_tmpmask,
sa_send_covered,
+ sa_this_book_map,
sa_this_core_map,
sa_this_sibling_map,
sa_nodemask,
@@ -6564,6 +6568,31 @@ cpu_to_core_group(int cpu, const struct
}
#endif /* CONFIG_SCHED_MC */
+/*
+ * book sched-domains:
+ */
+#ifdef CONFIG_SCHED_BOOK
+static DEFINE_PER_CPU(struct static_sched_domain, book_domains);
+static DEFINE_PER_CPU(struct static_sched_group, sched_group_book);
+
+static int
+cpu_to_book_group(int cpu, const struct cpumask *cpu_map,
+ struct sched_group **sg, struct cpumask *mask)
+{
+ int group = cpu;
+#ifdef CONFIG_SCHED_MC
+ cpumask_and(mask, cpu_coregroup_mask(cpu), cpu_map);
+ group = cpumask_first(mask);
+#elif defined(CONFIG_SCHED_SMT)
+ cpumask_and(mask, topology_thread_cpumask(cpu), cpu_map);
+ group = cpumask_first(mask);
+#endif
+ if (sg)
+ *sg = &per_cpu(sched_group_book, group).sg;
+ return group;
+}
+#endif /* CONFIG_SCHED_BOOK */
+
static DEFINE_PER_CPU(struct static_sched_domain, phys_domains);
static DEFINE_PER_CPU(struct static_sched_group, sched_group_phys);
@@ -6572,7 +6601,10 @@ cpu_to_phys_group(int cpu, const struct
struct sched_group **sg, struct cpumask *mask)
{
int group;
-#ifdef CONFIG_SCHED_MC
+#ifdef CONFIG_SCHED_BOOK
+ cpumask_and(mask, cpu_book_mask(cpu), cpu_map);
+ group = cpumask_first(mask);
+#elif defined(CONFIG_SCHED_MC)
cpumask_and(mask, cpu_coregroup_mask(cpu), cpu_map);
group = cpumask_first(mask);
#elif defined(CONFIG_SCHED_SMT)
@@ -6833,6 +6865,9 @@ SD_INIT_FUNC(CPU)
#ifdef CONFIG_SCHED_MC
SD_INIT_FUNC(MC)
#endif
+#ifdef CONFIG_SCHED_BOOK
+ SD_INIT_FUNC(BOOK)
+#endif
static int default_relax_domain_level = -1;
@@ -6882,6 +6917,8 @@ static void __free_domain_allocs(struct
free_cpumask_var(d->tmpmask); /* fall through */
case sa_send_covered:
free_cpumask_var(d->send_covered); /* fall through */
+ case sa_this_book_map:
+ free_cpumask_var(d->this_book_map); /* fall through */
case sa_this_core_map:
free_cpumask_var(d->this_core_map); /* fall through */
case sa_this_sibling_map:
@@ -6928,8 +6965,10 @@ static enum s_alloc __visit_domain_alloc
return sa_nodemask;
if (!alloc_cpumask_var(&d->this_core_map, GFP_KERNEL))
return sa_this_sibling_map;
- if (!alloc_cpumask_var(&d->send_covered, GFP_KERNEL))
+ if (!alloc_cpumask_var(&d->this_book_map, GFP_KERNEL))
return sa_this_core_map;
+ if (!alloc_cpumask_var(&d->send_covered, GFP_KERNEL))
+ return sa_this_book_map;
if (!alloc_cpumask_var(&d->tmpmask, GFP_KERNEL))
return sa_send_covered;
d->rd = alloc_rootdomain();
@@ -6987,6 +7026,23 @@ static struct sched_domain *__build_cpu_
return sd;
}
+static struct sched_domain *__build_book_sched_domain(struct s_data *d,
+ const struct cpumask *cpu_map, struct sched_domain_attr *attr,
+ struct sched_domain *parent, int i)
+{
+ struct sched_domain *sd = parent;
+#ifdef CONFIG_SCHED_BOOK
+ sd = &per_cpu(book_domains, i).sd;
+ SD_INIT(sd, BOOK);
+ set_domain_attribute(sd, attr);
+ cpumask_and(sched_domain_span(sd), cpu_map, cpu_book_mask(i));
+ sd->parent = parent;
+ parent->child = sd;
+ cpu_to_book_group(i, cpu_map, &sd->groups, d->tmpmask);
+#endif
+ return sd;
+}
+
static struct sched_domain *__build_mc_sched_domain(struct s_data *d,
const struct cpumask *cpu_map, struct sched_domain_attr *attr,
struct sched_domain *parent, int i)
@@ -7044,6 +7100,15 @@ static void build_sched_groups(struct s_
d->send_covered, d->tmpmask);
break;
#endif
+#ifdef CONFIG_SCHED_BOOK
+ case SD_LV_BOOK: /* set up book groups */
+ cpumask_and(d->this_book_map, cpu_map, cpu_book_mask(cpu));
+ if (cpu == cpumask_first(d->this_book_map))
+ init_sched_build_groups(d->this_book_map, cpu_map,
+ &cpu_to_book_group,
+ d->send_covered, d->tmpmask);
+ break;
+#endif
case SD_LV_CPU: /* set up physical groups */
cpumask_and(d->nodemask, cpumask_of_node(cpu), cpu_map);
if (!cpumask_empty(d->nodemask))
@@ -7091,12 +7156,14 @@ static int __build_sched_domains(const s
sd = __build_numa_sched_domains(&d, cpu_map, attr, i);
sd = __build_cpu_sched_domain(&d, cpu_map, attr, sd, i);
+ sd = __build_book_sched_domain(&d, cpu_map, attr, sd, i);
sd = __build_mc_sched_domain(&d, cpu_map, attr, sd, i);
sd = __build_smt_sched_domain(&d, cpu_map, attr, sd, i);
}
for_each_cpu(i, cpu_map) {
build_sched_groups(&d, SD_LV_SIBLING, cpu_map, i);
+ build_sched_groups(&d, SD_LV_BOOK, cpu_map, i);
build_sched_groups(&d, SD_LV_MC, cpu_map, i);
}
@@ -7127,6 +7194,12 @@ static int __build_sched_domains(const s
init_sched_groups_power(i, sd);
}
#endif
+#ifdef CONFIG_SCHED_BOOK
+ for_each_cpu(i, cpu_map) {
+ sd = &per_cpu(book_domains, i).sd;
+ init_sched_groups_power(i, sd);
+ }
+#endif
for_each_cpu(i, cpu_map) {
sd = &per_cpu(phys_domains, i).sd;
@@ -7152,6 +7225,8 @@ static int __build_sched_domains(const s
sd = &per_cpu(cpu_domains, i).sd;
#elif defined(CONFIG_SCHED_MC)
sd = &per_cpu(core_domains, i).sd;
+#elif defined(CONFIG_SCHED_BOOK)
+ sd = &per_cpu(book_domains, i).sd;
#else
sd = &per_cpu(phys_domains, i).sd;
#endif
@@ -7368,7 +7443,8 @@ match2:
mutex_unlock(&sched_domains_mutex);
}
-#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
+#if defined(CONFIG_SCHED_BOOK) || defined(CONFIG_SCHED_MC) || \
+ defined(CONFIG_SCHED_SMT)
static void arch_reinit_sched_domains(void)
{
get_online_cpus();
@@ -7405,6 +7481,9 @@ static ssize_t sched_power_savings_store
case SD_LV_MC:
sched_mc_power_savings = level;
break;
+ case SD_LV_BOOK:
+ sched_book_power_savings = level;
+ break;
default:
break;
}
@@ -7414,6 +7493,24 @@ static ssize_t sched_power_savings_store
return count;
}
+#ifdef CONFIG_SCHED_BOOK
+static ssize_t sched_book_power_savings_show(struct sysdev_class *class,
+ struct sysdev_class_attribute *attr,
+ char *page)
+{
+ return sprintf(page, "%u\n", sched_book_power_savings);
+}
+static ssize_t sched_book_power_savings_store(struct sysdev_class *class,
+ struct sysdev_class_attribute *attr,
+ const char *buf, size_t count)
+{
+ return sched_power_savings_store(buf, count, SD_LV_BOOK);
+}
+static SYSDEV_CLASS_ATTR(sched_book_power_savings, 0644,
+ sched_book_power_savings_show,
+ sched_book_power_savings_store);
+#endif
+
#ifdef CONFIG_SCHED_MC
static ssize_t sched_mc_power_savings_show(struct sysdev_class *class,
struct sysdev_class_attribute *attr,
@@ -7464,9 +7561,14 @@ int __init sched_create_sysfs_power_savi
err = sysfs_create_file(&cls->kset.kobj,
&attr_sched_mc_power_savings.attr);
#endif
+#ifdef CONFIG_SCHED_BOOK
+ if (!err && book_capable())
+ err = sysfs_create_file(&cls->kset.kobj,
+ &attr_sched_book_power_savings.attr);
+#endif
return err;
}
-#endif /* CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
+#endif /* CONFIG_SCHED_BOOK || CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
/*
* Update cpusets according to cpu_active mask. If cpusets are
diff -urpN linux-2.6/kernel/sched_fair.c linux-2.6-patched/kernel/sched_fair.c
--- linux-2.6/kernel/sched_fair.c 2010-08-11 13:47:16.000000000 +0200
+++ linux-2.6-patched/kernel/sched_fair.c 2010-08-11 13:47:23.000000000 +0200
@@ -2039,7 +2039,8 @@ struct sd_lb_stats {
unsigned long busiest_group_capacity;
int group_imb; /* Is there imbalance in this sd */
-#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
+#if defined(CONFIG_SCHED_BOOK) || defined(CONFIG_SCHED_MC) || \
+ defined(CONFIG_SCHED_SMT)
int power_savings_balance; /* Is powersave balance needed for this sd */
struct sched_group *group_min; /* Least loaded group in sd */
struct sched_group *group_leader; /* Group which relieves group_min */
@@ -2096,8 +2097,8 @@ static inline int get_sd_load_idx(struct
return load_idx;
}
-
-#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
+#if defined(CONFIG_SCHED_BOOK) || defined(CONFIG_SCHED_MC) || \
+ defined(CONFIG_SCHED_SMT)
/**
* init_sd_power_savings_stats - Initialize power savings statistics for
* the given sched_domain, during load balancing.
@@ -2217,7 +2218,7 @@ static inline int check_power_save_busie
return 1;
}
-#else /* CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
+#else /* CONFIG_SCHED_BOOK || CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
static inline void init_sd_power_savings_stats(struct sched_domain *sd,
struct sd_lb_stats *sds, enum cpu_idle_type idle)
{
@@ -2235,7 +2236,7 @@ static inline int check_power_save_busie
{
return 0;
}
-#endif /* CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
+#endif /* CONFIG_SCHED_BOOK || CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
unsigned long default_scale_freq_power(struct sched_domain *sd, int cpu)
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH/RFC 3/5] [PATCH] sched: add book scheduling domain
2010-08-12 17:25 ` [PATCH/RFC 3/5] [PATCH] sched: add book scheduling domain Heiko Carstens
@ 2010-08-13 21:22 ` Suresh Siddha
2010-08-16 8:48 ` Peter Zijlstra
0 siblings, 1 reply; 17+ messages in thread
From: Suresh Siddha @ 2010-08-13 21:22 UTC (permalink / raw)
To: Heiko Carstens
Cc: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Andreas Herrmann,
linux-kernel@vger.kernel.org, Martin Schwidefsky
On Thu, 2010-08-12 at 10:25 -0700, Heiko Carstens wrote:
> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>
> On top of the SMT and MC scheduling domains this adds the BOOK scheduling
> domain. This is useful for machines that have a four level cache hierarchy
> and but do not fall into the NUMA category.
>
> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
PeterZ had some ideas in cleaning up the sched domain setup to avoid
this maze of #ifdef's. I will let him comment on this.
thanks,
suresh
> ---
>
> arch/s390/defconfig | 1
> include/linux/sched.h | 19 +++++++
> include/linux/topology.h | 6 ++
> kernel/sched.c | 112 ++++++++++++++++++++++++++++++++++++++++++++---
> kernel/sched_fair.c | 11 ++--
> 5 files changed, 137 insertions(+), 12 deletions(-)
>
> diff -urpN linux-2.6/arch/s390/defconfig linux-2.6-patched/arch/s390/defconfig
> --- linux-2.6/arch/s390/defconfig 2010-08-02 00:11:14.000000000 +0200
> +++ linux-2.6-patched/arch/s390/defconfig 2010-08-11 13:47:23.000000000 +0200
> @@ -248,6 +248,7 @@ CONFIG_64BIT=y
> CONFIG_SMP=y
> CONFIG_NR_CPUS=32
> CONFIG_HOTPLUG_CPU=y
> +# CONFIG_SCHED_BOOK is not set
> CONFIG_COMPAT=y
> CONFIG_SYSVIPC_COMPAT=y
> CONFIG_AUDIT_ARCH=y
> diff -urpN linux-2.6/include/linux/sched.h linux-2.6-patched/include/linux/sched.h
> --- linux-2.6/include/linux/sched.h 2010-08-11 13:47:16.000000000 +0200
> +++ linux-2.6-patched/include/linux/sched.h 2010-08-11 13:47:23.000000000 +0200
> @@ -807,7 +807,9 @@ enum powersavings_balance_level {
> MAX_POWERSAVINGS_BALANCE_LEVELS
> };
>
> -extern int sched_mc_power_savings, sched_smt_power_savings;
> +extern int sched_smt_power_savings;
> +extern int sched_mc_power_savings;
> +extern int sched_book_power_savings;
>
> static inline int sd_balance_for_mc_power(void)
> {
> @@ -820,11 +822,23 @@ static inline int sd_balance_for_mc_powe
> return 0;
> }
>
> -static inline int sd_balance_for_package_power(void)
> +static inline int sd_balance_for_book_power(void)
> {
> if (sched_mc_power_savings | sched_smt_power_savings)
> return SD_POWERSAVINGS_BALANCE;
>
> + if (!sched_book_power_savings)
> + return SD_PREFER_SIBLING;
> +
> + return 0;
> +}
> +
> +static inline int sd_balance_for_package_power(void)
> +{
> + if (sched_book_power_savings | sched_mc_power_savings |
> + sched_smt_power_savings)
> + return SD_POWERSAVINGS_BALANCE;
> +
> return SD_PREFER_SIBLING;
> }
>
> @@ -875,6 +889,7 @@ enum sched_domain_level {
> SD_LV_NONE = 0,
> SD_LV_SIBLING,
> SD_LV_MC,
> + SD_LV_BOOK,
> SD_LV_CPU,
> SD_LV_NODE,
> SD_LV_ALLNODES,
> diff -urpN linux-2.6/include/linux/topology.h linux-2.6-patched/include/linux/topology.h
> --- linux-2.6/include/linux/topology.h 2010-08-11 13:47:16.000000000 +0200
> +++ linux-2.6-patched/include/linux/topology.h 2010-08-11 13:47:23.000000000 +0200
> @@ -201,6 +201,12 @@ int arch_update_cpu_topology(void);
> .balance_interval = 64, \
> }
>
> +#ifdef CONFIG_SCHED_BOOK
> +#ifndef SD_BOOK_INIT
> +#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
> +#endif
> +#endif /* CONFIG_SCHED_BOOK */
> +
> #ifdef CONFIG_NUMA
> #ifndef SD_NODE_INIT
> #error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
> diff -urpN linux-2.6/kernel/sched.c linux-2.6-patched/kernel/sched.c
> --- linux-2.6/kernel/sched.c 2010-08-11 13:47:23.000000000 +0200
> +++ linux-2.6-patched/kernel/sched.c 2010-08-11 13:47:23.000000000 +0200
> @@ -6472,7 +6472,9 @@ static void sched_domain_node_span(int n
> }
> #endif /* CONFIG_NUMA */
>
> -int sched_smt_power_savings = 0, sched_mc_power_savings = 0;
> +int sched_smt_power_savings;
> +int sched_mc_power_savings;
> +int sched_book_power_savings;
>
> /*
> * The cpus mask in sched_group and sched_domain hangs off the end.
> @@ -6500,6 +6502,7 @@ struct s_data {
> cpumask_var_t nodemask;
> cpumask_var_t this_sibling_map;
> cpumask_var_t this_core_map;
> + cpumask_var_t this_book_map;
> cpumask_var_t send_covered;
> cpumask_var_t tmpmask;
> struct sched_group **sched_group_nodes;
> @@ -6511,6 +6514,7 @@ enum s_alloc {
> sa_rootdomain,
> sa_tmpmask,
> sa_send_covered,
> + sa_this_book_map,
> sa_this_core_map,
> sa_this_sibling_map,
> sa_nodemask,
> @@ -6564,6 +6568,31 @@ cpu_to_core_group(int cpu, const struct
> }
> #endif /* CONFIG_SCHED_MC */
>
> +/*
> + * book sched-domains:
> + */
> +#ifdef CONFIG_SCHED_BOOK
> +static DEFINE_PER_CPU(struct static_sched_domain, book_domains);
> +static DEFINE_PER_CPU(struct static_sched_group, sched_group_book);
> +
> +static int
> +cpu_to_book_group(int cpu, const struct cpumask *cpu_map,
> + struct sched_group **sg, struct cpumask *mask)
> +{
> + int group = cpu;
> +#ifdef CONFIG_SCHED_MC
> + cpumask_and(mask, cpu_coregroup_mask(cpu), cpu_map);
> + group = cpumask_first(mask);
> +#elif defined(CONFIG_SCHED_SMT)
> + cpumask_and(mask, topology_thread_cpumask(cpu), cpu_map);
> + group = cpumask_first(mask);
> +#endif
> + if (sg)
> + *sg = &per_cpu(sched_group_book, group).sg;
> + return group;
> +}
> +#endif /* CONFIG_SCHED_BOOK */
> +
> static DEFINE_PER_CPU(struct static_sched_domain, phys_domains);
> static DEFINE_PER_CPU(struct static_sched_group, sched_group_phys);
>
> @@ -6572,7 +6601,10 @@ cpu_to_phys_group(int cpu, const struct
> struct sched_group **sg, struct cpumask *mask)
> {
> int group;
> -#ifdef CONFIG_SCHED_MC
> +#ifdef CONFIG_SCHED_BOOK
> + cpumask_and(mask, cpu_book_mask(cpu), cpu_map);
> + group = cpumask_first(mask);
> +#elif defined(CONFIG_SCHED_MC)
> cpumask_and(mask, cpu_coregroup_mask(cpu), cpu_map);
> group = cpumask_first(mask);
> #elif defined(CONFIG_SCHED_SMT)
> @@ -6833,6 +6865,9 @@ SD_INIT_FUNC(CPU)
> #ifdef CONFIG_SCHED_MC
> SD_INIT_FUNC(MC)
> #endif
> +#ifdef CONFIG_SCHED_BOOK
> + SD_INIT_FUNC(BOOK)
> +#endif
>
> static int default_relax_domain_level = -1;
>
> @@ -6882,6 +6917,8 @@ static void __free_domain_allocs(struct
> free_cpumask_var(d->tmpmask); /* fall through */
> case sa_send_covered:
> free_cpumask_var(d->send_covered); /* fall through */
> + case sa_this_book_map:
> + free_cpumask_var(d->this_book_map); /* fall through */
> case sa_this_core_map:
> free_cpumask_var(d->this_core_map); /* fall through */
> case sa_this_sibling_map:
> @@ -6928,8 +6965,10 @@ static enum s_alloc __visit_domain_alloc
> return sa_nodemask;
> if (!alloc_cpumask_var(&d->this_core_map, GFP_KERNEL))
> return sa_this_sibling_map;
> - if (!alloc_cpumask_var(&d->send_covered, GFP_KERNEL))
> + if (!alloc_cpumask_var(&d->this_book_map, GFP_KERNEL))
> return sa_this_core_map;
> + if (!alloc_cpumask_var(&d->send_covered, GFP_KERNEL))
> + return sa_this_book_map;
> if (!alloc_cpumask_var(&d->tmpmask, GFP_KERNEL))
> return sa_send_covered;
> d->rd = alloc_rootdomain();
> @@ -6987,6 +7026,23 @@ static struct sched_domain *__build_cpu_
> return sd;
> }
>
> +static struct sched_domain *__build_book_sched_domain(struct s_data *d,
> + const struct cpumask *cpu_map, struct sched_domain_attr *attr,
> + struct sched_domain *parent, int i)
> +{
> + struct sched_domain *sd = parent;
> +#ifdef CONFIG_SCHED_BOOK
> + sd = &per_cpu(book_domains, i).sd;
> + SD_INIT(sd, BOOK);
> + set_domain_attribute(sd, attr);
> + cpumask_and(sched_domain_span(sd), cpu_map, cpu_book_mask(i));
> + sd->parent = parent;
> + parent->child = sd;
> + cpu_to_book_group(i, cpu_map, &sd->groups, d->tmpmask);
> +#endif
> + return sd;
> +}
> +
> static struct sched_domain *__build_mc_sched_domain(struct s_data *d,
> const struct cpumask *cpu_map, struct sched_domain_attr *attr,
> struct sched_domain *parent, int i)
> @@ -7044,6 +7100,15 @@ static void build_sched_groups(struct s_
> d->send_covered, d->tmpmask);
> break;
> #endif
> +#ifdef CONFIG_SCHED_BOOK
> + case SD_LV_BOOK: /* set up book groups */
> + cpumask_and(d->this_book_map, cpu_map, cpu_book_mask(cpu));
> + if (cpu == cpumask_first(d->this_book_map))
> + init_sched_build_groups(d->this_book_map, cpu_map,
> + &cpu_to_book_group,
> + d->send_covered, d->tmpmask);
> + break;
> +#endif
> case SD_LV_CPU: /* set up physical groups */
> cpumask_and(d->nodemask, cpumask_of_node(cpu), cpu_map);
> if (!cpumask_empty(d->nodemask))
> @@ -7091,12 +7156,14 @@ static int __build_sched_domains(const s
>
> sd = __build_numa_sched_domains(&d, cpu_map, attr, i);
> sd = __build_cpu_sched_domain(&d, cpu_map, attr, sd, i);
> + sd = __build_book_sched_domain(&d, cpu_map, attr, sd, i);
> sd = __build_mc_sched_domain(&d, cpu_map, attr, sd, i);
> sd = __build_smt_sched_domain(&d, cpu_map, attr, sd, i);
> }
>
> for_each_cpu(i, cpu_map) {
> build_sched_groups(&d, SD_LV_SIBLING, cpu_map, i);
> + build_sched_groups(&d, SD_LV_BOOK, cpu_map, i);
> build_sched_groups(&d, SD_LV_MC, cpu_map, i);
> }
>
> @@ -7127,6 +7194,12 @@ static int __build_sched_domains(const s
> init_sched_groups_power(i, sd);
> }
> #endif
> +#ifdef CONFIG_SCHED_BOOK
> + for_each_cpu(i, cpu_map) {
> + sd = &per_cpu(book_domains, i).sd;
> + init_sched_groups_power(i, sd);
> + }
> +#endif
>
> for_each_cpu(i, cpu_map) {
> sd = &per_cpu(phys_domains, i).sd;
> @@ -7152,6 +7225,8 @@ static int __build_sched_domains(const s
> sd = &per_cpu(cpu_domains, i).sd;
> #elif defined(CONFIG_SCHED_MC)
> sd = &per_cpu(core_domains, i).sd;
> +#elif defined(CONFIG_SCHED_BOOK)
> + sd = &per_cpu(book_domains, i).sd;
> #else
> sd = &per_cpu(phys_domains, i).sd;
> #endif
> @@ -7368,7 +7443,8 @@ match2:
> mutex_unlock(&sched_domains_mutex);
> }
>
> -#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
> +#if defined(CONFIG_SCHED_BOOK) || defined(CONFIG_SCHED_MC) || \
> + defined(CONFIG_SCHED_SMT)
> static void arch_reinit_sched_domains(void)
> {
> get_online_cpus();
> @@ -7405,6 +7481,9 @@ static ssize_t sched_power_savings_store
> case SD_LV_MC:
> sched_mc_power_savings = level;
> break;
> + case SD_LV_BOOK:
> + sched_book_power_savings = level;
> + break;
> default:
> break;
> }
> @@ -7414,6 +7493,24 @@ static ssize_t sched_power_savings_store
> return count;
> }
>
> +#ifdef CONFIG_SCHED_BOOK
> +static ssize_t sched_book_power_savings_show(struct sysdev_class *class,
> + struct sysdev_class_attribute *attr,
> + char *page)
> +{
> + return sprintf(page, "%u\n", sched_book_power_savings);
> +}
> +static ssize_t sched_book_power_savings_store(struct sysdev_class *class,
> + struct sysdev_class_attribute *attr,
> + const char *buf, size_t count)
> +{
> + return sched_power_savings_store(buf, count, SD_LV_BOOK);
> +}
> +static SYSDEV_CLASS_ATTR(sched_book_power_savings, 0644,
> + sched_book_power_savings_show,
> + sched_book_power_savings_store);
> +#endif
> +
> #ifdef CONFIG_SCHED_MC
> static ssize_t sched_mc_power_savings_show(struct sysdev_class *class,
> struct sysdev_class_attribute *attr,
> @@ -7464,9 +7561,14 @@ int __init sched_create_sysfs_power_savi
> err = sysfs_create_file(&cls->kset.kobj,
> &attr_sched_mc_power_savings.attr);
> #endif
> +#ifdef CONFIG_SCHED_BOOK
> + if (!err && book_capable())
> + err = sysfs_create_file(&cls->kset.kobj,
> + &attr_sched_book_power_savings.attr);
> +#endif
> return err;
> }
> -#endif /* CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
> +#endif /* CONFIG_SCHED_BOOK || CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
>
> /*
> * Update cpusets according to cpu_active mask. If cpusets are
> diff -urpN linux-2.6/kernel/sched_fair.c linux-2.6-patched/kernel/sched_fair.c
> --- linux-2.6/kernel/sched_fair.c 2010-08-11 13:47:16.000000000 +0200
> +++ linux-2.6-patched/kernel/sched_fair.c 2010-08-11 13:47:23.000000000 +0200
> @@ -2039,7 +2039,8 @@ struct sd_lb_stats {
> unsigned long busiest_group_capacity;
>
> int group_imb; /* Is there imbalance in this sd */
> -#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
> +#if defined(CONFIG_SCHED_BOOK) || defined(CONFIG_SCHED_MC) || \
> + defined(CONFIG_SCHED_SMT)
> int power_savings_balance; /* Is powersave balance needed for this sd */
> struct sched_group *group_min; /* Least loaded group in sd */
> struct sched_group *group_leader; /* Group which relieves group_min */
> @@ -2096,8 +2097,8 @@ static inline int get_sd_load_idx(struct
> return load_idx;
> }
>
> -
> -#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
> +#if defined(CONFIG_SCHED_BOOK) || defined(CONFIG_SCHED_MC) || \
> + defined(CONFIG_SCHED_SMT)
> /**
> * init_sd_power_savings_stats - Initialize power savings statistics for
> * the given sched_domain, during load balancing.
> @@ -2217,7 +2218,7 @@ static inline int check_power_save_busie
> return 1;
>
> }
> -#else /* CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
> +#else /* CONFIG_SCHED_BOOK || CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
> static inline void init_sd_power_savings_stats(struct sched_domain *sd,
> struct sd_lb_stats *sds, enum cpu_idle_type idle)
> {
> @@ -2235,7 +2236,7 @@ static inline int check_power_save_busie
> {
> return 0;
> }
> -#endif /* CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
> +#endif /* CONFIG_SCHED_BOOK || CONFIG_SCHED_MC || CONFIG_SCHED_SMT */
>
>
> unsigned long default_scale_freq_power(struct sched_domain *sd, int cpu)
>
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH/RFC 3/5] [PATCH] sched: add book scheduling domain
2010-08-13 21:22 ` Suresh Siddha
@ 2010-08-16 8:48 ` Peter Zijlstra
0 siblings, 0 replies; 17+ messages in thread
From: Peter Zijlstra @ 2010-08-16 8:48 UTC (permalink / raw)
To: Suresh Siddha
Cc: Heiko Carstens, Mike Galbraith, Ingo Molnar, Andreas Herrmann,
linux-kernel@vger.kernel.org, Martin Schwidefsky
On Fri, 2010-08-13 at 14:22 -0700, Suresh Siddha wrote:
> On Thu, 2010-08-12 at 10:25 -0700, Heiko Carstens wrote:
> > From: Heiko Carstens <heiko.carstens@de.ibm.com>
> >
> > On top of the SMT and MC scheduling domains this adds the BOOK scheduling
> > domain. This is useful for machines that have a four level cache hierarchy
> > and but do not fall into the NUMA category.
> >
> > Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
>
> PeterZ had some ideas in cleaning up the sched domain setup to avoid
> this maze of #ifdef's. I will let him comment on this.
http://lkml.org/lkml/2009/8/18/169
More information in this thread: http://lkml.org/lkml/2009/8/20/190
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH/RFC 4/5] [PATCH] topology/sysfs: provide book id and siblings attributes
2010-08-12 17:25 [PATCH/RFC 0/5] sched: add new 'book' scheduling domain Heiko Carstens
` (2 preceding siblings ...)
2010-08-12 17:25 ` [PATCH/RFC 3/5] [PATCH] sched: add book scheduling domain Heiko Carstens
@ 2010-08-12 17:25 ` Heiko Carstens
2010-08-12 17:25 ` [PATCH/RFC 5/5] [PATCH] topology: add z196 cpu topology support Heiko Carstens
2010-08-19 12:22 ` [PATCH/RFC 0/5] sched: add new 'book' scheduling domain Andreas Herrmann
5 siblings, 0 replies; 17+ messages in thread
From: Heiko Carstens @ 2010-08-12 17:25 UTC (permalink / raw)
To: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Suresh Siddha,
Andreas Herrmann
Cc: linux-kernel, Martin Schwidefsky, Heiko Carstens
[-- Attachment #1: 04-topology-sysfs-book.diff --]
[-- Type: text/plain, Size: 4483 bytes --]
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Create attributes
/sys/devices/system/cpu/cpuX/topology/book_id
/sys/devices/system/cpu/cpuX/topology/book_siblings
which show the book id and the book siblings of a cpu.
Unlike the attributes for SMT and MC these attributes are only present if
CONFIG_SCHED_BOOK is set. There is no reason to pollute sysfs for every
architecture with unused attributes.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
Documentation/cputopology.txt | 23 ++++++++++++++++++++---
drivers/base/topology.c | 16 +++++++++++++++-
2 files changed, 35 insertions(+), 4 deletions(-)
diff -urpN linux-2.6/Documentation/cputopology.txt linux-2.6-patched/Documentation/cputopology.txt
--- linux-2.6/Documentation/cputopology.txt 2010-08-02 00:11:14.000000000 +0200
+++ linux-2.6-patched/Documentation/cputopology.txt 2010-08-11 13:47:23.000000000 +0200
@@ -14,25 +14,39 @@ to /proc/cpuinfo.
identifier (rather than the kernel's). The actual value is
architecture and platform dependent.
-3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
+3) /sys/devices/system/cpu/cpuX/topology/book_id:
+
+ the book ID of cpuX. Typically it is the hardware platform's
+ identifier (rather than the kernel's). The actual value is
+ architecture and platform dependent.
+
+4) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
internel kernel map of cpuX's hardware threads within the same
core as cpuX
-4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
+5) /sys/devices/system/cpu/cpuX/topology/core_siblings:
internal kernel map of cpuX's hardware threads within the same
physical_package_id.
+6) /sys/devices/system/cpu/cpuX/topology/book_siblings:
+
+ internal kernel map of cpuX's hardware threads within the same
+ book_id.
+
To implement it in an architecture-neutral way, a new source file,
-drivers/base/topology.c, is to export the 4 attributes.
+drivers/base/topology.c, is to export the 4 or 6 attributes. The two book
+related sysfs files will only be created if CONFIG_SCHED_BOOK is selected.
For an architecture to support this feature, it must define some of
these macros in include/asm-XXX/topology.h:
#define topology_physical_package_id(cpu)
#define topology_core_id(cpu)
+#define topology_book_id(cpu)
#define topology_thread_cpumask(cpu)
#define topology_core_cpumask(cpu)
+#define topology_book_cpumask(cpu)
The type of **_id is int.
The type of siblings is (const) struct cpumask *.
@@ -45,6 +59,9 @@ not defined by include/asm-XXX/topology.
3) thread_siblings: just the given CPU
4) core_siblings: just the given CPU
+For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
+default definitions for topology_book_id() and topology_book_cpumask().
+
Additionally, CPU topology information is provided under
/sys/devices/system/cpu and includes these files. The internal
source for the output is in brackets ("[]").
diff -urpN linux-2.6/drivers/base/topology.c linux-2.6-patched/drivers/base/topology.c
--- linux-2.6/drivers/base/topology.c 2010-08-02 00:11:14.000000000 +0200
+++ linux-2.6-patched/drivers/base/topology.c 2010-08-11 13:47:23.000000000 +0200
@@ -45,7 +45,8 @@ static ssize_t show_##name(struct sys_de
return sprintf(buf, "%d\n", topology_##name(cpu)); \
}
-#if defined(topology_thread_cpumask) || defined(topology_core_cpumask)
+#if defined(topology_thread_cpumask) || defined(topology_core_cpumask) || \
+ defined(topology_book_cpumask)
static ssize_t show_cpumap(int type, const struct cpumask *mask, char *buf)
{
ptrdiff_t len = PTR_ALIGN(buf + PAGE_SIZE - 1, PAGE_SIZE) - buf;
@@ -114,6 +115,14 @@ define_siblings_show_func(core_cpumask);
define_one_ro_named(core_siblings, show_core_cpumask);
define_one_ro_named(core_siblings_list, show_core_cpumask_list);
+#ifdef CONFIG_SCHED_BOOK
+define_id_show_func(book_id);
+define_one_ro(book_id);
+define_siblings_show_func(book_cpumask);
+define_one_ro_named(book_siblings, show_book_cpumask);
+define_one_ro_named(book_siblings_list, show_book_cpumask_list);
+#endif
+
static struct attribute *default_attrs[] = {
&attr_physical_package_id.attr,
&attr_core_id.attr,
@@ -121,6 +130,11 @@ static struct attribute *default_attrs[]
&attr_thread_siblings_list.attr,
&attr_core_siblings.attr,
&attr_core_siblings_list.attr,
+#ifdef CONFIG_SCHED_BOOK
+ &attr_book_id.attr,
+ &attr_book_siblings.attr,
+ &attr_book_siblings_list.attr,
+#endif
NULL
};
^ permalink raw reply [flat|nested] 17+ messages in thread* [PATCH/RFC 5/5] [PATCH] topology: add z196 cpu topology support
2010-08-12 17:25 [PATCH/RFC 0/5] sched: add new 'book' scheduling domain Heiko Carstens
` (3 preceding siblings ...)
2010-08-12 17:25 ` [PATCH/RFC 4/5] [PATCH] topology/sysfs: provide book id and siblings attributes Heiko Carstens
@ 2010-08-12 17:25 ` Heiko Carstens
2010-08-19 12:22 ` [PATCH/RFC 0/5] sched: add new 'book' scheduling domain Andreas Herrmann
5 siblings, 0 replies; 17+ messages in thread
From: Heiko Carstens @ 2010-08-12 17:25 UTC (permalink / raw)
To: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Suresh Siddha,
Andreas Herrmann
Cc: linux-kernel, Martin Schwidefsky, Heiko Carstens
[-- Attachment #1: 05-topology-z196.diff --]
[-- Type: text/plain, Size: 9114 bytes --]
From: Heiko Carstens <heiko.carstens@de.ibm.com>
Use the extended cpu topology information that z196 machines provide
in order to make use of the new book scheduling domain.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
---
arch/s390/Kconfig | 7 +
arch/s390/include/asm/topology.h | 28 ++++++-
arch/s390/kernel/topology.c | 150 ++++++++++++++++++++++++---------------
3 files changed, 124 insertions(+), 61 deletions(-)
diff -urpN linux-2.6/arch/s390/include/asm/topology.h linux-2.6-patched/arch/s390/include/asm/topology.h
--- linux-2.6/arch/s390/include/asm/topology.h 2010-08-11 13:47:13.000000000 +0200
+++ linux-2.6-patched/arch/s390/include/asm/topology.h 2010-08-11 13:47:24.000000000 +0200
@@ -3,15 +3,33 @@
#include <linux/cpumask.h>
-#define mc_capable() (1)
-
-const struct cpumask *cpu_coregroup_mask(unsigned int cpu);
-
extern unsigned char cpu_core_id[NR_CPUS];
extern cpumask_t cpu_core_map[NR_CPUS];
+static inline const struct cpumask *cpu_coregroup_mask(unsigned int cpu)
+{
+ return &cpu_core_map[cpu];
+}
+
#define topology_core_id(cpu) (cpu_core_id[cpu])
#define topology_core_cpumask(cpu) (&cpu_core_map[cpu])
+#define mc_capable() (1)
+
+#ifdef CONFIG_SCHED_BOOK
+
+extern unsigned char cpu_book_id[NR_CPUS];
+extern cpumask_t cpu_book_map[NR_CPUS];
+
+static inline const struct cpumask *cpu_book_mask(unsigned int cpu)
+{
+ return &cpu_book_map[cpu];
+}
+
+#define topology_book_id(cpu) (cpu_book_id[cpu])
+#define topology_book_cpumask(cpu) (&cpu_book_map[cpu])
+#define book_capable() (1)
+
+#endif /* CONFIG_SCHED_BOOK */
int topology_set_cpu_management(int fc);
void topology_schedule_update(void);
@@ -30,6 +48,8 @@ static inline void s390_init_cpu_topolog
};
#endif
+#define SD_BOOK_INIT SD_CPU_INIT
+
#include <asm-generic/topology.h>
#endif /* _ASM_S390_TOPOLOGY_H */
diff -urpN linux-2.6/arch/s390/Kconfig linux-2.6-patched/arch/s390/Kconfig
--- linux-2.6/arch/s390/Kconfig 2010-08-11 13:47:13.000000000 +0200
+++ linux-2.6-patched/arch/s390/Kconfig 2010-08-11 13:47:24.000000000 +0200
@@ -198,6 +198,13 @@ config HOTPLUG_CPU
can be controlled through /sys/devices/system/cpu/cpu#.
Say N if you want to disable CPU hotplug.
+config SCHED_BOOK
+ bool "Book scheduler support"
+ depends on SMP
+ help
+ Book scheduler support improves the CPU scheduler's decision making
+ when dealing with machines that have several books.
+
config MATHEMU
bool "IEEE FPU emulation"
depends on MARCH_G5
diff -urpN linux-2.6/arch/s390/kernel/topology.c linux-2.6-patched/arch/s390/kernel/topology.c
--- linux-2.6/arch/s390/kernel/topology.c 2010-08-02 00:11:14.000000000 +0200
+++ linux-2.6-patched/arch/s390/kernel/topology.c 2010-08-11 13:47:24.000000000 +0200
@@ -57,8 +57,8 @@ struct tl_info {
union tl_entry tle[0];
};
-struct core_info {
- struct core_info *next;
+struct mask_info {
+ struct mask_info *next;
unsigned char id;
cpumask_t mask;
};
@@ -66,7 +66,6 @@ struct core_info {
static int topology_enabled;
static void topology_work_fn(struct work_struct *work);
static struct tl_info *tl_info;
-static struct core_info core_info;
static int machine_has_topology;
static struct timer_list topology_timer;
static void set_topology_timer(void);
@@ -74,38 +73,37 @@ static DECLARE_WORK(topology_work, topol
/* topology_lock protects the core linked list */
static DEFINE_SPINLOCK(topology_lock);
+static struct mask_info core_info;
cpumask_t cpu_core_map[NR_CPUS];
unsigned char cpu_core_id[NR_CPUS];
-static cpumask_t cpu_coregroup_map(unsigned int cpu)
+#ifdef CONFIG_SCHED_BOOK
+static struct mask_info book_info;
+cpumask_t cpu_book_map[NR_CPUS];
+unsigned char cpu_book_id[NR_CPUS];
+#endif
+
+static cpumask_t cpu_group_map(struct mask_info *info, unsigned int cpu)
{
- struct core_info *core = &core_info;
- unsigned long flags;
cpumask_t mask;
cpus_clear(mask);
if (!topology_enabled || !machine_has_topology)
return cpu_possible_map;
- spin_lock_irqsave(&topology_lock, flags);
- while (core) {
- if (cpu_isset(cpu, core->mask)) {
- mask = core->mask;
+ while (info) {
+ if (cpu_isset(cpu, info->mask)) {
+ mask = info->mask;
break;
}
- core = core->next;
+ info = info->next;
}
- spin_unlock_irqrestore(&topology_lock, flags);
if (cpus_empty(mask))
mask = cpumask_of_cpu(cpu);
return mask;
}
-const struct cpumask *cpu_coregroup_mask(unsigned int cpu)
-{
- return &cpu_core_map[cpu];
-}
-
-static void add_cpus_to_core(struct tl_cpu *tl_cpu, struct core_info *core)
+static void add_cpus_to_mask(struct tl_cpu *tl_cpu, struct mask_info *book,
+ struct mask_info *core)
{
unsigned int cpu;
@@ -117,23 +115,35 @@ static void add_cpus_to_core(struct tl_c
rcpu = CPU_BITS - 1 - cpu + tl_cpu->origin;
for_each_present_cpu(lcpu) {
- if (cpu_logical_map(lcpu) == rcpu) {
- cpu_set(lcpu, core->mask);
- cpu_core_id[lcpu] = core->id;
- smp_cpu_polarization[lcpu] = tl_cpu->pp;
- }
+ if (cpu_logical_map(lcpu) != rcpu)
+ continue;
+#ifdef CONFIG_SCHED_BOOK
+ cpu_set(lcpu, book->mask);
+ cpu_book_id[lcpu] = book->id;
+#endif
+ cpu_set(lcpu, core->mask);
+ cpu_core_id[lcpu] = core->id;
+ smp_cpu_polarization[lcpu] = tl_cpu->pp;
}
}
}
-static void clear_cores(void)
+static void clear_masks(void)
{
- struct core_info *core = &core_info;
+ struct mask_info *info;
- while (core) {
- cpus_clear(core->mask);
- core = core->next;
+ info = &core_info;
+ while (info) {
+ cpus_clear(info->mask);
+ info = info->next;
+ }
+#ifdef CONFIG_SCHED_BOOK
+ info = &book_info;
+ while (info) {
+ cpus_clear(info->mask);
+ info = info->next;
}
+#endif
}
static union tl_entry *next_tle(union tl_entry *tle)
@@ -146,29 +156,36 @@ static union tl_entry *next_tle(union tl
static void tl_to_cores(struct tl_info *info)
{
+#ifdef CONFIG_SCHED_BOOK
+ struct mask_info *book = &book_info;
+#else
+ struct mask_info *book = NULL;
+#endif
+ struct mask_info *core = &core_info;
union tl_entry *tle, *end;
- struct core_info *core = &core_info;
+
spin_lock_irq(&topology_lock);
- clear_cores();
+ clear_masks();
tle = info->tle;
end = (union tl_entry *)((unsigned long)info + info->length);
while (tle < end) {
switch (tle->nl) {
- case 5:
- case 4:
- case 3:
+#ifdef CONFIG_SCHED_BOOK
case 2:
+ book = book->next;
+ book->id = tle->container.id;
break;
+#endif
case 1:
core = core->next;
core->id = tle->container.id;
break;
case 0:
- add_cpus_to_core(&tle->cpu, core);
+ add_cpus_to_mask(&tle->cpu, book, core);
break;
default:
- clear_cores();
+ clear_masks();
machine_has_topology = 0;
goto out;
}
@@ -221,10 +238,29 @@ int topology_set_cpu_management(int fc)
static void update_cpu_core_map(void)
{
+ unsigned long flags;
int cpu;
- for_each_possible_cpu(cpu)
- cpu_core_map[cpu] = cpu_coregroup_map(cpu);
+ spin_lock_irqsave(&topology_lock, flags);
+ for_each_possible_cpu(cpu) {
+ cpu_core_map[cpu] = cpu_group_map(&core_info, cpu);
+#ifdef CONFIG_SCHED_BOOK
+ cpu_book_map[cpu] = cpu_group_map(&book_info, cpu);
+#endif
+ }
+ spin_unlock_irqrestore(&topology_lock, flags);
+}
+
+static void store_topology(struct tl_info *info)
+{
+#ifdef CONFIG_SCHED_BOOK
+ int rc;
+
+ rc = stsi(info, 15, 1, 3);
+ if (rc != -ENOSYS)
+ return;
+#endif
+ stsi(info, 15, 1, 2);
}
int arch_update_cpu_topology(void)
@@ -238,7 +274,7 @@ int arch_update_cpu_topology(void)
topology_update_polarization_simple();
return 0;
}
- stsi(info, 15, 1, 2);
+ store_topology(info);
tl_to_cores(info);
update_cpu_core_map();
for_each_online_cpu(cpu) {
@@ -299,12 +335,24 @@ out:
}
__initcall(init_topology_update);
+static void alloc_masks(struct tl_info *info, struct mask_info *mask, int offset)
+{
+ int i, nr_masks;
+
+ nr_masks = info->mag[NR_MAG - offset];
+ for (i = 0; i < info->mnest - offset; i++)
+ nr_masks *= info->mag[NR_MAG - offset - 1 - i];
+ nr_masks = max(nr_masks, 1);
+ for (i = 0; i < nr_masks; i++) {
+ mask->next = alloc_bootmem(sizeof(struct mask_info));
+ mask = mask->next;
+ }
+}
+
void __init s390_init_cpu_topology(void)
{
unsigned long long facility_bits;
struct tl_info *info;
- struct core_info *core;
- int nr_cores;
int i;
if (stfle(&facility_bits, 1) <= 0)
@@ -315,25 +363,13 @@ void __init s390_init_cpu_topology(void)
tl_info = alloc_bootmem_pages(PAGE_SIZE);
info = tl_info;
- stsi(info, 15, 1, 2);
-
- nr_cores = info->mag[NR_MAG - 2];
- for (i = 0; i < info->mnest - 2; i++)
- nr_cores *= info->mag[NR_MAG - 3 - i];
-
+ store_topology(info);
pr_info("The CPU configuration topology of the machine is:");
for (i = 0; i < NR_MAG; i++)
printk(" %d", info->mag[i]);
printk(" / %d\n", info->mnest);
-
- core = &core_info;
- for (i = 0; i < nr_cores; i++) {
- core->next = alloc_bootmem(sizeof(struct core_info));
- core = core->next;
- if (!core)
- goto error;
- }
- return;
-error:
- machine_has_topology = 0;
+ alloc_masks(info, &core_info, 2);
+#ifdef CONFIG_SCHED_BOOK
+ alloc_masks(info, &book_info, 3);
+#endif
}
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: [PATCH/RFC 0/5] sched: add new 'book' scheduling domain
2010-08-12 17:25 [PATCH/RFC 0/5] sched: add new 'book' scheduling domain Heiko Carstens
` (4 preceding siblings ...)
2010-08-12 17:25 ` [PATCH/RFC 5/5] [PATCH] topology: add z196 cpu topology support Heiko Carstens
@ 2010-08-19 12:22 ` Andreas Herrmann
5 siblings, 0 replies; 17+ messages in thread
From: Andreas Herrmann @ 2010-08-19 12:22 UTC (permalink / raw)
To: Heiko Carstens
Cc: Peter Zijlstra, Mike Galbraith, Ingo Molnar, Suresh Siddha,
linux-kernel@vger.kernel.org, Martin Schwidefsky
On Thu, Aug 12, 2010 at 01:25:44PM -0400, Heiko Carstens wrote:
> This patch set adds (yet) another scheduling domain to the scheduler.
All that stuff reminds me of quite similar patches to introduce a
multi-node scheduling domain for Magny-Cours CPUs.
I am afraid that this stuff won't make it upstream and we both have to
review Peter's suggestions from last year to come up with a more
genarelized/flexible way to handle different scheduling domains.
> The reason for this is that the recent (s390) z196 architecture has
> four cache levels and uniform memory access (sort of -- see below).
> The cpu/cache/memory hierarchy is as follows:
> Each cpu has its private L1 (64KB I-cache + 128KB D-cache) and L2 (1.5MB)
> cache.
> A core consists of four cpus with a 24MB shared L3 cache.
> A book consists of six cores with a 192MB shared L4 cache.
> The z196 architecture has no SMT.
[...]
> A boot of a logical partition with 20 cpus, shared on two books, gives these
> initializion output to the console:
Below output shows that there is some odd distribution of your CPUs in
the different domain levels. Is this caused by the fact that not all
CPUs of a core and book were assigned to your logical partition?
For better understanding is the following CPUs-to-core/book mapping correct for
your example?
Book | Core | CPU
------+--------+---------
0 | 0 | 0,1,2,3
0 | 1 | 4,5
1 | 0 | 6,9
1 | 1 | 10,11
1 | 2 | 12,13
1 | 3 | 14,15,16
1 | 4 | 17,18,19
> Brought up 20 CPUs
> CPU0 attaching sched-domain:
> domain 0: span 0-5 level BOOK
> groups: 0 1-3 (cpu_power = 3072) 4-5 (cpu_power = 2048)
Why isn't there a range 0-3 instead of "0 1-3"?
And why isn't cpu_power=4096?
Ah, I think that for CPU 0 just the power information is
missing, So we have 3 groups:
0 (cpu_power=1024) 1-3 (cpu_power=3071) 4-5 (cpu_power=2048)
And the MC level is folded because it doesn't add anything in this
case.
So the mapping is in fact
Book | Core | CPU
------+--------+---------
0 | 0 | 0
0 | 1 | 1,2,3
0 | 2 | 4,5
1 | 0 | 6,9
1 | 1 | 10,11
1 | 2 | 12,13
1 | 3 | 14,15,16
1 | 4 | 17,18,19
> domain 1: span 0-19 level CPU
> groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)
> CPU1 attaching sched-domain:
> domain 0: span 1-3 level MC
> groups: 1 2 3
> domain 1: span 0-5 level BOOK
> groups: 1-3 (cpu_power = 3072) 4-5 (cpu_power = 2048) 0
> domain 2: span 0-19 level CPU
> groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)
It's odd that for CPU 1 the BOOK domain groups differ from those shown
for CPU0.
> CPU2 attaching sched-domain:
> domain 0: span 1-3 level MC
> groups: 2 3 1
> domain 1: span 0-5 level BOOK
> groups: 1-3 (cpu_power = 3072) 4-5 (cpu_power = 2048) 0
Again for CPU 0 the cpu_power is missing. I think that is confusing.
For better readability that sould also be displayed (if a group
consists of only 1 CPU).
> domain 2: span 0-19 level CPU
> groups: 0-5 (cpu_power = 6144) 6-19 (cpu_power = 14336)
[snip the rest]
Andreas
--
Operating | Advanced Micro Devices GmbH
System | Einsteinring 24, 85609 Dornach b. München, Germany
Research | Geschäftsführer: Alberto Bozzo, Andrew Bowd
Center | Sitz: Dornach, Gemeinde Aschheim, Landkreis München
(OSRC) | Registergericht München, HRB Nr. 43632
^ permalink raw reply [flat|nested] 17+ messages in thread