* [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groups dynamically
@ 2006-03-25 8:28 Srivatsa Vaddagiri
2006-03-27 21:18 ` Siddha, Suresh B
2006-07-07 0:01 ` Paul Jackson
0 siblings, 2 replies; 7+ messages in thread
From: Srivatsa Vaddagiri @ 2006-03-25 8:28 UTC (permalink / raw)
To: Nick Piggin, Ingo Molnar, pj
Cc: hawkes, Dinakar Guniguntala, Andrew Morton, linux-kernel,
suresh.b.siddha
I hit a kernel lockup problem while making a CPUset exclusive. This was
also reported before:
http://lkml.org/lkml/2006/3/16/128 (Ref 1)
Dinakar (CCed here) said that it may be similar to the problem reported
here:
http://lkml.org/lkml/2005/8/20/40 (Ref 2)
Upon code inspection, I found that it is indeed a similar problem for
SCHED_SMT case. The problem is recreated if SCHED_SMT is enabled in the
kernel, and one tries to make a exclusive CPUset with just one thread in
it. This cause the two threads of the same CPU to be in in different
dynamic sched domains, although they share the same sched_group
structure at 'phys_domains' level. This will lead to the same kind of
corruption described in Ref 2 above.
Patch below, against 2.6.16-mm1, fixes the problem for both SCHED_SMT and
SCHED_MC cases.
I have tested the patch on a 8-way (with HT) Intel Xeon machine and
found that the lockups I was facing earlier went away with the patch.
I couldn't test this on a multi-core machine, since I don't think we
have one in our lab.
Suresh, would you mind testing the patch on a multi-core machine, in case you
have access to one?
Basically you would need to do create a exclusive CPUset with one CPU in it
(ensure that its sibling in the same core is not part of the same
CPUset). As soon as you make the CPUset exclusive, you would hit some
kind of hang. With this patch, the hang should go away.
Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com>
kernel/sched.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++--------
1 files changed, 58 insertions(+), 9 deletions(-)
diff -puN kernel/sched.c~sd_dynschedgroups kernel/sched.c
--- linux-2.6.16-mm1/kernel/sched.c~sd_dynschedgroups 2006-03-25 11:45:05.000000000 +0530
+++ linux-2.6.16-mm1-root/kernel/sched.c 2006-03-25 13:30:01.000000000 +0530
@@ -5884,7 +5884,7 @@ static int cpu_to_cpu_group(int cpu)
#ifdef CONFIG_SCHED_MC
static DEFINE_PER_CPU(struct sched_domain, core_domains);
-static struct sched_group sched_group_core[NR_CPUS];
+static struct sched_group *sched_group_core_bycpu[NR_CPUS];
#endif
#if defined(CONFIG_SCHED_MC) && defined(CONFIG_SCHED_SMT)
@@ -5900,7 +5900,7 @@ static int cpu_to_core_group(int cpu)
#endif
static DEFINE_PER_CPU(struct sched_domain, phys_domains);
-static struct sched_group sched_group_phys[NR_CPUS];
+static struct sched_group *sched_group_phys_bycpu[NR_CPUS];
static int cpu_to_phys_group(int cpu)
{
#if defined(CONFIG_SCHED_MC)
@@ -5963,7 +5963,12 @@ next_sg:
*/
void build_sched_domains(const cpumask_t *cpu_map)
{
- int i;
+ int i, alloc_phys_failed = 0;
+ struct sched_group *sched_group_phys = NULL;
+#ifdef CONFIG_SCHED_MC
+ int alloc_core_failed = 0;
+ struct sched_group *sched_group_core = NULL;
+#endif
#ifdef CONFIG_NUMA
int alloc_failed = 0;
struct sched_group **sched_group_nodes = NULL;
@@ -6026,15 +6031,41 @@ void build_sched_domains(const cpumask_t
cpus_and(sd->span, sd->span, *cpu_map);
#endif
+ if (!sched_group_phys && !alloc_phys_failed) {
+ sched_group_phys
+ = kmalloc(sizeof(struct sched_group) * NR_CPUS,
+ GFP_KERNEL);
+ if (!sched_group_phys) {
+ printk (KERN_WARNING
+ "Can not alloc phys sched group\n");
+ alloc_phys_failed = 1;
+ }
+ sched_group_phys_bycpu[i] = sched_group_phys;
+ }
+
p = sd;
sd = &per_cpu(phys_domains, i);
group = cpu_to_phys_group(i);
*sd = SD_CPU_INIT;
sd->span = nodemask;
sd->parent = p;
- sd->groups = &sched_group_phys[group];
+ sd->groups = sched_group_phys ? &sched_group_phys[group] : NULL;
+ if (!sd->groups)
+ sd->flags = 0; /* No load balancing */
#ifdef CONFIG_SCHED_MC
+ if (!sched_group_core && !alloc_core_failed) {
+ sched_group_core
+ = kmalloc(sizeof(struct sched_group) * NR_CPUS,
+ GFP_KERNEL);
+ if (!sched_group_core) {
+ printk (KERN_WARNING
+ "Can not alloc core sched group\n");
+ alloc_core_failed = 1;
+ }
+ sched_group_core_bycpu[i] = sched_group_core;
+ }
+
p = sd;
sd = &per_cpu(core_domains, i);
group = cpu_to_core_group(i);
@@ -6042,7 +6073,9 @@ void build_sched_domains(const cpumask_t
sd->span = cpu_coregroup_map(i);
cpus_and(sd->span, sd->span, *cpu_map);
sd->parent = p;
- sd->groups = &sched_group_core[group];
+ sd->groups = sched_group_core ? &sched_group_core[group] : NULL;
+ if (!sd->groups)
+ sd->flags = 0; /* No load balancing */
#endif
#ifdef CONFIG_SCHED_SMT
@@ -6077,8 +6110,9 @@ void build_sched_domains(const cpumask_t
cpus_and(this_core_map, this_core_map, *cpu_map);
if (i != first_cpu(this_core_map))
continue;
- init_sched_build_groups(sched_group_core, this_core_map,
- &cpu_to_core_group);
+ if (sched_group_core)
+ init_sched_build_groups(sched_group_core, this_core_map,
+ &cpu_to_core_group);
}
#endif
@@ -6091,7 +6125,8 @@ void build_sched_domains(const cpumask_t
if (cpus_empty(nodemask))
continue;
- init_sched_build_groups(sched_group_phys, nodemask,
+ if (sched_group_phys)
+ init_sched_build_groups(sched_group_phys, nodemask,
&cpu_to_phys_group);
}
@@ -6249,9 +6284,9 @@ static void arch_init_sched_domains(cons
static void arch_destroy_sched_domains(const cpumask_t *cpu_map)
{
+ int cpu;
#ifdef CONFIG_NUMA
int i;
- int cpu;
for_each_cpu_mask(cpu, *cpu_map) {
struct sched_group *sched_group_allnodes
@@ -6289,6 +6324,20 @@ next_sg:
sched_group_nodes_bycpu[cpu] = NULL;
}
#endif
+ for_each_cpu_mask(cpu, *cpu_map) {
+ if (sched_group_phys_bycpu[cpu]) {
+ kfree(sched_group_phys_bycpu[cpu]);
+ sched_group_phys_bycpu[cpu] = NULL;
+ }
+ }
+#ifdef CONFIG_SCHED_MC
+ for_each_cpu_mask(cpu, *cpu_map) {
+ if (sched_group_core_bycpu[cpu]) {
+ kfree(sched_group_core_bycpu[cpu]);
+ sched_group_core_bycpu[cpu] = NULL;
+ }
+ }
+#endif
}
/*
_
_
--
Regards,
vatsa
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groups dynamically
2006-03-25 8:28 [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groups dynamically Srivatsa Vaddagiri
@ 2006-03-27 21:18 ` Siddha, Suresh B
2006-07-07 0:01 ` Paul Jackson
1 sibling, 0 replies; 7+ messages in thread
From: Siddha, Suresh B @ 2006-03-27 21:18 UTC (permalink / raw)
To: Srivatsa Vaddagiri
Cc: Nick Piggin, Ingo Molnar, pj, hawkes, Dinakar Guniguntala,
Andrew Morton, linux-kernel, suresh.b.siddha
On Sat, Mar 25, 2006 at 01:58:04PM +0530, Srivatsa Vaddagiri wrote:
> + if (!sched_group_phys && !alloc_phys_failed) {
> + sched_group_phys
> + = kmalloc(sizeof(struct sched_group) * NR_CPUS,
> + GFP_KERNEL);
> + if (!sched_group_phys) {
> + printk (KERN_WARNING
> + "Can not alloc phys sched group\n");
> + alloc_phys_failed = 1;
> + }
> + sched_group_phys_bycpu[i] = sched_group_phys;
> + }
We can move this allocation outside the for loop and avoid the complexities
of alloc_phys_failed, alloc_core_failed..
thanks,
suresh
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groups dynamically
2006-03-25 8:28 [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groups dynamically Srivatsa Vaddagiri
2006-03-27 21:18 ` Siddha, Suresh B
@ 2006-07-07 0:01 ` Paul Jackson
2006-07-07 0:08 ` Siddha, Suresh B
1 sibling, 1 reply; 7+ messages in thread
From: Paul Jackson @ 2006-07-07 0:01 UTC (permalink / raw)
To: vatsa
Cc: nickpiggin, mingo, hawkes, dino, akpm, linux-kernel,
suresh.b.siddha, ak
Several months ago, Srivatsa wrote:
> I couldn't test this on a multi-core machine, since I don't think we
> have one in our lab.
>
> Suresh, would you mind testing the patch on a multi-core machine, in case you
> have access to one?
>
> Basically you would need to do create a exclusive CPUset with one CPU in it
> (ensure that its sibling in the same core is not part of the same
> CPUset). As soon as you make the CPUset exclusive, you would hit some
> kind of hang. With this patch, the hang should go away.
Summary: Where do we stand with multi-core and this bug?
I don't see a reply from Suresh on whether he could test on multi-core.
I finally happened to be running on a hyper-threaded box last week,
and stumbled over this bug that Srivatsa's patch fixes. Hawkes
remembered Srivatsa's patch, I tried it, and it worked. Thanks!
But now I'm quite confused as to the situation with multi-core.
Details of my confusions, for the bored:
From Srivatsa's remark, I would have guessed that multi-core was
at risk for this bug too, but Srivatsa was hopeful that his patch
would fix that too.
Early this week, a couple of people who shall remain anonymous here
raised the question of whether we had the same problem with multi-core.
One of them believed that multi-core did have the same problem.
I got a little time on a multi-core system this morning to test it,
and while running what I -thought- was a kernel -without- Srivatsa's
patch, I could not find any problem. I made a cpuset with just a
single logical cpu in it, and marked it cpu_exclusive, and the
system did not hang.
It will be another day before I can get on that multi-core system
again to verify my findings.
I was hoping that someone could actually -read- this code and state
with confidence that one of the following held:
* it was already working ok on multi-core (a one CPU cpu_exclusive cpuset),
* it was broken, but Srivatsa's patch fixes it, or
* it's still broken, even with Srivatsa's patch.
I tried a couple of times to read the code myself, but could not
make any headway there.
So ... what's up with multi-core and this bug?
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groups dynamically
2006-07-07 0:01 ` Paul Jackson
@ 2006-07-07 0:08 ` Siddha, Suresh B
2006-07-07 0:34 ` Paul Jackson
0 siblings, 1 reply; 7+ messages in thread
From: Siddha, Suresh B @ 2006-07-07 0:08 UTC (permalink / raw)
To: Paul Jackson
Cc: vatsa, nickpiggin, mingo, hawkes, dino, akpm, linux-kernel,
suresh.b.siddha, ak
I have reviewed Srivatsa's code back then and made sure it fixes
the problem in the presence of multi-core too. Based on my review,
I provided feedback to Srivatsa then.
In short, multi-core was broken too and Srivatsa's patch fixed it.
thanks,
suresh
On Thu, Jul 06, 2006 at 05:01:51PM -0700, Paul Jackson wrote:
> Several months ago, Srivatsa wrote:
> > I couldn't test this on a multi-core machine, since I don't think we
> > have one in our lab.
> >
> > Suresh, would you mind testing the patch on a multi-core machine, in case you
> > have access to one?
> >
> > Basically you would need to do create a exclusive CPUset with one CPU in it
> > (ensure that its sibling in the same core is not part of the same
> > CPUset). As soon as you make the CPUset exclusive, you would hit some
> > kind of hang. With this patch, the hang should go away.
>
>
> Summary: Where do we stand with multi-core and this bug?
>
>
> I don't see a reply from Suresh on whether he could test on multi-core.
>
> I finally happened to be running on a hyper-threaded box last week,
> and stumbled over this bug that Srivatsa's patch fixes. Hawkes
> remembered Srivatsa's patch, I tried it, and it worked. Thanks!
>
> But now I'm quite confused as to the situation with multi-core.
>
> Details of my confusions, for the bored:
>
> From Srivatsa's remark, I would have guessed that multi-core was
> at risk for this bug too, but Srivatsa was hopeful that his patch
> would fix that too.
>
> Early this week, a couple of people who shall remain anonymous here
> raised the question of whether we had the same problem with multi-core.
> One of them believed that multi-core did have the same problem.
>
> I got a little time on a multi-core system this morning to test it,
> and while running what I -thought- was a kernel -without- Srivatsa's
> patch, I could not find any problem. I made a cpuset with just a
> single logical cpu in it, and marked it cpu_exclusive, and the
> system did not hang.
>
> It will be another day before I can get on that multi-core system
> again to verify my findings.
>
> I was hoping that someone could actually -read- this code and state
> with confidence that one of the following held:
> * it was already working ok on multi-core (a one CPU cpu_exclusive cpuset),
> * it was broken, but Srivatsa's patch fixes it, or
> * it's still broken, even with Srivatsa's patch.
>
> I tried a couple of times to read the code myself, but could not
> make any headway there.
>
> So ... what's up with multi-core and this bug?
>
> --
> I won't rest till it's the best ...
> Programmer, Linux Scalability
> Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groups dynamically
2006-07-07 0:08 ` Siddha, Suresh B
@ 2006-07-07 0:34 ` Paul Jackson
2006-07-07 0:36 ` Siddha, Suresh B
0 siblings, 1 reply; 7+ messages in thread
From: Paul Jackson @ 2006-07-07 0:34 UTC (permalink / raw)
To: Siddha, Suresh B
Cc: vatsa, nickpiggin, mingo, hawkes, dino, akpm, linux-kernel,
suresh.b.siddha, ak
> In short, multi-core was broken too and Srivatsa's patch fixed it.
Thanks for your quick response, Suresh.
My test earlier today that showed multi-core -not- broken must
have been flawed.
I will rerun them tomorrow, carefully.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groups dynamically
2006-07-07 0:34 ` Paul Jackson
@ 2006-07-07 0:36 ` Siddha, Suresh B
2006-07-07 5:26 ` Paul Jackson
0 siblings, 1 reply; 7+ messages in thread
From: Siddha, Suresh B @ 2006-07-07 0:36 UTC (permalink / raw)
To: Paul Jackson
Cc: Siddha, Suresh B, vatsa, nickpiggin, mingo, hawkes, dino, akpm,
linux-kernel, ak
On Thu, Jul 06, 2006 at 05:34:17PM -0700, Paul Jackson wrote:
> > In short, multi-core was broken too and Srivatsa's patch fixed it.
>
> Thanks for your quick response, Suresh.
>
> My test earlier today that showed multi-core -not- broken must
> have been flawed.
>
> I will rerun them tomorrow, carefully.
It is quite possible that the kernel you are testing doesn't have multi-core
scheduler domain. If so, then you may not run into this issue.
thanks,
suresh
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groups dynamically
2006-07-07 0:36 ` Siddha, Suresh B
@ 2006-07-07 5:26 ` Paul Jackson
0 siblings, 0 replies; 7+ messages in thread
From: Paul Jackson @ 2006-07-07 5:26 UTC (permalink / raw)
To: Siddha, Suresh B
Cc: suresh.b.siddha, vatsa, nickpiggin, mingo, hawkes, dino, akpm,
linux-kernel, ak
Suresh wrote:
> It is quite possible that the kernel you are testing doesn't have multi-core
> scheduler domain. If so, then you may not run into this issue.
Aha - we have a winner.
CONFIG_SCHED_MC was not enabled in this kernel.
Now what I see matches what it should.
On a Hyper-Thread (but not Multi-Core) x86_64 system that I
tested with CONFIG_SCHED_MC enabled, your patch was required to
keep single-cpu cpu_exclusive cpusets from instantly locking
up the system.
On a Multi-Core (but not Hyper-Thread) IA64 Montecito system
that did -not- have CONFIG_SCHED_MC enabled, there is no
such problem with single-cpu cpu_exclusive cpusets in the
first place. It worked ok, even without the patch.
Thank-you.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@sgi.com> 1.925.600.0401
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-07-07 5:26 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-25 8:28 [PATCH 2.6.16-mm1 2/2] sched_domains: Allocate sched_groups dynamically Srivatsa Vaddagiri
2006-03-27 21:18 ` Siddha, Suresh B
2006-07-07 0:01 ` Paul Jackson
2006-07-07 0:08 ` Siddha, Suresh B
2006-07-07 0:34 ` Paul Jackson
2006-07-07 0:36 ` Siddha, Suresh B
2006-07-07 5:26 ` Paul Jackson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox