* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
[not found] ` <4EC2DF93.2050904@codeaurora.org>
@ 2011-11-15 22:00 ` Thomas Gleixner
2011-12-14 0:13 ` Dima Zavin
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2011-11-15 22:00 UTC (permalink / raw)
To: Stepan Moskovchenko
Cc: Dima Zavin, Kukjin Kim, Vincent Guittot, Frank Rowand,
amit kachhap, Colin Cross, Russell King - ARM Linux, chaos.youn,
LAK, Peter Zijlstra, LKML
On Tue, 15 Nov 2011, Stepan Moskovchenko wrote:
> I am seeing a deadlock when executing hotplug operations with this patch
> applied. When the secondary CPU gets brought up in _cpu_up, the cpu is turned
> on
> and then the online notifier gets called, which is what marks the secondary
> CPU
> as active. If _cpu_up on the primary CPU is preempted before the secondary CPU
> is marked active, it is possible that the primary CPU will want to call
> smp_call_function (or send an IPI) to the secondary CPU because it is marked
> online. However, with this patch, the secondary CPU is still spinning on
> !cpu_active(cpu)
> with interrupts disabled. So, the primary CPU is now stuck in csd_lock_wait(),
> waiting for the secondary CPU to respond, while the secondary CPU spins with
> interrupts disabled, waiting for the primary CPU to mark it as active. So,
> while
> your approach to not call smp_function_single may work for you in your
> specific
> case, I believe there is still a problem in the general case.
>
> One suggestion for resolving this might be making smp_call_function look at
> the
> active CPUs rather than online CPUs, or to just let the secondary CPU mark
> itself as active rather than having the primary CPU do this, though this might
> defeat the original intended purpose of the active mask.
What a mess. I'll have a look tomorrow.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
2011-11-15 22:00 ` Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active Thomas Gleixner
@ 2011-12-14 0:13 ` Dima Zavin
2011-12-14 0:26 ` Thomas Gleixner
0 siblings, 1 reply; 5+ messages in thread
From: Dima Zavin @ 2011-12-14 0:13 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Stepan Moskovchenko, Kukjin Kim, Vincent Guittot, Frank Rowand,
amit kachhap, Colin Cross, Russell King - ARM Linux, chaos.youn,
LAK, Peter Zijlstra, LKML
Thomas,
Did you ever get a chance to look into this further?
Thanks in advance.
--Dima
On Tue, Nov 15, 2011 at 2:00 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 15 Nov 2011, Stepan Moskovchenko wrote:
>> I am seeing a deadlock when executing hotplug operations with this patch
>> applied. When the secondary CPU gets brought up in _cpu_up, the cpu is turned
>> on
>> and then the online notifier gets called, which is what marks the secondary
>> CPU
>> as active. If _cpu_up on the primary CPU is preempted before the secondary CPU
>> is marked active, it is possible that the primary CPU will want to call
>> smp_call_function (or send an IPI) to the secondary CPU because it is marked
>> online. However, with this patch, the secondary CPU is still spinning on
>> !cpu_active(cpu)
>> with interrupts disabled. So, the primary CPU is now stuck in csd_lock_wait(),
>> waiting for the secondary CPU to respond, while the secondary CPU spins with
>> interrupts disabled, waiting for the primary CPU to mark it as active. So,
>> while
>> your approach to not call smp_function_single may work for you in your
>> specific
>> case, I believe there is still a problem in the general case.
>>
>> One suggestion for resolving this might be making smp_call_function look at
>> the
>> active CPUs rather than online CPUs, or to just let the secondary CPU mark
>> itself as active rather than having the primary CPU do this, though this might
>> defeat the original intended purpose of the active mask.
>
> What a mess. I'll have a look tomorrow.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
2011-12-14 0:13 ` Dima Zavin
@ 2011-12-14 0:26 ` Thomas Gleixner
2011-12-15 16:09 ` Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Thomas Gleixner @ 2011-12-14 0:26 UTC (permalink / raw)
To: Dima Zavin
Cc: Stepan Moskovchenko, Kukjin Kim, Vincent Guittot, Frank Rowand,
amit kachhap, Colin Cross, Russell King - ARM Linux, chaos.youn,
LAK, Peter Zijlstra, LKML
On Tue, 13 Dec 2011, Dima Zavin wrote:
> Thomas,
>
> Did you ever get a chance to look into this further?
Yes, Peter and I are still debating the proper solution as this does
not only affect ARM. It looks like most of arch/* implementations are
hosed in one way or the other. Funny enough Peter and I put this on to
the tomorrow todo list a few hours ago.
Thanks,
tglx
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active
2011-12-14 0:26 ` Thomas Gleixner
@ 2011-12-15 16:09 ` Peter Zijlstra
2012-03-13 4:45 ` [tip:sched/core] sched: Cleanup cpu_active madness tip-bot for Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2011-12-15 16:09 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Dima Zavin, Stepan Moskovchenko, Kukjin Kim, Vincent Guittot,
Frank Rowand, amit kachhap, Colin Cross, Russell King - ARM Linux,
chaos.youn, LAK, LKML
With the note that we really should clean up the cpu hotplug code,
there's far too much code duplication in the arch/ implementations. The
below patch looks like it might actually work.
Thomas, others, please have a very close look because I might have
thought too hard and missed the obvious :-)
---
Subject: hotplug, sched: cpu_active vs __cpu_up
Stepan found:
CPU0 CPUn
_cpu_up()
__cpu_up()
boostrap()
notify_cpu_starting()
set_cpu_online()
while (!cpu_active())
cpu_relax()
<PREEMPT-out>
smp_call_function(.wait=1)
/* we find cpu_online() is true */
arch_send_call_function_ipi_mask()
/* wait-forever-more */
<PREEMPT-in>
local_irq_enable()
cpu_notify(CPU_ONLINE)
sched_cpu_active()
set_cpu_active()
Now the purpose of cpu_active is mostly with bringing down a cpu, where
we mark it !active to avoid the load-balancer from moving tasks to it
while we tear down the cpu. This is required because we only update the
sched_domain tree after we brought the cpu-down. And this is needed so
that some tasks can still run while we bring it down, we just don't want
new tasks to appear.
On cpu-up however the sched_domain tree doesn't yet include the new cpu,
so its invisible to the load-balancer, regardless of the active state.
So instead of setting the active state after we boot the new cpu (and
consequently having to wait for it before enabling interrupts) set the
cpu active before we set it online and avoid the whole mess.
Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
arch/arm/kernel/smp.c | 7 -------
arch/hexagon/kernel/smp.c | 2 --
arch/s390/kernel/smp.c | 6 ------
arch/x86/kernel/smpboot.c | 13 -------------
kernel/sched/core.c | 2 +-
5 files changed, 1 insertion(+), 29 deletions(-)
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -337,13 +337,6 @@ asmlinkage void __cpuinit secondary_star
*/
percpu_timer_setup();
- while (!cpu_active(cpu))
- cpu_relax();
-
- /*
- * cpu_active bit is set, so it's safe to enalbe interrupts
- * now.
- */
local_irq_enable();
local_fiq_enable();
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -179,8 +179,6 @@ void __cpuinit start_secondary(void)
printk(KERN_INFO "%s cpu %d\n", __func__, current_thread_info()->cpu);
set_cpu_online(cpu, true);
- while (!cpumask_test_cpu(cpu, cpu_active_mask))
- cpu_relax();
local_irq_enable();
cpu_idle();
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -518,12 +518,6 @@ int __cpuinit start_secondary(void *cpuv
S390_lowcore.restart_psw.addr =
PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
__ctl_set_bit(0, 28); /* Enable lowcore protection */
- /*
- * Wait until the cpu which brought this one up marked it
- * active before enabling interrupts.
- */
- while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
- cpu_relax();
local_irq_enable();
/* cpu_idle will call schedule for us */
cpu_idle();
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -291,19 +291,6 @@ notrace static void __cpuinit start_seco
per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
x86_platform.nmi_init();
- /*
- * Wait until the cpu which brought this one up marked it
- * online before enabling interrupts. If we don't do that then
- * we can end up waking up the softirq thread before this cpu
- * reached the active state, which makes the scheduler unhappy
- * and schedule the softirq thread on the wrong cpu. This is
- * only observable with forced threaded interrupts, but in
- * theory it could also happen w/o them. It's just way harder
- * to achieve.
- */
- while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
- cpu_relax();
-
/* enable local interrupts */
local_irq_enable();
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5390,7 +5390,7 @@ static int __cpuinit sched_cpu_active(st
unsigned long action, void *hcpu)
{
switch (action & ~CPU_TASKS_FROZEN) {
- case CPU_ONLINE:
+ case CPU_STARTING:
case CPU_DOWN_FAILED:
set_cpu_active((long)hcpu, true);
return NOTIFY_OK;
^ permalink raw reply [flat|nested] 5+ messages in thread
* [tip:sched/core] sched: Cleanup cpu_active madness
2011-12-15 16:09 ` Peter Zijlstra
@ 2012-03-13 4:45 ` tip-bot for Peter Zijlstra
0 siblings, 0 replies; 5+ messages in thread
From: tip-bot for Peter Zijlstra @ 2012-03-13 4:45 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, stepanm, peterz, tglx,
mingo
Commit-ID: 5fbd036b552f633abb394a319f7c62a5c86a9cd7
Gitweb: http://git.kernel.org/tip/5fbd036b552f633abb394a319f7c62a5c86a9cd7
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Thu, 15 Dec 2011 17:09:22 +0100
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 12 Mar 2012 20:43:15 +0100
sched: Cleanup cpu_active madness
Stepan found:
CPU0 CPUn
_cpu_up()
__cpu_up()
boostrap()
notify_cpu_starting()
set_cpu_online()
while (!cpu_active())
cpu_relax()
<PREEMPT-out>
smp_call_function(.wait=1)
/* we find cpu_online() is true */
arch_send_call_function_ipi_mask()
/* wait-forever-more */
<PREEMPT-in>
local_irq_enable()
cpu_notify(CPU_ONLINE)
sched_cpu_active()
set_cpu_active()
Now the purpose of cpu_active is mostly with bringing down a cpu, where
we mark it !active to avoid the load-balancer from moving tasks to it
while we tear down the cpu. This is required because we only update the
sched_domain tree after we brought the cpu-down. And this is needed so
that some tasks can still run while we bring it down, we just don't want
new tasks to appear.
On cpu-up however the sched_domain tree doesn't yet include the new cpu,
so its invisible to the load-balancer, regardless of the active state.
So instead of setting the active state after we boot the new cpu (and
consequently having to wait for it before enabling interrupts) set the
cpu active before we set it online and avoid the whole mess.
Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/arm/kernel/smp.c | 7 -------
arch/hexagon/kernel/smp.c | 2 --
arch/s390/kernel/smp.c | 6 ------
arch/x86/kernel/smpboot.c | 13 -------------
kernel/sched/core.c | 2 +-
5 files changed, 1 insertions(+), 29 deletions(-)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index cdeb727..d616ed5 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -295,13 +295,6 @@ asmlinkage void __cpuinit secondary_start_kernel(void)
*/
percpu_timer_setup();
- while (!cpu_active(cpu))
- cpu_relax();
-
- /*
- * cpu_active bit is set, so it's safe to enalbe interrupts
- * now.
- */
local_irq_enable();
local_fiq_enable();
diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c
index c871a2c..0123c63 100644
--- a/arch/hexagon/kernel/smp.c
+++ b/arch/hexagon/kernel/smp.c
@@ -179,8 +179,6 @@ void __cpuinit start_secondary(void)
printk(KERN_INFO "%s cpu %d\n", __func__, current_thread_info()->cpu);
set_cpu_online(cpu, true);
- while (!cpumask_test_cpu(cpu, cpu_active_mask))
- cpu_relax();
local_irq_enable();
cpu_idle();
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 2398ce6..b0e28c4 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -550,12 +550,6 @@ int __cpuinit start_secondary(void *cpuvoid)
S390_lowcore.restart_psw.addr =
PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler;
__ctl_set_bit(0, 28); /* Enable lowcore protection */
- /*
- * Wait until the cpu which brought this one up marked it
- * active before enabling interrupts.
- */
- while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
- cpu_relax();
local_irq_enable();
/* cpu_idle will call schedule for us */
cpu_idle();
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 66d250c..58f7816 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -291,19 +291,6 @@ notrace static void __cpuinit start_secondary(void *unused)
per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
x86_platform.nmi_init();
- /*
- * Wait until the cpu which brought this one up marked it
- * online before enabling interrupts. If we don't do that then
- * we can end up waking up the softirq thread before this cpu
- * reached the active state, which makes the scheduler unhappy
- * and schedule the softirq thread on the wrong cpu. This is
- * only observable with forced threaded interrupts, but in
- * theory it could also happen w/o them. It's just way harder
- * to achieve.
- */
- while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
- cpu_relax();
-
/* enable local interrupts */
local_irq_enable();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9554512..b1ccce8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5410,7 +5410,7 @@ static int __cpuinit sched_cpu_active(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
switch (action & ~CPU_TASKS_FROZEN) {
- case CPU_ONLINE:
+ case CPU_STARTING:
case CPU_DOWN_FAILED:
set_cpu_active((long)hcpu, true);
return NOTIFY_OK;
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-03-13 4:45 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20110908215314.829452535@linutronix.de>
[not found] ` <CADGdYn6V_eQJZ4=irMbSS7XJj4MJVeALxcrWdD7o2bGs1sEJZQ@mail.gmail.com>
[not found] ` <20110913133258.GA6267@n2100.arm.linux.org.uk>
[not found] ` <CAKfTPtCWuiOM5+sOwdh1FzXZb1EFT__tVEpRPY5suGOxpYTdQQ@mail.gmail.com>
[not found] ` <20110913175312.GB6267@n2100.arm.linux.org.uk>
[not found] ` <20110923084001.GP17169@n2100.arm.linux.org.uk>
[not found] ` <01d501cc84d6$62720890$275619b0$%kim@samsung.com>
[not found] ` <alpine.LFD.2.02.1110071354310.3240@ionos>
[not found] ` <CAPz4a6CkOXp6LHX2hFYVBhXWT4rhCUzLyYTLbO7g8u_jYcHt7Q@mail.gmail.com>
[not found] ` <CAPz4a6C7md+k8pCz6mJtB-v_1h-faGxQZkOqU9vHUtcG-jrYVQ@mail.gmail.com>
[not found] ` <4EC2DF93.2050904@codeaurora.org>
2011-11-15 22:00 ` Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active Thomas Gleixner
2011-12-14 0:13 ` Dima Zavin
2011-12-14 0:26 ` Thomas Gleixner
2011-12-15 16:09 ` Peter Zijlstra
2012-03-13 4:45 ` [tip:sched/core] sched: Cleanup cpu_active madness tip-bot for Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox