* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active [not found] ` <4EC2DF93.2050904@codeaurora.org> @ 2011-11-15 22:00 ` Thomas Gleixner 2011-12-14 0:13 ` Dima Zavin 0 siblings, 1 reply; 5+ messages in thread From: Thomas Gleixner @ 2011-11-15 22:00 UTC (permalink / raw) To: Stepan Moskovchenko Cc: Dima Zavin, Kukjin Kim, Vincent Guittot, Frank Rowand, amit kachhap, Colin Cross, Russell King - ARM Linux, chaos.youn, LAK, Peter Zijlstra, LKML On Tue, 15 Nov 2011, Stepan Moskovchenko wrote: > I am seeing a deadlock when executing hotplug operations with this patch > applied. When the secondary CPU gets brought up in _cpu_up, the cpu is turned > on > and then the online notifier gets called, which is what marks the secondary > CPU > as active. If _cpu_up on the primary CPU is preempted before the secondary CPU > is marked active, it is possible that the primary CPU will want to call > smp_call_function (or send an IPI) to the secondary CPU because it is marked > online. However, with this patch, the secondary CPU is still spinning on > !cpu_active(cpu) > with interrupts disabled. So, the primary CPU is now stuck in csd_lock_wait(), > waiting for the secondary CPU to respond, while the secondary CPU spins with > interrupts disabled, waiting for the primary CPU to mark it as active. So, > while > your approach to not call smp_function_single may work for you in your > specific > case, I believe there is still a problem in the general case. > > One suggestion for resolving this might be making smp_call_function look at > the > active CPUs rather than online CPUs, or to just let the secondary CPU mark > itself as active rather than having the primary CPU do this, though this might > defeat the original intended purpose of the active mask. What a mess. I'll have a look tomorrow. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active 2011-11-15 22:00 ` Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active Thomas Gleixner @ 2011-12-14 0:13 ` Dima Zavin 2011-12-14 0:26 ` Thomas Gleixner 0 siblings, 1 reply; 5+ messages in thread From: Dima Zavin @ 2011-12-14 0:13 UTC (permalink / raw) To: Thomas Gleixner Cc: Stepan Moskovchenko, Kukjin Kim, Vincent Guittot, Frank Rowand, amit kachhap, Colin Cross, Russell King - ARM Linux, chaos.youn, LAK, Peter Zijlstra, LKML Thomas, Did you ever get a chance to look into this further? Thanks in advance. --Dima On Tue, Nov 15, 2011 at 2:00 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > On Tue, 15 Nov 2011, Stepan Moskovchenko wrote: >> I am seeing a deadlock when executing hotplug operations with this patch >> applied. When the secondary CPU gets brought up in _cpu_up, the cpu is turned >> on >> and then the online notifier gets called, which is what marks the secondary >> CPU >> as active. If _cpu_up on the primary CPU is preempted before the secondary CPU >> is marked active, it is possible that the primary CPU will want to call >> smp_call_function (or send an IPI) to the secondary CPU because it is marked >> online. However, with this patch, the secondary CPU is still spinning on >> !cpu_active(cpu) >> with interrupts disabled. So, the primary CPU is now stuck in csd_lock_wait(), >> waiting for the secondary CPU to respond, while the secondary CPU spins with >> interrupts disabled, waiting for the primary CPU to mark it as active. So, >> while >> your approach to not call smp_function_single may work for you in your >> specific >> case, I believe there is still a problem in the general case. >> >> One suggestion for resolving this might be making smp_call_function look at >> the >> active CPUs rather than online CPUs, or to just let the secondary CPU mark >> itself as active rather than having the primary CPU do this, though this might >> defeat the original intended purpose of the active mask. > > What a mess. I'll have a look tomorrow. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active 2011-12-14 0:13 ` Dima Zavin @ 2011-12-14 0:26 ` Thomas Gleixner 2011-12-15 16:09 ` Peter Zijlstra 0 siblings, 1 reply; 5+ messages in thread From: Thomas Gleixner @ 2011-12-14 0:26 UTC (permalink / raw) To: Dima Zavin Cc: Stepan Moskovchenko, Kukjin Kim, Vincent Guittot, Frank Rowand, amit kachhap, Colin Cross, Russell King - ARM Linux, chaos.youn, LAK, Peter Zijlstra, LKML On Tue, 13 Dec 2011, Dima Zavin wrote: > Thomas, > > Did you ever get a chance to look into this further? Yes, Peter and I are still debating the proper solution as this does not only affect ARM. It looks like most of arch/* implementations are hosed in one way or the other. Funny enough Peter and I put this on to the tomorrow todo list a few hours ago. Thanks, tglx ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active 2011-12-14 0:26 ` Thomas Gleixner @ 2011-12-15 16:09 ` Peter Zijlstra 2012-03-13 4:45 ` [tip:sched/core] sched: Cleanup cpu_active madness tip-bot for Peter Zijlstra 0 siblings, 1 reply; 5+ messages in thread From: Peter Zijlstra @ 2011-12-15 16:09 UTC (permalink / raw) To: Thomas Gleixner Cc: Dima Zavin, Stepan Moskovchenko, Kukjin Kim, Vincent Guittot, Frank Rowand, amit kachhap, Colin Cross, Russell King - ARM Linux, chaos.youn, LAK, LKML With the note that we really should clean up the cpu hotplug code, there's far too much code duplication in the arch/ implementations. The below patch looks like it might actually work. Thomas, others, please have a very close look because I might have thought too hard and missed the obvious :-) --- Subject: hotplug, sched: cpu_active vs __cpu_up Stepan found: CPU0 CPUn _cpu_up() __cpu_up() boostrap() notify_cpu_starting() set_cpu_online() while (!cpu_active()) cpu_relax() <PREEMPT-out> smp_call_function(.wait=1) /* we find cpu_online() is true */ arch_send_call_function_ipi_mask() /* wait-forever-more */ <PREEMPT-in> local_irq_enable() cpu_notify(CPU_ONLINE) sched_cpu_active() set_cpu_active() Now the purpose of cpu_active is mostly with bringing down a cpu, where we mark it !active to avoid the load-balancer from moving tasks to it while we tear down the cpu. This is required because we only update the sched_domain tree after we brought the cpu-down. And this is needed so that some tasks can still run while we bring it down, we just don't want new tasks to appear. On cpu-up however the sched_domain tree doesn't yet include the new cpu, so its invisible to the load-balancer, regardless of the active state. So instead of setting the active state after we boot the new cpu (and consequently having to wait for it before enabling interrupts) set the cpu active before we set it online and avoid the whole mess. Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- arch/arm/kernel/smp.c | 7 ------- arch/hexagon/kernel/smp.c | 2 -- arch/s390/kernel/smp.c | 6 ------ arch/x86/kernel/smpboot.c | 13 ------------- kernel/sched/core.c | 2 +- 5 files changed, 1 insertion(+), 29 deletions(-) --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -337,13 +337,6 @@ asmlinkage void __cpuinit secondary_star */ percpu_timer_setup(); - while (!cpu_active(cpu)) - cpu_relax(); - - /* - * cpu_active bit is set, so it's safe to enalbe interrupts - * now. - */ local_irq_enable(); local_fiq_enable(); --- a/arch/hexagon/kernel/smp.c +++ b/arch/hexagon/kernel/smp.c @@ -179,8 +179,6 @@ void __cpuinit start_secondary(void) printk(KERN_INFO "%s cpu %d\n", __func__, current_thread_info()->cpu); set_cpu_online(cpu, true); - while (!cpumask_test_cpu(cpu, cpu_active_mask)) - cpu_relax(); local_irq_enable(); cpu_idle(); --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -518,12 +518,6 @@ int __cpuinit start_secondary(void *cpuv S390_lowcore.restart_psw.addr = PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler; __ctl_set_bit(0, 28); /* Enable lowcore protection */ - /* - * Wait until the cpu which brought this one up marked it - * active before enabling interrupts. - */ - while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) - cpu_relax(); local_irq_enable(); /* cpu_idle will call schedule for us */ cpu_idle(); --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -291,19 +291,6 @@ notrace static void __cpuinit start_seco per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE; x86_platform.nmi_init(); - /* - * Wait until the cpu which brought this one up marked it - * online before enabling interrupts. If we don't do that then - * we can end up waking up the softirq thread before this cpu - * reached the active state, which makes the scheduler unhappy - * and schedule the softirq thread on the wrong cpu. This is - * only observable with forced threaded interrupts, but in - * theory it could also happen w/o them. It's just way harder - * to achieve. - */ - while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) - cpu_relax(); - /* enable local interrupts */ local_irq_enable(); --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5390,7 +5390,7 @@ static int __cpuinit sched_cpu_active(st unsigned long action, void *hcpu) { switch (action & ~CPU_TASKS_FROZEN) { - case CPU_ONLINE: + case CPU_STARTING: case CPU_DOWN_FAILED: set_cpu_active((long)hcpu, true); return NOTIFY_OK; ^ permalink raw reply [flat|nested] 5+ messages in thread
* [tip:sched/core] sched: Cleanup cpu_active madness 2011-12-15 16:09 ` Peter Zijlstra @ 2012-03-13 4:45 ` tip-bot for Peter Zijlstra 0 siblings, 0 replies; 5+ messages in thread From: tip-bot for Peter Zijlstra @ 2012-03-13 4:45 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, a.p.zijlstra, stepanm, peterz, tglx, mingo Commit-ID: 5fbd036b552f633abb394a319f7c62a5c86a9cd7 Gitweb: http://git.kernel.org/tip/5fbd036b552f633abb394a319f7c62a5c86a9cd7 Author: Peter Zijlstra <peterz@infradead.org> AuthorDate: Thu, 15 Dec 2011 17:09:22 +0100 Committer: Ingo Molnar <mingo@elte.hu> CommitDate: Mon, 12 Mar 2012 20:43:15 +0100 sched: Cleanup cpu_active madness Stepan found: CPU0 CPUn _cpu_up() __cpu_up() boostrap() notify_cpu_starting() set_cpu_online() while (!cpu_active()) cpu_relax() <PREEMPT-out> smp_call_function(.wait=1) /* we find cpu_online() is true */ arch_send_call_function_ipi_mask() /* wait-forever-more */ <PREEMPT-in> local_irq_enable() cpu_notify(CPU_ONLINE) sched_cpu_active() set_cpu_active() Now the purpose of cpu_active is mostly with bringing down a cpu, where we mark it !active to avoid the load-balancer from moving tasks to it while we tear down the cpu. This is required because we only update the sched_domain tree after we brought the cpu-down. And this is needed so that some tasks can still run while we bring it down, we just don't want new tasks to appear. On cpu-up however the sched_domain tree doesn't yet include the new cpu, so its invisible to the load-balancer, regardless of the active state. So instead of setting the active state after we boot the new cpu (and consequently having to wait for it before enabling interrupts) set the cpu active before we set it online and avoid the whole mess. Reported-by: Stepan Moskovchenko <stepanm@codeaurora.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1323965362.18942.71.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu> --- arch/arm/kernel/smp.c | 7 ------- arch/hexagon/kernel/smp.c | 2 -- arch/s390/kernel/smp.c | 6 ------ arch/x86/kernel/smpboot.c | 13 ------------- kernel/sched/core.c | 2 +- 5 files changed, 1 insertions(+), 29 deletions(-) diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index cdeb727..d616ed5 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -295,13 +295,6 @@ asmlinkage void __cpuinit secondary_start_kernel(void) */ percpu_timer_setup(); - while (!cpu_active(cpu)) - cpu_relax(); - - /* - * cpu_active bit is set, so it's safe to enalbe interrupts - * now. - */ local_irq_enable(); local_fiq_enable(); diff --git a/arch/hexagon/kernel/smp.c b/arch/hexagon/kernel/smp.c index c871a2c..0123c63 100644 --- a/arch/hexagon/kernel/smp.c +++ b/arch/hexagon/kernel/smp.c @@ -179,8 +179,6 @@ void __cpuinit start_secondary(void) printk(KERN_INFO "%s cpu %d\n", __func__, current_thread_info()->cpu); set_cpu_online(cpu, true); - while (!cpumask_test_cpu(cpu, cpu_active_mask)) - cpu_relax(); local_irq_enable(); cpu_idle(); diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c index 2398ce6..b0e28c4 100644 --- a/arch/s390/kernel/smp.c +++ b/arch/s390/kernel/smp.c @@ -550,12 +550,6 @@ int __cpuinit start_secondary(void *cpuvoid) S390_lowcore.restart_psw.addr = PSW_ADDR_AMODE | (unsigned long) psw_restart_int_handler; __ctl_set_bit(0, 28); /* Enable lowcore protection */ - /* - * Wait until the cpu which brought this one up marked it - * active before enabling interrupts. - */ - while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) - cpu_relax(); local_irq_enable(); /* cpu_idle will call schedule for us */ cpu_idle(); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 66d250c..58f7816 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -291,19 +291,6 @@ notrace static void __cpuinit start_secondary(void *unused) per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE; x86_platform.nmi_init(); - /* - * Wait until the cpu which brought this one up marked it - * online before enabling interrupts. If we don't do that then - * we can end up waking up the softirq thread before this cpu - * reached the active state, which makes the scheduler unhappy - * and schedule the softirq thread on the wrong cpu. This is - * only observable with forced threaded interrupts, but in - * theory it could also happen w/o them. It's just way harder - * to achieve. - */ - while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) - cpu_relax(); - /* enable local interrupts */ local_irq_enable(); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 9554512..b1ccce8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5410,7 +5410,7 @@ static int __cpuinit sched_cpu_active(struct notifier_block *nfb, unsigned long action, void *hcpu) { switch (action & ~CPU_TASKS_FROZEN) { - case CPU_ONLINE: + case CPU_STARTING: case CPU_DOWN_FAILED: set_cpu_active((long)hcpu, true); return NOTIFY_OK; ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-03-13 4:45 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20110908215314.829452535@linutronix.de>
[not found] ` <CADGdYn6V_eQJZ4=irMbSS7XJj4MJVeALxcrWdD7o2bGs1sEJZQ@mail.gmail.com>
[not found] ` <20110913133258.GA6267@n2100.arm.linux.org.uk>
[not found] ` <CAKfTPtCWuiOM5+sOwdh1FzXZb1EFT__tVEpRPY5suGOxpYTdQQ@mail.gmail.com>
[not found] ` <20110913175312.GB6267@n2100.arm.linux.org.uk>
[not found] ` <20110923084001.GP17169@n2100.arm.linux.org.uk>
[not found] ` <01d501cc84d6$62720890$275619b0$%kim@samsung.com>
[not found] ` <alpine.LFD.2.02.1110071354310.3240@ionos>
[not found] ` <CAPz4a6CkOXp6LHX2hFYVBhXWT4rhCUzLyYTLbO7g8u_jYcHt7Q@mail.gmail.com>
[not found] ` <CAPz4a6C7md+k8pCz6mJtB-v_1h-faGxQZkOqU9vHUtcG-jrYVQ@mail.gmail.com>
[not found] ` <4EC2DF93.2050904@codeaurora.org>
2011-11-15 22:00 ` Re: [patch] ARM: smpboot: Enable interrupts after marking CPU online/active Thomas Gleixner
2011-12-14 0:13 ` Dima Zavin
2011-12-14 0:26 ` Thomas Gleixner
2011-12-15 16:09 ` Peter Zijlstra
2012-03-13 4:45 ` [tip:sched/core] sched: Cleanup cpu_active madness tip-bot for Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox