* [PATCH] x86/smpboot: Check for cpu_active on cpu initialization @ 2015-07-16 9:17 Joerg Roedel 2015-07-20 14:46 ` Borislav Petkov 0 siblings, 1 reply; 9+ messages in thread From: Joerg Roedel @ 2015-07-16 9:17 UTC (permalink / raw) To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin Cc: x86, Borislav Petkov, linux-kernel, Joerg Roedel From: Joerg Roedel <jroedel@suse.de> Currently the code to bring up secondary CPUs only checks for cpu_online before it proceeds with launching the per-cpu threads for the freshly booted remote CPU. But the code to move these threads to the new CPU checks for cpu_active to do so. If this check fails the threads end up on the wrong CPU, causing warnings and bugs like: WARNING: CPU: 0 PID: 1 at ../kernel/workqueue.c:4417 workqueue_cpu_up_callback and/or: kernel BUG at ../kernel/smpboot.c:135! The reason is that the cpu_active bit for the new CPU becomes visible significantly later than the cpu_online bit. The reasons could be that the kernel runs in a KVM guest, where the vCPU thread gets preempted when the cpu_online bit is set, but with cpu_active still clear. But this could also happen on bare-metal systems with lots of CPUs. We have observed this issue on an 88 core x86 system on bare-metal. To fix this issue, wait before the remote CPU is online *and* active before launching the per-cpu threads. Signed-off-by: Joerg Roedel <jroedel@suse.de> --- arch/x86/kernel/smpboot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index d3010aa..30b7b8b 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1006,7 +1006,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle) check_tsc_sync_source(cpu); local_irq_restore(flags); - while (!cpu_online(cpu)) { + while (!cpu_online(cpu) || !cpu_active(cpu)) { cpu_relax(); touch_nmi_watchdog(); } -- 1.9.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/smpboot: Check for cpu_active on cpu initialization 2015-07-16 9:17 [PATCH] x86/smpboot: Check for cpu_active on cpu initialization Joerg Roedel @ 2015-07-20 14:46 ` Borislav Petkov 2015-07-20 15:02 ` Joerg Roedel 0 siblings, 1 reply; 9+ messages in thread From: Borislav Petkov @ 2015-07-20 14:46 UTC (permalink / raw) To: Joerg Roedel Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Joerg Roedel On Thu, Jul 16, 2015 at 11:17:17AM +0200, Joerg Roedel wrote: > From: Joerg Roedel <jroedel@suse.de> > > Currently the code to bring up secondary CPUs only checks > for cpu_online before it proceeds with launching the per-cpu > threads for the freshly booted remote CPU. > > But the code to move these threads to the new CPU checks for > cpu_active to do so. If this check fails the threads end up > on the wrong CPU, causing warnings and bugs like: > > WARNING: CPU: 0 PID: 1 at ../kernel/workqueue.c:4417 workqueue_cpu_up_callback > > and/or: > > kernel BUG at ../kernel/smpboot.c:135! > > The reason is that the cpu_active bit for the new CPU > becomes visible significantly later than the cpu_online bit. I see void set_cpu_online(unsigned int cpu, bool online) { if (online) { cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits)); cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits)); } else { which is called in start_secondary(). Do you mean that setting the bit in cpu_active_mask gets delayed soo much? Because it comes right after setting the bit in cpu_online_mask. > The reasons could be that the kernel runs in a KVM guest, > where the vCPU thread gets preempted when the cpu_online bit > is set, but with cpu_active still clear. > > But this could also happen on bare-metal systems with lots > of CPUs. We have observed this issue on an 88 core x86 > system on bare-metal. > > To fix this issue, wait before the remote CPU is online > *and* active before launching the per-cpu threads. > > Signed-off-by: Joerg Roedel <jroedel@suse.de> > --- > arch/x86/kernel/smpboot.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index d3010aa..30b7b8b 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -1006,7 +1006,7 @@ int native_cpu_up(unsigned int cpu, struct task_struct *tidle) > check_tsc_sync_source(cpu); > local_irq_restore(flags); > > - while (!cpu_online(cpu)) { > + while (!cpu_online(cpu) || !cpu_active(cpu)) { > cpu_relax(); > touch_nmi_watchdog(); Maybe we should just swap the calls in set_cpu_online() instead? I.e., if (online) { cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits)); cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits)); } ? I see cpu_online() being called much more than cpu_active()... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/smpboot: Check for cpu_active on cpu initialization 2015-07-20 14:46 ` Borislav Petkov @ 2015-07-20 15:02 ` Joerg Roedel 2015-07-20 15:10 ` Borislav Petkov 0 siblings, 1 reply; 9+ messages in thread From: Joerg Roedel @ 2015-07-20 15:02 UTC (permalink / raw) To: Borislav Petkov Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel On Mon, Jul 20, 2015 at 04:46:19PM +0200, Borislav Petkov wrote: > On Thu, Jul 16, 2015 at 11:17:17AM +0200, Joerg Roedel wrote: > > The reason is that the cpu_active bit for the new CPU > > becomes visible significantly later than the cpu_online bit. > > I see > > void set_cpu_online(unsigned int cpu, bool online) > { > if (online) { > cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits)); > cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits)); > } else { > > which is called in start_secondary(). > > Do you mean that setting the bit in cpu_active_mask gets delayed soo > much? Because it comes right after setting the bit in cpu_online_mask. Yes, cpu_active becomes either set a lot later in a KVM guest, when the host decides to preempt the vCPU right after setting cpu_online, but before cpu_active is set, or on bare-metal. I have seen a report where this happens on bare metal, when the change to the cpu_active bit becomes visible on the other CPU significantly later than the the cpu_online bit. This happened on a pretty big machine with 88 cores. Joerg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/smpboot: Check for cpu_active on cpu initialization 2015-07-20 15:02 ` Joerg Roedel @ 2015-07-20 15:10 ` Borislav Petkov 2015-07-20 15:18 ` Joerg Roedel 0 siblings, 1 reply; 9+ messages in thread From: Borislav Petkov @ 2015-07-20 15:10 UTC (permalink / raw) To: Joerg Roedel Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel On Mon, Jul 20, 2015 at 05:02:40PM +0200, Joerg Roedel wrote: > I have seen a report where this happens on bare metal, when the change > to the cpu_active bit becomes visible on the other CPU significantly > later than the the cpu_online bit. This happened on a pretty big machine > with 88 cores. So how about what I proposed at the end of my previous mail? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/smpboot: Check for cpu_active on cpu initialization 2015-07-20 15:10 ` Borislav Petkov @ 2015-07-20 15:18 ` Joerg Roedel 2015-07-20 15:27 ` Borislav Petkov 0 siblings, 1 reply; 9+ messages in thread From: Joerg Roedel @ 2015-07-20 15:18 UTC (permalink / raw) To: Borislav Petkov Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel On Mon, Jul 20, 2015 at 05:10:00PM +0200, Borislav Petkov wrote: > On Mon, Jul 20, 2015 at 05:02:40PM +0200, Joerg Roedel wrote: > > I have seen a report where this happens on bare metal, when the change > > to the cpu_active bit becomes visible on the other CPU significantly > > later than the the cpu_online bit. This happened on a pretty big machine > > with 88 cores. > > So how about what I proposed at the end of my previous mail? Oh sorry, I missed that. Setting cpu_active first should work on x86, where writes become visible in the order they are executed. But this function is in generic code and I have no idea what this change might break on other architectures. In the end cpu_active means that the scheduler can push tasks to the CPU, no idea if some arch code breaks when the scheduler is already working on a CPU before it becomes visibly online. Joerg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86/smpboot: Check for cpu_active on cpu initialization 2015-07-20 15:18 ` Joerg Roedel @ 2015-07-20 15:27 ` Borislav Petkov 2015-07-27 18:21 ` [PATCH] sched: fix cpu_active_mask/cpu_online_mask race Jan H. Schönherr 0 siblings, 1 reply; 9+ messages in thread From: Borislav Petkov @ 2015-07-20 15:27 UTC (permalink / raw) To: Joerg Roedel, Peter Zijlstra Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel On Mon, Jul 20, 2015 at 05:18:31PM +0200, Joerg Roedel wrote: > On Mon, Jul 20, 2015 at 05:10:00PM +0200, Borislav Petkov wrote: > > On Mon, Jul 20, 2015 at 05:02:40PM +0200, Joerg Roedel wrote: > > > I have seen a report where this happens on bare metal, when the change > > > to the cpu_active bit becomes visible on the other CPU significantly > > > later than the the cpu_online bit. This happened on a pretty big machine > > > with 88 cores. > > > > So how about what I proposed at the end of my previous mail? > > Oh sorry, I missed that. Setting cpu_active first should work on x86, > where writes become visible in the order they are executed. But this > function is in generic code and I have no idea what this change might > break on other architectures. > > In the end cpu_active means that the scheduler can push tasks to the > CPU, no idea if some arch code breaks when the scheduler is already > working on a CPU before it becomes visibly online. Hmm... Let's run it by Peter. @Peter: see the first patch in the mail for the problem of which cpumask bit to test wrt scheduler and migrating tasks to newly appearing cores... Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] sched: fix cpu_active_mask/cpu_online_mask race 2015-07-20 15:27 ` Borislav Petkov @ 2015-07-27 18:21 ` Jan H. Schönherr 2015-07-30 16:00 ` Peter Zijlstra 0 siblings, 1 reply; 9+ messages in thread From: Jan H. Schönherr @ 2015-07-27 18:21 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra Cc: Thomas Gleixner, H. Peter Anvin, x86, Joerg Roedel, Borislav Petkov, linux-kernel, Lai Jiangshan, Michael Ellerman, Anton Blanchard, Matt Wilson, Jan H. Schönherr From: Jan H. Schönherr <jschoenh@amazon.de> There is a race condition in SMP bootup code, which may result in WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418 workqueue_cpu_up_callback() or kernel BUG at kernel/smpboot.c:135! It can be triggered with a bit of luck in Linux guests running on busy hosts. CPU0 CPUn ==== ==== _cpu_up() __cpu_up() start_secondary() set_cpu_online() cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits)); cpu_notify(CPU_ONLINE) <do stuff, see below> cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits)); During the various CPU_ONLINE callbacks CPUn is online but not active. Several things can go wrong at that point, depending on the scheduling of tasks on CPU0. Variant 1: cpu_notify(CPU_ONLINE) workqueue_cpu_up_callback() rebind_workers() set_cpus_allowed_ptr() This call fails because it requires an active CPU; rebind_workers() ends with a warning: WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418 workqueue_cpu_up_callback() Variant 2: cpu_notify(CPU_ONLINE) smpboot_thread_call() smpboot_unpark_threads() .. __kthread_unpark() __kthread_bind() wake_up_state() .. select_task_rq() select_fallback_rq() The ->wake_cpu of the unparked thread is not allowed, making a call to select_fallback_rq() necessary. Then, select_fallback_rq() cannot find an allowed, active CPU and promptly resets the allowed CPUs, so that the task in question ends up on CPU0. When those unparked tasks are eventually executed, they run immediately into a BUG: kernel BUG at kernel/smpboot.c:135! Just changing the order in which the online/active bits are set (and adding some memory barriers), would solve the two issues above. However, it would change the order of operations back to the one before commit 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()"), thus, reintroducing that particular problem. Going further back into history, we have at least the following commits touching this topic: - commit 2baab4e90495 ("sched: Fix select_fallback_rq() vs cpu_active/cpu_online") - commit 5fbd036b552f ("sched: Cleanup cpu_active madness") Together, these give us the following non-working solutions: - secondary CPU sets active before online, because active is assumed to be a subset of online; - secondary CPU sets online before active, because the primary CPU assumes that an online CPU is also active; - secondary CPU sets online and waits for primary CPU to set active, because it might deadlock. Commit 875ebe940d77 ("powerpc/smp: Wait until secondaries are active & online") introduces an arch-specific solution to this arch-independent problem. Now, go for a more general solution without explicit waiting and simply set active twice: once on the secondary CPU after online was set and once on the primary CPU after online was seen. Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()") Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> --- kernel/sched/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 78b4bad10..155bb4a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5433,6 +5433,7 @@ static int sched_cpu_active(struct notifier_block *nfb, case CPU_STARTING: set_cpu_rq_start_time(); return NOTIFY_OK; + case CPU_ONLINE: case CPU_DOWN_FAILED: set_cpu_active((long)hcpu, true); return NOTIFY_OK; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] sched: fix cpu_active_mask/cpu_online_mask race 2015-07-27 18:21 ` [PATCH] sched: fix cpu_active_mask/cpu_online_mask race Jan H. Schönherr @ 2015-07-30 16:00 ` Peter Zijlstra 2015-07-30 19:17 ` [PATCH v2] " Jan H. Schönherr 0 siblings, 1 reply; 9+ messages in thread From: Peter Zijlstra @ 2015-07-30 16:00 UTC (permalink / raw) To: Jan H. Schönherr Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86, Joerg Roedel, Borislav Petkov, linux-kernel, Lai Jiangshan, Michael Ellerman, Anton Blanchard, Matt Wilson On Mon, Jul 27, 2015 at 08:21:28PM +0200, Jan H. Schönherr wrote: > From: Jan H. Schönherr <jschoenh@amazon.de> > > Together, these give us the following non-working solutions: > - secondary CPU sets active before online, because active is assumed to > be a subset of online; > - secondary CPU sets online before active, because the primary CPU > assumes that an online CPU is also active; > - secondary CPU sets online and waits for primary CPU to set active, > because it might deadlock. > > Commit 875ebe940d77 ("powerpc/smp: Wait until secondaries are active & > online") introduces an arch-specific solution to this arch-independent > problem. > > Now, go for a more general solution without explicit waiting and simply > set active twice: once on the secondary CPU after online was set and > once on the primary CPU after online was seen. Very vile indeed ;-) > Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()") > Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > --- > kernel/sched/core.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 78b4bad10..155bb4a 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5433,6 +5433,7 @@ static int sched_cpu_active(struct notifier_block *nfb, > case CPU_STARTING: > set_cpu_rq_start_time(); > return NOTIFY_OK; > + case CPU_ONLINE: Not without a big honking comment right there :-) > case CPU_DOWN_FAILED: > set_cpu_active((long)hcpu, true); > return NOTIFY_OK; ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v2] sched: fix cpu_active_mask/cpu_online_mask race 2015-07-30 16:00 ` Peter Zijlstra @ 2015-07-30 19:17 ` Jan H. Schönherr 0 siblings, 0 replies; 9+ messages in thread From: Jan H. Schönherr @ 2015-07-30 19:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Ingo Molnar, Thomas Gleixner, H. Peter Anvin, x86, Joerg Roedel, Borislav Petkov, linux-kernel, Michael Ellerman, Anton Blanchard, Matt Wilson, Jan H. Schönherr From: Jan H. Schönherr <jschoenh@amazon.de> There is a race condition in SMP bootup code, which may result in WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418 workqueue_cpu_up_callback() or kernel BUG at kernel/smpboot.c:135! It can be triggered with a bit of luck in Linux guests running on busy hosts. CPU0 CPUn ==== ==== _cpu_up() __cpu_up() start_secondary() set_cpu_online() cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits)); cpu_notify(CPU_ONLINE) <do stuff, see below> cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits)); During the various CPU_ONLINE callbacks CPUn is online but not active. Several things can go wrong at that point, depending on the scheduling of tasks on CPU0. Variant 1: cpu_notify(CPU_ONLINE) workqueue_cpu_up_callback() rebind_workers() set_cpus_allowed_ptr() This call fails because it requires an active CPU; rebind_workers() ends with a warning: WARNING: CPU: 0 PID: 1 at kernel/workqueue.c:4418 workqueue_cpu_up_callback() Variant 2: cpu_notify(CPU_ONLINE) smpboot_thread_call() smpboot_unpark_threads() .. __kthread_unpark() __kthread_bind() wake_up_state() .. select_task_rq() select_fallback_rq() The ->wake_cpu of the unparked thread is not allowed, making a call to select_fallback_rq() necessary. Then, select_fallback_rq() cannot find an allowed, active CPU and promptly resets the allowed CPUs, so that the task in question ends up on CPU0. When those unparked tasks are eventually executed, they run immediately into a BUG: kernel BUG at kernel/smpboot.c:135! Just changing the order in which the online/active bits are set (and adding some memory barriers), would solve the two issues above. However, it would change the order of operations back to the one before commit 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()"), thus, reintroducing that particular problem. Going further back into history, we have at least the following commits touching this topic: - commit 2baab4e90495 ("sched: Fix select_fallback_rq() vs cpu_active/cpu_online") - commit 5fbd036b552f ("sched: Cleanup cpu_active madness") Together, these give us the following non-working solutions: - secondary CPU sets active before online, because active is assumed to be a subset of online; - secondary CPU sets online before active, because the primary CPU assumes that an online CPU is also active; - secondary CPU sets online and waits for primary CPU to set active, because it might deadlock. Commit 875ebe940d77 ("powerpc/smp: Wait until secondaries are active & online") introduces an arch-specific solution to this arch-independent problem. Now, go for a more general solution without explicit waiting and simply set active twice: once on the secondary CPU after online was set and once on the primary CPU after online was seen. Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()") Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> --- v2: - Big honking comment in the CPU_ONLINE path for Peter. :) kernel/sched/core.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 78b4bad10..e967343 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5433,6 +5433,14 @@ static int sched_cpu_active(struct notifier_block *nfb, case CPU_STARTING: set_cpu_rq_start_time(); return NOTIFY_OK; + case CPU_ONLINE: + /* + * At this point a starting CPU has marked itself as online via + * set_cpu_online(). But it might not yet have marked itself + * as active, which is essential from here on. + * + * Thus, fall-through and help the starting CPU along. + */ case CPU_DOWN_FAILED: set_cpu_active((long)hcpu, true); return NOTIFY_OK; -- 2.4.6.1.gea4e83c ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-07-30 19:17 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-16 9:17 [PATCH] x86/smpboot: Check for cpu_active on cpu initialization Joerg Roedel 2015-07-20 14:46 ` Borislav Petkov 2015-07-20 15:02 ` Joerg Roedel 2015-07-20 15:10 ` Borislav Petkov 2015-07-20 15:18 ` Joerg Roedel 2015-07-20 15:27 ` Borislav Petkov 2015-07-27 18:21 ` [PATCH] sched: fix cpu_active_mask/cpu_online_mask race Jan H. Schönherr 2015-07-30 16:00 ` Peter Zijlstra 2015-07-30 19:17 ` [PATCH v2] " Jan H. Schönherr
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).