From: Anton Blanchard <anton@samba.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: peterz@infradead.org, yuanhan.liu@linux.intel.com,
bsegall@google.com, paulus@samba.org, rafael.j.wysocki@intel.com,
torvalds@linux-foundation.org, mingo@redhat.com, pjt@google.com,
yuyang.du@intel.com, daniel@numascale.com, rostedt@goodmis.org,
subbaram@codeaurora.org, tglx@linutronix.de,
fengguang.wu@intel.com, linuxppc-dev@lists.ozlabs.org,
linux-kernel@vger.kernel.org, sp@datera.io, tj@kernel.org,
akpm@linux-foundation.org, computersforpeace@gmail.com,
lkp@01.org
Subject: [PATCH] powerpc: secondary CPUs signal to master before setting active and online (fixes kernel BUG at kernel/smpboot.c:134!)
Date: Tue, 9 Dec 2014 10:58:19 +1100 [thread overview]
Message-ID: <20141209105819.0e847b4b@kryten> (raw)
In-Reply-To: <20141208211859.6e81ec81@kryten>
Hi Ingo,
> At that point I thought the previous task_cpu() was somewhat ingrained
> in the scheduler and came up with the patch. If not, we could go on a
> hunt to see what else needs fixing.
I had another look. The scheduled does indeed make assumptions about the
previous task_cpu, but we have a hammer to fix it up called
select_fallback_rq.
I annotated select_fallback_rq, and did hit a case where the CPU was
not active. ppc64 patch below.
I think x86 have a similar (although harder to hit) issue. While it
does wait for the cpu_online bit to be set:
while (!cpu_online(cpu)) {
cpu_relax();
touch_nmi_watchdog();
}
The cpu_active bit is set after the cpu_online bit:
void set_cpu_online(unsigned int cpu, bool online)
{
if (online) {
cpumask_set_cpu(cpu, to_cpumask(cpu_online_bits));
cpumask_set_cpu(cpu, to_cpumask(cpu_active_bits));
If the CPU got delayed between the two stores (eg a KVM guest had the CPU
scheduled out), then we'd end up with cpu_active unset and hit the same
issue in select_fallback_rq.
Anton
--
I have a busy ppc64le KVM box where guests sometimes hit the infamous
"kernel BUG at kernel/smpboot.c:134!" issue during boot:
BUG_ON(td->cpu != smp_processor_id());
Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
output confirms it:
CPU: 0
Comm: watchdog/130
The problem is that we aren't ensuring the CPU active and online bits are set
before allowing the master to continue on. The master unparks the secondary
CPUs kthreads and the scheduler looks for a CPU to run on. It calls
select_task_rq and realises the suggested CPU is not in the cpus_allowed
mask. It then ends up in select_fallback_rq, and since the active and
online bits aren't set we choose some other CPU to run on.
Cc: stable@vger.kernel.org
Signed-off-by: Anton Blanchard <anton@samba.org>
---
arch/powerpc/kernel/smp.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 71e186d..d40e46e 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -700,7 +700,6 @@ void start_secondary(void *unused)
smp_store_cpu_info(cpu);
set_dec(tb_ticks_per_jiffy);
preempt_disable();
- cpu_callin_map[cpu] = 1;
if (smp_ops->setup_cpu)
smp_ops->setup_cpu(cpu);
@@ -739,6 +738,14 @@ void start_secondary(void *unused)
notify_cpu_starting(cpu);
set_cpu_online(cpu, true);
+ /*
+ * CPU must be marked active and online before we signal back to the
+ * master, because the scheduler needs to see the cpu_online and
+ * cpu_active bits set.
+ */
+ smp_wmb();
+ cpu_callin_map[cpu] = 1;
+
local_irq_enable();
cpu_startup_entry(CPUHP_ONLINE);
--
2.1.0
next prev parent reply other threads:[~2014-12-08 23:58 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-08 3:27 [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!) Anton Blanchard
2014-12-08 4:28 ` Linus Torvalds
2014-12-08 4:46 ` Anton Blanchard
2014-12-08 8:34 ` Ingo Molnar
2014-12-08 10:18 ` Anton Blanchard
2014-12-08 23:58 ` Anton Blanchard [this message]
2014-12-09 20:54 ` [PATCH] powerpc: secondary CPUs signal to master before setting active and online " Linus Torvalds
2014-12-10 14:08 ` Thomas Gleixner
2014-12-10 23:06 ` Michael Ellerman
2014-12-08 13:54 ` [PATCH] kthread: kthread_bind fails to enforce CPU affinity " Steven Rostedt
2014-12-09 2:24 ` Lai Jiangshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141209105819.0e847b4b@kryten \
--to=anton@samba.org \
--cc=akpm@linux-foundation.org \
--cc=bsegall@google.com \
--cc=computersforpeace@gmail.com \
--cc=daniel@numascale.com \
--cc=fengguang.wu@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lkp@01.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=rafael.j.wysocki@intel.com \
--cc=rostedt@goodmis.org \
--cc=sp@datera.io \
--cc=subbaram@codeaurora.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=yuanhan.liu@linux.intel.com \
--cc=yuyang.du@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).