From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: lkp@lists.01.org
Subject: Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
Date: Tue, 09 Dec 2014 10:24:37 +0800 [thread overview]
Message-ID: <54865D65.8030906@cn.fujitsu.com> (raw)
In-Reply-To: <20141208085405.730577a3@gandalf.local.home>
[-- Attachment #1: Type: text/plain, Size: 2776 bytes --]
On 12/08/2014 09:54 PM, Steven Rostedt wrote:
> On Mon, 8 Dec 2014 14:27:01 +1100
> Anton Blanchard <anton@samba.org> wrote:
>
>> I have a busy ppc64le KVM box where guests sometimes hit the infamous
>> "kernel BUG at kernel/smpboot.c:134!" issue during boot:
>>
>> BUG_ON(td->cpu != smp_processor_id());
>>
>> Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
>> output confirms it:
>>
>> CPU: 0
>> Comm: watchdog/130
>>
>> The issue is in kthread_bind where we set the cpus_allowed mask, but do
>> not touch task_thread_info(p)->cpu. The scheduler assumes the previously
>> scheduled CPU is in the cpus_allowed mask, but in this case we are
>> moving a thread to another CPU so it is not.
>>
>
> Does this happen always on boot up, and always with the watchdog thread?
>
> I followed the logic that starts the watchdog threads.
>
> watchdog_enable_all_cpus()
> smpboot_register_percpu-thread() {
>
> for_each_online_cpu(cpu) { ... }
>
> Where watchdog_enable_all_cpus() can be called by
> lockup_detector_init() before SMP is started, but also by
> proc_dowatchdog() which is called by the sysctl commands (after SMP is
> up and running).
>
> I noticed there's no "get_online_cpus()" anywhere, although the
> unregister_percpu_thread() has it. Is it possible that we created a
> thread on a CPU that wasn't fully online yet?
>
> Perhaps the following patch is needed? Even if this isn't the solution
> to this bug, it is probably needed as watchdog_enable_all_cpus() can be
> called after boot up too.
>
> -- Steve
Hi, Steven, tglx
See this https://lkml.org/lkml/2014/7/30/804
"[PATCH] smpboot: add missing get_online_cpus() when register"
Thanks,
Lai
>
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index eb89e1807408..60d35ac5d3f1 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -279,6 +279,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
> unsigned int cpu;
> int ret = 0;
>
> + get_online_cpus();
> mutex_lock(&smpboot_threads_lock);
> for_each_online_cpu(cpu) {
> ret = __smpboot_create_thread(plug_thread, cpu);
> @@ -291,6 +292,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
> list_add(&plug_thread->list, &hotplug_threads);
> out:
> mutex_unlock(&smpboot_threads_lock);
> + put_online_cpus();
> return ret;
> }
> EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo(a)vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> .
>
WARNING: multiple messages have this Message-ID (diff)
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: yuyang.du@intel.com, computersforpeace@gmail.com,
peterz@infradead.org, lkp@01.org, rafael.j.wysocki@intel.com,
yuanhan.liu@linux.intel.com, linux-kernel@vger.kernel.org,
bsegall@google.com, linuxppc-dev@lists.ozlabs.org,
mingo@redhat.com, Anton Blanchard <anton@samba.org>,
sp@datera.io, daniel@numascale.com, tj@kernel.org,
subbaram@codeaurora.org, akpm@linux-foundation.org,
fengguang.wu@intel.com, torvalds@linux-foundation.org,
tglx@linutronix.de, pjt@google.com
Subject: Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
Date: Tue, 9 Dec 2014 10:24:37 +0800 [thread overview]
Message-ID: <54865D65.8030906@cn.fujitsu.com> (raw)
In-Reply-To: <20141208085405.730577a3@gandalf.local.home>
On 12/08/2014 09:54 PM, Steven Rostedt wrote:
> On Mon, 8 Dec 2014 14:27:01 +1100
> Anton Blanchard <anton@samba.org> wrote:
>
>> I have a busy ppc64le KVM box where guests sometimes hit the infamous
>> "kernel BUG at kernel/smpboot.c:134!" issue during boot:
>>
>> BUG_ON(td->cpu != smp_processor_id());
>>
>> Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
>> output confirms it:
>>
>> CPU: 0
>> Comm: watchdog/130
>>
>> The issue is in kthread_bind where we set the cpus_allowed mask, but do
>> not touch task_thread_info(p)->cpu. The scheduler assumes the previously
>> scheduled CPU is in the cpus_allowed mask, but in this case we are
>> moving a thread to another CPU so it is not.
>>
>
> Does this happen always on boot up, and always with the watchdog thread?
>
> I followed the logic that starts the watchdog threads.
>
> watchdog_enable_all_cpus()
> smpboot_register_percpu-thread() {
>
> for_each_online_cpu(cpu) { ... }
>
> Where watchdog_enable_all_cpus() can be called by
> lockup_detector_init() before SMP is started, but also by
> proc_dowatchdog() which is called by the sysctl commands (after SMP is
> up and running).
>
> I noticed there's no "get_online_cpus()" anywhere, although the
> unregister_percpu_thread() has it. Is it possible that we created a
> thread on a CPU that wasn't fully online yet?
>
> Perhaps the following patch is needed? Even if this isn't the solution
> to this bug, it is probably needed as watchdog_enable_all_cpus() can be
> called after boot up too.
>
> -- Steve
Hi, Steven, tglx
See this https://lkml.org/lkml/2014/7/30/804
"[PATCH] smpboot: add missing get_online_cpus() when register"
Thanks,
Lai
>
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index eb89e1807408..60d35ac5d3f1 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -279,6 +279,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
> unsigned int cpu;
> int ret = 0;
>
> + get_online_cpus();
> mutex_lock(&smpboot_threads_lock);
> for_each_online_cpu(cpu) {
> ret = __smpboot_create_thread(plug_thread, cpu);
> @@ -291,6 +292,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
> list_add(&plug_thread->list, &hotplug_threads);
> out:
> mutex_unlock(&smpboot_threads_lock);
> + put_online_cpus();
> return ret;
> }
> EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> .
>
WARNING: multiple messages have this Message-ID (diff)
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Anton Blanchard <anton@samba.org>,
<torvalds@linux-foundation.org>, <akpm@linux-foundation.org>,
<peterz@infradead.org>, <tglx@linutronix.de>, <mingo@redhat.com>,
<tj@kernel.org>, <fengguang.wu@intel.com>,
<rafael.j.wysocki@intel.com>, <yuyang.du@intel.com>, <lkp@01.org>,
<yuanhan.liu@linux.intel.com>, <pjt@google.com>,
<bsegall@google.com>, <daniel@numascale.com>,
<subbaram@codeaurora.org>, <computersforpeace@gmail.com>,
<sp@datera.io>, <linux-kernel@vger.kernel.org>,
<linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
Date: Tue, 9 Dec 2014 10:24:37 +0800 [thread overview]
Message-ID: <54865D65.8030906@cn.fujitsu.com> (raw)
In-Reply-To: <20141208085405.730577a3@gandalf.local.home>
On 12/08/2014 09:54 PM, Steven Rostedt wrote:
> On Mon, 8 Dec 2014 14:27:01 +1100
> Anton Blanchard <anton@samba.org> wrote:
>
>> I have a busy ppc64le KVM box where guests sometimes hit the infamous
>> "kernel BUG at kernel/smpboot.c:134!" issue during boot:
>>
>> BUG_ON(td->cpu != smp_processor_id());
>>
>> Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
>> output confirms it:
>>
>> CPU: 0
>> Comm: watchdog/130
>>
>> The issue is in kthread_bind where we set the cpus_allowed mask, but do
>> not touch task_thread_info(p)->cpu. The scheduler assumes the previously
>> scheduled CPU is in the cpus_allowed mask, but in this case we are
>> moving a thread to another CPU so it is not.
>>
>
> Does this happen always on boot up, and always with the watchdog thread?
>
> I followed the logic that starts the watchdog threads.
>
> watchdog_enable_all_cpus()
> smpboot_register_percpu-thread() {
>
> for_each_online_cpu(cpu) { ... }
>
> Where watchdog_enable_all_cpus() can be called by
> lockup_detector_init() before SMP is started, but also by
> proc_dowatchdog() which is called by the sysctl commands (after SMP is
> up and running).
>
> I noticed there's no "get_online_cpus()" anywhere, although the
> unregister_percpu_thread() has it. Is it possible that we created a
> thread on a CPU that wasn't fully online yet?
>
> Perhaps the following patch is needed? Even if this isn't the solution
> to this bug, it is probably needed as watchdog_enable_all_cpus() can be
> called after boot up too.
>
> -- Steve
Hi, Steven, tglx
See this https://lkml.org/lkml/2014/7/30/804
"[PATCH] smpboot: add missing get_online_cpus() when register"
Thanks,
Lai
>
> diff --git a/kernel/smpboot.c b/kernel/smpboot.c
> index eb89e1807408..60d35ac5d3f1 100644
> --- a/kernel/smpboot.c
> +++ b/kernel/smpboot.c
> @@ -279,6 +279,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
> unsigned int cpu;
> int ret = 0;
>
> + get_online_cpus();
> mutex_lock(&smpboot_threads_lock);
> for_each_online_cpu(cpu) {
> ret = __smpboot_create_thread(plug_thread, cpu);
> @@ -291,6 +292,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread)
> list_add(&plug_thread->list, &hotplug_threads);
> out:
> mutex_unlock(&smpboot_threads_lock);
> + put_online_cpus();
> return ret;
> }
> EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> .
>
next prev parent reply other threads:[~2014-12-09 2:24 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-08 3:27 [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!) Anton Blanchard
2014-12-08 3:27 ` Anton Blanchard
2014-12-08 3:27 ` Anton Blanchard
2014-12-08 4:28 ` Linus Torvalds
2014-12-08 4:28 ` Linus Torvalds
2014-12-08 4:28 ` Linus Torvalds
2014-12-08 4:46 ` Anton Blanchard
2014-12-08 4:46 ` Anton Blanchard
2014-12-08 4:46 ` Anton Blanchard
2014-12-08 8:34 ` Ingo Molnar
2014-12-08 8:34 ` Ingo Molnar
2014-12-08 8:34 ` Ingo Molnar
2014-12-08 10:18 ` Anton Blanchard
2014-12-08 10:18 ` Anton Blanchard
2014-12-08 10:18 ` Anton Blanchard
2014-12-08 23:58 ` [PATCH] powerpc: secondary CPUs signal to master before setting active and online " Anton Blanchard
2014-12-08 23:58 ` Anton Blanchard
2014-12-08 23:58 ` Anton Blanchard
2014-12-09 20:54 ` Linus Torvalds
2014-12-09 20:54 ` Linus Torvalds
2014-12-09 20:54 ` Linus Torvalds
2014-12-10 14:08 ` Thomas Gleixner
2014-12-10 14:08 ` Thomas Gleixner
2014-12-10 14:08 ` Thomas Gleixner
2014-12-10 23:06 ` Michael Ellerman
2014-12-10 23:06 ` Michael Ellerman
2014-12-10 23:06 ` Michael Ellerman
2014-12-08 13:54 ` [PATCH] kthread: kthread_bind fails to enforce CPU affinity " Steven Rostedt
2014-12-08 13:54 ` Steven Rostedt
2014-12-08 13:54 ` Steven Rostedt
2014-12-09 2:24 ` Lai Jiangshan [this message]
2014-12-09 2:24 ` Lai Jiangshan
2014-12-09 2:24 ` Lai Jiangshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54865D65.8030906@cn.fujitsu.com \
--to=laijs@cn.fujitsu.com \
--cc=lkp@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.