All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: lkp@lists.01.org
Subject: Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
Date: Mon, 08 Dec 2014 09:34:08 +0100	[thread overview]
Message-ID: <20141208083408.GA8023@gmail.com> (raw)
In-Reply-To: <1418009221-12719-1-git-send-email-anton@samba.org>

[-- Attachment #1: Type: text/plain, Size: 2607 bytes --]


* Anton Blanchard <anton@samba.org> wrote:

> I have a busy ppc64le KVM box where guests sometimes hit the 
> infamous "kernel BUG at kernel/smpboot.c:134!" issue during 
> boot:
> 
> BUG_ON(td->cpu != smp_processor_id());
> 
> Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
> output confirms it:
> 
> CPU: 0
> Comm: watchdog/130
> 
> The issue is in kthread_bind where we set the cpus_allowed 
> mask, but do not touch task_thread_info(p)->cpu. The scheduler 
> assumes the previously scheduled CPU is in the cpus_allowed 
> mask, but in this case we are moving a thread to another CPU so 
> it is not.
> 
> We used to call set_task_cpu which sets 
> task_thread_info(p)->cpu (in fact kthread_bind still has a 
> comment suggesting this). That was removed in e2912009fb7b 
> ("sched: Ensure set_task_cpu() is never called on blocked 
> tasks").
> 
> Since we cannot call set_task_cpu (the task is in a sleeping 
> state), just do an explicit set of task_thread_info(p)->cpu.

So we cannot call set_task_cpu() because in the normal life time 
of a task the ->cpu value gets set on wakeup. So if a task is 
blocked right now, and its affinity changes, it ought to get a 
correct ->cpu selected on wakeup. The affinity mask and the 
current value of ->cpu getting out of sync is thus 'normal'.

(Check for example how set_cpus_allowed_ptr() works: we first set 
the new allowed mask, then do we migrate the task away if 
necessary.)

In the kthread_bind() case this is explicitly assumed: it only 
calls do_set_cpus_allowed().

But obviously the bug triggers in kernel/smpboot.c, and that 
assert shows a real bug - and your patch makes the assert go 
away, so the question is, how did the kthread get woken up and 
put on a runqueue without its ->cpu getting set?

One possibility is a generic scheduler bug in ttwu(), resulting 
in ->cpu not getting set properly. If this was the case then 
other places would be blowing up as well, and I don't think we 
are seeing this currently, especially not over such a long 
timespan.

Another possibility would be that kthread_bind()'s assumption 
that the task is inactive is false: if the task activates when we 
think it's blocked and we just hotplug-migrate it away while its 
running (setting its td->cpu?), the assert could trigger I think 
- and the patch would make the assert go away.

A third possibility would be, if this is a freshly created 
thread, some sort of initialization race - either in the kthread 
or in the scheduler code.

Weird.

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Anton Blanchard <anton@samba.org>
Cc: yuyang.du@intel.com, computersforpeace@gmail.com,
	peterz@infradead.org, lkp@01.org, rafael.j.wysocki@intel.com,
	yuanhan.liu@linux.intel.com, rostedt@goodmis.org,
	linux-kernel@vger.kernel.org, bsegall@google.com,
	linuxppc-dev@lists.ozlabs.org, mingo@redhat.com, sp@datera.io,
	daniel@numascale.com, tj@kernel.org, subbaram@codeaurora.org,
	akpm@linux-foundation.org, fengguang.wu@intel.com,
	torvalds@linux-foundation.org, tglx@linutronix.de,
	pjt@google.com
Subject: Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
Date: Mon, 8 Dec 2014 09:34:08 +0100	[thread overview]
Message-ID: <20141208083408.GA8023@gmail.com> (raw)
In-Reply-To: <1418009221-12719-1-git-send-email-anton@samba.org>


* Anton Blanchard <anton@samba.org> wrote:

> I have a busy ppc64le KVM box where guests sometimes hit the 
> infamous "kernel BUG at kernel/smpboot.c:134!" issue during 
> boot:
> 
> BUG_ON(td->cpu != smp_processor_id());
> 
> Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
> output confirms it:
> 
> CPU: 0
> Comm: watchdog/130
> 
> The issue is in kthread_bind where we set the cpus_allowed 
> mask, but do not touch task_thread_info(p)->cpu. The scheduler 
> assumes the previously scheduled CPU is in the cpus_allowed 
> mask, but in this case we are moving a thread to another CPU so 
> it is not.
> 
> We used to call set_task_cpu which sets 
> task_thread_info(p)->cpu (in fact kthread_bind still has a 
> comment suggesting this). That was removed in e2912009fb7b 
> ("sched: Ensure set_task_cpu() is never called on blocked 
> tasks").
> 
> Since we cannot call set_task_cpu (the task is in a sleeping 
> state), just do an explicit set of task_thread_info(p)->cpu.

So we cannot call set_task_cpu() because in the normal life time 
of a task the ->cpu value gets set on wakeup. So if a task is 
blocked right now, and its affinity changes, it ought to get a 
correct ->cpu selected on wakeup. The affinity mask and the 
current value of ->cpu getting out of sync is thus 'normal'.

(Check for example how set_cpus_allowed_ptr() works: we first set 
the new allowed mask, then do we migrate the task away if 
necessary.)

In the kthread_bind() case this is explicitly assumed: it only 
calls do_set_cpus_allowed().

But obviously the bug triggers in kernel/smpboot.c, and that 
assert shows a real bug - and your patch makes the assert go 
away, so the question is, how did the kthread get woken up and 
put on a runqueue without its ->cpu getting set?

One possibility is a generic scheduler bug in ttwu(), resulting 
in ->cpu not getting set properly. If this was the case then 
other places would be blowing up as well, and I don't think we 
are seeing this currently, especially not over such a long 
timespan.

Another possibility would be that kthread_bind()'s assumption 
that the task is inactive is false: if the task activates when we 
think it's blocked and we just hotplug-migrate it away while its 
running (setting its td->cpu?), the assert could trigger I think 
- and the patch would make the assert go away.

A third possibility would be, if this is a freshly created 
thread, some sort of initialization race - either in the kthread 
or in the scheduler code.

Weird.

Thanks,

	Ingo

WARNING: multiple messages have this Message-ID (diff)
From: Ingo Molnar <mingo@kernel.org>
To: Anton Blanchard <anton@samba.org>
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com,
	rostedt@goodmis.org, tj@kernel.org, fengguang.wu@intel.com,
	rafael.j.wysocki@intel.com, yuyang.du@intel.com, lkp@01.org,
	yuanhan.liu@linux.intel.com, pjt@google.com, bsegall@google.com,
	daniel@numascale.com, subbaram@codeaurora.org,
	computersforpeace@gmail.com, sp@datera.io,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
Date: Mon, 8 Dec 2014 09:34:08 +0100	[thread overview]
Message-ID: <20141208083408.GA8023@gmail.com> (raw)
In-Reply-To: <1418009221-12719-1-git-send-email-anton@samba.org>


* Anton Blanchard <anton@samba.org> wrote:

> I have a busy ppc64le KVM box where guests sometimes hit the 
> infamous "kernel BUG at kernel/smpboot.c:134!" issue during 
> boot:
> 
> BUG_ON(td->cpu != smp_processor_id());
> 
> Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops
> output confirms it:
> 
> CPU: 0
> Comm: watchdog/130
> 
> The issue is in kthread_bind where we set the cpus_allowed 
> mask, but do not touch task_thread_info(p)->cpu. The scheduler 
> assumes the previously scheduled CPU is in the cpus_allowed 
> mask, but in this case we are moving a thread to another CPU so 
> it is not.
> 
> We used to call set_task_cpu which sets 
> task_thread_info(p)->cpu (in fact kthread_bind still has a 
> comment suggesting this). That was removed in e2912009fb7b 
> ("sched: Ensure set_task_cpu() is never called on blocked 
> tasks").
> 
> Since we cannot call set_task_cpu (the task is in a sleeping 
> state), just do an explicit set of task_thread_info(p)->cpu.

So we cannot call set_task_cpu() because in the normal life time 
of a task the ->cpu value gets set on wakeup. So if a task is 
blocked right now, and its affinity changes, it ought to get a 
correct ->cpu selected on wakeup. The affinity mask and the 
current value of ->cpu getting out of sync is thus 'normal'.

(Check for example how set_cpus_allowed_ptr() works: we first set 
the new allowed mask, then do we migrate the task away if 
necessary.)

In the kthread_bind() case this is explicitly assumed: it only 
calls do_set_cpus_allowed().

But obviously the bug triggers in kernel/smpboot.c, and that 
assert shows a real bug - and your patch makes the assert go 
away, so the question is, how did the kthread get woken up and 
put on a runqueue without its ->cpu getting set?

One possibility is a generic scheduler bug in ttwu(), resulting 
in ->cpu not getting set properly. If this was the case then 
other places would be blowing up as well, and I don't think we 
are seeing this currently, especially not over such a long 
timespan.

Another possibility would be that kthread_bind()'s assumption 
that the task is inactive is false: if the task activates when we 
think it's blocked and we just hotplug-migrate it away while its 
running (setting its td->cpu?), the assert could trigger I think 
- and the patch would make the assert go away.

A third possibility would be, if this is a freshly created 
thread, some sort of initialization race - either in the kthread 
or in the scheduler code.

Weird.

Thanks,

	Ingo

  parent reply	other threads:[~2014-12-08  8:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-08  3:27 [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!) Anton Blanchard
2014-12-08  3:27 ` Anton Blanchard
2014-12-08  3:27 ` Anton Blanchard
2014-12-08  4:28 ` Linus Torvalds
2014-12-08  4:28   ` Linus Torvalds
2014-12-08  4:28   ` Linus Torvalds
2014-12-08  4:46   ` Anton Blanchard
2014-12-08  4:46     ` Anton Blanchard
2014-12-08  4:46     ` Anton Blanchard
2014-12-08  8:34 ` Ingo Molnar [this message]
2014-12-08  8:34   ` Ingo Molnar
2014-12-08  8:34   ` Ingo Molnar
2014-12-08 10:18   ` Anton Blanchard
2014-12-08 10:18     ` Anton Blanchard
2014-12-08 10:18     ` Anton Blanchard
2014-12-08 23:58     ` [PATCH] powerpc: secondary CPUs signal to master before setting active and online " Anton Blanchard
2014-12-08 23:58       ` Anton Blanchard
2014-12-08 23:58       ` Anton Blanchard
2014-12-09 20:54       ` Linus Torvalds
2014-12-09 20:54         ` Linus Torvalds
2014-12-09 20:54         ` Linus Torvalds
2014-12-10 14:08         ` Thomas Gleixner
2014-12-10 14:08           ` Thomas Gleixner
2014-12-10 14:08           ` Thomas Gleixner
2014-12-10 23:06         ` Michael Ellerman
2014-12-10 23:06           ` Michael Ellerman
2014-12-10 23:06           ` Michael Ellerman
2014-12-08 13:54 ` [PATCH] kthread: kthread_bind fails to enforce CPU affinity " Steven Rostedt
2014-12-08 13:54   ` Steven Rostedt
2014-12-08 13:54   ` Steven Rostedt
2014-12-09  2:24   ` Lai Jiangshan
2014-12-09  2:24     ` Lai Jiangshan
2014-12-09  2:24     ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141208083408.GA8023@gmail.com \
    --to=mingo@kernel.org \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.