From: Anton Blanchard <anton@samba.org>
To: lkp@lists.01.org
Subject: Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
Date: Mon, 08 Dec 2014 21:18:59 +1100 [thread overview]
Message-ID: <20141208211859.6e81ec81@kryten> (raw)
In-Reply-To: <20141208083408.GA8023@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1366 bytes --]
Hi Ingo,
> So we cannot call set_task_cpu() because in the normal life time
> of a task the ->cpu value gets set on wakeup. So if a task is
> blocked right now, and its affinity changes, it ought to get a
> correct ->cpu selected on wakeup. The affinity mask and the
> current value of ->cpu getting out of sync is thus 'normal'.
>
> (Check for example how set_cpus_allowed_ptr() works: we first set
> the new allowed mask, then do we migrate the task away if
> necessary.)
>
> In the kthread_bind() case this is explicitly assumed: it only
> calls do_set_cpus_allowed().
>
> But obviously the bug triggers in kernel/smpboot.c, and that
> assert shows a real bug - and your patch makes the assert go
> away, so the question is, how did the kthread get woken up and
> put on a runqueue without its ->cpu getting set?
I started going down this line earlier today, and found things like:
select_task_rq_fair:
if (p->nr_cpus_allowed == 1)
return prev_cpu;
I tried returning cpumask_first(tsk_cpus_allowed()) instead, and while
I couldn't hit the BUG I did manage to get a scheduler lockup during
testing.
At that point I thought the previous task_cpu() was somewhat ingrained
in the scheduler and came up with the patch. If not, we could go on a
hunt to see what else needs fixing.
Anton
WARNING: multiple messages have this Message-ID (diff)
From: Anton Blanchard <anton@samba.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: yuyang.du@intel.com, computersforpeace@gmail.com,
peterz@infradead.org, lkp@01.org, rafael.j.wysocki@intel.com,
yuanhan.liu@linux.intel.com, rostedt@goodmis.org,
linux-kernel@vger.kernel.org, bsegall@google.com,
linuxppc-dev@lists.ozlabs.org, mingo@redhat.com, sp@datera.io,
daniel@numascale.com, tj@kernel.org, subbaram@codeaurora.org,
akpm@linux-foundation.org, fengguang.wu@intel.com,
torvalds@linux-foundation.org, tglx@linutronix.de,
pjt@google.com
Subject: Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
Date: Mon, 8 Dec 2014 21:18:59 +1100 [thread overview]
Message-ID: <20141208211859.6e81ec81@kryten> (raw)
In-Reply-To: <20141208083408.GA8023@gmail.com>
Hi Ingo,
> So we cannot call set_task_cpu() because in the normal life time
> of a task the ->cpu value gets set on wakeup. So if a task is
> blocked right now, and its affinity changes, it ought to get a
> correct ->cpu selected on wakeup. The affinity mask and the
> current value of ->cpu getting out of sync is thus 'normal'.
>
> (Check for example how set_cpus_allowed_ptr() works: we first set
> the new allowed mask, then do we migrate the task away if
> necessary.)
>
> In the kthread_bind() case this is explicitly assumed: it only
> calls do_set_cpus_allowed().
>
> But obviously the bug triggers in kernel/smpboot.c, and that
> assert shows a real bug - and your patch makes the assert go
> away, so the question is, how did the kthread get woken up and
> put on a runqueue without its ->cpu getting set?
I started going down this line earlier today, and found things like:
select_task_rq_fair:
if (p->nr_cpus_allowed == 1)
return prev_cpu;
I tried returning cpumask_first(tsk_cpus_allowed()) instead, and while
I couldn't hit the BUG I did manage to get a scheduler lockup during
testing.
At that point I thought the previous task_cpu() was somewhat ingrained
in the scheduler and came up with the patch. If not, we could go on a
hunt to see what else needs fixing.
Anton
WARNING: multiple messages have this Message-ID (diff)
From: Anton Blanchard <anton@samba.org>
To: Ingo Molnar <mingo@kernel.org>
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com,
rostedt@goodmis.org, tj@kernel.org, fengguang.wu@intel.com,
rafael.j.wysocki@intel.com, yuyang.du@intel.com, lkp@01.org,
yuanhan.liu@linux.intel.com, pjt@google.com, bsegall@google.com,
daniel@numascale.com, subbaram@codeaurora.org,
computersforpeace@gmail.com, sp@datera.io,
linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!)
Date: Mon, 8 Dec 2014 21:18:59 +1100 [thread overview]
Message-ID: <20141208211859.6e81ec81@kryten> (raw)
In-Reply-To: <20141208083408.GA8023@gmail.com>
Hi Ingo,
> So we cannot call set_task_cpu() because in the normal life time
> of a task the ->cpu value gets set on wakeup. So if a task is
> blocked right now, and its affinity changes, it ought to get a
> correct ->cpu selected on wakeup. The affinity mask and the
> current value of ->cpu getting out of sync is thus 'normal'.
>
> (Check for example how set_cpus_allowed_ptr() works: we first set
> the new allowed mask, then do we migrate the task away if
> necessary.)
>
> In the kthread_bind() case this is explicitly assumed: it only
> calls do_set_cpus_allowed().
>
> But obviously the bug triggers in kernel/smpboot.c, and that
> assert shows a real bug - and your patch makes the assert go
> away, so the question is, how did the kthread get woken up and
> put on a runqueue without its ->cpu getting set?
I started going down this line earlier today, and found things like:
select_task_rq_fair:
if (p->nr_cpus_allowed == 1)
return prev_cpu;
I tried returning cpumask_first(tsk_cpus_allowed()) instead, and while
I couldn't hit the BUG I did manage to get a scheduler lockup during
testing.
At that point I thought the previous task_cpu() was somewhat ingrained
in the scheduler and came up with the patch. If not, we could go on a
hunt to see what else needs fixing.
Anton
next prev parent reply other threads:[~2014-12-08 10:18 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-08 3:27 [PATCH] kthread: kthread_bind fails to enforce CPU affinity (fixes kernel BUG at kernel/smpboot.c:134!) Anton Blanchard
2014-12-08 3:27 ` Anton Blanchard
2014-12-08 3:27 ` Anton Blanchard
2014-12-08 4:28 ` Linus Torvalds
2014-12-08 4:28 ` Linus Torvalds
2014-12-08 4:28 ` Linus Torvalds
2014-12-08 4:46 ` Anton Blanchard
2014-12-08 4:46 ` Anton Blanchard
2014-12-08 4:46 ` Anton Blanchard
2014-12-08 8:34 ` Ingo Molnar
2014-12-08 8:34 ` Ingo Molnar
2014-12-08 8:34 ` Ingo Molnar
2014-12-08 10:18 ` Anton Blanchard [this message]
2014-12-08 10:18 ` Anton Blanchard
2014-12-08 10:18 ` Anton Blanchard
2014-12-08 23:58 ` [PATCH] powerpc: secondary CPUs signal to master before setting active and online " Anton Blanchard
2014-12-08 23:58 ` Anton Blanchard
2014-12-08 23:58 ` Anton Blanchard
2014-12-09 20:54 ` Linus Torvalds
2014-12-09 20:54 ` Linus Torvalds
2014-12-09 20:54 ` Linus Torvalds
2014-12-10 14:08 ` Thomas Gleixner
2014-12-10 14:08 ` Thomas Gleixner
2014-12-10 14:08 ` Thomas Gleixner
2014-12-10 23:06 ` Michael Ellerman
2014-12-10 23:06 ` Michael Ellerman
2014-12-10 23:06 ` Michael Ellerman
2014-12-08 13:54 ` [PATCH] kthread: kthread_bind fails to enforce CPU affinity " Steven Rostedt
2014-12-08 13:54 ` Steven Rostedt
2014-12-08 13:54 ` Steven Rostedt
2014-12-09 2:24 ` Lai Jiangshan
2014-12-09 2:24 ` Lai Jiangshan
2014-12-09 2:24 ` Lai Jiangshan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141208211859.6e81ec81@kryten \
--to=anton@samba.org \
--cc=lkp@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.