From: Nelson Elhage <nelhage@ksplice.com>
To: Paul Menage <menage@google.com>, Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, linux-kernel@vger.kernel.org
Subject: cgroup scheduling: Adding kthreadd to a non-RT cgroup can deadlock the kernel
Date: Tue, 4 Jan 2011 23:54:47 -0500 [thread overview]
Message-ID: <20110105045447.GN23414@ksplice.com> (raw)
Hi,
I've found a bug where, on CONFIG_RT_GROUP_SCHED systems, adding the kthreadd
task to a cgroup with cpu.rt_runtime_us = 0 (as some cgroup configuration
scripts do, when they move all processes into a default cgroup), can result in
deadlocks in the kernel.
On 2.6.37, the problem can be triggered via CPU hotplug. The following sequence
of events will deadlock on an SMP system:
1. Add kthreadd to a cpu cgroup with rt_runtime_us = 0
2. echo 0 > /sys/devices/system/cpu/cpu1/online
3. echo 1 > /sys/devices/system/cpu/cpu1/online
4. echo 0 > /sys/devices/system/cpu/cpu1/online
5. echo 1 > /sys/devices/system/cpu/cpu1/online
In line (3), the CPU hotplug will cause us to create a new ksoftirqd/1
thread. Since that thread is forked from kthreadd, it will end up in the same
cgroup, also without any realtime access.
In step (4), cpu_callback in softirq.c will attempt to kill ksoftirqd by setting
it to SCHED_FIFO and using kthread_stop(). It does this with
'sched_setscheduler_nocheck', which bypasses the usual checks that prevent
setting a process to an SCHED_FIFO if it is in a cgroup that would prevent it
from running.
Thus, ksoftirqd ends up at SCHED_FIFO but with a zero rt_runtime_us, and is
never scheduled again, and kthread_stop blocks waiting on it.
In (5), we try to call the CPU notifier chain again, but it is still locked from
(4), and we deadlock.
For reasons I don't fully understand, just adding ksoftirqd/1 to a cgroup and
then taking CPU 1 offline doesn't result in a hang, so I think there may be some
detail of this situation I don't fully understand, but I'm pretty confident in
the general analysis.
Before 2.6.34, we can trigger a similar problem just by adding kthreadd to a
cgroup and then calling stop_machine (e.g. by removing a module), since
stop_machine created a new RT workqueue on each invocation. This is how I first
found this problem: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/693594
- Nelson
next reply other threads:[~2011-01-05 4:54 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-05 4:54 Nelson Elhage [this message]
2011-01-05 6:01 ` cgroup scheduling: Adding kthreadd to a non-RT cgroup can deadlock the kernel Mike Galbraith
2011-01-05 9:44 ` Peter Zijlstra
2011-01-05 15:02 ` Nelson Elhage
2011-01-05 15:18 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110105045447.GN23414@ksplice.com \
--to=nelhage@ksplice.com \
--cc=a.p.zijlstra@chello.nl \
--cc=linux-kernel@vger.kernel.org \
--cc=lizf@cn.fujitsu.com \
--cc=menage@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.