From: Glauber Costa <glommer@parallels.com>
To: Tejun Heo <tj@kernel.org>, Cgroups <cgroups@vger.kernel.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: "Regression" with cd3d09527537
Date: Tue, 26 Jun 2012 16:43:03 +0400 [thread overview]
Message-ID: <4FE9AE57.4090007@parallels.com> (raw)
Hi,
I've recently started seeing a lockdep warning at the end of *every*
"init 0" issued in my machine. Actually, reboots are fine, and that's
probably why I've never seen it earlier. The log is quite extensively,
but shows the following dependency chain:
[ 83.982111] -> #4 (cpu_hotplug.lock){+.+.+.}:
[...]
[ 83.982111] -> #3 (jump_label_mutex){+.+...}:
[...]
[ 83.982111] -> #2 (sk_lock-AF_INET){+.+.+.}:
[...]
[ 83.982111] -> #1 (&sig->cred_guard_mutex){+.+.+.}:
[...]
[ 83.982111] -> #0 (cgroup_mutex){+.+.+.}:
I've recently fixed bugs with the lock ordering imposed by cpusets
on cpu_hotplug.lock through jump_label_mutex, and initially thought it
to be the same kind of issue. But that was not the case.
I've omitted the full backtrace for readability, but I run this with all
cgroups disabled but the cpuset, so it can't be sock memcg (after my
initial reaction of "oh, fuck, not again"). That jump_label is there for
years, and it comes from the code that disables socket timestamps.
(net_enable_timestamp)
After a couple of days of extensive debugging, with git bisect failing
to pinpoint a culprit, I've got to that patch
"cgroup: always lock threadgroup during migration" as the one that would
trigger the bug.
The problem is, what this patch does is start calling threadgroup_lock
everytime, instead of conditionally. In that sense, it of course did not
create the bug, only made it (fortunately) always visible.
Thing is, I honestly don't know what would be a fix for this bug.
We could hold the threadgroup_lock before the cgroup_lock, but that
would hold it for way too long.
This is just another incarnation of the cgroup_lock creating nasty
dependencies with virtually everything else, because we hold it for
everything we do. I fear we'll fix this, and another one will just wake
up any time.
What do you think, Tejun?
next reply other threads:[~2012-06-26 12:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-26 12:43 Glauber Costa [this message]
2012-06-27 23:08 ` "Regression" with cd3d09527537 Tejun Heo
2012-06-27 23:07 ` Glauber Costa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FE9AE57.4090007@parallels.com \
--to=glommer@parallels.com \
--cc=cgroups@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).