cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "Regression" with cd3d09527537
@ 2012-06-26 12:43 Glauber Costa
       [not found] ` <4FE9AE57.4090007-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Glauber Costa @ 2012-06-26 12:43 UTC (permalink / raw)
  To: Tejun Heo, Cgroups, linux-kernel

Hi,

I've recently started seeing a lockdep warning at the end of *every* 
"init 0" issued in my machine. Actually, reboots are fine, and that's 
probably why I've never seen it earlier. The log is quite extensively, 
but shows the following dependency chain:

[   83.982111] -> #4 (cpu_hotplug.lock){+.+.+.}:
[...]
[   83.982111] -> #3 (jump_label_mutex){+.+...}:
[...]
[   83.982111] -> #2 (sk_lock-AF_INET){+.+.+.}:
[...]
[   83.982111] -> #1 (&sig->cred_guard_mutex){+.+.+.}:
[...]
[   83.982111] -> #0 (cgroup_mutex){+.+.+.}:

I've recently fixed bugs with the lock ordering imposed by cpusets
on cpu_hotplug.lock through jump_label_mutex, and initially thought it 
to be the same kind of issue. But that was not the case.

I've omitted the full backtrace for readability, but I run this with all 
cgroups disabled but the cpuset, so it can't be sock memcg (after my 
initial reaction of "oh, fuck, not again"). That jump_label is there for 
years, and it comes from the code that disables socket timestamps.
(net_enable_timestamp)

After a couple of days of extensive debugging, with git bisect failing 
to pinpoint a culprit, I've got to that patch
"cgroup: always lock threadgroup during migration" as the one that would
trigger the bug.

The problem is, what this patch does is start calling threadgroup_lock
everytime, instead of conditionally. In that sense, it of course did not 
create the bug, only made it (fortunately) always visible.

Thing is, I honestly don't know what would be a fix for this bug.
We could hold the threadgroup_lock before the cgroup_lock, but that 
would hold it for way too long.

This is just another incarnation of the cgroup_lock creating nasty 
dependencies with virtually everything else, because we hold it for 
everything we do. I fear we'll fix this, and another one will just wake 
up any time.

What do you think, Tejun?

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-06-27 23:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-06-26 12:43 "Regression" with cd3d09527537 Glauber Costa
     [not found] ` <4FE9AE57.4090007-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-27 23:08   ` Tejun Heo
     [not found]     ` <20120627230823.GU15811-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-06-27 23:07       ` Glauber Costa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).