"Regression" with cd3d09527537

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-kernel
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: "Regression" with cd3d09527537
Date: Tue, 26 Jun 2012 16:43:03 +0400	[thread overview]
Message-ID: <4FE9AE57.4090007@parallels.com> (raw)

Hi,

I've recently started seeing a lockdep warning at the end of *every* 
"init 0" issued in my machine. Actually, reboots are fine, and that's 
probably why I've never seen it earlier. The log is quite extensively, 
but shows the following dependency chain:

[   83.982111] -> #4 (cpu_hotplug.lock){+.+.+.}:
[...]
[   83.982111] -> #3 (jump_label_mutex){+.+...}:
[...]
[   83.982111] -> #2 (sk_lock-AF_INET){+.+.+.}:
[...]
[   83.982111] -> #1 (&sig->cred_guard_mutex){+.+.+.}:
[...]
[   83.982111] -> #0 (cgroup_mutex){+.+.+.}:

I've recently fixed bugs with the lock ordering imposed by cpusets
on cpu_hotplug.lock through jump_label_mutex, and initially thought it 
to be the same kind of issue. But that was not the case.

I've omitted the full backtrace for readability, but I run this with all 
cgroups disabled but the cpuset, so it can't be sock memcg (after my 
initial reaction of "oh, fuck, not again"). That jump_label is there for 
years, and it comes from the code that disables socket timestamps.
(net_enable_timestamp)

After a couple of days of extensive debugging, with git bisect failing 
to pinpoint a culprit, I've got to that patch
"cgroup: always lock threadgroup during migration" as the one that would
trigger the bug.

The problem is, what this patch does is start calling threadgroup_lock
everytime, instead of conditionally. In that sense, it of course did not 
create the bug, only made it (fortunately) always visible.

Thing is, I honestly don't know what would be a fix for this bug.
We could hold the threadgroup_lock before the cgroup_lock, but that 
would hold it for way too long.

This is just another incarnation of the cgroup_lock creating nasty 
dependencies with virtually everything else, because we hold it for 
everything we do. I fear we'll fix this, and another one will just wake 
up any time.

What do you think, Tejun?

WARNING: multiple messages have this Message-ID (diff)

From: Glauber Costa <glommer@parallels.com>
To: Tejun Heo <tj@kernel.org>, Cgroups <cgroups@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: "Regression" with cd3d09527537
Date: Tue, 26 Jun 2012 16:43:03 +0400	[thread overview]
Message-ID: <4FE9AE57.4090007@parallels.com> (raw)

Hi,

I've recently started seeing a lockdep warning at the end of *every* 
"init 0" issued in my machine. Actually, reboots are fine, and that's 
probably why I've never seen it earlier. The log is quite extensively, 
but shows the following dependency chain:

[   83.982111] -> #4 (cpu_hotplug.lock){+.+.+.}:
[...]
[   83.982111] -> #3 (jump_label_mutex){+.+...}:
[...]
[   83.982111] -> #2 (sk_lock-AF_INET){+.+.+.}:
[...]
[   83.982111] -> #1 (&sig->cred_guard_mutex){+.+.+.}:
[...]
[   83.982111] -> #0 (cgroup_mutex){+.+.+.}:

I've recently fixed bugs with the lock ordering imposed by cpusets
on cpu_hotplug.lock through jump_label_mutex, and initially thought it 
to be the same kind of issue. But that was not the case.

I've omitted the full backtrace for readability, but I run this with all 
cgroups disabled but the cpuset, so it can't be sock memcg (after my 
initial reaction of "oh, fuck, not again"). That jump_label is there for 
years, and it comes from the code that disables socket timestamps.
(net_enable_timestamp)

After a couple of days of extensive debugging, with git bisect failing 
to pinpoint a culprit, I've got to that patch
"cgroup: always lock threadgroup during migration" as the one that would
trigger the bug.

The problem is, what this patch does is start calling threadgroup_lock
everytime, instead of conditionally. In that sense, it of course did not 
create the bug, only made it (fortunately) always visible.

Thing is, I honestly don't know what would be a fix for this bug.
We could hold the threadgroup_lock before the cgroup_lock, but that 
would hold it for way too long.

This is just another incarnation of the cgroup_lock creating nasty 
dependencies with virtually everything else, because we hold it for 
everything we do. I fear we'll fix this, and another one will just wake 
up any time.

What do you think, Tejun?

next             reply	other threads:[~2012-06-26 12:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-26 12:43 Glauber Costa [this message]
2012-06-26 12:43 ` "Regression" with cd3d09527537 Glauber Costa
     [not found] ` <4FE9AE57.4090007-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-06-27 23:08   ` Tejun Heo
2012-06-27 23:08     ` Tejun Heo
     [not found]     ` <20120627230823.GU15811-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2012-06-27 23:07       ` Glauber Costa
2012-06-27 23:07         ` Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FE9AE57.4090007@parallels.com \
    --to=glommer-bzqdu9zft3wakbo8gow8eq@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.