Re: [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Max Krasnyansky <maxk@qualcomm.com>
To: Paul Menage <menage@google.com>
Cc: a.p.zijlstra@chello.nl, pj@sgi.com, vegard.nossum@gmail.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock
Date: Tue, 01 Jul 2008 20:55:31 -0700	[thread overview]
Message-ID: <486AFC33.7010901@qualcomm.com> (raw)
In-Reply-To: <6599ad830806271702y2c1df2edh3a75dda75336ddad@mail.gmail.com>

Hi Paul,

Sorry for the delay.

Paul Menage wrote:
> On Fri, Jun 27, 2008 at 12:59 AM, Paul Menage <menage@google.com> wrote:
>> This patch adds a hierarchy_mutex to the cgroup_subsys object that
>> protects changes to the hierarchy observed by that subsystem. It is
>> taken by the cgroup subsystem for the following operations:
>>
>> - linking a cgroup into that subsystem's cgroup tree
>> - unlinking a cgroup from that subsystem's cgroup tree
>> - moving the subsystem to/from a hierarchy (including across the
>>  bind() callback)
>>
>> Thus if the subsystem holds its own hierarchy_mutex, it can safely
>> traverse its ts own hierarchy.
>>
>
> It struck me that there's a small race in this code now that we're not
> doing cgroup_lock() in the hotplug path.
>
> - we start to attach a task T to cpuset C, with a single CPU "X" in
> its "cpus" list
> - cpuset_can_attach() returns successfully since the cpuset has a cpu
> - CPU X gets hot-unplugged; any tasks in C are moved to their parent
> cpuset and C loses its cpu.
> - we update T->cgroups to point to C, which is broken since C has no cpus.
>
> So we'll need some additional locking work on top of this - but I
> think this patch is still a step in the right direction.
I was about to say "yeah, looks good" and then tried a couple of
different hot-plug scenarious.
We still have circular locking even with your patch

[ INFO: possible circular locking dependency detected ]
2.6.26-rc8 #4
-------------------------------------------------------
bash/2779 is trying to acquire lock:
 (&cpu_hotplug.lock){--..}, at: [<ffffffff8025e024>] get_online_cpus+0x24/0x40

but task is already holding lock:
 (sched_domains_mutex){--..}, at: [<ffffffff8022f5e9>]
partition_sched_domains+0x29/0x2b0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (sched_domains_mutex){--..}:
       [<ffffffff8025988f>] __lock_acquire+0x9cf/0xe50
       [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
       [<ffffffff804d12c4>] mutex_lock_nested+0x94/0x250
       [<ffffffff8022f5e9>] partition_sched_domains+0x29/0x2b0
       [<ffffffff80268f9d>] rebuild_sched_domains+0x9d/0x3f0
       [<ffffffff80269f05>] cpuset_handle_cpuhp+0x205/0x220
       [<ffffffff804d688f>] notifier_call_chain+0x3f/0x80
       [<ffffffff80250679>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff804c1748>] _cpu_down+0xa8/0x290
       [<ffffffff804c196b>] cpu_down+0x3b/0x60
       [<ffffffff804c2c68>] store_online+0x48/0xa0
       [<ffffffff803a46c4>] sysdev_store+0x24/0x30
       [<ffffffff802eebba>] sysfs_write_file+0xca/0x140
       [<ffffffff8029cb3b>] vfs_write+0xcb/0x170
       [<ffffffff8029ccd0>] sys_write+0x50/0x90
       [<ffffffff8020b92b>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 (&ss->hierarchy_mutex){--..}:
       [<ffffffff8025988f>] __lock_acquire+0x9cf/0xe50
       [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
       [<ffffffff804d12c4>] mutex_lock_nested+0x94/0x250
       [<ffffffff80269d39>] cpuset_handle_cpuhp+0x39/0x220
       [<ffffffff804d688f>] notifier_call_chain+0x3f/0x80
       [<ffffffff80250679>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff804c1748>] _cpu_down+0xa8/0x290
       [<ffffffff804c196b>] cpu_down+0x3b/0x60
       [<ffffffff804c2c68>] store_online+0x48/0xa0
       [<ffffffff803a46c4>] sysdev_store+0x24/0x30
       [<ffffffff802eebba>] sysfs_write_file+0xca/0x140
       [<ffffffff8029cb3b>] vfs_write+0xcb/0x170
       [<ffffffff8029ccd0>] sys_write+0x50/0x90
       [<ffffffff8020b92b>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&cpu_hotplug.lock){--..}:
       [<ffffffff80259913>] __lock_acquire+0xa53/0xe50
       [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
       [<ffffffff804d12c4>] mutex_lock_nested+0x94/0x250
       [<ffffffff8025e024>] get_online_cpus+0x24/0x40
       [<ffffffff8022fee1>] sched_getaffinity+0x11/0x80
       [<ffffffff8026e6d9>] __synchronize_sched+0x19/0x90
       [<ffffffff8022ed46>] detach_destroy_domains+0x46/0x50
       [<ffffffff8022f6b9>] partition_sched_domains+0xf9/0x2b0
       [<ffffffff80268f9d>] rebuild_sched_domains+0x9d/0x3f0
       [<ffffffff8026a858>] cpuset_common_file_write+0x2b8/0x5c0
       [<ffffffff8026657c>] cgroup_file_write+0x7c/0x1a0
       [<ffffffff8029cb3b>] vfs_write+0xcb/0x170
       [<ffffffff8029ccd0>] sys_write+0x50/0x90
       [<ffffffff8020b92b>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

2 locks held by bash/2779:
 #0:  (cgroup_mutex){--..}, at: [<ffffffff802653e2>] cgroup_lock+0x12/0x20
 #1:  (sched_domains_mutex){--..}, at: [<ffffffff8022f5e9>]
partition_sched_domains+0x29/0x2b0

stack backtrace:
Pid: 2779, comm: bash Not tainted 2.6.26-rc8 #4

Call Trace:
 [<ffffffff80258c0c>] print_circular_bug_tail+0x8c/0x90
 [<ffffffff802589c4>] ? print_circular_bug_entry+0x54/0x60
 [<ffffffff80259913>] __lock_acquire+0xa53/0xe50
 [<ffffffff8025e024>] ? get_online_cpus+0x24/0x40
 [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
 [<ffffffff8025e024>] ? get_online_cpus+0x24/0x40
 [<ffffffff804d12c4>] mutex_lock_nested+0x94/0x250
 [<ffffffff8025867d>] ? mark_held_locks+0x4d/0x90
 [<ffffffff8025e024>] get_online_cpus+0x24/0x40
 [<ffffffff8022fee1>] sched_getaffinity+0x11/0x80
 [<ffffffff8026e6d9>] __synchronize_sched+0x19/0x90
 [<ffffffff8022ed46>] detach_destroy_domains+0x46/0x50
 [<ffffffff8022f6b9>] partition_sched_domains+0xf9/0x2b0
 [<ffffffff80258801>] ? trace_hardirqs_on+0xc1/0xe0
 [<ffffffff80268f9d>] rebuild_sched_domains+0x9d/0x3f0
 [<ffffffff8026a858>] cpuset_common_file_write+0x2b8/0x5c0
 [<ffffffff80268c00>] ? cpuset_test_cpumask+0x0/0x20
 [<ffffffff80269f20>] ? cpuset_change_cpumask+0x0/0x20
 [<ffffffff80265260>] ? started_after+0x0/0x50
 [<ffffffff8026657c>] cgroup_file_write+0x7c/0x1a0
 [<ffffffff8029cb3b>] vfs_write+0xcb/0x170
 [<ffffffff8029ccd0>] sys_write+0x50/0x90
 [<ffffffff8020b92b>] system_call_after_swapgs+0x7b/0x80

CPU3 attaching NULL sched-domain.

next prev parent reply	other threads:[~2008-07-02  3:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-27  7:59 [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock Paul Menage
2008-06-28  0:02 ` Paul Menage
2008-07-02  3:55   ` Max Krasnyansky [this message]
2008-07-02 22:31     ` Paul Menage
2008-07-03  4:46       ` Max Krasnyansky
2008-07-10 17:26         ` Max Krasnyansky
2008-07-11  0:32         ` Paul Menage
2008-07-12 22:48           ` Max Krasnyansky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=486AFC33.7010901@qualcomm.com \
    --to=maxk@qualcomm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=pj@sgi.com \
    --cc=vegard.nossum@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox