Re: [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Max Krasnyansky <maxk@qualcomm.com>
To: Paul Menage <menage@google.com>
Cc: a.p.zijlstra@chello.nl, pj@sgi.com, vegard.nossum@gmail.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock
Date: Tue, 01 Jul 2008 20:55:31 -0700	[thread overview]
Message-ID: <486AFC33.7010901@qualcomm.com> (raw)
In-Reply-To: <6599ad830806271702y2c1df2edh3a75dda75336ddad@mail.gmail.com>

Hi Paul,

Sorry for the delay.

Paul Menage wrote:
> On Fri, Jun 27, 2008 at 12:59 AM, Paul Menage <menage@google.com> wrote:
>> This patch adds a hierarchy_mutex to the cgroup_subsys object that
>> protects changes to the hierarchy observed by that subsystem. It is
>> taken by the cgroup subsystem for the following operations:
>>
>> - linking a cgroup into that subsystem's cgroup tree
>> - unlinking a cgroup from that subsystem's cgroup tree
>> - moving the subsystem to/from a hierarchy (including across the
>>  bind() callback)
>>
>> Thus if the subsystem holds its own hierarchy_mutex, it can safely
>> traverse its ts own hierarchy.
>>
>
> It struck me that there's a small race in this code now that we're not
> doing cgroup_lock() in the hotplug path.
>
> - we start to attach a task T to cpuset C, with a single CPU "X" in
> its "cpus" list
> - cpuset_can_attach() returns successfully since the cpuset has a cpu
> - CPU X gets hot-unplugged; any tasks in C are moved to their parent
> cpuset and C loses its cpu.
> - we update T->cgroups to point to C, which is broken since C has no cpus.
>
> So we'll need some additional locking work on top of this - but I
> think this patch is still a step in the right direction.
I was about to say "yeah, looks good" and then tried a couple of
different hot-plug scenarious.
We still have circular locking even with your patch

[ INFO: possible circular locking dependency detected ]
2.6.26-rc8 #4
-------------------------------------------------------
bash/2779 is trying to acquire lock:
 (&cpu_hotplug.lock){--..}, at: [<ffffffff8025e024>] get_online_cpus+0x24/0x40

but task is already holding lock:
 (sched_domains_mutex){--..}, at: [<ffffffff8022f5e9>]
partition_sched_domains+0x29/0x2b0

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (sched_domains_mutex){--..}:
       [<ffffffff8025988f>] __lock_acquire+0x9cf/0xe50
       [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
       [<ffffffff804d12c4>] mutex_lock_nested+0x94/0x250
       [<ffffffff8022f5e9>] partition_sched_domains+0x29/0x2b0
       [<ffffffff80268f9d>] rebuild_sched_domains+0x9d/0x3f0
       [<ffffffff80269f05>] cpuset_handle_cpuhp+0x205/0x220
       [<ffffffff804d688f>] notifier_call_chain+0x3f/0x80
       [<ffffffff80250679>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff804c1748>] _cpu_down+0xa8/0x290
       [<ffffffff804c196b>] cpu_down+0x3b/0x60
       [<ffffffff804c2c68>] store_online+0x48/0xa0
       [<ffffffff803a46c4>] sysdev_store+0x24/0x30
       [<ffffffff802eebba>] sysfs_write_file+0xca/0x140
       [<ffffffff8029cb3b>] vfs_write+0xcb/0x170
       [<ffffffff8029ccd0>] sys_write+0x50/0x90
       [<ffffffff8020b92b>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 (&ss->hierarchy_mutex){--..}:
       [<ffffffff8025988f>] __lock_acquire+0x9cf/0xe50
       [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
       [<ffffffff804d12c4>] mutex_lock_nested+0x94/0x250
       [<ffffffff80269d39>] cpuset_handle_cpuhp+0x39/0x220
       [<ffffffff804d688f>] notifier_call_chain+0x3f/0x80
       [<ffffffff80250679>] __raw_notifier_call_chain+0x9/0x10
       [<ffffffff804c1748>] _cpu_down+0xa8/0x290
       [<ffffffff804c196b>] cpu_down+0x3b/0x60
       [<ffffffff804c2c68>] store_online+0x48/0xa0
       [<ffffffff803a46c4>] sysdev_store+0x24/0x30
       [<ffffffff802eebba>] sysfs_write_file+0xca/0x140
       [<ffffffff8029cb3b>] vfs_write+0xcb/0x170
       [<ffffffff8029ccd0>] sys_write+0x50/0x90
       [<ffffffff8020b92b>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&cpu_hotplug.lock){--..}:
       [<ffffffff80259913>] __lock_acquire+0xa53/0xe50
       [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
       [<ffffffff804d12c4>] mutex_lock_nested+0x94/0x250
       [<ffffffff8025e024>] get_online_cpus+0x24/0x40
       [<ffffffff8022fee1>] sched_getaffinity+0x11/0x80
       [<ffffffff8026e6d9>] __synchronize_sched+0x19/0x90
       [<ffffffff8022ed46>] detach_destroy_domains+0x46/0x50
       [<ffffffff8022f6b9>] partition_sched_domains+0xf9/0x2b0
       [<ffffffff80268f9d>] rebuild_sched_domains+0x9d/0x3f0
       [<ffffffff8026a858>] cpuset_common_file_write+0x2b8/0x5c0
       [<ffffffff8026657c>] cgroup_file_write+0x7c/0x1a0
       [<ffffffff8029cb3b>] vfs_write+0xcb/0x170
       [<ffffffff8029ccd0>] sys_write+0x50/0x90
       [<ffffffff8020b92b>] system_call_after_swapgs+0x7b/0x80
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

2 locks held by bash/2779:
 #0:  (cgroup_mutex){--..}, at: [<ffffffff802653e2>] cgroup_lock+0x12/0x20
 #1:  (sched_domains_mutex){--..}, at: [<ffffffff8022f5e9>]
partition_sched_domains+0x29/0x2b0

stack backtrace:
Pid: 2779, comm: bash Not tainted 2.6.26-rc8 #4

Call Trace:
 [<ffffffff80258c0c>] print_circular_bug_tail+0x8c/0x90
 [<ffffffff802589c4>] ? print_circular_bug_entry+0x54/0x60
 [<ffffffff80259913>] __lock_acquire+0xa53/0xe50
 [<ffffffff8025e024>] ? get_online_cpus+0x24/0x40
 [<ffffffff80259d6b>] lock_acquire+0x5b/0x80
 [<ffffffff8025e024>] ? get_online_cpus+0x24/0x40
 [<ffffffff804d12c4>] mutex_lock_nested+0x94/0x250
 [<ffffffff8025867d>] ? mark_held_locks+0x4d/0x90
 [<ffffffff8025e024>] get_online_cpus+0x24/0x40
 [<ffffffff8022fee1>] sched_getaffinity+0x11/0x80
 [<ffffffff8026e6d9>] __synchronize_sched+0x19/0x90
 [<ffffffff8022ed46>] detach_destroy_domains+0x46/0x50
 [<ffffffff8022f6b9>] partition_sched_domains+0xf9/0x2b0
 [<ffffffff80258801>] ? trace_hardirqs_on+0xc1/0xe0
 [<ffffffff80268f9d>] rebuild_sched_domains+0x9d/0x3f0
 [<ffffffff8026a858>] cpuset_common_file_write+0x2b8/0x5c0
 [<ffffffff80268c00>] ? cpuset_test_cpumask+0x0/0x20
 [<ffffffff80269f20>] ? cpuset_change_cpumask+0x0/0x20
 [<ffffffff80265260>] ? started_after+0x0/0x50
 [<ffffffff8026657c>] cgroup_file_write+0x7c/0x1a0
 [<ffffffff8029cb3b>] vfs_write+0xcb/0x170
 [<ffffffff8029ccd0>] sys_write+0x50/0x90
 [<ffffffff8020b92b>] system_call_after_swapgs+0x7b/0x80

CPU3 attaching NULL sched-domain.

next prev parent reply	other threads:[~2008-07-02  3:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-27  7:59 [RFC][PATCH] CGroups: Add a per-subsystem hierarchy lock Paul Menage
2008-06-28  0:02 ` Paul Menage
2008-07-02  3:55   ` Max Krasnyansky [this message]
2008-07-02 22:31     ` Paul Menage
2008-07-03  4:46       ` Max Krasnyansky
2008-07-10 17:26         ` Max Krasnyansky
2008-07-11  0:32         ` Paul Menage
2008-07-12 22:48           ` Max Krasnyansky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=486AFC33.7010901@qualcomm.com \
    --to=maxk@qualcomm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=pj@sgi.com \
    --cc=vegard.nossum@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.