Re: [RFC] [PATCH 1/5] cgroups: revamp subsys array

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Li Zefan <lizf-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Cc: Ben Blum <bblum-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org>,
	Paul Menage <menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org
Subject: Re: [RFC] [PATCH 1/5] cgroups: revamp subsys array
Date: Thu, 10 Dec 2009 14:00:29 +0800	[thread overview]
Message-ID: <4B208E7D.8020306@cn.fujitsu.com> (raw)
In-Reply-To: <20091210051912.GA11893-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org>

>>> How does this sound as a possible solution, in cgroup_get_sb:
>>>
>>> 1) Take subsys_mutex
>>> 2) Call parse_cgroupfs_options()
>>> 3) Drop subsys_mutex
>>> 4) Call sget(), which gets sb->s_umount without subsys_mutex held
>>> 5) Take subsys_mutex
>>> 6) Call verify_cgroupfs_options()
>>> 7) Proceed as normal
>>>
>>> In which verify_cgroupfs_options will be a new function that ensures the
>>> invariants that rebind_subsystems expects are still there; if not, bail
>>> out by jumping to drop_new_super just as if parse_cgroupfs_options had
>>> failed in the first place.
>>>
>> The current code doesn't need this verify_cgroupfs_options, so why it
>> will become necessary? I think what we need is grab module refcnt in
>> parse_cgroupfs_options, and then we can drop subsys_mutex.
> 
> Oh, good point. I thought pinning the modules had to happen in rebinding
> since there's a case where rebind_subsystems is called without parsing,
> but that's just in kill_sb where no new subsystems are added. So, better
> would be to make sure we can't get owned while we drop the lock instead
> of checking afterwards if we got owned and bailing if so.
> 
>> But why you are using a rw semaphore? I think a mutex is fine.
> 
> The "most of cgroups wants to look at the subsys array" versus "module
> loading/unloading modifies the array" is clearly a readers/writers case.
> 

Yes, but it doesn't mean we should use rw lock or rw semaphore is
preferable than plain mutex.

- the read side of subsys_mutex is mainly at mount/remount/umount,
  the write side is in cgroup_load_subsys() and cgroup_unload_subsys().
  None is in critical path.

- In most callsites, cgroup_mutex is held just after acquiring
  subsys_mutex.

So what does it gain us to use this rw_sem?

>> And why not just use cgroup_mutex to protect the subsys[] array?
>> The adding and spreading of subsys_mutex looks ugly to me.
> 
> The reasoning for this is that there are various chunks of code that
> need to be protected by a mutex guarding subsys[] that aren't already
> under cgroup_mutex - like parse_cgroupfs_options, or the first stage
> of cgroup_load_subsys. Do you think those critical sections are small
> enough that sacrificing reentrancy for simplicity of code is worth it?
> 

Except parse_cgroupfs_options() which is called without cgroup_mutex
held, in all other callsites, cgroup_mutex is held right after acquiring
subsys_mutex.

So yes, I don't think use cgroup_mutex will harm scalibility.

In contrast, this subsys_mutex is quite ugly and deadlock-prone.
For example, see this:

static int cgroup_remount(struct super_block *sb, int *flags, char *data)
{
	...
        lock_kernel();
        mutex_lock(&cgrp->dentry->d_inode->i_mutex);
        down_read(&subsys_mutex);
        mutex_lock(&cgroup_mutex);
	...
}

Four locks here!

WARNING: multiple messages have this Message-ID (diff)

From: Li Zefan <lizf@cn.fujitsu.com>
To: unlisted-recipients:; (no To-header on input)
Cc: linux-kernel@vger.kernel.org,
	containers@lists.linux-foundation.org, akpm@linux-foundation.org,
	Paul Menage <menage@google.com>, Ben Blum <bblum@andrew.cmu.edu>
Subject: Re: [RFC] [PATCH 1/5] cgroups: revamp subsys array
Date: Thu, 10 Dec 2009 14:00:29 +0800	[thread overview]
Message-ID: <4B208E7D.8020306@cn.fujitsu.com> (raw)
In-Reply-To: <20091210051912.GA11893@andrew.cmu.edu>

>>> How does this sound as a possible solution, in cgroup_get_sb:
>>>
>>> 1) Take subsys_mutex
>>> 2) Call parse_cgroupfs_options()
>>> 3) Drop subsys_mutex
>>> 4) Call sget(), which gets sb->s_umount without subsys_mutex held
>>> 5) Take subsys_mutex
>>> 6) Call verify_cgroupfs_options()
>>> 7) Proceed as normal
>>>
>>> In which verify_cgroupfs_options will be a new function that ensures the
>>> invariants that rebind_subsystems expects are still there; if not, bail
>>> out by jumping to drop_new_super just as if parse_cgroupfs_options had
>>> failed in the first place.
>>>
>> The current code doesn't need this verify_cgroupfs_options, so why it
>> will become necessary? I think what we need is grab module refcnt in
>> parse_cgroupfs_options, and then we can drop subsys_mutex.
> 
> Oh, good point. I thought pinning the modules had to happen in rebinding
> since there's a case where rebind_subsystems is called without parsing,
> but that's just in kill_sb where no new subsystems are added. So, better
> would be to make sure we can't get owned while we drop the lock instead
> of checking afterwards if we got owned and bailing if so.
> 
>> But why you are using a rw semaphore? I think a mutex is fine.
> 
> The "most of cgroups wants to look at the subsys array" versus "module
> loading/unloading modifies the array" is clearly a readers/writers case.
> 

Yes, but it doesn't mean we should use rw lock or rw semaphore is
preferable than plain mutex.

- the read side of subsys_mutex is mainly at mount/remount/umount,
  the write side is in cgroup_load_subsys() and cgroup_unload_subsys().
  None is in critical path.

- In most callsites, cgroup_mutex is held just after acquiring
  subsys_mutex.

So what does it gain us to use this rw_sem?

>> And why not just use cgroup_mutex to protect the subsys[] array?
>> The adding and spreading of subsys_mutex looks ugly to me.
> 
> The reasoning for this is that there are various chunks of code that
> need to be protected by a mutex guarding subsys[] that aren't already
> under cgroup_mutex - like parse_cgroupfs_options, or the first stage
> of cgroup_load_subsys. Do you think those critical sections are small
> enough that sacrificing reentrancy for simplicity of code is worth it?
> 

Except parse_cgroupfs_options() which is called without cgroup_mutex
held, in all other callsites, cgroup_mutex is held right after acquiring
subsys_mutex.

So yes, I don't think use cgroup_mutex will harm scalibility.

In contrast, this subsys_mutex is quite ugly and deadlock-prone.
For example, see this:

static int cgroup_remount(struct super_block *sb, int *flags, char *data)
{
	...
        lock_kernel();
        mutex_lock(&cgrp->dentry->d_inode->i_mutex);
        down_read(&subsys_mutex);
        mutex_lock(&cgroup_mutex);
	...
}

Four locks here!

next prev parent reply	other threads:[~2009-12-10  6:00 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-04  8:53 [RFC] [PATCH 0/5] cgroups: support for module-loadable subsystems Ben Blum
2009-12-04  8:55 ` [RFC] [PATCH 1/5] cgroups: revamp subsys array Ben Blum
     [not found]   ` <20091204085508.GA18912-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org>
2009-12-08  7:38     ` Li Zefan
2009-12-08  7:38   ` Li Zefan
     [not found]     ` <4B1E0283.70108-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-12-09  5:50       ` Ben Blum
2009-12-09  8:36       ` Ben Blum
2009-12-09  5:50     ` Ben Blum
2009-12-09  6:07       ` Li Zefan
     [not found]         ` <4B1F3EB9.6080502-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-12-09  6:09           ` Li Zefan
2009-12-09  8:27           ` Ben Blum
2009-12-09  6:09         ` Li Zefan
2009-12-09  8:27         ` Ben Blum
     [not found]           ` <20091209082729.GA14114-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org>
2009-12-10  3:18             ` Li Zefan
2009-12-10  3:18           ` Li Zefan
2009-12-10  5:19             ` Ben Blum
     [not found]               ` <20091210051912.GA11893-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org>
2009-12-10  6:00                 ` Li Zefan [this message]
2009-12-10  6:00                   ` Li Zefan
     [not found]                   ` <4B208E7D.8020306-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-12-10  6:13                     ` Ben Blum
2009-12-10  6:13                   ` Ben Blum
     [not found]             ` <4B20686E.3070907-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-12-10  5:19               ` Ben Blum
     [not found]       ` <20091209055016.GA12342-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org>
2009-12-09  6:07         ` Li Zefan
2009-12-09  8:36     ` Ben Blum
2009-12-04  8:56 ` [RFC] [PATCH 2/5] cgroups: subsystem module loading interface Ben Blum
2009-12-04  8:58 ` [RFC] [PATCH 4/5] cgroups: subsystem module unloading Ben Blum
     [not found] ` <20091204085349.GA18867-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org>
2009-12-04  8:55   ` [RFC] [PATCH 1/5] cgroups: revamp subsys array Ben Blum
2009-12-04  8:56   ` [RFC] [PATCH 2/5] cgroups: subsystem module loading interface Ben Blum
2009-12-04  8:57   ` [RFC] [PATCH 3/5] cgroups: net_cls as module Ben Blum
2009-12-04  8:57     ` Ben Blum
2009-12-08  6:07     ` Li Zefan
     [not found]     ` <20091204085712.GC18912-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org>
2009-12-08  6:07       ` Li Zefan
2009-12-04  8:58   ` [RFC] [PATCH 4/5] cgroups: subsystem module unloading Ben Blum
2009-12-04  8:58   ` [RFC] [PATCH 5/5] cgroups: subsystem dependencies Ben Blum
2009-12-04  8:58 ` Ben Blum
2009-12-08  6:11   ` Li Zefan
2009-12-09  1:08     ` Ben Blum
2009-12-09  1:40       ` Li Zefan
     [not found]       ` <20091209010817.GA8929-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org>
2009-12-09  1:40         ` Li Zefan
     [not found]     ` <4B1DEE09.4010508-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2009-12-09  1:08       ` Ben Blum
     [not found]   ` <20091204085854.GE18912-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org>
2009-12-08  6:11     ` Li Zefan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B208E7D.8020306@cn.fujitsu.com \
    --to=lizf-bthxqxjhjhxqfuhtdcdx3a@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=bblum-OM76b2Iv3yLQjUSlxSEPGw@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=menage-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.