From: "Serge E. Hallyn" <serue@us.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, containers@lists.osdl.org
Subject: Re: [PATCH 2/3] ipc namespaces: implement support for posix msqueues
Date: Tue, 27 Jan 2009 15:56:08 -0600 [thread overview]
Message-ID: <20090127215608.GA13210@us.ibm.com> (raw)
In-Reply-To: <20090126152839.b35dbb45.akpm@linux-foundation.org>
Quoting Andrew Morton (akpm@linux-foundation.org):
> On Fri, 16 Jan 2009 20:03:32 -0600
> "Serge E. Hallyn" <serue@us.ibm.com> wrote:
>
> > Implement multiple mounts of the mqueue file system, and
> > link it to usage of CLONE_NEWIPC.
> >
> > Each ipc ns has a corresponding mqueuefs superblock. When
> > a user does clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the
> > unshare will cause an internal mount of a new mqueuefs sb
> > linked to the new ipc ns.
> >
> > When a user does 'mount -t mqueue mqueue /dev/mqueue', he
> > mounts the mqueuefs superblock.
> >
> > Posix message queues can be worked with both through the
> > mq_* system calls (see mq_overview(7)), and through the VFS
> > through the mqueue mount. Any usage of mq_open() and friends
> > will work with the acting task's ipc namespace. Any actions
> > through the VFS will work with the mqueuefs in which the
> > file was created. So if a user doesn't remount mqueuefs
> > after unshare(CLONE_NEWIPC), mq_open("/ab") will not be
> > reflected in "ls /dev/mqueue".
> >
> > If task a mounts mqueue for ipc_ns:1, then clones task b with
> > a new ipcns, ipcns:2, and then task a is the last task in
> > ipc_ns:1 to exit, then (1) ipc_ns:1 will be freed, (2) it's
> > superblock will live on until task b umounts the corresponding
> > mqueuefs, and vfs actions will continue to succeed, but (3)
> > sb->s_fs_info will be NULL for the sb corresponding to the
> > deceased ipc_ns:1.
>
> I suppose that stuff like this should find its way into a manpage one
> day. That'll be fun for someone.
Nadia wrote an update for the mq_overview.7 page. But the
semantics have changed a bit since then so I'll need to
revise that.
> > +static int get_sb_single_ns(struct file_system_type *fs_type,
> > + int flags, void *data,
> > + int (*fill_super)(struct super_block *, void *, int),
> > + struct vfsmount *mnt)
> > +{
> > + struct super_block *s;
> > + int error;
> > +
> > + s = sget(fs_type, compare_sb_single_ns, set_sb_single_ns, data);
> > + if (IS_ERR(s))
> > + return PTR_ERR(s);
> > + if (!s->s_root) {
> > + s->s_flags = flags;
> > + error = fill_super(s, data, flags & MS_SILENT ? 1 : 0);
> > + if (error) {
> > + up_write(&s->s_umount);
> > + deactivate_super(s);
> > + return error;
> > + }
> > + s->s_flags |= MS_ACTIVE;
> > + }
> > + do_remount_sb(s, flags, data, 0);
> > + return simple_set_mnt(mnt, s);
> > }
>
> The above doesn't seem specific to mqueue. Is it in the best place?
No, I think there is commonality with devpts and perhaps also proc. Now
that devpts namespaces are upstream i should first move it to using a
more generic helper, then introduce posixmq namespaces as another user.
> > @@ -266,10 +317,12 @@ static void mqueue_delete_inode(struct inode *inode)
> > if (user) {
> > spin_lock(&mq_lock);
> > user->mq_bytes -= mq_bytes;
> > - ipc_ns->mq_queues_count--;
> > + if (ipc_ns)
>
> The reader might be wondering why ipc_ns==NULL is an acceptable state,
> and what that state actually means. This reader is wondering that.
That's actually a central part of how we resolved the dependency
between the mqfs mount (pinned by vfs) and mqns (pinned by nsproxy).
So the mqns pins its mqfs vfsmount, but the mnt->mnt_sb->s_fs_info
is not pinned by the mqns. Rather, s_fs_info is protected by the
mq_lock and either non-NULL and valid, or NULL. The mqns takes
the mq_lock to set it to NULL, and any reader of s_fs_info must
dereference it and pin the mqns under mq_lock.
> > + * put_ipc_ns - drop a reference to an ipc namespace.
> > + * @ns: the namespace to put
> > + *
> > + * If this is the last task in the namespace exiting, and
> > + * it is dropping the refcount to 0, then it can race with
> > + * a task in another ipc namespace but in a mounts namespace
> > + * which has this ipcns's mqueuefs mounted, doing some action
> > + * with one of the mqueuefs files. That can raise the refcount.
> > + * So dropping the refcount, and raising the refcount when
> > + * accessing it through the VFS, are protected with mq_lock.
> > + *
> > + * (Clearly, a task raising the refcount on its own ipc_ns
> > + * needn't take mq_lock since it can't race with the last task
> > + * in the ipcns exiting).
> > + */
> > +void put_ipc_ns(struct ipc_namespace *ns)
> > {
> > - struct ipc_namespace *ns;
> > + if (ns && atomic_dec_and_lock(&ns->count, &mq_lock)) {
> > + mq_clear_sbinfo(ns);
> > + spin_unlock(&mq_lock);
> > + mq_put_mnt(ns);
> > + free_ipc_ns(ns);
> > + }
> > +}
>
> Why are we supporting calls with a NULL arg here?
Hmm, I'm not sure that's necessary. Will look into it.
Every other comment I'll fix in the next version. Thanks
very much.
-serge
next prev parent reply other threads:[~2009-01-27 21:56 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-01-17 2:02 [Patch 0/3] posix mqueue namespace (v14) Serge E. Hallyn
2009-01-17 2:03 ` [PATCH 1/3] mqueue ns: move mqueue_mnt into struct ipc_namespace Serge E. Hallyn
2009-01-26 23:28 ` Andrew Morton
2009-01-17 2:03 ` [PATCH 2/3] ipc namespaces: implement support for posix msqueues Serge E. Hallyn
2009-01-26 23:28 ` Andrew Morton
2009-01-27 21:56 ` Serge E. Hallyn [this message]
2009-01-17 2:05 ` [PATCH 3/3] mqueue namespace: adapt sysctl Serge E. Hallyn
2009-01-26 23:28 ` [Patch 0/3] posix mqueue namespace (v14) Andrew Morton
2009-01-27 21:20 ` Serge E. Hallyn
-- strict thread matches above, loose matches on Subject: below --
2008-12-19 17:19 [RFC patch 0/3] posix mqueue namespace (v13) Serge E. Hallyn
2008-12-19 17:20 ` [PATCH 2/3] ipc namespaces: implement support for posix msqueues Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090127215608.GA13210@us.ibm.com \
--to=serue@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.osdl.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox