All of lore.kernel.org
 help / color / mirror / Atom feed
From: Serge Hallyn <serge.hallyn@canonical.com>
To: "Bruno Prémont" <bonbons@linux-vserver.org>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>,
	containers@lists.linux-foundation.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	LXC Development <Lxc-devel@lists.sourceforge.net>
Subject: Re: [RFC] catching sys_reboot syscall
Date: Thu, 11 Aug 2011 13:10:23 -0500	[thread overview]
Message-ID: <20110811181022.GA12307@peqn> (raw)
In-Reply-To: <20110811190456.77ff9280@neptune.home>

Quoting Bruno Prémont (bonbons@linux-vserver.org):
> On Thu, 11 August 2011 Daniel Lezcano <daniel.lezcano@free.fr> wrote:
> > On 08/11/2011 06:30 PM, Bruno Prémont wrote:
> > > On Wed, 10 August 2011 Daniel Lezcano <daniel.lezcano@free.fr> wrote:
> > >> On 08/10/2011 10:10 PM, Bruno Prémont wrote:
> > >>> Hi Daniel,
> > >>>
> > >>> [I'm adding containers ml as we had a discussion there some time ago
> > >>>  for this feature]
> > >> [ ... ]
> > >>
> > >>>> +    if (cmd == LINUX_REBOOT_CMD_RESTART2)
> > >>>> +        if (strncpy_from_user(&buffer[0], arg, sizeof(buffer) - 1) < 0)
> > >>>> +            return -EFAULT;
> > >>>> +
> > >>>> +    /* If we are not in the initial pid namespace, we send a signal
> > >>>> +     * to the parent of this init pid namespace, notifying a shutdown
> > >>>> +     * occured */
> > >>>> +    if (pid_ns != &init_pid_ns)
> > >>>> +        pid_namespace_reboot(pid_ns, cmd, buffer);
> > >>> Should there be a return here?
> > >>> Or does pid_namespace_reboot() never return by submitting signal to
> > >>> parent?
> > >> Yes, it does not return a value, like 'do_notify_parent_cldstop'
> > > So execution flow continues reaching the whole "host reboot code"?
> > >
> > > That's not so good as it then prevents using CAP_SYS_BOOT inside PID namespace
> > > to limit access to rebooting the container from inside as giving a process
> > > inside container CAP_SYS_BOOT would cause host to reboot (and when not given
> > > process inside container would get -EPERM in all cases).
> > >
> > > Wouldn't the following be better?:
> > > ...
> > > +
> > > +    /* We only trust the superuser with rebooting the system. */
> > > +    if (!capable(CAP_SYS_BOOT))
> > > +        return -EPERM;
> > > +
> > > +    /* If we are not in the initial pid namespace, we send a signal
> > > +     * to the parent of this init pid namespace, notifying a shutdown
> > > +     * occured */
> > > +    if (pid_ns != &init_pid_ns) {
> > > +        pid_namespace_reboot(pid_ns, cmd, buffer);
> > > +        return 0;
> > > +    }
> > > +
> > >      mutex_lock(&reboot_mutex);
> > >      switch (cmd) {
> > > ...
> > >
> > >
> > > If I misunderstood, please correct me.
> > 
> > Yep, this is what I did at the beginning but I realized I was closing
> > the door for future applications using the pid namespaces. The pid
> > namespace could be used by another kind of application, not a container,
> > running some administrative tasks so they may want to shutdown the host
> > from a different pid namespace.
> > 
> > For this reason, to prevent this execution flow, the container has to
> > drop the CAP_SYS_BOOT in addition of taking care of the SIGCHLD signal
> > with CLDREBOOT.
> 
> Ok, though for later source code readers to know adding/extending comment
> would be nice.
> Maybe something like
> 
> +    /* If we are not in the initial pid namespace, we send a signal
> +     * to the parent of this init pid namespace, notifying a shutdown
> +     * occured
> +     * NOTE: if process has CAP_SYS_BOOT it will additionally have the
> +     * same effect as if it was not namespaced */
> 
> 
> How would all of this integrate with the ongoing work on user namespaces?
> Maybe that one should later be the differentiator for who may or may not
> trigger the host reboot.

Right, then you'll be able to do:

	if (ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT)) {
		// do container reboot stuff
	}

	if (!capable(CAP_SYS_BOOT)
		return;

	// do host reboot stuff

The first checks whether the task has privilege against the user ns
which owns his pid_ns.

The second one is a synonym for

	if (!ns_capable(&init_user_ns, CAP_SYS_BOOT))

where init_user_ns is the owner of all physical resources.

Right now pid_ns doesn't yet have a user_ns owner (except in my patch over
here: http://kernel.ubuntu.com/git?p=serge/linux-syslogns.git;a=commit;h=63556e9a39bcd75ec4a88333425800905013c73e )
and, if it did, well you can't yet do enough in a user namespace to run
a container.  But that'll be the ideal.  Hopefully soon...

-serge

  parent reply	other threads:[~2011-08-11 18:10 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-08 21:14 [RFC] catching sys_reboot syscall Daniel Lezcano
     [not found] ` <4E4051A0.8030009-GANU6spQydw@public.gmane.org>
2011-08-10 20:10   ` Bruno Prémont
2011-08-10 20:10     ` Bruno Prémont
2011-08-10 20:49     ` Daniel Lezcano
     [not found]       ` <4E42EEE3.9050608-GANU6spQydw@public.gmane.org>
2011-08-11 16:30         ` Bruno Prémont
2011-08-11 16:30       ` Bruno Prémont
     [not found]         ` <20110811183027.49275b2d-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
2011-08-11 16:49           ` Daniel Lezcano
2011-08-11 16:49         ` Daniel Lezcano
2011-08-11 17:04           ` Bruno Prémont
     [not found]             ` <20110811190456.77ff9280-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
2011-08-11 18:10               ` [lxc-devel] " Daniel Lezcano
2011-08-11 18:10               ` Serge Hallyn
2011-08-11 18:10             ` [lxc-devel] " Daniel Lezcano
2011-08-11 18:10             ` Serge Hallyn [this message]
2011-08-11 18:40               ` [PATCH] add pid->user_ns Serge Hallyn
2011-08-11 18:40               ` Serge Hallyn
     [not found]           ` <4E44082F.6040606-GANU6spQydw@public.gmane.org>
2011-08-11 17:04             ` [RFC] catching sys_reboot syscall Bruno Prémont
     [not found]     ` <20110810221028.2e0c8590-hY15tx4IgV39zxVx7UNMDg@public.gmane.org>
2011-08-10 20:49       ` Daniel Lezcano
2011-08-20 11:03 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110811181022.GA12307@peqn \
    --to=serge.hallyn@canonical.com \
    --cc=Lxc-devel@lists.sourceforge.net \
    --cc=bonbons@linux-vserver.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=daniel.lezcano@free.fr \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.