linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
To: Andy Lutomirski <luto@amacapital.net>
Cc: mtk.manpages@gmail.com,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	lkml <linux-kernel@vger.kernel.org>,
	"linux-man@vger.kernel.org" <linux-man@vger.kernel.org>,
	Linux Containers <containers@lists.linux-foundation.org>,
	richard -rw- weinberger <richard.weinberger@gmail.com>,
	"Serge E. Hallyn" <serge@hallyn.com>
Subject: Re: For review: user_namespace(7) man page
Date: Tue, 09 Sep 2014 07:00:48 -0700	[thread overview]
Message-ID: <540F0810.7030408@gmail.com> (raw)
In-Reply-To: <CALCETrX2qwvzmeoVcLFLxEK=1Fv+f0Ri0TouzzvbN_rgDjka4A@mail.gmail.com>

Hi Andy, and Eric,

On 09/01/2014 01:57 PM, Andy Lutomirski wrote:
> On Wed, Aug 20, 2014 at 4:36 PM, Michael Kerrisk (man-pages)
> <mtk.manpages@gmail.com> wrote:
>> Hello Eric et al.,
>>
>> For various reasons, my work on the namespaces man pages
>> fell off the table a while back. Nevertheless, the pages have
>> been close to completion for a while now, and I recently restarted,
>> in an effort to finish them. As you also noted to me f2f, there have
>> been recently been some small namespace changes that you may affect
>> the content of the pages. Therefore, I'll take the opportunity to
>> send the namespace-related pages out for further (final?) review.
>>
>> So, here, I start with the user_namespaces(7) page, which is shown
>> in rendered form below, with source attached to this mail. I'll
>> send various other pages in follow-on mails.
>>
>> Review comments/suggestions for improvements / bug fixes welcome.
>>
>> Cheers,
>>
>> Michael
>>
>> ==
>>
>> NAME
>>        user_namespaces - overview of Linux user_namespaces
>>
>> DESCRIPTION
>>        For an overview of namespaces, see namespaces(7).
>>
>>        User   namespaces   isolate   security-related   identifiers  and
>>        attributes, in particular, user IDs and group  IDs  (see  creden‐
>>        tials(7), the root directory, keys (see keyctl(2)), and capabili‐
> 
> Putting "root directory" here is odd -- that's really part of a
> different namespace.  But user namespaces sort of isolate the other
> namespaces from each other.

I'm trying to remember the details here. I think this piece originally 
came after a discussion with Eric, but I am not sure. Eric?

> Also, ugh, keys.  How did keyctl(2) ever make it through any kind of review?
> 
>>        ties (see capabilities(7)).  A process's user and group  IDs  can
>>        be different inside and outside a user namespace.  In particular,
>>        a process can have a normal unprivileged user ID outside  a  user
>>        namespace while at the same time having a user ID of 0 inside the
>>        namespace; in other words, the process has  full  privileges  for
>>        operations  inside  the  user  namespace, but is unprivileged for
>>        operations outside the namespace.
>>
>>    Nested namespaces, namespace membership
>>        User namespaces can be nested;  that  is,  each  user  namespace—
>>        except  the  initial  ("root") namespace—has a parent user names‐
>>        pace, and can have zero or more child user namespaces.  The  par‐
>>        ent user namespace is the user namespace of the process that cre‐
>>        ates the user namespace via a call to unshare(2) or clone(2) with
>>        the CLONE_NEWUSER flag.
>>
>>        The kernel imposes (since version 3.11) a limit of 32 nested lev‐
>>        els of user namespaces.  Calls to  unshare(2)  or  clone(2)  that
>>        would cause this limit to be exceeded fail with the error EUSERS.
>>
>>        Each  process  is  a  member  of  exactly  one user namespace.  A
>>        process created via fork(2) or clone(2) without the CLONE_NEWUSER
>>        flag  is  a  member  of the same user namespace as its parent.  A
>>        process can join another user namespace with setns(2) if  it  has
>>        the  CAP_SYS_ADMIN  in  that namespace; upon doing so, it gains a
>>        full set of capabilities in that namespace.
>>
>>        A call to clone(2) or  unshare(2)  with  the  CLONE_NEWUSER  flag
>>        makes  the  new  child  process (for clone(2)) or the caller (for
>>        unshare(2)) a member of the new user  namespace  created  by  the
>>        call.
>>
>>    Capabilities
>>        The child process created by clone(2) with the CLONE_NEWUSER flag
>>        starts out with a complete set of capabilities in  the  new  user
>>        namespace.  Likewise, a process that creates a new user namespace
>>        using unshare(2)  or  joins  an  existing  user  namespace  using
>>        setns(2)  gains a full set of capabilities in that namespace.  On
>>        the other hand, that process has no capabilities  in  the  parent
>>        (in  the case of clone(2)) or previous (in the case of unshare(2)
>>        and setns(2)) user namespace, even if the new namespace  is  cre‐
>>        ated  or  joined by the root user (i.e., a process with user ID 0
>>        in the root namespace).
>>
>>        Note that a call to execve(2) will cause a process  to  lose  any
>>        capabilities that it has, unless it has a user ID of 0 within the
>>        namespace.
> 
> Or unless file capabilities have a non-empty inheritable mask.
> 
> It may be worth mentioning that execve in a user namespace works
> exactly like execve outside a userns.


I';ve reworded that para to say:

       Note that a call to execve(2) will cause a process's  capabili‐
       ties to be recalculated in the usual way (see capabilities(7)),
       so that usually, unless it has a user ID of 0 within the names‐
       pace or the executable file has a nonempty inheritable capabil‐
       ities mask, it will lose all capabilities.  See the  discussion
       of user and group ID mappings, below.

Okay?

> 
>>            $ cat /proc/$$/uid_map
>>                     0          0 4294967295
>>
>>        This mapping tells us that the range starting at  user  ID  0  in
>>        this namespace maps to a range starting at 0 in the (nonexistent)
>>        parent namespace, and the length of  the  range  is  the  largest
>>        32-bit unsigned integer.
>>
>>    Defining user and group ID mappings: writing to uid_map and gid_map
>>        After  the  creation of a new user namespace, the uid_map file of
>>        one of the processes in the namespace may be written to  once  to
>>        define  the  mapping  of  user IDs in the new user namespace.  An
>>        attempt to write more than once to  a  uid_map  file  in  a  user
>>        namespace  fails  with  the error EPERM.  Similar rules apply for
>>        gid_map files.
>>
>>        The lines written to uid_map (gid_map) must conform to  the  fol‐
>>        lowing rules:
>>
>>        *  The  three  fields  must  be valid numbers, and the last field
>>           must be greater than 0.
>>
>>        *  Lines are terminated by newline characters.
>>
>>        *  There is an (arbitrary) limit on the number of  lines  in  the
>>           file.  As at Linux 3.8, the limit is five lines.  In addition,
>>           the number of bytes written to the file must be less than  the
>>           system page size, and the write must be performed at the start
>>           of the file (i.e., lseek(2) and pwrite(2)  can't  be  used  to
>>           write to nonzero offsets in the file).
>>
>>        *  The  range of user IDs (group IDs) specified in each line can‐
>>           not overlap with the ranges in any other lines.  In  the  ini‐
>>           tial  implementation  (Linux 3.8), this requirement was satis‐
>>           fied by a simplistic implementation that imposed  the  further
>>           requirement  that  the  values  in both field 1 and field 2 of
>>           successive lines must be in ascending numerical  order,  which
>>           prevented some otherwise valid maps from being created.  Linux
>>           3.9 and later fix this limitation, allowing any valid  set  of
>>           nonoverlapping maps.
>>
>>        *  At least one line must be written to the file.
>>
>>        Writes that violate the above rules fail with the error EINVAL.
>>
>>        In  order  for  a  process  to  write  to the /proc/[pid]/uid_map
>>        (/proc/[pid]/gid_map) file, all  of  the  following  requirements
>>        must be met:
>>
>>        1. The  writing  process  must  have  the CAP_SETUID (CAP_SETGID)
>>           capability in the user namespace of the process pid.
> 
> This checked for the opening process (and I don't actually remember
> whether it's checked for the writing process).

Eric, can you comment?

>>
>>        2. The writing process must be in either the  user  namespace  of
>>           the  process  pid  or  inside the parent user namespace of the
>>           process pid.
>>
>>        3. The mapped user IDs (group IDs) must in turn have a mapping in
>>           the parent user namespace.
>>
>>        4. One of the following is true:
>>
>>           *  The  data written to uid_map (gid_map) consists of a single
>>              line that maps the writing  process's  filesystem  user  ID
>>              (group ID) in the parent user namespace to a user ID (group
>>              ID) in the user namespace.  The usual  case  here  is  that
>>              this  single  line  provides  a  mapping for user ID of the
>>              process that created the namespace.
>>
>>           *  The process has the CAP_SETUID (CAP_SETGID)  capability  in
>>              the  parent user namespace.  Thus, a privileged process can
>>              make mappings to arbitrary user IDs (group IDs) in the par‐
>>              ent user namespace.
> 
> The opening process.

Fixed.

> One other thing that could be worth mentioning it: any non-user
> namespace that's created is owned by the user namespace of the process
> that created it at the time of creation.  Actions on those namespaces
> require capabilities in the corresponding user namespace.

I added:

[[
When a non-user-namespace is created,
it is owned by the user namespace in which the creating process
was a member at the time of the creation of the namespace.
Actions on the non-user-namespace
require capabilities in the corresponding user namespace.
]]

> Thanks for doing this!

You're welcome. Thanks for the review!

Cheers,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

  reply	other threads:[~2014-09-09 14:00 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-20 23:36 For review: user_namespace(7) man page Michael Kerrisk (man-pages)
2014-08-22 21:12 ` Serge E. Hallyn
2014-09-01 16:58   ` Michael Kerrisk (man-pages)
2014-08-30 21:53 ` Eric W. Biederman
2014-09-01 17:31   ` Michael Kerrisk (man-pages)
2014-09-02  1:05     ` Eric W. Biederman
2014-09-09 14:00       ` Michael Kerrisk (man-pages)
2014-09-09 16:16         ` Eric W. Biederman
2014-09-11 14:40           ` Michael Kerrisk (man-pages)
2014-09-09 13:59   ` Michael Kerrisk (man-pages)
2014-09-09 15:49     ` Eric W. Biederman
2014-09-11 14:40       ` Michael Kerrisk (man-pages)
2014-09-09 13:59   ` Michael Kerrisk (man-pages)
2014-09-09 15:51     ` Eric W. Biederman
2014-09-11 14:40       ` Michael Kerrisk (man-pages)
2014-09-01 20:57 ` Andy Lutomirski
2014-09-09 14:00   ` Michael Kerrisk (man-pages) [this message]
2014-09-09 16:05     ` Eric W. Biederman
2014-09-09 19:26       ` Andy Lutomirski
2014-09-09 19:39         ` Andy Lutomirski
2014-09-11 14:47         ` Michael Kerrisk (man-pages)
2014-09-11 15:15           ` Andy Lutomirski
2014-09-14  2:58             ` Michael Kerrisk (man-pages)
2014-09-11 14:46       ` Michael Kerrisk (man-pages)
2014-09-11 15:14         ` Andy Lutomirski
2014-09-14  2:42           ` Michael Kerrisk (man-pages)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=540F0810.7030408@gmail.com \
    --to=mtk.manpages@gmail.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=richard.weinberger@gmail.com \
    --cc=serge@hallyn.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).