linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] namespace man page updates for 3.8
@ 2012-11-26 22:57 Eric W. Biederman
       [not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-11-26 22:57 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: Linux API, Serge E. Hallyn


The following patches document the namespace user namespace, the pid
namespace, the mount namespace changes that are currently sitting in my
for next-next branch of:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git

Except for uid_map and gid_map which should have been documented with
for Linux 3.6 I am a bit early for these changes to be merged, but it
seems a good idea to get the patches out there so things will be
documented and reviewed and thought about in a timely manner.

Eric


 man2/clone.2 |   39 ++++++++++++++++++++++
 man2/setns.2 |   41 +++++++++++++++++++----
 man5/proc.5  |  102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 174 insertions(+), 8 deletions(-)

Eric W. Biederman (4):
      proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
      clone.2:  Describe the user namespace
      proc.5:  Document the proc files for the user, mount, and pid namespaces.
      setns.2: Document the pid, user, and mount namespace support.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
       [not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-11-27  0:46   ` Eric W. Biederman
       [not found]     ` <874nkbrhyv.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-11-27  0:46   ` [PATCH 2/4] clone.2: Describe the user namespace Eric W. Biederman
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-11-27  0:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: Linux API, Linux Containers


Document the user namespace files that report the mapping of uids
and gids between user namespaces.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 man5/proc.5 |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/man5/proc.5 b/man5/proc.5
index fb70d2b..840480d 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -317,6 +317,31 @@ The files in this directory are readable only by the owner of the process.
 .\" .TP
 .\" .IR /proc/[pid]/io " (since kernel 2.6.20)"
 .TP
+.IR /proc/[pid]/gid_map " (since kernel 3.6)"
+This file reports the mapping of gids from the user namespace of the process specified by
+.IR pid
+to the user namespace of the process that opened
+.IR /proc/[pid]/gid_map .
+
+Each line specifies a 1 to 1 mapping of a range of contiguous gids from
+the user namespace of the process specified by
+.IR pid
+to the user namespace of the process that opened
+.IR /proc/[pid]/gid_map.
+
+Each line contains three numbers.  The start of the range of gids in
+the user namespace of the process specifed by
+.IR pid.
+The start of the range of gids in the user namespace of the process that
+opened
+.IR /proc/[pid]/gid_map.
+The number of gids in the range of numbers that is mapped between to two
+user namespaces.
+
+After the creation of a new user namespace this file may be written to
+exactly once to specify the mapping of gids in the new user namespace.
+
+.TP
 .IR /proc/[pid]/limits " (since kernel 2.6.24)"
 This file displays the soft limit, hard limit, and units of measurement
 for each of the process's resource limits (see
@@ -1169,6 +1194,31 @@ directory are not available if the main thread has already terminated
 (typically by calling
 .BR pthread_exit (3)).
 .TP
+.IR /proc/[pid]/uid_map " (since kernel 3.6)"
+This file reports the mapping of uids from the user namespace of the process specified by
+.IR pid
+to the user namespace of the process that opened
+.IR /proc/[pid]/uid_map .
+
+Each line specifies a 1 to 1 mapping of a range of contiguous uids from
+the user namespace of the process specified by
+.IR pid
+to the user namespace of the process that opened
+.IR /proc/[pid]/uid_map.
+
+Each line contains three numbers.  The start of the range of uids in
+the user namespace of the process specifed by
+.IR pid.
+The start of the range of uids in the user namespace of the process that
+opened
+.IR /proc/[pid]/uid_map.
+The number of uids in the range of numbers that is mapped between to two
+user namespaces.
+
+After the creation of a new user namespace this file may be written to
+exactly once to specify the mapping of uids in the new user namespace.
+
+.TP
 .I /proc/apm
 Advanced power management version and battery information when
 .B CONFIG_APM
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/4] clone.2:  Describe the user namespace
       [not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-11-27  0:46   ` [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map Eric W. Biederman
@ 2012-11-27  0:46   ` Eric W. Biederman
       [not found]     ` <87y5hnq3d5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-11-27  0:47   ` [PATCH 3/4] proc.5: Document the proc files for the user, mount, and pid namespaces Eric W. Biederman
  2012-11-27  0:48   ` [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support Eric W. Biederman
  3 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-11-27  0:46 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: Linux API, Serge E. Hallyn, Linux Containers


Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 man2/clone.2 |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/man2/clone.2 b/man2/clone.2
index 0582057..4566677 100644
--- a/man2/clone.2
+++ b/man2/clone.2
@@ -366,6 +366,45 @@ in the same
 .BR clone ()
 call.
 .TP
+.BR CLONE_NEWUSER " (since Linux 3.6)"
+If
+.B CLONE_NEWUSER
+is set, the create the process in a new user namespace.  If this flag is not set, then (as with
+.BR fork (2)),
+the process is created in the same user namespace as the calling process.
+
+A user namespace provides an isolated environment for security related identifiers in particular
+uids, gids, keys (see
+.BR keyctl (2)),
+and capabilities.
+
+When a user namespace is created it initially starts out without a mapping of uids and gids
+to the parent user namespace.  The desired mapping of uids to the parent user namespace
+may be set by writting into  
+.IR /proc/[pid]/uid_map.
+The desired mapping of gids to the parent user namespace may be set by writinng into
+.IR /proc/[pid]/gid_map.
+
+The first process in a user namespace starts out with a complete set of capabilities with
+respect to the new user namespace.  
+
+syscalls that return uids and gids will either return the uid or gid mapped into the current
+user namespace if there is a mapping or depending on the context will return either
+the overflowuid (default 65534) or the overflowgid (default 65534). See
+.IR /proc/sys/kernel/overflowuid, /proc/sys/kernel/overflowgid
+
+As of Linux 3.8 no priviliges are needed to create a user namespace,
+and mount, pid, ipc, net, uts namespaces can be created with just
+CAP_SYS_ADMIN privileges in your current user namespace.
+
+Over the years there have been a lot of features that have been added
+to the linux kernel that are only available to privileged users
+because of their potential to confuse setuid root applications.  In
+general it becomes safe to allow the root user in a user namespace to
+use those features because it is impossible while in a user namespace
+to gain more privilege than the root user of a user namespace has.
+
+.TP
 .BR CLONE_NEWPID " (since Linux 2.6.24)"
 .\" This explanation draws a lot of details from
 .\" http://lwn.net/Articles/259217/
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/4] proc.5:  Document the proc files for the user, mount, and pid namespaces.
       [not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-11-27  0:46   ` [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map Eric W. Biederman
  2012-11-27  0:46   ` [PATCH 2/4] clone.2: Describe the user namespace Eric W. Biederman
@ 2012-11-27  0:47   ` Eric W. Biederman
       [not found]     ` <87pq2zq3b6.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-11-27  0:48   ` [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support Eric W. Biederman
  3 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-11-27  0:47 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: Linux API, Linux Containers


Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 man5/proc.5 |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/man5/proc.5 b/man5/proc.5
index 840480d..eb612b9 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -581,6 +581,58 @@ even if all processes in the namespace terminate.
 The file descriptor can be passed to
 .BR setns (2).
 .TP
+.IR /proc/[pid]/ns/user " (since Linux 3.8)"
+Bind mounting this file (see
+.BR mount (2))
+to somewhere else in the filesystem keeps
+the user namespace of the process specified by
+.I pid
+alive even if all processes currently in the namespace terminate.
+
+Opening this file returns a file handle for the user namespace
+of the process specified by
+.IR pid .
+As long as this file descriptor remains open,
+the user namespace will remain alive,
+even if all processes in the namespace terminate.
+The file descriptor can be passed to
+.BR setns (2).
+.TP
+.IR /proc/[pid]/ns/pid " (since Linux 3.8)"
+Bind mounting this file (see
+.BR mount (2))
+to somewhere else in the filesystem keeps
+the PID namespace of the process specified by
+.I pid
+alive even if all processes currently in the namespace terminate.
+
+Opening this file returns a file handle for the PID namespace
+of the process specified by
+.IR pid .
+As long as this file descriptor remains open,
+the PID namespace will remain alive,
+even if all processes in the namespace terminate.
+The file descriptor can be passed to
+.BR setns (2).
+.TP
+.IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
+Bind mounting this file (see
+.BR mount (2))
+to somewhere else in the filesystem keeps
+the mount namespace of the process specified by
+.I pid
+alive even if all processes currently in the namespace terminate.
+
+Opening this file returns a file handle for the mount namespace
+of the process specified by
+.IR pid .
+As long as this file descriptor remains open,
+the mount namespace will remain alive,
+even if all processes in the namespace terminate.
+The file descriptor can be passed to
+.BR setns (2).
+
+.TP
 .IR /proc/[pid]/numa_maps " (since Linux 2.6.14)"
 See
 .BR numa (7).
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
       [not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
                     ` (2 preceding siblings ...)
  2012-11-27  0:47   ` [PATCH 3/4] proc.5: Document the proc files for the user, mount, and pid namespaces Eric W. Biederman
@ 2012-11-27  0:48   ` Eric W. Biederman
       [not found]     ` <87k3t7q39u.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  3 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-11-27  0:48 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: Linux API, Serge E. Hallyn, Linux Containers


Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 man2/setns.2 |   41 +++++++++++++++++++++++++++++++++--------
 1 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/man2/setns.2 b/man2/setns.2
index 6aa01e1..63b04dc 100644
--- a/man2/setns.2
+++ b/man2/setns.2
@@ -48,6 +48,18 @@ must refer to a network namespace.
 .BR CLONE_NEWUTS
 .I fd
 must refer to a UTS namespace.
+.TP
+.BR CLONE_NEWPID
+.I fd
+must refer to a PID namespace.
+.TP
+.BR CLONE_NEWUSER
+.I fd
+must refer to a user namespace.
+.TP
+.BR CLONE_NEWNS
+.I fd
+must refer to a mount namespace.
 .PP
 Specifying
 .I nstype
@@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
 .IR fd
 if the file descriptor was opened by another process and, for example,
 passed to the caller via a UNIX domain socket.)
+
+The pid namespace is a little different.  Reassociating the calling
+thread with a pid namespace only changes the pid namespace that the
+child processes will be created in.
+
+Changing the pid namespace for child processes is only allowed if the
+pid namespace specified by
+.IR fd
+is a child pid namespace of the pid namespace of the current thread.
+
+A multi-threaded process may not change user namespace with setns.  A
+process may not reassociate the thread with the current user
+namespace.  The process reassociating itself with a user namespace
+must have CAP_SYS_ADMIN privileges in the target user namespace.
+
+A process may not be reassociated with a new mount namespace if it is
+multi-threaded or it does not possess both CAP_SYS_CHROOT privileges
+and CAP_SYS_ADMIN rights over the target mount namespace.
+
 .SH RETURN VALUE
 On success,
 .IR setns ()
@@ -94,7 +125,8 @@ for this operation.
 The
 .BR setns ()
 system call first appeared in Linux in kernel 3.0;
-library support was added to glibc in version 2.14.
+library support was added to glibc in version 2.14;
+Support for PID, user and mount namespaces first appeard in Linux in kernel 3.8.
 .SH CONFORMING TO
 The
 .BR setns ()
@@ -106,13 +138,6 @@ a new thread is created using
 can be changed using
 .BR setns ().
 .SH BUGS
-The PID namespace and the mount namespace are not currently supported.
-(See the descriptions of
-.BR CLONE_NEWPID
-and
-.BR CLONE_NEWNS
-in
-.BR clone (2).)
 .SH SEE ALSO
 .BR clone (2),
 .BR fork (2),
-- 
1.7.5.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
       [not found]     ` <874nkbrhyv.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-27  9:03       ` Michael Kerrisk (man-pages)
       [not found]         ` <CAKgNAkixXmtvQUbwyv=a8mU=gdf-x+w-ou_4N=cNaau+hVoy4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-27  9:03 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Serge E. Hallyn, Linux Containers

Hi Eric,

Thanks for this patch. I have one question and a revised version f the
text that I'd like you to review.

On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> Document the user namespace files that report the mapping of uids
> and gids between user namespaces.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  man5/proc.5 |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 50 insertions(+), 0 deletions(-)
>
> diff --git a/man5/proc.5 b/man5/proc.5
> index fb70d2b..840480d 100644
> --- a/man5/proc.5
> +++ b/man5/proc.5
> @@ -317,6 +317,31 @@ The files in this directory are readable only by the owner of the process.
>  .\" .TP
>  .\" .IR /proc/[pid]/io " (since kernel 2.6.20)"
>  .TP
> +.IR /proc/[pid]/gid_map " (since kernel 3.6)"
> +This file reports the mapping of gids from the user namespace of the process specified by
> +.IR pid
> +to the user namespace of the process that opened
> +.IR /proc/[pid]/gid_map .
> +
> +Each line specifies a 1 to 1 mapping of a range of contiguous gids from
> +the user namespace of the process specified by
> +.IR pid
> +to the user namespace of the process that opened
> +.IR /proc/[pid]/gid_map.

I want to check the above point. What do you mean by "the process that
opened uid_map"? Does that mean the process that opened uid_map to do
the one-time write of the UID map? I had assumed that uid_map actually
provided a mapping between the namespace of 'pid' and the 'parent'
namespace, where the parent namespace is the namespace of the process
that created this namespace via clone(CLONE_NEWUSER).

> +
> +Each line contains three numbers.  The start of the range of gids in
> +the user namespace of the process specifed by
> +.IR pid.
> +The start of the range of gids in the user namespace of the process that
> +opened
> +.IR /proc/[pid]/gid_map.
> +The number of gids in the range of numbers that is mapped between to two
> +user namespaces.
> +
> +After the creation of a new user namespace this file may be written to
> +exactly once to specify the mapping of gids in the new user namespace.
> +
> +.TP
>  .IR /proc/[pid]/limits " (since kernel 2.6.24)"
>  This file displays the soft limit, hard limit, and units of measurement
>  for each of the process's resource limits (see
> @@ -1169,6 +1194,31 @@ directory are not available if the main thread has already terminated
>  (typically by calling
>  .BR pthread_exit (3)).
>  .TP
> +.IR /proc/[pid]/uid_map " (since kernel 3.6)"
> +This file reports the mapping of uids from the user namespace of the process specified by
> +.IR pid
> +to the user namespace of the process that opened
> +.IR /proc/[pid]/uid_map .
> +
> +Each line specifies a 1 to 1 mapping of a range of contiguous uids from
> +the user namespace of the process specified by
> +.IR pid
> +to the user namespace of the process that opened
> +.IR /proc/[pid]/uid_map.
> +
> +Each line contains three numbers.  The start of the range of uids in
> +the user namespace of the process specifed by
> +.IR pid.
> +The start of the range of uids in the user namespace of the process that
> +opened
> +.IR /proc/[pid]/uid_map.
> +The number of uids in the range of numbers that is mapped between to two
> +user namespaces.
> +
> +After the creation of a new user namespace this file may be written to
> +exactly once to specify the mapping of uids in the new user namespace.
> +
> +.TP
>  .I /proc/apm
>  Advanced power management version and battery information when
>  .B CONFIG_APM

I revised your text quite a bit, and added a piece on the format od
the uid_map files. Could you please read the following and let me know
of errors:

[[
       /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.6)
              These  files  expose the mappings for user and group IDs
              inside the user namespace  for  the  process  pid.   The
              description  here  explains  the  details  for  uid_map;
              gid_map is exactly the same, but each instance of  "user
              ID" is replaced by "group ID".

              The  uid_map  file  exposes the mapping of user IDs from
              the user namespace of the process pid to the user names‐
              pace of the process that opened uid_map.

              Each  line  in  the file specifies a 1-to-1 mapping of a
              range of contiguous user IDs from the user namespace  of
              the  process  pid  to  the user namespace of the process
              that opened uid_map.

              Each line contains  three  numbers  delimited  by  white
              space:

              (1) The  start  of  the  range  of  user IDs in the user
                  namespace of the process pid.

              (2) The start of the range  of  user  IDs  in  the  user
                  namespace of the process that opened uid_map.

              (3) The  length  of the range of user IDs that is mapped
                  between the two user namespaces.

              After the creation of a new user  namespace,  this  file
              may be written to exactly once to specify the mapping of
              user IDs in the new  user  namespace.   (An  attempt  to
              write  more  than  once to the file fails with the error
              EPERM.)

              The lines written to uid_map must conform to the follow‐
              ing rules:

              *  The  three fields must be valid numbers, and the last
                 field must be greater than 0.

              *  Lines are terminated by newline characters.

              *  The file can contain a maximum of five lines.

              *  The values in both field 1 and field 2 of  each  line
                 must be in ascending numerical order.

              *  The  range  of user IDs specified in each line cannot
                 overlap with the ranges in any other lines.

              Writes that violate the above rules fail with the  error
              EINVAL.
]]

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] clone.2: Describe the user namespace
       [not found]     ` <87y5hnq3d5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-27 10:16       ` Michael Kerrisk (man-pages)
       [not found]         ` <CAKgNAkgXWp49wXKom9hMm9fajKVOAwOmFzPdKWBesbBhfZEssA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-27 10:16 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Linux Containers

Hi Eric,

On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>  man2/clone.2 |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
>
> diff --git a/man2/clone.2 b/man2/clone.2
> index 0582057..4566677 100644
> --- a/man2/clone.2
> +++ b/man2/clone.2
> @@ -366,6 +366,45 @@ in the same
>  .BR clone ()
>  call.
>  .TP
> +.BR CLONE_NEWUSER " (since Linux 3.6)"

Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained
some meaning in 2.6.29.

> +If
> +.B CLONE_NEWUSER
> +is set, the create the process in a new user namespace.  If this flag is not set, then (as with
> +.BR fork (2)),
> +the process is created in the same user namespace as the calling process.
> +
> +A user namespace provides an isolated environment for security related identifiers in particular
> +uids, gids, keys (see
> +.BR keyctl (2)),
> +and capabilities.
> +
> +When a user namespace is created it initially starts out without a mapping of uids and gids
> +to the parent user namespace.  The desired mapping of uids to the parent user namespace
> +may be set by writting into
> +.IR /proc/[pid]/uid_map.
> +The desired mapping of gids to the parent user namespace may be set by writinng into
> +.IR /proc/[pid]/gid_map.
> +
> +The first process in a user namespace starts out with a complete set of capabilities with
> +respect to the new user namespace.
> +
> +syscalls that return uids and gids will either return the uid or gid mapped into the current
> +user namespace if there is a mapping or depending on the context will return either
> +the overflowuid (default 65534) or the overflowgid (default 65534). See
> +.IR /proc/sys/kernel/overflowuid, /proc/sys/kernel/overflowgid
> +
> +As of Linux 3.8 no priviliges are needed to create a user namespace,
> +and mount, pid, ipc, net, uts namespaces can be created with just
> +CAP_SYS_ADMIN privileges in your current user namespace.
> +
> +Over the years there have been a lot of features that have been added
> +to the linux kernel that are only available to privileged users
> +because of their potential to confuse setuid root applications.  In
> +general it becomes safe to allow the root user in a user namespace to
> +use those features because it is impossible while in a user namespace
> +to gain more privilege than the root user of a user namespace has.
> +
> +.TP
>  .BR CLONE_NEWPID " (since Linux 2.6.24)"
>  .\" This explanation draws a lot of details from
>  .\" http://lwn.net/Articles/259217/

I reworked your text somewhat. Could you please review the following:

[[
       CLONE_NEWUSER
              (This  flag first became meaningful for clone() in Linux
              2.6.29, but the implementation of  user  namespaces  was
              only  completed in Linux 3.8.)  If CLONE_NEWUSER is set,
              then create the process in a  new  user  namespace.   If
              this flag is not set, then (as with fork(2)) the process
              is created in the same user  namespace  as  the  calling
              process.

              A  user  namespace  provides an isolated environment for
              security related identifiers, in particular,  user  IDs,
              group IDs, keys (see keyctl(2)), and capabilities.

              When  a user namespace is created, it starts out without
              a mapping of user IDs (group IDs)  to  the  parent  user
              namespace.   The desired mapping of user IDs (group IDs)
              to the parent user namespace may be set by writing  into
              /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).

              The  first process in a user namespace starts out with a
              complete set of capabilities with  respect  to  the  new
              user namespace.

              System  calls  that  return  user  IDs  (group IDs) will
              return either the user ID (group  ID)  mapped  into  the
              current  user  namespace  if  there is a mapping, or the
              overflow user ID (group ID); the default value  for  the
              overflow  user ID (group ID) is 65534.  See the descrip‐
              tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
              nel/overflowgid in proc(5).

              Starting  with  Linux  3.8,  no privileges are needed to
              create a user namespace, and mount, PID, IPC,  net,  and
              UTS   namespaces   can   be   created   with   just  the
              CAP_SYS_ADMIN capability in the caller's user namespace.

              Over the years, there have been a lot of  features  that
              have been added to the Linux kernel that are only avail‐
              able to privileged users because of their  potential  to
              confuse  set-user-ID-root  applications.  In general, it
              becomes safe to allow the root user in a user  namespace
              to use those features because it is impossible, while in
              a user namespace, to gain more privilege than  the  root
              user of a user namespace has.
]]

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/4] proc.5: Document the proc files for the user, mount, and pid namespaces.
       [not found]     ` <87pq2zq3b6.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-27 10:28       ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-27 10:28 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Linux Containers

Hi Eric,

On Tue, Nov 27, 2012 at 1:47 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Thanks. Applied.

Cheers,

Michael


> ---
>  man5/proc.5 |   52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 52 insertions(+), 0 deletions(-)
>
> diff --git a/man5/proc.5 b/man5/proc.5
> index 840480d..eb612b9 100644
> --- a/man5/proc.5
> +++ b/man5/proc.5
> @@ -581,6 +581,58 @@ even if all processes in the namespace terminate.
>  The file descriptor can be passed to
>  .BR setns (2).
>  .TP
> +.IR /proc/[pid]/ns/user " (since Linux 3.8)"
> +Bind mounting this file (see
> +.BR mount (2))
> +to somewhere else in the filesystem keeps
> +the user namespace of the process specified by
> +.I pid
> +alive even if all processes currently in the namespace terminate.
> +
> +Opening this file returns a file handle for the user namespace
> +of the process specified by
> +.IR pid .
> +As long as this file descriptor remains open,
> +the user namespace will remain alive,
> +even if all processes in the namespace terminate.
> +The file descriptor can be passed to
> +.BR setns (2).
> +.TP
> +.IR /proc/[pid]/ns/pid " (since Linux 3.8)"
> +Bind mounting this file (see
> +.BR mount (2))
> +to somewhere else in the filesystem keeps
> +the PID namespace of the process specified by
> +.I pid
> +alive even if all processes currently in the namespace terminate.
> +
> +Opening this file returns a file handle for the PID namespace
> +of the process specified by
> +.IR pid .
> +As long as this file descriptor remains open,
> +the PID namespace will remain alive,
> +even if all processes in the namespace terminate.
> +The file descriptor can be passed to
> +.BR setns (2).
> +.TP
> +.IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
> +Bind mounting this file (see
> +.BR mount (2))
> +to somewhere else in the filesystem keeps
> +the mount namespace of the process specified by
> +.I pid
> +alive even if all processes currently in the namespace terminate.
> +
> +Opening this file returns a file handle for the mount namespace
> +of the process specified by
> +.IR pid .
> +As long as this file descriptor remains open,
> +the mount namespace will remain alive,
> +even if all processes in the namespace terminate.
> +The file descriptor can be passed to
> +.BR setns (2).
> +
> +.TP
>  .IR /proc/[pid]/numa_maps " (since Linux 2.6.14)"
>  See
>  .BR numa (7).
> --
> 1.7.5.4
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
       [not found]     ` <87k3t7q39u.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-27 11:08       ` Michael Kerrisk (man-pages)
       [not found]         ` <CAKgNAkiaw5L_oNE8NENjmoBS8Hq_uj+iaEdhyXc1+hje4HdnNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-27 11:08 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Serge E. Hallyn, Linux Containers

Hi Eric,

Some questions below.

On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
>  man2/setns.2 |   41 +++++++++++++++++++++++++++++++++--------
>  1 files changed, 33 insertions(+), 8 deletions(-)
>
> diff --git a/man2/setns.2 b/man2/setns.2
> index 6aa01e1..63b04dc 100644
> --- a/man2/setns.2
> +++ b/man2/setns.2
> @@ -48,6 +48,18 @@ must refer to a network namespace.
>  .BR CLONE_NEWUTS
>  .I fd
>  must refer to a UTS namespace.
> +.TP
> +.BR CLONE_NEWPID
> +.I fd
> +must refer to a PID namespace.
> +.TP
> +.BR CLONE_NEWUSER
> +.I fd
> +must refer to a user namespace.
> +.TP
> +.BR CLONE_NEWNS
> +.I fd
> +must refer to a mount namespace.
>  .PP
>  Specifying
>  .I nstype
> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
>  .IR fd
>  if the file descriptor was opened by another process and, for example,
>  passed to the caller via a UNIX domain socket.)
> +
> +The pid namespace is a little different.  Reassociating the calling
> +thread with a pid namespace only changes the pid namespace that the
> +child processes will be created in.
> +
> +Changing the pid namespace for child processes is only allowed if the
> +pid namespace specified by
> +.IR fd
> +is a child pid namespace of the pid namespace of the current thread.

I assume "current thread" above should be "calling thread", right?

> +
> +A multi-threaded process may not change user namespace with setns.  A
> +process may not reassociate the thread with the current user
> +namespace.

What do you mean by "the current user nsamesapce"?

> The process reassociating itself with a user namespace
> +must have CAP_SYS_ADMIN privileges in the target user namespace.
> +
> +A process may not be reassociated with a new mount namespace if it is
> +multi-threaded

I tried to verify the precdeing two lines from the kernel source, but
did not work out where this check is made. Where is it?

> or it does not possess both CAP_SYS_CHROOT privileges
> +and CAP_SYS_ADMIN rights over the target mount namespace.

Could you please expand/clarify the preceding two lines. As they
stand, I don't really understand them.

>  .SH RETURN VALUE
>  On success,
>  .IR setns ()
> @@ -94,7 +125,8 @@ for this operation.
>  The
>  .BR setns ()
>  system call first appeared in Linux in kernel 3.0;
> -library support was added to glibc in version 2.14.
> +library support was added to glibc in version 2.14;
> +Support for PID, user and mount namespaces first appeard in Linux in kernel 3.8.
>  .SH CONFORMING TO
>  The
>  .BR setns ()
> @@ -106,13 +138,6 @@ a new thread is created using
>  can be changed using
>  .BR setns ().
>  .SH BUGS
> -The PID namespace and the mount namespace are not currently supported.
> -(See the descriptions of
> -.BR CLONE_NEWPID
> -and
> -.BR CLONE_NEWNS
> -in
> -.BR clone (2).)
>  .SH SEE ALSO
>  .BR clone (2),
>  .BR fork (2),

Cheers,

Michael




-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
       [not found]         ` <CAKgNAkixXmtvQUbwyv=a8mU=gdf-x+w-ou_4N=cNaau+hVoy4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-27 16:58           ` Eric W. Biederman
       [not found]             ` <87obhfxwhb.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-12-27 17:23           ` Eric W. Biederman
  1 sibling, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-27 16:58 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Linux API, Serge E. Hallyn, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hi Eric,
>
> Thanks for this patch. I have one question and a revised version f the
> text that I'd like you to review.
>
> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>
>> Document the user namespace files that report the mapping of uids
>> and gids between user namespaces.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>>  man5/proc.5 |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 50 insertions(+), 0 deletions(-)
>>
>> diff --git a/man5/proc.5 b/man5/proc.5
>> index fb70d2b..840480d 100644
>> --- a/man5/proc.5
>> +++ b/man5/proc.5
>> @@ -317,6 +317,31 @@ The files in this directory are readable only by the owner of the process.
>>  .\" .TP
>>  .\" .IR /proc/[pid]/io " (since kernel 2.6.20)"
>>  .TP
>> +.IR /proc/[pid]/gid_map " (since kernel 3.6)"
>> +This file reports the mapping of gids from the user namespace of the process specified by
>> +.IR pid
>> +to the user namespace of the process that opened
>> +.IR /proc/[pid]/gid_map .
>> +
>> +Each line specifies a 1 to 1 mapping of a range of contiguous gids from
>> +the user namespace of the process specified by
>> +.IR pid
>> +to the user namespace of the process that opened
>> +.IR /proc/[pid]/gid_map.
>
> I want to check the above point. What do you mean by "the process that
> opened uid_map"? Does that mean the process that opened uid_map to do
> the one-time write of the UID map? I had assumed that uid_map actually
> provided a mapping between the namespace of 'pid' and the 'parent'
> namespace, where the parent namespace is the namespace of the process
> that created this namespace via clone(CLONE_NEWUSER).

I mean the process that opens uid_map for read or write.

For writing you are correct about the mapping to the parent (but that is
not an exception that is a restriction on who can write to the file).

The complete rule is for the user namespace of the second value is:

- If the user namespace of the opener of the file and the user namespace
  of the process do not match.  The user namespace of the opener of the
  file is used.

- If the user namespace of the opener of the file and the user namespace
  of the process are the same.  The parent user namespace of the process
  is used for the second value.

While very wordy I think the rule makes a lot of intuitive and practical
sense.  Especially since it is non-trivial to come up with the chain of
user namespaces a process is in.

>> +Each line contains three numbers.  The start of the range of gids in
>> +the user namespace of the process specifed by
>> +.IR pid.
>> +The start of the range of gids in the user namespace of the process that
>> +opened
>> +.IR /proc/[pid]/gid_map.
>> +The number of gids in the range of numbers that is mapped between to two
>> +user namespaces.
>> +
>> +After the creation of a new user namespace this file may be written to
>> +exactly once to specify the mapping of gids in the new user namespace.
>> +
>> +.TP
>>  .IR /proc/[pid]/limits " (since kernel 2.6.24)"
>>  This file displays the soft limit, hard limit, and units of measurement
>>  for each of the process's resource limits (see
>> @@ -1169,6 +1194,31 @@ directory are not available if the main thread has already terminated
>>  (typically by calling
>>  .BR pthread_exit (3)).
>>  .TP
>> +.IR /proc/[pid]/uid_map " (since kernel 3.6)"
>> +This file reports the mapping of uids from the user namespace of the process specified by
>> +.IR pid
>> +to the user namespace of the process that opened
>> +.IR /proc/[pid]/uid_map .
>> +
>> +Each line specifies a 1 to 1 mapping of a range of contiguous uids from
>> +the user namespace of the process specified by
>> +.IR pid
>> +to the user namespace of the process that opened
>> +.IR /proc/[pid]/uid_map.
>> +
>> +Each line contains three numbers.  The start of the range of uids in
>> +the user namespace of the process specifed by
>> +.IR pid.
>> +The start of the range of uids in the user namespace of the process that
>> +opened
>> +.IR /proc/[pid]/uid_map.
>> +The number of uids in the range of numbers that is mapped between to two
>> +user namespaces.
>> +
>> +After the creation of a new user namespace this file may be written to
>> +exactly once to specify the mapping of uids in the new user namespace.
>> +
>> +.TP
>>  .I /proc/apm
>>  Advanced power management version and battery information when
>>  .B CONFIG_APM
>
> I revised your text quite a bit, and added a piece on the format od
> the uid_map files. Could you please read the following and let me know
> of errors:
>
> [[
>        /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.6)
>               These  files  expose the mappings for user and group IDs
>               inside the user namespace  for  the  process  pid.   The
>               description  here  explains  the  details  for  uid_map;
>               gid_map is exactly the same, but each instance of  "user
>               ID" is replaced by "group ID".
>
>               The  uid_map  file  exposes the mapping of user IDs from
>               the user namespace of the process pid to the user names‐
>               pace of the process that opened uid_map.
>
>               Each  line  in  the file specifies a 1-to-1 mapping of a
>               range of contiguous user IDs from the user namespace  of
>               the  process  pid  to  the user namespace of the process
>               that opened uid_map.
>
>               Each line contains  three  numbers  delimited  by  white
>               space:
>
>               (1) The  start  of  the  range  of  user IDs in the user
>                   namespace of the process pid.
>
>               (2) The start of the range  of  user  IDs  in  the  user
>                   namespace of the process that opened uid_map.
>
>               (3) The  length  of the range of user IDs that is mapped
>                   between the two user namespaces.
>
>               After the creation of a new user  namespace,  this  file
>               may be written to exactly once to specify the mapping of
>               user IDs in the new  user  namespace.   (An  attempt  to
>               write  more  than  once to the file fails with the error
>               EPERM.)
>
>               The lines written to uid_map must conform to the follow‐
>               ing rules:
>
>               *  The  three fields must be valid numbers, and the last
>                  field must be greater than 0.
>
>               *  Lines are terminated by newline characters.
>
>               *  The file can contain a maximum of five lines.

A maximum of 5 lines is important to Document but it is a current
arbitrary limit that may be changed in the future.  Right now 5 extents
are more than enough for any conceivable use case, and fit nicely within
a single cache line.

It is probably better to say writes that exceed an arbitrary maximum
length fail with -EINVAL.  Currently the arbitrary maximum length is
five lines.

>               *  The values in both field 1 and field 2 of  each  line
>                  must be in ascending numerical order.

The rule is that the extents need to be non-overlapping.  Ascending
numerical order is how that is implemented but that is a misfeature,
and there has already been one request to fix that.  Removing the
ascending numerical order limitation is on my todo list.

>               *  The  range  of user IDs specified in each line cannot
>                  overlap with the ranges in any other lines.
>
>               Writes that violate the above rules fail with the  error
>               EINVAL.
> ]]
>
> Thanks,
>
> Michael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] clone.2: Describe the user namespace
       [not found]         ` <CAKgNAkgXWp49wXKom9hMm9fajKVOAwOmFzPdKWBesbBhfZEssA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-27 17:20           ` Eric W. Biederman
       [not found]             ` <87r4mbv2c9.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  2012-12-27 17:47           ` Eric W. Biederman
  1 sibling, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-27 17:20 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>  man2/clone.2 |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/man2/clone.2 b/man2/clone.2
>> index 0582057..4566677 100644
>> --- a/man2/clone.2
>> +++ b/man2/clone.2
>> @@ -366,6 +366,45 @@ in the same
>>  .BR clone ()
>>  call.
>>  .TP
>> +.BR CLONE_NEWUSER " (since Linux 3.6)"
>
> Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained
> some meaning in 2.6.29.

Looking at it where I have said 3.6 that is wrong.  I meant 3.5.

I think I made the same mistake in one or two other manpages.  Nothing
was merged in 3.6 unfortunately.

My intent was these are the semantics of user namespaces since 3.5,
when my rework/refocusing of them was merged.

Since 3.5 all that has really happened with user namespaces is the
uid/gid to kuid/kgid conversion, permission checks have been relaxed,
and a few bugs have been fixed.

3.8 is huge from a usability standpoint.  3.8 is huge because setns(),
and unshare() are now complete from a namespace perspective, and because
enough permission checks have been relaxed in user namespaces that you
can really start using them.

But semantically from a user namespace perspective nothing really has
changed in 3.8.

>> +If
>> +.B CLONE_NEWUSER
>> +is set, the create the process in a new user namespace.  If this flag is not set, then (as with
>> +.BR fork (2)),
>> +the process is created in the same user namespace as the calling process.
>> +
>> +A user namespace provides an isolated environment for security related identifiers in particular
>> +uids, gids, keys (see
>> +.BR keyctl (2)),
>> +and capabilities.
>> +
>> +When a user namespace is created it initially starts out without a mapping of uids and gids
>> +to the parent user namespace.  The desired mapping of uids to the parent user namespace
>> +may be set by writting into
>> +.IR /proc/[pid]/uid_map.
>> +The desired mapping of gids to the parent user namespace may be set by writinng into
>> +.IR /proc/[pid]/gid_map.
>> +
>> +The first process in a user namespace starts out with a complete set of capabilities with
>> +respect to the new user namespace.
>> +
>> +syscalls that return uids and gids will either return the uid or gid mapped into the current
>> +user namespace if there is a mapping or depending on the context will return either
>> +the overflowuid (default 65534) or the overflowgid (default 65534). See
>> +.IR /proc/sys/kernel/overflowuid, /proc/sys/kernel/overflowgid
>> +
>> +As of Linux 3.8 no priviliges are needed to create a user namespace,
>> +and mount, pid, ipc, net, uts namespaces can be created with just
>> +CAP_SYS_ADMIN privileges in your current user namespace.
>> +
>> +Over the years there have been a lot of features that have been added
>> +to the linux kernel that are only available to privileged users
>> +because of their potential to confuse setuid root applications.  In
>> +general it becomes safe to allow the root user in a user namespace to
>> +use those features because it is impossible while in a user namespace
>> +to gain more privilege than the root user of a user namespace has.
>> +
>> +.TP
>>  .BR CLONE_NEWPID " (since Linux 2.6.24)"
>>  .\" This explanation draws a lot of details from
>>  .\" http://lwn.net/Articles/259217/
>
> I reworked your text somewhat. Could you please review the following:
>
> [[
>        CLONE_NEWUSER
>               (This  flag first became meaningful for clone() in Linux
>               2.6.29, but the implementation of  user  namespaces  was
>               only  completed in Linux 3.8.)

Long rant about 2.6.29 vs 3.8 above.  I think what we need to say is:

                (This  flag first became meaningful for clone() in Linux
                2.6.29, the current semantics were merged present in
                3.5, and user namespaces only really became usable in 3.8.)

>                                               If CLONE_NEWUSER is set,
>               then create the process in a  new  user  namespace.   If
>               this flag is not set, then (as with fork(2)) the process
>               is created in the same user  namespace  as  the  calling
>               process.
>
>               A  user  namespace  provides an isolated environment for
>               security related identifiers, in particular,  user  IDs,
>               group IDs, keys (see keyctl(2)), and capabilities.
>
>               When  a user namespace is created, it starts out without
>               a mapping of user IDs (group IDs)  to  the  parent  user
>               namespace.   The desired mapping of user IDs (group IDs)
>               to the parent user namespace may be set by writing  into
>               /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).

		/proc/[pid]/projid_map deserves a mention.  Not that
                I am a fan of project is or that xfs where the are
                implemented has been converted yet but....

>               The  first process in a user namespace starts out with a
>               complete set of capabilities with  respect  to  the  new
>               user namespace.
>
>               System  calls  that  return  user  IDs  (group IDs) will
>               return either the user ID (group  ID)  mapped  into  the
>               current  user  namespace  if  there is a mapping, or the
>               overflow user ID (group ID); the default value  for  the
>               overflow  user ID (group ID) is 65534.  See the descrip‐
>               tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
>               nel/overflowgid in proc(5).
>
>               Starting  with  Linux  3.8,  no privileges are needed to
>               create a user namespace, and mount, PID, IPC,  net,  and
>               UTS   namespaces   can   be   created   with   just  the
>               CAP_SYS_ADMIN capability in the caller's user namespace.
>
>               Over the years, there have been a lot of  features  that
>               have been added to the Linux kernel that are only avail‐
>               able to privileged users because of their  potential  to
>               confuse  set-user-ID-root  applications.  In general, it
>               becomes safe to allow the root user in a user  namespace
>               to use those features because it is impossible, while in
>               a user namespace, to gain more privilege than  the  root
>               user of a user namespace has.

I don't have any problems with this bit of text.

It occurs to me that what is going on with capabilities and user
namespaces needs to be documented better.  There was a minor bug with
them this release cycle and I realized while the current definition
makes sense and isn't hard to understand in general.  In detail the
interaction of capabilities and user namespaces are hard to describe.

I think capabilities and user namespaces are the work of a future patch
however.

> ]]
>
> Thanks,
>
> Michael
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
       [not found]         ` <CAKgNAkixXmtvQUbwyv=a8mU=gdf-x+w-ou_4N=cNaau+hVoy4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-12-27 16:58           ` Eric W. Biederman
@ 2012-12-27 17:23           ` Eric W. Biederman
       [not found]             ` <87licjv276.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-27 17:23 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Linux API, Serge E. Hallyn, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hi Eric,
>
> Thanks for this patch. I have one question and a revised version f the
> text that I'd like you to review.

In this patch where I said 3.6 it should have been 3.5

Eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
       [not found]         ` <CAKgNAkiaw5L_oNE8NENjmoBS8Hq_uj+iaEdhyXc1+hje4HdnNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-27 17:40           ` Eric W. Biederman
       [not found]             ` <87bodftmv0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-27 17:40 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hi Eric,
>
> Some questions below.

A quick note.  Getting the permission checks correct has been a little
more interesting that I would have preferred.

I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns()
install methods except the user namespace.  Not a change in pre 3.8
behavior but a change to my patch, and possibly a documentation change
below.

> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>>  man2/setns.2 |   41 +++++++++++++++++++++++++++++++++--------
>>  1 files changed, 33 insertions(+), 8 deletions(-)
>>
>> diff --git a/man2/setns.2 b/man2/setns.2
>> index 6aa01e1..63b04dc 100644
>> --- a/man2/setns.2
>> +++ b/man2/setns.2
>> @@ -48,6 +48,18 @@ must refer to a network namespace.
>>  .BR CLONE_NEWUTS
>>  .I fd
>>  must refer to a UTS namespace.
>> +.TP
>> +.BR CLONE_NEWPID
>> +.I fd
>> +must refer to a PID namespace.
>> +.TP
>> +.BR CLONE_NEWUSER
>> +.I fd
>> +must refer to a user namespace.
>> +.TP
>> +.BR CLONE_NEWNS
>> +.I fd
>> +must refer to a mount namespace.
>>  .PP
>>  Specifying
>>  .I nstype
>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
>>  .IR fd
>>  if the file descriptor was opened by another process and, for example,
>>  passed to the caller via a UNIX domain socket.)
>> +
>> +The pid namespace is a little different.  Reassociating the calling
>> +thread with a pid namespace only changes the pid namespace that the
>> +child processes will be created in.
>> +
>> +Changing the pid namespace for child processes is only allowed if the
>> +pid namespace specified by
>> +.IR fd
>> +is a child pid namespace of the pid namespace of the current thread.
>
> I assume "current thread" above should be "calling thread", right?

What I mean in "current" from a kernel perspective.

It should be just "caller".

Threads must share a pid namespace so mentioning threads seems wrong.

>> +
>> +A multi-threaded process may not change user namespace with setns.  A
>> +process may not reassociate the thread with the current user
>> +namespace.
>
> What do you mean by "the current user nsamesapce"?

fd = open("/proc/self/ns/user");
setns(fd) -> -EINVAL.

So from a userspace perspective I mean "the callers user namespace".

>> The process reassociating itself with a user namespace
>> +must have CAP_SYS_ADMIN privileges in the target user namespace.
>>
>> +A process may not be reassociated with a new mount namespace if it is
>> +multi-threaded
>
> I tried to verify the precdeing two lines from the kernel source, but
> did not work out where this check is made. Where is it?

kernel/user_namespace.c:userns_install()
fs/namespace.c:mntns_install()

A couple of the security checks have been pushed down into a per
namespace context, because the exact check that makes sense depends on
the namespace.

>> or it does not possess both CAP_SYS_CHROOT privileges
>> +and CAP_SYS_ADMIN rights over the target mount namespace.
>
> Could you please expand/clarify the preceding two lines. As they
> stand, I don't really understand them.

Ugh.  The text is slightly wrong.

The code is:
	if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
	    !nsown_capable(CAP_SYS_CHROOT) ||
	    !nsown_capable(CAP_SYS_ADMIN))
		return -EPERM;

Basically you aren't allowed change your mount namespace into
a mount namespace that doesn't see you as the all powerful root
able to mount and unmount filesystems.

You aren't allowed to change your mount namespace unless you possesses
CAP_SYS_CHROOT and CAP_SYS_ADMIN.

>>  .SH RETURN VALUE
>>  On success,
>>  .IR setns ()
>> @@ -94,7 +125,8 @@ for this operation.
>>  The
>>  .BR setns ()
>>  system call first appeared in Linux in kernel 3.0;
>> -library support was added to glibc in version 2.14.
>> +library support was added to glibc in version 2.14;
>> +Support for PID, user and mount namespaces first appeard in Linux in kernel 3.8.
>>  .SH CONFORMING TO
>>  The
>>  .BR setns ()
>> @@ -106,13 +138,6 @@ a new thread is created using
>>  can be changed using
>>  .BR setns ().
>>  .SH BUGS
>> -The PID namespace and the mount namespace are not currently supported.
>> -(See the descriptions of
>> -.BR CLONE_NEWPID
>> -and
>> -.BR CLONE_NEWNS
>> -in
>> -.BR clone (2).)
>>  .SH SEE ALSO
>>  .BR clone (2),
>>  .BR fork (2),
>
> Cheers,
>
> Michael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] clone.2: Describe the user namespace
       [not found]         ` <CAKgNAkgXWp49wXKom9hMm9fajKVOAwOmFzPdKWBesbBhfZEssA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2012-12-27 17:20           ` Eric W. Biederman
@ 2012-12-27 17:47           ` Eric W. Biederman
       [not found]             ` <87sj6rs7zc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-27 17:47 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers


There is one other bit that needs to be documented in clone, although
I am not certain where/how.

The sequences:

unshare(CLONE_NEWPID).
clone(CLONE_VM)

setns(fd, CLONE_NEWPID).
clone(CLONE_VM).

Now fail.

Basically the rule is all threads must be in the same pid namespace.

The joy of reviews with good comments that come much later than hoped.

Eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
       [not found]             ` <87licjv276.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-27 18:39               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-27 18:39 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Serge E. Hallyn, Linux Containers

On Thu, Dec 27, 2012 at 6:23 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Hi Eric,
>>
>> Thanks for this patch. I have one question and a revised version f the
>> text that I'd like you to review.
>
> In this patch where I said 3.6 it should have been 3.5

Fixed!

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
       [not found]             ` <87obhfxwhb.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-28 19:20               ` Michael Kerrisk (man-pages)
       [not found]                 ` <CAKgNAkjs9T-s8SG-EgTT0O-Uj8S98Q_zfnMqnZ1ROrcYqh7Z5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-28 19:20 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Linux Containers

Hi Eric,

On Thu, Dec 27, 2012 at 5:58 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
>> Hi Eric,
>>
>> Thanks for this patch. I have one question and a revised version f the
>> text that I'd like you to review.
>>
>> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>>
>>> Document the user namespace files that report the mapping of uids
>>> and gids between user namespaces.
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>  man5/proc.5 |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 files changed, 50 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/man5/proc.5 b/man5/proc.5
>>> index fb70d2b..840480d 100644
>>> --- a/man5/proc.5
>>> +++ b/man5/proc.5
>>> @@ -317,6 +317,31 @@ The files in this directory are readable only by the owner of the process.
>>>  .\" .TP
>>>  .\" .IR /proc/[pid]/io " (since kernel 2.6.20)"
>>>  .TP
>>> +.IR /proc/[pid]/gid_map " (since kernel 3.6)"
>>> +This file reports the mapping of gids from the user namespace of the process specified by
>>> +.IR pid
>>> +to the user namespace of the process that opened
>>> +.IR /proc/[pid]/gid_map .
>>> +
>>> +Each line specifies a 1 to 1 mapping of a range of contiguous gids from
>>> +the user namespace of the process specified by
>>> +.IR pid
>>> +to the user namespace of the process that opened
>>> +.IR /proc/[pid]/gid_map.
>>
>> I want to check the above point. What do you mean by "the process that
>> opened uid_map"? Does that mean the process that opened uid_map to do
>> the one-time write of the UID map? I had assumed that uid_map actually
>> provided a mapping between the namespace of 'pid' and the 'parent'
>> namespace, where the parent namespace is the namespace of the process
>> that created this namespace via clone(CLONE_NEWUSER).
>
> I mean the process that opens uid_map for read or write.

Thanks for the confirmation.

> For writing you are correct about the mapping to the parent (but that is
> not an exception that is a restriction on who can write to the file).

So, by the way, I added this sentence to the page:

              In   order   to   write   to   the   /proc/[pid]/uid_map
              (/proc/[pid]/gid_map) file,  a  process  must  have  the
              CAP_SETUID (CAP_SETGID) capability in the user namespace
              of the process pid.

Is that correct?

But, there appear to be more rules than this governing whether a
process can write to the file (i.e., various other -EPERM cases). What
are the rules?

> The complete rule is for the user namespace of the second value is:
>
> - If the user namespace of the opener of the file and the user namespace
>   of the process do not match.  The user namespace of the opener of the
>   file is used.
>
> - If the user namespace of the opener of the file and the user namespace
>   of the process are the same.  The parent user namespace of the process
>   is used for the second value.

Could you give an example of the last case? (What I'm really seeking,
I think, is clarification of "parent user namespace". Does that mean
"user namespace of the process that created the user namespace of this
process"?)


> While very wordy I think the rule makes a lot of intuitive and practical
> sense.  Especially since it is non-trivial to come up with the chain of
> user namespaces a process is in.
>
>>> +Each line contains three numbers.  The start of the range of gids in
>>> +the user namespace of the process specifed by
>>> +.IR pid.
>>> +The start of the range of gids in the user namespace of the process that
>>> +opened
>>> +.IR /proc/[pid]/gid_map.
>>> +The number of gids in the range of numbers that is mapped between to two
>>> +user namespaces.
>>> +
>>> +After the creation of a new user namespace this file may be written to
>>> +exactly once to specify the mapping of gids in the new user namespace.
>>> +
>>> +.TP
>>>  .IR /proc/[pid]/limits " (since kernel 2.6.24)"
>>>  This file displays the soft limit, hard limit, and units of measurement
>>>  for each of the process's resource limits (see
>>> @@ -1169,6 +1194,31 @@ directory are not available if the main thread has already terminated
>>>  (typically by calling
>>>  .BR pthread_exit (3)).
>>>  .TP
>>> +.IR /proc/[pid]/uid_map " (since kernel 3.6)"
>>> +This file reports the mapping of uids from the user namespace of the process specified by
>>> +.IR pid
>>> +to the user namespace of the process that opened
>>> +.IR /proc/[pid]/uid_map .
>>> +
>>> +Each line specifies a 1 to 1 mapping of a range of contiguous uids from
>>> +the user namespace of the process specified by
>>> +.IR pid
>>> +to the user namespace of the process that opened
>>> +.IR /proc/[pid]/uid_map.
>>> +
>>> +Each line contains three numbers.  The start of the range of uids in
>>> +the user namespace of the process specifed by
>>> +.IR pid.
>>> +The start of the range of uids in the user namespace of the process that
>>> +opened
>>> +.IR /proc/[pid]/uid_map.
>>> +The number of uids in the range of numbers that is mapped between to two
>>> +user namespaces.
>>> +
>>> +After the creation of a new user namespace this file may be written to
>>> +exactly once to specify the mapping of uids in the new user namespace.
>>> +
>>> +.TP
>>>  .I /proc/apm
>>>  Advanced power management version and battery information when
>>>  .B CONFIG_APM
>>
>> I revised your text quite a bit, and added a piece on the format od
>> the uid_map files. Could you please read the following and let me know
>> of errors:
>>
>> [[
>>        /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.6)
>>               These  files  expose the mappings for user and group IDs
>>               inside the user namespace  for  the  process  pid.   The
>>               description  here  explains  the  details  for  uid_map;
>>               gid_map is exactly the same, but each instance of  "user
>>               ID" is replaced by "group ID".
>>
>>               The  uid_map  file  exposes the mapping of user IDs from
>>               the user namespace of the process pid to the user names‐
>>               pace of the process that opened uid_map.
>>
>>               Each  line  in  the file specifies a 1-to-1 mapping of a
>>               range of contiguous user IDs from the user namespace  of
>>               the  process  pid  to  the user namespace of the process
>>               that opened uid_map.
>>
>>               Each line contains  three  numbers  delimited  by  white
>>               space:
>>
>>               (1) The  start  of  the  range  of  user IDs in the user
>>                   namespace of the process pid.
>>
>>               (2) The start of the range  of  user  IDs  in  the  user
>>                   namespace of the process that opened uid_map.
>>
>>               (3) The  length  of the range of user IDs that is mapped
>>                   between the two user namespaces.
>>
>>               After the creation of a new user  namespace,  this  file
>>               may be written to exactly once to specify the mapping of
>>               user IDs in the new  user  namespace.   (An  attempt  to
>>               write  more  than  once to the file fails with the error
>>               EPERM.)
>>
>>               The lines written to uid_map must conform to the follow‐
>>               ing rules:
>>
>>               *  The  three fields must be valid numbers, and the last
>>                  field must be greater than 0.
>>
>>               *  Lines are terminated by newline characters.
>>
>>               *  The file can contain a maximum of five lines.
>
> A maximum of 5 lines is important to Document but it is a current
> arbitrary limit that may be changed in the future.  Right now 5 extents
> are more than enough for any conceivable use case, and fit nicely within
> a single cache line.
>
> It is probably better to say writes that exceed an arbitrary maximum
> length fail with -EINVAL.  Currently the arbitrary maximum length is
> five lines.

Okay -- reworded.

>
>>               *  The values in both field 1 and field 2 of  each  line
>>                  must be in ascending numerical order.
>
> The rule is that the extents need to be non-overlapping.  Ascending
> numerical order is how that is implemented but that is a misfeature,
> and there has already been one request to fix that.  Removing the
> ascending numerical order limitation is on my todo list.

Okay -- I've reworded some text here.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
       [not found]                 ` <CAKgNAkjs9T-s8SG-EgTT0O-Uj8S98Q_zfnMqnZ1ROrcYqh7Z5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-28 21:20                   ` Eric W. Biederman
       [not found]                     ` <87vcbldgbj.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-28 21:20 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> On Thu, Dec 27, 2012 at 5:58 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> Thanks for this patch. I have one question and a revised version f the
>>> text that I'd like you to review.
>>>
>>> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>>
>>>> Document the user namespace files that report the mapping of uids
>>>> and gids between user namespaces.
>>>>
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> ---
>>>>  man5/proc.5 |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 files changed, 50 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/man5/proc.5 b/man5/proc.5
>>>> index fb70d2b..840480d 100644
>>>> --- a/man5/proc.5
>>>> +++ b/man5/proc.5
>>>> @@ -317,6 +317,31 @@ The files in this directory are readable only by the owner of the process.
>>>>  .\" .TP
>>>>  .\" .IR /proc/[pid]/io " (since kernel 2.6.20)"
>>>>  .TP
>>>> +.IR /proc/[pid]/gid_map " (since kernel 3.6)"
>>>> +This file reports the mapping of gids from the user namespace of the process specified by
>>>> +.IR pid
>>>> +to the user namespace of the process that opened
>>>> +.IR /proc/[pid]/gid_map .
>>>> +
>>>> +Each line specifies a 1 to 1 mapping of a range of contiguous gids from
>>>> +the user namespace of the process specified by
>>>> +.IR pid
>>>> +to the user namespace of the process that opened
>>>> +.IR /proc/[pid]/gid_map.
>>>
>>> I want to check the above point. What do you mean by "the process that
>>> opened uid_map"? Does that mean the process that opened uid_map to do
>>> the one-time write of the UID map? I had assumed that uid_map actually
>>> provided a mapping between the namespace of 'pid' and the 'parent'
>>> namespace, where the parent namespace is the namespace of the process
>>> that created this namespace via clone(CLONE_NEWUSER).
>>
>> I mean the process that opens uid_map for read or write.
>
> Thanks for the confirmation.
>
>> For writing you are correct about the mapping to the parent (but that is
>> not an exception that is a restriction on who can write to the file).
>
> So, by the way, I added this sentence to the page:
>
>               In   order   to   write   to   the   /proc/[pid]/uid_map
>               (/proc/[pid]/gid_map) file,  a  process  must  have  the
>               CAP_SETUID (CAP_SETGID) capability in the user namespace
>               of the process pid.
>
> Is that correct?

Yes.

> But, there appear to be more rules than this governing whether a
> process can write to the file (i.e., various other -EPERM cases). What
> are the rules?

In general you must also have CAP_SETUID (CAP_SETGID) in the parent user
namespace as well.  The one exception to that is if you are mapping
your current uid and gid.    A rose by any other name will smell as
sweet.  In practice this means you must be root to map to uid or gids
other than your own, which preserves the current limits on setuid and
setgid.

Additionally the writer must see the map file with the lower user
namespace being the parent user namespace.  Which means you must be
inside the user namespace itself or in the parent user namespace to
write to the user namespaces mapping file.

For /proc/[pid]/projid_map which will be interesting once xfs
has kuid/kgid support there are no capability checks because xfs let's
anyone have any projid.

This is one of the few cases where it almost matters to understand
how ns_capable works when you are not in the user namespace in question,
and that goes to what is a parent user namespace.  If you would like
some more detail on that please ask.

>> The complete rule is for the user namespace of the second value is:
>>
>> - If the user namespace of the opener of the file and the user namespace
>>   of the process do not match.  The user namespace of the opener of the
>>   file is used.
>>
>> - If the user namespace of the opener of the file and the user namespace
>>   of the process are the same.  The parent user namespace of the process
>>   is used for the second value.
>
> Could you give an example of the last case? (What I'm really seeking,
> I think, is clarification of "parent user namespace". Does that mean
> "user namespace of the process that created the user namespace of this
> process"?)

User namespaces form a tree.  What you can do in one user namespace is a
subset of what you can do in the parent user namespace.

The parent user namespace is the user namespace of the process that
calls unshare or clone with CLONE_NEWUSER.


The last case is the common case of /proc/self/uid_map.  And you see how
your uids map into the user namespace of the creator of your user
namespace.

With the default being just:         0          0 4294967295

>> While very wordy I think the rule makes a lot of intuitive and practical
>> sense.  Especially since it is non-trivial to come up with the chain of
>> user namespaces a process is in.
>>
>>>> +Each line contains three numbers.  The start of the range of gids in
>>>> +the user namespace of the process specifed by
>>>> +.IR pid.
>>>> +The start of the range of gids in the user namespace of the process that
>>>> +opened
>>>> +.IR /proc/[pid]/gid_map.
>>>> +The number of gids in the range of numbers that is mapped between to two
>>>> +user namespaces.
>>>> +
>>>> +After the creation of a new user namespace this file may be written to
>>>> +exactly once to specify the mapping of gids in the new user namespace.
>>>> +
>>>> +.TP
>>>>  .IR /proc/[pid]/limits " (since kernel 2.6.24)"
>>>>  This file displays the soft limit, hard limit, and units of measurement
>>>>  for each of the process's resource limits (see
>>>> @@ -1169,6 +1194,31 @@ directory are not available if the main thread has already terminated
>>>>  (typically by calling
>>>>  .BR pthread_exit (3)).
>>>>  .TP
>>>> +.IR /proc/[pid]/uid_map " (since kernel 3.6)"
>>>> +This file reports the mapping of uids from the user namespace of the process specified by
>>>> +.IR pid
>>>> +to the user namespace of the process that opened
>>>> +.IR /proc/[pid]/uid_map .
>>>> +
>>>> +Each line specifies a 1 to 1 mapping of a range of contiguous uids from
>>>> +the user namespace of the process specified by
>>>> +.IR pid
>>>> +to the user namespace of the process that opened
>>>> +.IR /proc/[pid]/uid_map.
>>>> +
>>>> +Each line contains three numbers.  The start of the range of uids in
>>>> +the user namespace of the process specifed by
>>>> +.IR pid.
>>>> +The start of the range of uids in the user namespace of the process that
>>>> +opened
>>>> +.IR /proc/[pid]/uid_map.
>>>> +The number of uids in the range of numbers that is mapped between to two
>>>> +user namespaces.
>>>> +
>>>> +After the creation of a new user namespace this file may be written to
>>>> +exactly once to specify the mapping of uids in the new user namespace.
>>>> +
>>>> +.TP
>>>>  .I /proc/apm
>>>>  Advanced power management version and battery information when
>>>>  .B CONFIG_APM
>>>
>>> I revised your text quite a bit, and added a piece on the format od
>>> the uid_map files. Could you please read the following and let me know
>>> of errors:
>>>
>>> [[
>>>        /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.6)
>>>               These  files  expose the mappings for user and group IDs
>>>               inside the user namespace  for  the  process  pid.   The
>>>               description  here  explains  the  details  for  uid_map;
>>>               gid_map is exactly the same, but each instance of  "user
>>>               ID" is replaced by "group ID".
>>>
>>>               The  uid_map  file  exposes the mapping of user IDs from
>>>               the user namespace of the process pid to the user names‐
>>>               pace of the process that opened uid_map.
>>>
>>>               Each  line  in  the file specifies a 1-to-1 mapping of a
>>>               range of contiguous user IDs from the user namespace  of
>>>               the  process  pid  to  the user namespace of the process
>>>               that opened uid_map.
>>>
>>>               Each line contains  three  numbers  delimited  by  white
>>>               space:
>>>
>>>               (1) The  start  of  the  range  of  user IDs in the user
>>>                   namespace of the process pid.
>>>
>>>               (2) The start of the range  of  user  IDs  in  the  user
>>>                   namespace of the process that opened uid_map.
>>>
>>>               (3) The  length  of the range of user IDs that is mapped
>>>                   between the two user namespaces.
>>>
>>>               After the creation of a new user  namespace,  this  file
>>>               may be written to exactly once to specify the mapping of
>>>               user IDs in the new  user  namespace.   (An  attempt  to
>>>               write  more  than  once to the file fails with the error
>>>               EPERM.)
>>>
>>>               The lines written to uid_map must conform to the follow‐
>>>               ing rules:
>>>
>>>               *  The  three fields must be valid numbers, and the last
>>>                  field must be greater than 0.
>>>
>>>               *  Lines are terminated by newline characters.
>>>
>>>               *  The file can contain a maximum of five lines.
>>
>> A maximum of 5 lines is important to Document but it is a current
>> arbitrary limit that may be changed in the future.  Right now 5 extents
>> are more than enough for any conceivable use case, and fit nicely within
>> a single cache line.
>>
>> It is probably better to say writes that exceed an arbitrary maximum
>> length fail with -EINVAL.  Currently the arbitrary maximum length is
>> five lines.
>
> Okay -- reworded.
>
>>
>>>               *  The values in both field 1 and field 2 of  each  line
>>>                  must be in ascending numerical order.
>>
>> The rule is that the extents need to be non-overlapping.  Ascending
>> numerical order is how that is implemented but that is a misfeature,
>> and there has already been one request to fix that.  Removing the
>> ascending numerical order limitation is on my todo list.
>
> Okay -- I've reworded some text here.

Thank you very much for your time and patience in getting a good
description of the user namespace.

Eric

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] clone.2: Describe the user namespace
       [not found]             ` <87sj6rs7zc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-01  9:29               ` Michael Kerrisk (man-pages)
       [not found]                 ` <CAKgNAkgRQXn0-x6CXxvW94eeG19dOAOEx78iNC0+w08uX+Sg1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-01  9:29 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Linux Containers

Hi Eric,

On Thu, Dec 27, 2012 at 6:47 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> There is one other bit that needs to be documented in clone, although
> I am not certain where/how.
>
> The sequences:
>
> unshare(CLONE_NEWPID).
> clone(CLONE_VM)
>
> setns(fd, CLONE_NEWPID).
> clone(CLONE_VM).
>
> Now fail.

Can you define "now" please. Which kernel version?

> Basically the rule is all threads must be in the same pid namespace.
>
> The joy of reviews with good comments that come much later than hoped.

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] clone.2: Describe the user namespace
       [not found]             ` <87r4mbv2c9.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-01  9:30               ` Michael Kerrisk (man-pages)
       [not found]                 ` <CAKgNAkgPET9jex1DO=1Z3HRQqO_WVD8qmG-UaH1DQB6wDGqO5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-01  9:30 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Serge E. Hallyn, Linux Containers

Hi Eric,

On Thu, Dec 27, 2012 at 6:20 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Hi Eric,
>>
>> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>>> ---
>>>  man2/clone.2 |   39 +++++++++++++++++++++++++++++++++++++++
>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/man2/clone.2 b/man2/clone.2
>>> index 0582057..4566677 100644
>>> --- a/man2/clone.2
>>> +++ b/man2/clone.2
>>> @@ -366,6 +366,45 @@ in the same
>>>  .BR clone ()
>>>  call.
>>>  .TP
>>> +.BR CLONE_NEWUSER " (since Linux 3.6)"
>>
>> Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained
>> some meaning in 2.6.29.
>
> Looking at it where I have said 3.6 that is wrong.  I meant 3.5.

Okay.

> I think I made the same mistake in one or two other manpages.  Nothing
> was merged in 3.6 unfortunately.

I think the other cases have been fixed by now.

> My intent was these are the semantics of user namespaces since 3.5,
> when my rework/refocusing of them was merged.
>
> Since 3.5 all that has really happened with user namespaces is the
> uid/gid to kuid/kgid conversion, permission checks have been relaxed,
> and a few bugs have been fixed.
>
> 3.8 is huge from a usability standpoint.  3.8 is huge because setns(),
> and unshare() are now complete from a namespace perspective, and because
> enough permission checks have been relaxed in user namespaces that you
> can really start using them.
>
> But semantically from a user namespace perspective nothing really has
> changed in 3.8.
>
[...]

>> I reworked your text somewhat. Could you please review the following:
>>
>> [[
>>        CLONE_NEWUSER
>>               (This  flag first became meaningful for clone() in Linux
>>               2.6.29, but the implementation of  user  namespaces  was
>>               only  completed in Linux 3.8.)
>
> Long rant about 2.6.29 vs 3.8 above.  I think what we need to say is:
>
>                 (This  flag first became meaningful for clone() in Linux
>                 2.6.29, the current semantics were merged present in
>                 3.5, and user namespaces only really became usable in 3.8.)

Yup. I've done something like that now.

>>                                               If CLONE_NEWUSER is set,
>>               then create the process in a  new  user  namespace.   If
>>               this flag is not set, then (as with fork(2)) the process
>>               is created in the same user  namespace  as  the  calling
>>               process.
>>
>>               A  user  namespace  provides an isolated environment for
>>               security related identifiers, in particular,  user  IDs,
>>               group IDs, keys (see keyctl(2)), and capabilities.
>>
>>               When  a user namespace is created, it starts out without
>>               a mapping of user IDs (group IDs)  to  the  parent  user
>>               namespace.   The desired mapping of user IDs (group IDs)
>>               to the parent user namespace may be set by writing  into
>>               /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
>
>                 /proc/[pid]/projid_map deserves a mention.  Not that
>                 I am a fan of project is or that xfs where the are
>                 implemented has been converted yet but....

Would you be able to send a patch documenting this in proc(5)?

>>               The  first process in a user namespace starts out with a
>>               complete set of capabilities with  respect  to  the  new
>>               user namespace.
>>
>>               System  calls  that  return  user  IDs  (group IDs) will
>>               return either the user ID (group  ID)  mapped  into  the
>>               current  user  namespace  if  there is a mapping, or the
>>               overflow user ID (group ID); the default value  for  the
>>               overflow  user ID (group ID) is 65534.  See the descrip‐
>>               tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
>>               nel/overflowgid in proc(5).
>>
>>               Starting  with  Linux  3.8,  no privileges are needed to
>>               create a user namespace, and mount, PID, IPC,  net,  and
>>               UTS   namespaces   can   be   created   with   just  the
>>               CAP_SYS_ADMIN capability in the caller's user namespace.
>>
>>               Over the years, there have been a lot of  features  that
>>               have been added to the Linux kernel that are only avail‐
>>               able to privileged users because of their  potential  to
>>               confuse  set-user-ID-root  applications.  In general, it
>>               becomes safe to allow the root user in a user  namespace
>>               to use those features because it is impossible, while in
>>               a user namespace, to gain more privilege than  the  root
>>               user of a user namespace has.
>
> I don't have any problems with this bit of text.
>
> It occurs to me that what is going on with capabilities and user
> namespaces needs to be documented better.  There was a minor bug with
> them this release cycle and I realized while the current definition
> makes sense and isn't hard to understand in general.  In detail the
> interaction of capabilities and user namespaces are hard to describe.
>
> I think capabilities and user namespaces are the work of a future patch
> however.

Okay. So, below, a new iteration of the text. Could you please check
it over, and note any errors to be fixed or improvements to be made.

Thanks,

Michael

       CLONE_NEWUSER
              (This  flag first became meaningful for clone() in Linux
              2.6.23, the current clone()  semantics  were  merged  in
              Linux  3.5, and the final pieces to make the user names‐
              paces completely usable were merged in Linux 3.8.)

              If CLONE_NEWUSER is set, then create the  process  in  a
              new  user  namespace.  If this flag is not set, then (as
              with fork(2)) the process is created in  the  same  user
              namespace as the calling process.

              A  user  namespace  provides an isolated environment for
              security related identifiers, in particular,  user  IDs,
              group IDs, keys (see keyctl(2)), and capabilities.

              When  a user namespace is created, it starts out without
              a mapping of user IDs (group IDs)  to  the  parent  user
              namespace.   The desired mapping of user IDs (group IDs)
              to the parent user namespace may be set by writing  into
              /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).

              The  first process in a user namespace starts out with a
              complete set of capabilities with  respect  to  the  new
              user namespace.

              System  calls  that  return  user  IDs  (group IDs) will
              return either the user ID (group  ID)  mapped  into  the
              current  user  namespace  if  there is a mapping, or the
              overflow user ID (group ID); the default value  for  the
              overflow  user ID (group ID) is 65534.  See the descrip‐
              tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
              nel/overflowgid in proc(5).

              Use  of  this flag requires a kernel configured with the
              CONFIG_USER_NS  option.   Before  Linux  3.8,   use   of
              CLONE_NEWUSER  required that the caller have three capa‐
              bilities:  CAP_SYS_ADMIN,  CAP_SETUID,  and  CAP_SETGID.
              Starting  with  Linux  3.8,  no privileges are needed to
              create a user namespace, and mount, PID, IPC,  net,  and
              UTS   namespaces   can   be   created   with   just  the
              CAP_SYS_ADMIN capability in the caller's user namespace.

              Over the years, there have been a lot of  features  that
              have been added to the Linux kernel that are only avail‐
              able to privileged users because of their  potential  to
              confuse  set-user-ID-root  applications.  In general, it
              becomes safe to allow the root user in a user  namespace
              to use those features because it is impossible, while in
              a user namespace, to gain more privilege than  the  root
              user of a user namespace has.


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
       [not found]             ` <87bodftmv0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-01  9:30               ` Michael Kerrisk (man-pages)
       [not found]                 ` <CAKgNAkjJR02rKOBh98n7HJwXqAwywHY=Ef35t9tW7wOuyo86NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-01  9:30 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Linux Containers

Hi Eric,

On Thu, Dec 27, 2012 at 6:40 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
>> Hi Eric,
>>
>> Some questions below.
>
> A quick note.  Getting the permission checks correct has been a little
> more interesting that I would have preferred.
>
> I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns()
> install methods except the user namespace.  Not a change in pre 3.8
> behavior but a change to my patch, and possibly a documentation change
> below.
>
>> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>>  man2/setns.2 |   41 +++++++++++++++++++++++++++++++++--------
>>>  1 files changed, 33 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/man2/setns.2 b/man2/setns.2
>>> index 6aa01e1..63b04dc 100644
>>> --- a/man2/setns.2
>>> +++ b/man2/setns.2
>>> @@ -48,6 +48,18 @@ must refer to a network namespace.
>>>  .BR CLONE_NEWUTS
>>>  .I fd
>>>  must refer to a UTS namespace.
>>> +.TP
>>> +.BR CLONE_NEWPID
>>> +.I fd
>>> +must refer to a PID namespace.
>>> +.TP
>>> +.BR CLONE_NEWUSER
>>> +.I fd
>>> +must refer to a user namespace.
>>> +.TP
>>> +.BR CLONE_NEWNS
>>> +.I fd
>>> +must refer to a mount namespace.
>>>  .PP
>>>  Specifying
>>>  .I nstype
>>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
>>>  .IR fd
>>>  if the file descriptor was opened by another process and, for example,
>>>  passed to the caller via a UNIX domain socket.)
>>> +
>>> +The pid namespace is a little different.  Reassociating the calling
>>> +thread with a pid namespace only changes the pid namespace that the
>>> +child processes will be created in.
>>> +
>>> +Changing the pid namespace for child processes is only allowed if the
>>> +pid namespace specified by
>>> +.IR fd
>>> +is a child pid namespace of the pid namespace of the current thread.
>>
>> I assume "current thread" above should be "calling thread", right?
>
> What I mean in "current" from a kernel perspective.
>
> It should be just "caller".

Okay. Changed.

> Threads must share a pid namespace so mentioning threads seems wrong.
>
>>> +
>>> +A multi-threaded process may not change user namespace with setns.  A
>>> +process may not reassociate the thread with the current user
>>> +namespace.
>>
>> What do you mean by "the current user nsamesapce"?
>
> fd = open("/proc/self/ns/user");
> setns(fd) -> -EINVAL.
>
> So from a userspace perspective I mean "the callers user namespace".
>
>>> The process reassociating itself with a user namespace
>>> +must have CAP_SYS_ADMIN privileges in the target user namespace.
>>>
>>> +A process may not be reassociated with a new mount namespace if it is
>>> +multi-threaded
>>
>> I tried to verify the precdeing two lines from the kernel source, but
>> did not work out where this check is made. Where is it?
>
> kernel/user_namespace.c:userns_install()
> fs/namespace.c:mntns_install()

Thanks.

> A couple of the security checks have been pushed down into a per
> namespace context, because the exact check that makes sense depends on
> the namespace.
>
>>> or it does not possess both CAP_SYS_CHROOT privileges
>>> +and CAP_SYS_ADMIN rights over the target mount namespace.
>>
>> Could you please expand/clarify the preceding two lines. As they
>> stand, I don't really understand them.
>
> Ugh.  The text is slightly wrong.
>
> The code is:
>         if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
>             !nsown_capable(CAP_SYS_CHROOT) ||
>             !nsown_capable(CAP_SYS_ADMIN))
>                 return -EPERM;
>
> Basically you aren't allowed change your mount namespace into
> a mount namespace that doesn't see you as the all powerful root
> able to mount and unmount filesystems.
>
> You aren't allowed to change your mount namespace unless you possesses
> CAP_SYS_CHROOT and CAP_SYS_ADMIN.

Okay -- reworded.

So, I've done some more reworking of the text, which now reads as
folows. Could you please check this (and see my questions below).

       CLONE_NEWPID  behaves somewhat differently from the other
       nstype values: reassociating the calling  thread  with  a
       PID  namespace  only changes the PID namespace that child
       processes of the caller will be created in; it  does  not
       change the PID namespace of the caller itself.

I reworked the preceding piece a lot. Is it correct still?

       Reassoci‐
       ating with a PID namespace is only  allowed  if  the  PID
       namespace  specified by fd is a descendant (child, grand‐
       child, etc.)

Is the preceding sentence correct? (You talked only of children in
your original patch, but I believe it's more general than that.)

       PID namespace of the PID namespace  of  the
       caller.

       A  multi-threaded  process  may not change user namespace
       with setns().  A process may not reassociate  the  thread
       with  the caller's user namespace.

What does the last sentence above *mean*? I don't understand it.

       A process reassociat‐
       ing itself with a user namespace must have  CAP_SYS_ADMIN
       privileges in the target user namespace.

       A process may not be reassociated with a new mount names‐
       pace if it is multi-threaded.  Changing the mount  names‐
       pace requires that the caller possess both CAP_SYS_CHROOT
       and CAP_SYS_ADMIN capabilities.

Re the last sentence: are those capabilities required in (1) the
target namespace, or (2) the source namespace, or (3) both? I suspect
(1), but please confirm.

Thanks,

Michael



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
       [not found]                     ` <87vcbldgbj.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-01  9:37                       ` Michael Kerrisk (man-pages)
       [not found]                         ` <CAKgNAkjf=KS5FnP0L-TPTCjQuTDAMs-N4cadAP89L4Mb3KubzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-01  9:37 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Linux Containers

Hi Eric,

On Fri, Dec 28, 2012 at 10:20 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

[...]

>>> For writing you are correct about the mapping to the parent (but that is
>>> not an exception that is a restriction on who can write to the file).
>>
>> So, by the way, I added this sentence to the page:
>>
>>               In   order   to   write   to   the   /proc/[pid]/uid_map
>>               (/proc/[pid]/gid_map) file,  a  process  must  have  the
>>               CAP_SETUID (CAP_SETGID) capability in the user namespace
>>               of the process pid.
>>
>> Is that correct?
>
> Yes.
>
>> But, there appear to be more rules than this governing whether a
>> process can write to the file (i.e., various other -EPERM cases). What
>> are the rules?
>
> In general you must also have CAP_SETUID (CAP_SETGID) in the parent user
> namespace as well.  The one exception to that is if you are mapping
> your current uid and gid.

Can you clarify what you mean by "mapping your own UID and GID" please
(i.e., who is "you" in that sentence).

> A rose by any other name will smell as
> sweet.  In practice this means you must be root to map to uid or gids
> other than your own, which preserves the current limits on setuid and
> setgid.
>
> Additionally the writer must see the map file with the lower user
> namespace being the parent user namespace.  Which means you must be
> inside the user namespace itself or in the parent user namespace to
> write to the user namespaces mapping file.

Okay -- I added some words on this point.

> For /proc/[pid]/projid_map which will be interesting once xfs
> has kuid/kgid support there are no capability checks because xfs let's
> anyone have any projid.
>
> This is one of the few cases where it almost matters to understand
> how ns_capable works when you are not in the user namespace in question,
> and that goes to what is a parent user namespace.  If you would like
> some more detail on that please ask.
>
>>> The complete rule is for the user namespace of the second value is:
>>>
>>> - If the user namespace of the opener of the file and the user namespace
>>>   of the process do not match.  The user namespace of the opener of the
>>>   file is used.
>>>
>>> - If the user namespace of the opener of the file and the user namespace
>>>   of the process are the same.  The parent user namespace of the process
>>>   is used for the second value.
>>
>> Could you give an example of the last case? (What I'm really seeking,
>> I think, is clarification of "parent user namespace". Does that mean
>> "user namespace of the process that created the user namespace of this
>> process"?)
>
> User namespaces form a tree.  What you can do in one user namespace is a
> subset of what you can do in the parent user namespace.
>
> The parent user namespace is the user namespace of the process that
> calls unshare or clone with CLONE_NEWUSER.

Thanks.

> The last case is the common case of /proc/self/uid_map.  And you see how
> your uids map into the user namespace of the creator of your user
> namespace.

Okay -- got it now.

> With the default being just:         0          0 4294967295

Right.

>>> While very wordy I think the rule makes a lot of intuitive and practical
>>> sense.  Especially since it is non-trivial to come up with the chain of
>>> user namespaces a process is in.

Yes, I see what you mean.

[...]

> Thank you very much for your time and patience in getting a good
> description of the user namespace.

Well, we're not done yet, but we're getting there. Below, I've pasted
the current text from proc(5). Could you please take a look, and let
me know of any errors or improvements.

Cheers,

Michael

       /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.5)
              These  files  expose the mappings for user and group IDs
              inside the user namespace  for  the  process  pid.   The
              description  here  explains  the  details  for  uid_map;
              gid_map is exactly the same, but each instance of  "user
              ID" is replaced by "group ID".

              The  uid_map  file  exposes the mapping of user IDs from
              the user namespace of the process pid to the user names‐
              pace of the process that opened uid_map (but see a qual‐
              ification to this point below).  In  other  words,  pro‐
              cesses that are in different user namespaces will poten‐
              tially see different values when reading from a particu‐
              lar  uid_map file, depending on the user ID mappings for
              the user namespaces of the reading processes.

              Each line in the file specifies a 1-to-1  mapping  of  a
              range  of  contiguous  between two user namespaces.  The
              specification in each line takes the form of three  num‐
              bers  delimited  by  white space.  The first two numbers
              specify the starting user ID  in  each  user  namespace.
              The  third  number  specifies  the  length of the mapped
              range.  In detail, the fields are  interpreted  as  fol‐
              lows:

              (1) The  start  of  the  range  of  user IDs in the user
                  namespace of the process pid.

              (2) The start of the range of user IDs to which the user
                  IDs  specified  by  field one map.  How field two is
                  interpreted depends  on  whether  the  process  that
                  opened  uid_map  and the process pid are in the same
                  user namespace, as follows:

                  a) If the two processes are in different user names‐
                     paces:  field two is the start of a range of user
                     IDs in the user namespace  of  the  process  that
                     opened uid_map.

                  b) If  the two processes are in the same user names‐
                     pace: field two is the start of the range of user
                     IDs  in  the parent user namespace of the process
                     pid.  (The "parent user namespace"  is  the  user
                     namespace  of  the  process  that  created a user
                     namespace via a call to  unshare(2)  or  clone(2)
                     with  the CLONE_NEWUSER flag.)  This case enables
                     the opener of uid_map (the common  case  here  is
                     opening /proc/self/uid_map) to see the mapping of
                     user IDs into the user namespace of  the  process
                     that created this user namespace.

              (3) The  length  of the range of user IDs that is mapped
                  between the two user namespaces.

              After the creation of a new user namespace, the  uid_map
              file  may be written to exactly once to specify the map‐
              ping of user IDs in the new user namespace.  (An attempt
              to write more than once to the file fails with the error
              EPERM.)

              The lines written to uid_map must conform to the follow‐
              ing rules:

              *  The  three fields must be valid numbers, and the last
                 field must be greater than 0.

              *  Lines are terminated by newline characters.

              *  There is an (arbitrary) limit on the number of  lines
                 in  the  file.   As  at  Linux 3.8, the limit is five
                 lines.

              *  The range of user IDs specified in each  line  cannot
                 overlap  with  the ranges in any other lines.  In the
                 current implementation (Linux 3.8), this  requirement
                 is  satisified  by  a  simplistic implementation that
                 imposes the further requirement that  the  values  in
                 both  field 1 and field 2 of successive lines must be
                 in ascending numerical order.

              Writes that violate the above rules fail with the  error
              EINVAL.

              In    order    for   a   process   to   write   to   the
              /proc/[pid]/uid_map (/proc/[pid]/gid_map) file, the fol‐
              lowing requirements must be met:

              *  The  process  must  have  the CAP_SETUID (CAP_SETGID)
                 capability in the user namespace of the process pid.

              *  The process must  have  the  CAP_SETUID  (CAP_SETGID)
                 capability in the parent user namespace.

              *  The  process  must be in either the user namespace of
                 the process pid or inside the parent  user  namespace
                 of the process pid.

==== end ====

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] clone.2: Describe the user namespace
       [not found]                 ` <CAKgNAkgRQXn0-x6CXxvW94eeG19dOAOEx78iNC0+w08uX+Sg1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-01  9:39                   ` Eric W. Biederman
       [not found]                     ` <87a9st5jj4.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-01  9:39 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Linux API, Serge E. Hallyn, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hi Eric,
>
> On Thu, Dec 27, 2012 at 6:47 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>
>> There is one other bit that needs to be documented in clone, although
>> I am not certain where/how.
>>
>> The sequences:
>>
>> unshare(CLONE_NEWPID).
>> clone(CLONE_VM)
>>
>> setns(fd, CLONE_NEWPID).
>> clone(CLONE_VM).
>>
>> Now fail.
>
> Can you define "now" please. Which kernel version?

3.8

The sequence was impossible in 3.7.

I think that change that made that impossible happened in the 3.8-rc1 to
3.8-rc2 window.

Eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] clone.2: Describe the user namespace
       [not found]                 ` <CAKgNAkgPET9jex1DO=1Z3HRQqO_WVD8qmG-UaH1DQB6wDGqO5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-01  9:45                   ` Eric W. Biederman
  0 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-01  9:45 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> On Thu, Dec 27, 2012 at 6:20 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>>
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> ---
>>>>  man2/clone.2 |   39 +++++++++++++++++++++++++++++++++++++++
>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/man2/clone.2 b/man2/clone.2
>>>> index 0582057..4566677 100644
>>>> --- a/man2/clone.2
>>>> +++ b/man2/clone.2
>>>> @@ -366,6 +366,45 @@ in the same
>>>>  .BR clone ()
>>>>  call.
>>>>  .TP
>>>> +.BR CLONE_NEWUSER " (since Linux 3.6)"
>>>
>>> Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained
>>> some meaning in 2.6.29.
>>
>> Looking at it where I have said 3.6 that is wrong.  I meant 3.5.
>
> Okay.
>
>> I think I made the same mistake in one or two other manpages.  Nothing
>> was merged in 3.6 unfortunately.
>
> I think the other cases have been fixed by now.
>
>> My intent was these are the semantics of user namespaces since 3.5,
>> when my rework/refocusing of them was merged.
>>
>> Since 3.5 all that has really happened with user namespaces is the
>> uid/gid to kuid/kgid conversion, permission checks have been relaxed,
>> and a few bugs have been fixed.
>>
>> 3.8 is huge from a usability standpoint.  3.8 is huge because setns(),
>> and unshare() are now complete from a namespace perspective, and because
>> enough permission checks have been relaxed in user namespaces that you
>> can really start using them.
>>
>> But semantically from a user namespace perspective nothing really has
>> changed in 3.8.
>>
> [...]
>
>>> I reworked your text somewhat. Could you please review the following:
>>>
>>> [[
>>>        CLONE_NEWUSER
>>>               (This  flag first became meaningful for clone() in Linux
>>>               2.6.29, but the implementation of  user  namespaces  was
>>>               only  completed in Linux 3.8.)
>>
>> Long rant about 2.6.29 vs 3.8 above.  I think what we need to say is:
>>
>>                 (This  flag first became meaningful for clone() in Linux
>>                 2.6.29, the current semantics were merged present in
>>                 3.5, and user namespaces only really became usable in 3.8.)
>
> Yup. I've done something like that now.
>
>>>                                               If CLONE_NEWUSER is set,
>>>               then create the process in a  new  user  namespace.   If
>>>               this flag is not set, then (as with fork(2)) the process
>>>               is created in the same user  namespace  as  the  calling
>>>               process.
>>>
>>>               A  user  namespace  provides an isolated environment for
>>>               security related identifiers, in particular,  user  IDs,
>>>               group IDs, keys (see keyctl(2)), and capabilities.
>>>
>>>               When  a user namespace is created, it starts out without
>>>               a mapping of user IDs (group IDs)  to  the  parent  user
>>>               namespace.   The desired mapping of user IDs (group IDs)
>>>               to the parent user namespace may be set by writing  into
>>>               /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
>>
>>                 /proc/[pid]/projid_map deserves a mention.  Not that
>>                 I am a fan of project is or that xfs where the are
>>                 implemented has been converted yet but....
>
> Would you be able to send a patch documenting this in proc(5)?

Sure.  I don't know why I didn't mention projid in my earlier patch.
Same story fewer permission checks.  Silly me.

>>>               The  first process in a user namespace starts out with a
>>>               complete set of capabilities with  respect  to  the  new
>>>               user namespace.
>>>
>>>               System  calls  that  return  user  IDs  (group IDs) will
>>>               return either the user ID (group  ID)  mapped  into  the
>>>               current  user  namespace  if  there is a mapping, or the
>>>               overflow user ID (group ID); the default value  for  the
>>>               overflow  user ID (group ID) is 65534.  See the descrip‐
>>>               tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
>>>               nel/overflowgid in proc(5).
>>>
>>>               Starting  with  Linux  3.8,  no privileges are needed to
>>>               create a user namespace, and mount, PID, IPC,  net,  and
>>>               UTS   namespaces   can   be   created   with   just  the
>>>               CAP_SYS_ADMIN capability in the caller's user namespace.
>>>
>>>               Over the years, there have been a lot of  features  that
>>>               have been added to the Linux kernel that are only avail‐
>>>               able to privileged users because of their  potential  to
>>>               confuse  set-user-ID-root  applications.  In general, it
>>>               becomes safe to allow the root user in a user  namespace
>>>               to use those features because it is impossible, while in
>>>               a user namespace, to gain more privilege than  the  root
>>>               user of a user namespace has.
>>
>> I don't have any problems with this bit of text.
>>
>> It occurs to me that what is going on with capabilities and user
>> namespaces needs to be documented better.  There was a minor bug with
>> them this release cycle and I realized while the current definition
>> makes sense and isn't hard to understand in general.  In detail the
>> interaction of capabilities and user namespaces are hard to describe.
>>
>> I think capabilities and user namespaces are the work of a future patch
>> however.
>
> Okay. So, below, a new iteration of the text. Could you please check
> it over, and note any errors to be fixed or improvements to be made.
>
> Thanks,
>
> Michael
>
>        CLONE_NEWUSER
>               (This  flag first became meaningful for clone() in Linux
>               2.6.23, the current clone()  semantics  were  merged  in
>               Linux  3.5, and the final pieces to make the user names‐
>               paces completely usable were merged in Linux 3.8.)
>
>               If CLONE_NEWUSER is set, then create the  process  in  a
>               new  user  namespace.  If this flag is not set, then (as
>               with fork(2)) the process is created in  the  same  user
>               namespace as the calling process.
>
>               A  user  namespace  provides an isolated environment for
>               security related identifiers, in particular,  user  IDs,
>               group IDs, keys (see keyctl(2)), and capabilities.
>
>               When  a user namespace is created, it starts out without
>               a mapping of user IDs (group IDs)  to  the  parent  user
>               namespace.   The desired mapping of user IDs (group IDs)
>               to the parent user namespace may be set by writing  into
>               /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
>
>               The  first process in a user namespace starts out with a
>               complete set of capabilities with  respect  to  the  new
>               user namespace.
>
>               System  calls  that  return  user  IDs  (group IDs) will
>               return either the user ID (group  ID)  mapped  into  the
>               current  user  namespace  if  there is a mapping, or the
>               overflow user ID (group ID); the default value  for  the
>               overflow  user ID (group ID) is 65534.  See the descrip‐
>               tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
>               nel/overflowgid in proc(5).
>
>               Use  of  this flag requires a kernel configured with the
>               CONFIG_USER_NS  option.   Before  Linux  3.8,   use   of
>               CLONE_NEWUSER  required that the caller have three capa‐
>               bilities:  CAP_SYS_ADMIN,  CAP_SETUID,  and  CAP_SETGID.
>               Starting  with  Linux  3.8,  no privileges are needed to
>               create a user namespace, and mount, PID, IPC,  net,  and
>               UTS   namespaces   can   be   created   with   just  the
>               CAP_SYS_ADMIN capability in the caller's user namespace.
>
>               Over the years, there have been a lot of  features  that
>               have been added to the Linux kernel that are only avail‐
>               able to privileged users because of their  potential  to
>               confuse  set-user-ID-root  applications.  In general, it
>               becomes safe to allow the root user in a user  namespace
>               to use those features because it is impossible, while in
>               a user namespace, to gain more privilege than  the  root
>               user of a user namespace has.


I don't see anything wrong with that text.

Happy New Year.

Eric

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
       [not found]                 ` <CAKgNAkjJR02rKOBh98n7HJwXqAwywHY=Ef35t9tW7wOuyo86NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-01  9:58                   ` Eric W. Biederman
       [not found]                     ` <87mwwt2pj8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-01  9:58 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> On Thu, Dec 27, 2012 at 6:40 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> Some questions below.
>>
>> A quick note.  Getting the permission checks correct has been a little
>> more interesting that I would have preferred.
>>
>> I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns()
>> install methods except the user namespace.  Not a change in pre 3.8
>> behavior but a change to my patch, and possibly a documentation change
>> below.
>>
>>> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>>
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> ---
>>>>  man2/setns.2 |   41 +++++++++++++++++++++++++++++++++--------
>>>>  1 files changed, 33 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/man2/setns.2 b/man2/setns.2
>>>> index 6aa01e1..63b04dc 100644
>>>> --- a/man2/setns.2
>>>> +++ b/man2/setns.2
>>>> @@ -48,6 +48,18 @@ must refer to a network namespace.
>>>>  .BR CLONE_NEWUTS
>>>>  .I fd
>>>>  must refer to a UTS namespace.
>>>> +.TP
>>>> +.BR CLONE_NEWPID
>>>> +.I fd
>>>> +must refer to a PID namespace.
>>>> +.TP
>>>> +.BR CLONE_NEWUSER
>>>> +.I fd
>>>> +must refer to a user namespace.
>>>> +.TP
>>>> +.BR CLONE_NEWNS
>>>> +.I fd
>>>> +must refer to a mount namespace.
>>>>  .PP
>>>>  Specifying
>>>>  .I nstype
>>>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
>>>>  .IR fd
>>>>  if the file descriptor was opened by another process and, for example,
>>>>  passed to the caller via a UNIX domain socket.)
>>>> +
>>>> +The pid namespace is a little different.  Reassociating the calling
>>>> +thread with a pid namespace only changes the pid namespace that the
>>>> +child processes will be created in.
>>>> +
>>>> +Changing the pid namespace for child processes is only allowed if the
>>>> +pid namespace specified by
>>>> +.IR fd
>>>> +is a child pid namespace of the pid namespace of the current thread.
>>>
>>> I assume "current thread" above should be "calling thread", right?
>>
>> What I mean in "current" from a kernel perspective.
>>
>> It should be just "caller".
>
> Okay. Changed.
>
>> Threads must share a pid namespace so mentioning threads seems wrong.
>>
>>>> +
>>>> +A multi-threaded process may not change user namespace with setns.  A
>>>> +process may not reassociate the thread with the current user
>>>> +namespace.
>>>
>>> What do you mean by "the current user nsamesapce"?
>>
>> fd = open("/proc/self/ns/user");
>> setns(fd) -> -EINVAL.
>>
>> So from a userspace perspective I mean "the callers user namespace".
>>
>>>> The process reassociating itself with a user namespace
>>>> +must have CAP_SYS_ADMIN privileges in the target user namespace.
>>>>
>>>> +A process may not be reassociated with a new mount namespace if it is
>>>> +multi-threaded
>>>
>>> I tried to verify the precdeing two lines from the kernel source, but
>>> did not work out where this check is made. Where is it?
>>
>> kernel/user_namespace.c:userns_install()
>> fs/namespace.c:mntns_install()
>
> Thanks.
>
>> A couple of the security checks have been pushed down into a per
>> namespace context, because the exact check that makes sense depends on
>> the namespace.
>>
>>>> or it does not possess both CAP_SYS_CHROOT privileges
>>>> +and CAP_SYS_ADMIN rights over the target mount namespace.
>>>
>>> Could you please expand/clarify the preceding two lines. As they
>>> stand, I don't really understand them.
>>
>> Ugh.  The text is slightly wrong.
>>
>> The code is:
>>         if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
>>             !nsown_capable(CAP_SYS_CHROOT) ||
>>             !nsown_capable(CAP_SYS_ADMIN))
>>                 return -EPERM;
>>
>> Basically you aren't allowed change your mount namespace into
>> a mount namespace that doesn't see you as the all powerful root
>> able to mount and unmount filesystems.
>>
>> You aren't allowed to change your mount namespace unless you possesses
>> CAP_SYS_CHROOT and CAP_SYS_ADMIN.
>
> Okay -- reworded.
>
> So, I've done some more reworking of the text, which now reads as
> folows. Could you please check this (and see my questions below).
>
>        CLONE_NEWPID  behaves somewhat differently from the other
>        nstype values: reassociating the calling  thread  with  a
>        PID  namespace  only changes the PID namespace that child
>        processes of the caller will be created in; it  does  not
>        change the PID namespace of the caller itself.


> I reworked the preceding piece a lot. Is it correct still?
>
>        Reassoci‐
>        ating with a PID namespace is only  allowed  if  the  PID
>        namespace  specified by fd is a descendant (child, grand‐
>        child, etc.)
>
> Is the preceding sentence correct? (You talked only of children in
> your original patch, but I believe it's more general than that.)

Yes.  That is correct.

>        PID namespace of the PID namespace  of  the
>        caller.
>
>        A  multi-threaded  process  may not change user namespace
>        with setns().  A process may not reassociate  the  thread
>        with  the caller's user namespace.
>
> What does the last sentence above *mean*? I don't understand it.

So the set of checks are:

	/* Don't allow gaining capabilities by reentering
	 * the same user namespace.
	 */
	if (user_ns == current_user_ns())
		return -EINVAL;

	/* Threaded processes may not enter a different user namespace */
	if (atomic_read(&current->mm->mm_users) > 1)
		return -EINVAL;

	if (!ns_capable(user_ns, CAP_SYS_ADMIN))
		return -EPERM;

Rereading it looks like I was going fast and suffered from dropping
important words.

A  multi-threaded  process  may not change it's user namespace
with setns().

aka if you have threads setns for a user namespace will fail.


    A process may not change the user namespace to the caller's user
    namespace via setns.  This is important because changing to a
    user namespace via setns implies gaining all caps, and you should
    not be able to gain all caps over your current user namespace.

Hopefully that clears it up.

>        A process reassociat‐
>        ing itself with a user namespace must have  CAP_SYS_ADMIN
>        privileges in the target user namespace.
>
>        A process may not be reassociated with a new mount names‐
>        pace if it is multi-threaded.  Changing the mount  names‐
>        pace requires that the caller possess both CAP_SYS_CHROOT
>        and CAP_SYS_ADMIN capabilities.
>
> Re the last sentence: are those capabilities required in (1) the
> target namespace, or (2) the source namespace, or (3) both? I suspect
> (1), but please confirm.

CAP_SYS_ADMIN is required in the current user namespace.
CAP_SYS_ADMIN is required over the target mount namesapce.

CAP_SYS_CHROOT is required in the current user namespace.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
       [not found]                         ` <CAKgNAkjf=KS5FnP0L-TPTCjQuTDAMs-N4cadAP89L4Mb3KubzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-01 10:12                           ` Eric W. Biederman
       [not found]                             ` <87r4m51abp.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-01 10:12 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:

> Hi Eric,
>
> On Fri, Dec 28, 2012 at 10:20 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
> [...]
>
>>>> For writing you are correct about the mapping to the parent (but that is
>>>> not an exception that is a restriction on who can write to the file).
>>>
>>> So, by the way, I added this sentence to the page:
>>>
>>>               In   order   to   write   to   the   /proc/[pid]/uid_map
>>>               (/proc/[pid]/gid_map) file,  a  process  must  have  the
>>>               CAP_SETUID (CAP_SETGID) capability in the user namespace
>>>               of the process pid.
>>>
>>> Is that correct?
>>
>> Yes.
>>
>>> But, there appear to be more rules than this governing whether a
>>> process can write to the file (i.e., various other -EPERM cases). What
>>> are the rules?
>>
>> In general you must also have CAP_SETUID (CAP_SETGID) in the parent user
>> namespace as well.  The one exception to that is if you are mapping
>> your current uid and gid.
>
> Can you clarify what you mean by "mapping your own UID and GID" please
> (i.e., who is "you" in that sentence).

At the time of clone() or unshare() that creates a new user namespace,
the kuid and the kgid of the process does not change.

setuid and setgid fail before any mappings are set up.

Therefore the caller is allowed to map any single uid to the uid of the
caller in the parent user namespace.  Likewise the caller is allowed to
map any single gid to the gid of the caller in the parent user
namespace.

>> A rose by any other name will smell as
>> sweet.  In practice this means you must be root to map to uid or gids
>> other than your own, which preserves the current limits on setuid and
>> setgid.
>>
>> Additionally the writer must see the map file with the lower user
>> namespace being the parent user namespace.  Which means you must be
>> inside the user namespace itself or in the parent user namespace to
>> write to the user namespaces mapping file.
>
> Okay -- I added some words on this point.
>
>> For /proc/[pid]/projid_map which will be interesting once xfs
>> has kuid/kgid support there are no capability checks because xfs let's
>> anyone have any projid.
>>
>> This is one of the few cases where it almost matters to understand
>> how ns_capable works when you are not in the user namespace in question,
>> and that goes to what is a parent user namespace.  If you would like
>> some more detail on that please ask.
>>
>>>> The complete rule is for the user namespace of the second value is:
>>>>
>>>> - If the user namespace of the opener of the file and the user namespace
>>>>   of the process do not match.  The user namespace of the opener of the
>>>>   file is used.
>>>>
>>>> - If the user namespace of the opener of the file and the user namespace
>>>>   of the process are the same.  The parent user namespace of the process
>>>>   is used for the second value.
>>>
>>> Could you give an example of the last case? (What I'm really seeking,
>>> I think, is clarification of "parent user namespace". Does that mean
>>> "user namespace of the process that created the user namespace of this
>>> process"?)
>>
>> User namespaces form a tree.  What you can do in one user namespace is a
>> subset of what you can do in the parent user namespace.
>>
>> The parent user namespace is the user namespace of the process that
>> calls unshare or clone with CLONE_NEWUSER.
>
> Thanks.
>
>> The last case is the common case of /proc/self/uid_map.  And you see how
>> your uids map into the user namespace of the creator of your user
>> namespace.
>
> Okay -- got it now.
>
>> With the default being just:         0          0 4294967295
>
> Right.
>
>>>> While very wordy I think the rule makes a lot of intuitive and practical
>>>> sense.  Especially since it is non-trivial to come up with the chain of
>>>> user namespaces a process is in.
>
> Yes, I see what you mean.
>
> [...]
>
>> Thank you very much for your time and patience in getting a good
>> description of the user namespace.
>
> Well, we're not done yet, but we're getting there. Below, I've pasted
> the current text from proc(5). Could you please take a look, and let
> me know of any errors or improvements.
>
> Cheers,
>
> Michael
>
>        /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.5)
>               These  files  expose the mappings for user and group IDs
>               inside the user namespace  for  the  process  pid.   The
>               description  here  explains  the  details  for  uid_map;
>               gid_map is exactly the same, but each instance of  "user
>               ID" is replaced by "group ID".
>
>               The  uid_map  file  exposes the mapping of user IDs from
>               the user namespace of the process pid to the user names‐
>               pace of the process that opened uid_map (but see a qual‐
>               ification to this point below).  In  other  words,  pro‐
>               cesses that are in different user namespaces will poten‐
>               tially see different values when reading from a particu‐
>               lar  uid_map file, depending on the user ID mappings for
>               the user namespaces of the reading processes.
>
>               Each line in the file specifies a 1-to-1  mapping  of  a
>               range  of  contiguous  between two user namespaces.  The
>               specification in each line takes the form of three  num‐
>               bers  delimited  by  white space.  The first two numbers
>               specify the starting user ID  in  each  user  namespace.
>               The  third  number  specifies  the  length of the mapped
>               range.  In detail, the fields are  interpreted  as  fol‐
>               lows:
>
>               (1) The  start  of  the  range  of  user IDs in the user
>                   namespace of the process pid.
>
>               (2) The start of the range of user IDs to which the user
>                   IDs  specified  by  field one map.  How field two is
>                   interpreted depends  on  whether  the  process  that
>                   opened  uid_map  and the process pid are in the same
>                   user namespace, as follows:
>
>                   a) If the two processes are in different user names‐
>                      paces:  field two is the start of a range of user
>                      IDs in the user namespace  of  the  process  that
>                      opened uid_map.
>
>                   b) If  the two processes are in the same user names‐
>                      pace: field two is the start of the range of user
>                      IDs  in  the parent user namespace of the process
>                      pid.  (The "parent user namespace"  is  the  user
>                      namespace  of  the  process  that  created a user
>                      namespace via a call to  unshare(2)  or  clone(2)
>                      with  the CLONE_NEWUSER flag.)  This case enables
>                      the opener of uid_map (the common  case  here  is
>                      opening /proc/self/uid_map) to see the mapping of
>                      user IDs into the user namespace of  the  process
>                      that created this user namespace.
>
>               (3) The  length  of the range of user IDs that is mapped
>                   between the two user namespaces.
>
>               After the creation of a new user namespace, the  uid_map
>               file  may be written to exactly once to specify the map‐
>               ping of user IDs in the new user namespace.  (An attempt
>               to write more than once to the file fails with the error
>               EPERM.)
>
>               The lines written to uid_map must conform to the follow‐
>               ing rules:
>
>               *  The  three fields must be valid numbers, and the last
>                  field must be greater than 0.
>
>               *  Lines are terminated by newline characters.
>
>               *  There is an (arbitrary) limit on the number of  lines
>                  in  the  file.   As  at  Linux 3.8, the limit is five
>                  lines.
>
>               *  The range of user IDs specified in each  line  cannot
>                  overlap  with  the ranges in any other lines.  In the
>                  current implementation (Linux 3.8), this  requirement
>                  is  satisified  by  a  simplistic implementation that
>                  imposes the further requirement that  the  values  in
>                  both  field 1 and field 2 of successive lines must be
>                  in ascending numerical order.
>
>               Writes that violate the above rules fail with the  error
>               EINVAL.
>
>               In    order    for   a   process   to   write   to   the
>               /proc/[pid]/uid_map (/proc/[pid]/gid_map) file, the fol‐
>               lowing requirements must be met:
>
>               *  The  process  must  have  the CAP_SETUID (CAP_SETGID)
>                  capability in the user namespace of the process pid.
>
>               *  The process must  have  the  CAP_SETUID  (CAP_SETGID)
>                  capability in the parent user namespace.
>
>               *  The  process  must be in either the user namespace of
>                  the process pid or inside the parent  user  namespace
>                  of the process pid.

That sounds right.

In addition /proc/[pid]/projid_map was added in 3.7, and obeys the same
rules except that there are no capabilities required to set the mapping.

I suspect it is probably easier to add a quick mention of projid_map
instead of repeating all of the text bug I could be wrong.  In any event
I will leave off with projid_map until we get the uid_map and gid_map
ext solid.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] clone.2: Describe the user namespace
       [not found]                     ` <87a9st5jj4.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-07  8:33                       ` Michael Kerrisk (man-pages)
       [not found]                         ` <CAKgNAkggMKib5v4ND9UR1jH=CrK-viM5hhfmc0Rw=mP5GbenSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-07  8:33 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Serge E. Hallyn, Linux Containers

Hi Eric,

On Tue, Jan 1, 2013 at 10:39 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Hi Eric,
>>
>> On Thu, Dec 27, 2012 at 6:47 PM, Eric W. Biederman
>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>> There is one other bit that needs to be documented in clone, although
>>> I am not certain where/how.
>>>
>>> The sequences:
>>>
>>> unshare(CLONE_NEWPID).
>>> clone(CLONE_VM)
>>>
>>> setns(fd, CLONE_NEWPID).
>>> clone(CLONE_VM).
>>>
>>> Now fail.
>>
>> Can you define "now" please. Which kernel version?
>
> 3.8
>
> The sequence was impossible in 3.7.
>
> I think that change that made that impossible happened in the 3.8-rc1 to
> 3.8-rc2 window.

Adding something along these lines to the man page would be fine, but
we need some text to explain *why* these sequences fail. Could you
send me a sentence or two about that?

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/4] clone.2: Describe the user namespace
       [not found]                         ` <CAKgNAkggMKib5v4ND9UR1jH=CrK-viM5hhfmc0Rw=mP5GbenSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-07  8:59                           ` Eric W. Biederman
  0 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-07  8:59 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Linux API, Serge E. Hallyn, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Hi Eric,
>
> On Tue, Jan 1, 2013 at 10:39 AM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>>> Hi Eric,
>>>
>>> On Thu, Dec 27, 2012 at 6:47 PM, Eric W. Biederman
>>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>>
>>>> There is one other bit that needs to be documented in clone, although
>>>> I am not certain where/how.
>>>>
>>>> The sequences:
>>>>
>>>> unshare(CLONE_NEWPID).
>>>> clone(CLONE_VM)
>>>>
>>>> setns(fd, CLONE_NEWPID).
>>>> clone(CLONE_VM).
>>>>
>>>> Now fail.
>>>
>>> Can you define "now" please. Which kernel version?
>>
>> 3.8
>>
>> The sequence was impossible in 3.7.
>>
>> I think that change that made that impossible happened in the 3.8-rc1 to
                                       ^^^^^^^^^ illegal 3.8-rc1 made the sequence possible.
>> 3.8-rc2 window.
>
> Adding something along these lines to the man page would be fine, but
> we need some text to explain *why* these sequences fail. Could you
> send me a sentence or two about that?

The basic principle is every thread in a process must be in the same pid
namespace.   As unshare(CLONE_NEWPID) and setns(fd, CLONE_NEWPID) only
change the pid namespace for created children creating a child process
that is a thread would put that thread in a different pid namespace.

Creating a multithreaded application and then setns(fd, CLONE_NEWPID or
clone(CLONE_NEWPID) was outlawed because it was two bizarre and no one
cared.  Oleg noticed you could create the threads afterwards and get
into a bizarre state that no one wanted to support.

Eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
       [not found]                     ` <87mwwt2pj8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-07  9:51                       ` Michael Kerrisk (man-pages)
       [not found]                         ` <CAKgNAkggEOV0dXVzr4Zf3n_-it5SXfvjJ1ooYxiVNWaYzQgRLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-07  9:51 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Linux Containers

On Tue, Jan 1, 2013 at 10:58 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>

[...]

>>        PID namespace of the PID namespace  of  the
>>        caller.
>>
>>        A  multi-threaded  process  may not change user namespace
>>        with setns().  A process may not reassociate  the  thread
>>        with  the caller's user namespace.
>>
>> What does the last sentence above *mean*? I don't understand it.
>
> So the set of checks are:
>
>         /* Don't allow gaining capabilities by reentering
>          * the same user namespace.
>          */
>         if (user_ns == current_user_ns())
>                 return -EINVAL;
>
>         /* Threaded processes may not enter a different user namespace */
>         if (atomic_read(&current->mm->mm_users) > 1)
>                 return -EINVAL;
>
>         if (!ns_capable(user_ns, CAP_SYS_ADMIN))
>                 return -EPERM;
>
> Rereading it looks like I was going fast and suffered from dropping
> important words.
>
> A  multi-threaded  process  may not change it's user namespace
> with setns().
>
> aka if you have threads setns for a user namespace will fail.
>
>
>     A process may not change the user namespace to the caller's user
>     namespace via setns.  This is important because changing to a
>     user namespace via setns implies gaining all caps, and you should
>     not be able to gain all caps over your current user namespace.
>
> Hopefully that clears it up.

Well, I worded it rather differently, but I hope I got it right. See below.

>>        A process reassociat‐
>>        ing itself with a user namespace must have  CAP_SYS_ADMIN
>>        privileges in the target user namespace.
>>
>>        A process may not be reassociated with a new mount names‐
>>        pace if it is multi-threaded.  Changing the mount  names‐
>>        pace requires that the caller possess both CAP_SYS_CHROOT
>>        and CAP_SYS_ADMIN capabilities.
>>
>> Re the last sentence: are those capabilities required in (1) the
>> target namespace, or (2) the source namespace, or (3) both? I suspect
>> (1), but please confirm.
>
> CAP_SYS_ADMIN is required in the current user namespace.
> CAP_SYS_ADMIN is required over the target mount namesapce.
>
> CAP_SYS_CHROOT is required in the current user namespace.

Okay. See below.

So, let's take one more pass. How does the following look:

       A multi-threaded process may not  change  user  namespace  with
       setns().   It  is  not  permitted to use setns() to reenter the
       caller's current user namespace.  This prevents a  caller  that
       has  dropped capabilities from regaining those capabilities via
       a call to setns() A process reassociating itself  with  a  user
       namespace must have CAP_SYS_ADMIN privileges in the target user
       namespace.

       A process may not be reassociated with a new mount namespace if
       it  is  multi-threaded.   Changing the mount namespace requires
       that the caller possess both CAP_SYS_CHROOT  and  CAP_SYS_ADMIN
       capabilities in its own user namespace and CAP_SYS_ADMIN in the
       target mount namespace.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
       [not found]                         ` <CAKgNAkggEOV0dXVzr4Zf3n_-it5SXfvjJ1ooYxiVNWaYzQgRLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-07 23:58                           ` Eric W. Biederman
  0 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-07 23:58 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Linux API, Serge E. Hallyn, Linux Containers

"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> Okay. See below.
>
> So, let's take one more pass. How does the following look:
>
>        A multi-threaded process may not  change  user  namespace  with
>        setns().   It  is  not  permitted to use setns() to reenter the
>        caller's current user namespace.  This prevents a  caller  that
>        has  dropped capabilities from regaining those capabilities via
>        a call to setns() A process reassociating itself  with  a  user
>        namespace must have CAP_SYS_ADMIN privileges in the target user
>        namespace.
>
>        A process may not be reassociated with a new mount namespace if
>        it  is  multi-threaded.   Changing the mount namespace requires
>        that the caller possess both CAP_SYS_CHROOT  and  CAP_SYS_ADMIN
>        capabilities in its own user namespace and CAP_SYS_ADMIN in the
>        target mount namespace.

That wording looks correct.

Eric

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
       [not found]                             ` <87r4m51abp.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-14  8:59                               ` Michael Kerrisk (man-pages)
  0 siblings, 0 replies; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-14  8:59 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linux API, Linux Containers

Hi Eric,

On Tue, Jan 1, 2013 at 11:12 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Hi Eric,
>>
>> On Fri, Dec 28, 2012 at 10:20 PM, Eric W. Biederman
>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> [...]
>>
>>>>> For writing you are correct about the mapping to the parent (but that is
>>>>> not an exception that is a restriction on who can write to the file).
>>>>
>>>> So, by the way, I added this sentence to the page:
>>>>
>>>>               In   order   to   write   to   the   /proc/[pid]/uid_map
>>>>               (/proc/[pid]/gid_map) file,  a  process  must  have  the
>>>>               CAP_SETUID (CAP_SETGID) capability in the user namespace
>>>>               of the process pid.
>>>>
>>>> Is that correct?
>>>
>>> Yes.
>>>
>>>> But, there appear to be more rules than this governing whether a
>>>> process can write to the file (i.e., various other -EPERM cases). What
>>>> are the rules?
>>>
>>> In general you must also have CAP_SETUID (CAP_SETGID) in the parent user
>>> namespace as well.  The one exception to that is if you are mapping
>>> your current uid and gid.
>>
>> Can you clarify what you mean by "mapping your own UID and GID" please
>> (i.e., who is "you" in that sentence).
>
> At the time of clone() or unshare() that creates a new user namespace,
> the kuid and the kgid of the process does not change.
>
> setuid and setgid fail before any mappings are set up.
>
> Therefore the caller is allowed to map any single uid to the uid of the
> caller in the parent user namespace.  Likewise the caller is allowed to
> map any single gid to the gid of the caller in the parent user
> namespace.

So, then is the following text now correct and complete:

       In  order  for  a  process  to write to the /proc/[pid]/uid_map
       (/proc/[pid]/gid_map) file, the following requirements must  be
       met:

       *  The process must have the CAP_SETUID (CAP_SETGID) capability
          in the user namespace of the process pid.

       *  The process must have the CAP_SETUID (CAP_SETGID) capability
          in the parent user namespace.  There is an exception to this
          requirement: a  process  writing  to  uid_map  (gid_map)  is
          allowed  to  map any single UID (GID) to the file system UID
          (GID) of the caller in the parent user namespace.

       *  The process must be in either  the  user  namespace  of  the
          process  pid  or  inside  the  parent  user namespace of the
          process pid.

Cheers,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-01-14  8:59 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-26 22:57 [PATCH 0/4] namespace man page updates for 3.8 Eric W. Biederman
     [not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-27  0:46   ` [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map Eric W. Biederman
     [not found]     ` <874nkbrhyv.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27  9:03       ` Michael Kerrisk (man-pages)
     [not found]         ` <CAKgNAkixXmtvQUbwyv=a8mU=gdf-x+w-ou_4N=cNaau+hVoy4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-27 16:58           ` Eric W. Biederman
     [not found]             ` <87obhfxwhb.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-28 19:20               ` Michael Kerrisk (man-pages)
     [not found]                 ` <CAKgNAkjs9T-s8SG-EgTT0O-Uj8S98Q_zfnMqnZ1ROrcYqh7Z5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-28 21:20                   ` Eric W. Biederman
     [not found]                     ` <87vcbldgbj.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-01  9:37                       ` Michael Kerrisk (man-pages)
     [not found]                         ` <CAKgNAkjf=KS5FnP0L-TPTCjQuTDAMs-N4cadAP89L4Mb3KubzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-01 10:12                           ` Eric W. Biederman
     [not found]                             ` <87r4m51abp.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-14  8:59                               ` Michael Kerrisk (man-pages)
2012-12-27 17:23           ` Eric W. Biederman
     [not found]             ` <87licjv276.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 18:39               ` Michael Kerrisk (man-pages)
2012-11-27  0:46   ` [PATCH 2/4] clone.2: Describe the user namespace Eric W. Biederman
     [not found]     ` <87y5hnq3d5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 10:16       ` Michael Kerrisk (man-pages)
     [not found]         ` <CAKgNAkgXWp49wXKom9hMm9fajKVOAwOmFzPdKWBesbBhfZEssA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-27 17:20           ` Eric W. Biederman
     [not found]             ` <87r4mbv2c9.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-01  9:30               ` Michael Kerrisk (man-pages)
     [not found]                 ` <CAKgNAkgPET9jex1DO=1Z3HRQqO_WVD8qmG-UaH1DQB6wDGqO5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-01  9:45                   ` Eric W. Biederman
2012-12-27 17:47           ` Eric W. Biederman
     [not found]             ` <87sj6rs7zc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-01  9:29               ` Michael Kerrisk (man-pages)
     [not found]                 ` <CAKgNAkgRQXn0-x6CXxvW94eeG19dOAOEx78iNC0+w08uX+Sg1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-01  9:39                   ` Eric W. Biederman
     [not found]                     ` <87a9st5jj4.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-07  8:33                       ` Michael Kerrisk (man-pages)
     [not found]                         ` <CAKgNAkggMKib5v4ND9UR1jH=CrK-viM5hhfmc0Rw=mP5GbenSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-07  8:59                           ` Eric W. Biederman
2012-11-27  0:47   ` [PATCH 3/4] proc.5: Document the proc files for the user, mount, and pid namespaces Eric W. Biederman
     [not found]     ` <87pq2zq3b6.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 10:28       ` Michael Kerrisk (man-pages)
2012-11-27  0:48   ` [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support Eric W. Biederman
     [not found]     ` <87k3t7q39u.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 11:08       ` Michael Kerrisk (man-pages)
     [not found]         ` <CAKgNAkiaw5L_oNE8NENjmoBS8Hq_uj+iaEdhyXc1+hje4HdnNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-27 17:40           ` Eric W. Biederman
     [not found]             ` <87bodftmv0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-01  9:30               ` Michael Kerrisk (man-pages)
     [not found]                 ` <CAKgNAkjJR02rKOBh98n7HJwXqAwywHY=Ef35t9tW7wOuyo86NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-01  9:58                   ` Eric W. Biederman
     [not found]                     ` <87mwwt2pj8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-07  9:51                       ` Michael Kerrisk (man-pages)
     [not found]                         ` <CAKgNAkggEOV0dXVzr4Zf3n_-it5SXfvjJ1ooYxiVNWaYzQgRLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-07 23:58                           ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).