* [PATCH 0/4] namespace man page updates for 3.8
@ 2012-11-26 22:57 Eric W. Biederman
[not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-11-26 22:57 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: Linux API, Serge E. Hallyn
The following patches document the namespace user namespace, the pid
namespace, the mount namespace changes that are currently sitting in my
for next-next branch of:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
Except for uid_map and gid_map which should have been documented with
for Linux 3.6 I am a bit early for these changes to be merged, but it
seems a good idea to get the patches out there so things will be
documented and reviewed and thought about in a timely manner.
Eric
man2/clone.2 | 39 ++++++++++++++++++++++
man2/setns.2 | 41 +++++++++++++++++++----
man5/proc.5 | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 174 insertions(+), 8 deletions(-)
Eric W. Biederman (4):
proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
clone.2: Describe the user namespace
proc.5: Document the proc files for the user, mount, and pid namespaces.
setns.2: Document the pid, user, and mount namespace support.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
[not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-11-27 0:46 ` Eric W. Biederman
[not found] ` <874nkbrhyv.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-27 0:46 ` [PATCH 2/4] clone.2: Describe the user namespace Eric W. Biederman
` (2 subsequent siblings)
3 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-11-27 0:46 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: Linux API, Linux Containers
Document the user namespace files that report the mapping of uids
and gids between user namespaces.
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
man5/proc.5 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 50 insertions(+), 0 deletions(-)
diff --git a/man5/proc.5 b/man5/proc.5
index fb70d2b..840480d 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -317,6 +317,31 @@ The files in this directory are readable only by the owner of the process.
.\" .TP
.\" .IR /proc/[pid]/io " (since kernel 2.6.20)"
.TP
+.IR /proc/[pid]/gid_map " (since kernel 3.6)"
+This file reports the mapping of gids from the user namespace of the process specified by
+.IR pid
+to the user namespace of the process that opened
+.IR /proc/[pid]/gid_map .
+
+Each line specifies a 1 to 1 mapping of a range of contiguous gids from
+the user namespace of the process specified by
+.IR pid
+to the user namespace of the process that opened
+.IR /proc/[pid]/gid_map.
+
+Each line contains three numbers. The start of the range of gids in
+the user namespace of the process specifed by
+.IR pid.
+The start of the range of gids in the user namespace of the process that
+opened
+.IR /proc/[pid]/gid_map.
+The number of gids in the range of numbers that is mapped between to two
+user namespaces.
+
+After the creation of a new user namespace this file may be written to
+exactly once to specify the mapping of gids in the new user namespace.
+
+.TP
.IR /proc/[pid]/limits " (since kernel 2.6.24)"
This file displays the soft limit, hard limit, and units of measurement
for each of the process's resource limits (see
@@ -1169,6 +1194,31 @@ directory are not available if the main thread has already terminated
(typically by calling
.BR pthread_exit (3)).
.TP
+.IR /proc/[pid]/uid_map " (since kernel 3.6)"
+This file reports the mapping of uids from the user namespace of the process specified by
+.IR pid
+to the user namespace of the process that opened
+.IR /proc/[pid]/uid_map .
+
+Each line specifies a 1 to 1 mapping of a range of contiguous uids from
+the user namespace of the process specified by
+.IR pid
+to the user namespace of the process that opened
+.IR /proc/[pid]/uid_map.
+
+Each line contains three numbers. The start of the range of uids in
+the user namespace of the process specifed by
+.IR pid.
+The start of the range of uids in the user namespace of the process that
+opened
+.IR /proc/[pid]/uid_map.
+The number of uids in the range of numbers that is mapped between to two
+user namespaces.
+
+After the creation of a new user namespace this file may be written to
+exactly once to specify the mapping of uids in the new user namespace.
+
+.TP
.I /proc/apm
Advanced power management version and battery information when
.B CONFIG_APM
--
1.7.5.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 2/4] clone.2: Describe the user namespace
[not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-27 0:46 ` [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map Eric W. Biederman
@ 2012-11-27 0:46 ` Eric W. Biederman
[not found] ` <87y5hnq3d5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-27 0:47 ` [PATCH 3/4] proc.5: Document the proc files for the user, mount, and pid namespaces Eric W. Biederman
2012-11-27 0:48 ` [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support Eric W. Biederman
3 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-11-27 0:46 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: Linux API, Serge E. Hallyn, Linux Containers
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
man2/clone.2 | 39 +++++++++++++++++++++++++++++++++++++++
1 files changed, 39 insertions(+), 0 deletions(-)
diff --git a/man2/clone.2 b/man2/clone.2
index 0582057..4566677 100644
--- a/man2/clone.2
+++ b/man2/clone.2
@@ -366,6 +366,45 @@ in the same
.BR clone ()
call.
.TP
+.BR CLONE_NEWUSER " (since Linux 3.6)"
+If
+.B CLONE_NEWUSER
+is set, the create the process in a new user namespace. If this flag is not set, then (as with
+.BR fork (2)),
+the process is created in the same user namespace as the calling process.
+
+A user namespace provides an isolated environment for security related identifiers in particular
+uids, gids, keys (see
+.BR keyctl (2)),
+and capabilities.
+
+When a user namespace is created it initially starts out without a mapping of uids and gids
+to the parent user namespace. The desired mapping of uids to the parent user namespace
+may be set by writting into
+.IR /proc/[pid]/uid_map.
+The desired mapping of gids to the parent user namespace may be set by writinng into
+.IR /proc/[pid]/gid_map.
+
+The first process in a user namespace starts out with a complete set of capabilities with
+respect to the new user namespace.
+
+syscalls that return uids and gids will either return the uid or gid mapped into the current
+user namespace if there is a mapping or depending on the context will return either
+the overflowuid (default 65534) or the overflowgid (default 65534). See
+.IR /proc/sys/kernel/overflowuid, /proc/sys/kernel/overflowgid
+
+As of Linux 3.8 no priviliges are needed to create a user namespace,
+and mount, pid, ipc, net, uts namespaces can be created with just
+CAP_SYS_ADMIN privileges in your current user namespace.
+
+Over the years there have been a lot of features that have been added
+to the linux kernel that are only available to privileged users
+because of their potential to confuse setuid root applications. In
+general it becomes safe to allow the root user in a user namespace to
+use those features because it is impossible while in a user namespace
+to gain more privilege than the root user of a user namespace has.
+
+.TP
.BR CLONE_NEWPID " (since Linux 2.6.24)"
.\" This explanation draws a lot of details from
.\" http://lwn.net/Articles/259217/
--
1.7.5.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 3/4] proc.5: Document the proc files for the user, mount, and pid namespaces.
[not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-27 0:46 ` [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map Eric W. Biederman
2012-11-27 0:46 ` [PATCH 2/4] clone.2: Describe the user namespace Eric W. Biederman
@ 2012-11-27 0:47 ` Eric W. Biederman
[not found] ` <87pq2zq3b6.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-27 0:48 ` [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support Eric W. Biederman
3 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-11-27 0:47 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: Linux API, Linux Containers
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
man5/proc.5 | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 52 insertions(+), 0 deletions(-)
diff --git a/man5/proc.5 b/man5/proc.5
index 840480d..eb612b9 100644
--- a/man5/proc.5
+++ b/man5/proc.5
@@ -581,6 +581,58 @@ even if all processes in the namespace terminate.
The file descriptor can be passed to
.BR setns (2).
.TP
+.IR /proc/[pid]/ns/user " (since Linux 3.8)"
+Bind mounting this file (see
+.BR mount (2))
+to somewhere else in the filesystem keeps
+the user namespace of the process specified by
+.I pid
+alive even if all processes currently in the namespace terminate.
+
+Opening this file returns a file handle for the user namespace
+of the process specified by
+.IR pid .
+As long as this file descriptor remains open,
+the user namespace will remain alive,
+even if all processes in the namespace terminate.
+The file descriptor can be passed to
+.BR setns (2).
+.TP
+.IR /proc/[pid]/ns/pid " (since Linux 3.8)"
+Bind mounting this file (see
+.BR mount (2))
+to somewhere else in the filesystem keeps
+the PID namespace of the process specified by
+.I pid
+alive even if all processes currently in the namespace terminate.
+
+Opening this file returns a file handle for the PID namespace
+of the process specified by
+.IR pid .
+As long as this file descriptor remains open,
+the PID namespace will remain alive,
+even if all processes in the namespace terminate.
+The file descriptor can be passed to
+.BR setns (2).
+.TP
+.IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
+Bind mounting this file (see
+.BR mount (2))
+to somewhere else in the filesystem keeps
+the mount namespace of the process specified by
+.I pid
+alive even if all processes currently in the namespace terminate.
+
+Opening this file returns a file handle for the mount namespace
+of the process specified by
+.IR pid .
+As long as this file descriptor remains open,
+the mount namespace will remain alive,
+even if all processes in the namespace terminate.
+The file descriptor can be passed to
+.BR setns (2).
+
+.TP
.IR /proc/[pid]/numa_maps " (since Linux 2.6.14)"
See
.BR numa (7).
--
1.7.5.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
[not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
` (2 preceding siblings ...)
2012-11-27 0:47 ` [PATCH 3/4] proc.5: Document the proc files for the user, mount, and pid namespaces Eric W. Biederman
@ 2012-11-27 0:48 ` Eric W. Biederman
[not found] ` <87k3t7q39u.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
3 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-11-27 0:48 UTC (permalink / raw)
To: Michael Kerrisk (man-pages); +Cc: Linux API, Serge E. Hallyn, Linux Containers
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
man2/setns.2 | 41 +++++++++++++++++++++++++++++++++--------
1 files changed, 33 insertions(+), 8 deletions(-)
diff --git a/man2/setns.2 b/man2/setns.2
index 6aa01e1..63b04dc 100644
--- a/man2/setns.2
+++ b/man2/setns.2
@@ -48,6 +48,18 @@ must refer to a network namespace.
.BR CLONE_NEWUTS
.I fd
must refer to a UTS namespace.
+.TP
+.BR CLONE_NEWPID
+.I fd
+must refer to a PID namespace.
+.TP
+.BR CLONE_NEWUSER
+.I fd
+must refer to a user namespace.
+.TP
+.BR CLONE_NEWNS
+.I fd
+must refer to a mount namespace.
.PP
Specifying
.I nstype
@@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
.IR fd
if the file descriptor was opened by another process and, for example,
passed to the caller via a UNIX domain socket.)
+
+The pid namespace is a little different. Reassociating the calling
+thread with a pid namespace only changes the pid namespace that the
+child processes will be created in.
+
+Changing the pid namespace for child processes is only allowed if the
+pid namespace specified by
+.IR fd
+is a child pid namespace of the pid namespace of the current thread.
+
+A multi-threaded process may not change user namespace with setns. A
+process may not reassociate the thread with the current user
+namespace. The process reassociating itself with a user namespace
+must have CAP_SYS_ADMIN privileges in the target user namespace.
+
+A process may not be reassociated with a new mount namespace if it is
+multi-threaded or it does not possess both CAP_SYS_CHROOT privileges
+and CAP_SYS_ADMIN rights over the target mount namespace.
+
.SH RETURN VALUE
On success,
.IR setns ()
@@ -94,7 +125,8 @@ for this operation.
The
.BR setns ()
system call first appeared in Linux in kernel 3.0;
-library support was added to glibc in version 2.14.
+library support was added to glibc in version 2.14;
+Support for PID, user and mount namespaces first appeard in Linux in kernel 3.8.
.SH CONFORMING TO
The
.BR setns ()
@@ -106,13 +138,6 @@ a new thread is created using
can be changed using
.BR setns ().
.SH BUGS
-The PID namespace and the mount namespace are not currently supported.
-(See the descriptions of
-.BR CLONE_NEWPID
-and
-.BR CLONE_NEWNS
-in
-.BR clone (2).)
.SH SEE ALSO
.BR clone (2),
.BR fork (2),
--
1.7.5.4
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
[not found] ` <874nkbrhyv.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-27 9:03 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkixXmtvQUbwyv=a8mU=gdf-x+w-ou_4N=cNaau+hVoy4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-27 9:03 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Serge E. Hallyn, Linux Containers
Hi Eric,
Thanks for this patch. I have one question and a revised version f the
text that I'd like you to review.
On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> Document the user namespace files that report the mapping of uids
> and gids between user namespaces.
>
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
> man5/proc.5 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 50 insertions(+), 0 deletions(-)
>
> diff --git a/man5/proc.5 b/man5/proc.5
> index fb70d2b..840480d 100644
> --- a/man5/proc.5
> +++ b/man5/proc.5
> @@ -317,6 +317,31 @@ The files in this directory are readable only by the owner of the process.
> .\" .TP
> .\" .IR /proc/[pid]/io " (since kernel 2.6.20)"
> .TP
> +.IR /proc/[pid]/gid_map " (since kernel 3.6)"
> +This file reports the mapping of gids from the user namespace of the process specified by
> +.IR pid
> +to the user namespace of the process that opened
> +.IR /proc/[pid]/gid_map .
> +
> +Each line specifies a 1 to 1 mapping of a range of contiguous gids from
> +the user namespace of the process specified by
> +.IR pid
> +to the user namespace of the process that opened
> +.IR /proc/[pid]/gid_map.
I want to check the above point. What do you mean by "the process that
opened uid_map"? Does that mean the process that opened uid_map to do
the one-time write of the UID map? I had assumed that uid_map actually
provided a mapping between the namespace of 'pid' and the 'parent'
namespace, where the parent namespace is the namespace of the process
that created this namespace via clone(CLONE_NEWUSER).
> +
> +Each line contains three numbers. The start of the range of gids in
> +the user namespace of the process specifed by
> +.IR pid.
> +The start of the range of gids in the user namespace of the process that
> +opened
> +.IR /proc/[pid]/gid_map.
> +The number of gids in the range of numbers that is mapped between to two
> +user namespaces.
> +
> +After the creation of a new user namespace this file may be written to
> +exactly once to specify the mapping of gids in the new user namespace.
> +
> +.TP
> .IR /proc/[pid]/limits " (since kernel 2.6.24)"
> This file displays the soft limit, hard limit, and units of measurement
> for each of the process's resource limits (see
> @@ -1169,6 +1194,31 @@ directory are not available if the main thread has already terminated
> (typically by calling
> .BR pthread_exit (3)).
> .TP
> +.IR /proc/[pid]/uid_map " (since kernel 3.6)"
> +This file reports the mapping of uids from the user namespace of the process specified by
> +.IR pid
> +to the user namespace of the process that opened
> +.IR /proc/[pid]/uid_map .
> +
> +Each line specifies a 1 to 1 mapping of a range of contiguous uids from
> +the user namespace of the process specified by
> +.IR pid
> +to the user namespace of the process that opened
> +.IR /proc/[pid]/uid_map.
> +
> +Each line contains three numbers. The start of the range of uids in
> +the user namespace of the process specifed by
> +.IR pid.
> +The start of the range of uids in the user namespace of the process that
> +opened
> +.IR /proc/[pid]/uid_map.
> +The number of uids in the range of numbers that is mapped between to two
> +user namespaces.
> +
> +After the creation of a new user namespace this file may be written to
> +exactly once to specify the mapping of uids in the new user namespace.
> +
> +.TP
> .I /proc/apm
> Advanced power management version and battery information when
> .B CONFIG_APM
I revised your text quite a bit, and added a piece on the format od
the uid_map files. Could you please read the following and let me know
of errors:
[[
/proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.6)
These files expose the mappings for user and group IDs
inside the user namespace for the process pid. The
description here explains the details for uid_map;
gid_map is exactly the same, but each instance of "user
ID" is replaced by "group ID".
The uid_map file exposes the mapping of user IDs from
the user namespace of the process pid to the user names‐
pace of the process that opened uid_map.
Each line in the file specifies a 1-to-1 mapping of a
range of contiguous user IDs from the user namespace of
the process pid to the user namespace of the process
that opened uid_map.
Each line contains three numbers delimited by white
space:
(1) The start of the range of user IDs in the user
namespace of the process pid.
(2) The start of the range of user IDs in the user
namespace of the process that opened uid_map.
(3) The length of the range of user IDs that is mapped
between the two user namespaces.
After the creation of a new user namespace, this file
may be written to exactly once to specify the mapping of
user IDs in the new user namespace. (An attempt to
write more than once to the file fails with the error
EPERM.)
The lines written to uid_map must conform to the follow‐
ing rules:
* The three fields must be valid numbers, and the last
field must be greater than 0.
* Lines are terminated by newline characters.
* The file can contain a maximum of five lines.
* The values in both field 1 and field 2 of each line
must be in ascending numerical order.
* The range of user IDs specified in each line cannot
overlap with the ranges in any other lines.
Writes that violate the above rules fail with the error
EINVAL.
]]
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] clone.2: Describe the user namespace
[not found] ` <87y5hnq3d5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-27 10:16 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgXWp49wXKom9hMm9fajKVOAwOmFzPdKWBesbBhfZEssA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-27 10:16 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Linux Containers
Hi Eric,
On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> man2/clone.2 | 39 +++++++++++++++++++++++++++++++++++++++
> 1 files changed, 39 insertions(+), 0 deletions(-)
>
> diff --git a/man2/clone.2 b/man2/clone.2
> index 0582057..4566677 100644
> --- a/man2/clone.2
> +++ b/man2/clone.2
> @@ -366,6 +366,45 @@ in the same
> .BR clone ()
> call.
> .TP
> +.BR CLONE_NEWUSER " (since Linux 3.6)"
Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained
some meaning in 2.6.29.
> +If
> +.B CLONE_NEWUSER
> +is set, the create the process in a new user namespace. If this flag is not set, then (as with
> +.BR fork (2)),
> +the process is created in the same user namespace as the calling process.
> +
> +A user namespace provides an isolated environment for security related identifiers in particular
> +uids, gids, keys (see
> +.BR keyctl (2)),
> +and capabilities.
> +
> +When a user namespace is created it initially starts out without a mapping of uids and gids
> +to the parent user namespace. The desired mapping of uids to the parent user namespace
> +may be set by writting into
> +.IR /proc/[pid]/uid_map.
> +The desired mapping of gids to the parent user namespace may be set by writinng into
> +.IR /proc/[pid]/gid_map.
> +
> +The first process in a user namespace starts out with a complete set of capabilities with
> +respect to the new user namespace.
> +
> +syscalls that return uids and gids will either return the uid or gid mapped into the current
> +user namespace if there is a mapping or depending on the context will return either
> +the overflowuid (default 65534) or the overflowgid (default 65534). See
> +.IR /proc/sys/kernel/overflowuid, /proc/sys/kernel/overflowgid
> +
> +As of Linux 3.8 no priviliges are needed to create a user namespace,
> +and mount, pid, ipc, net, uts namespaces can be created with just
> +CAP_SYS_ADMIN privileges in your current user namespace.
> +
> +Over the years there have been a lot of features that have been added
> +to the linux kernel that are only available to privileged users
> +because of their potential to confuse setuid root applications. In
> +general it becomes safe to allow the root user in a user namespace to
> +use those features because it is impossible while in a user namespace
> +to gain more privilege than the root user of a user namespace has.
> +
> +.TP
> .BR CLONE_NEWPID " (since Linux 2.6.24)"
> .\" This explanation draws a lot of details from
> .\" http://lwn.net/Articles/259217/
I reworked your text somewhat. Could you please review the following:
[[
CLONE_NEWUSER
(This flag first became meaningful for clone() in Linux
2.6.29, but the implementation of user namespaces was
only completed in Linux 3.8.) If CLONE_NEWUSER is set,
then create the process in a new user namespace. If
this flag is not set, then (as with fork(2)) the process
is created in the same user namespace as the calling
process.
A user namespace provides an isolated environment for
security related identifiers, in particular, user IDs,
group IDs, keys (see keyctl(2)), and capabilities.
When a user namespace is created, it starts out without
a mapping of user IDs (group IDs) to the parent user
namespace. The desired mapping of user IDs (group IDs)
to the parent user namespace may be set by writing into
/proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
The first process in a user namespace starts out with a
complete set of capabilities with respect to the new
user namespace.
System calls that return user IDs (group IDs) will
return either the user ID (group ID) mapped into the
current user namespace if there is a mapping, or the
overflow user ID (group ID); the default value for the
overflow user ID (group ID) is 65534. See the descrip‐
tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
nel/overflowgid in proc(5).
Starting with Linux 3.8, no privileges are needed to
create a user namespace, and mount, PID, IPC, net, and
UTS namespaces can be created with just the
CAP_SYS_ADMIN capability in the caller's user namespace.
Over the years, there have been a lot of features that
have been added to the Linux kernel that are only avail‐
able to privileged users because of their potential to
confuse set-user-ID-root applications. In general, it
becomes safe to allow the root user in a user namespace
to use those features because it is impossible, while in
a user namespace, to gain more privilege than the root
user of a user namespace has.
]]
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 3/4] proc.5: Document the proc files for the user, mount, and pid namespaces.
[not found] ` <87pq2zq3b6.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-27 10:28 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-27 10:28 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Linux Containers
Hi Eric,
On Tue, Nov 27, 2012 at 1:47 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Thanks. Applied.
Cheers,
Michael
> ---
> man5/proc.5 | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 52 insertions(+), 0 deletions(-)
>
> diff --git a/man5/proc.5 b/man5/proc.5
> index 840480d..eb612b9 100644
> --- a/man5/proc.5
> +++ b/man5/proc.5
> @@ -581,6 +581,58 @@ even if all processes in the namespace terminate.
> The file descriptor can be passed to
> .BR setns (2).
> .TP
> +.IR /proc/[pid]/ns/user " (since Linux 3.8)"
> +Bind mounting this file (see
> +.BR mount (2))
> +to somewhere else in the filesystem keeps
> +the user namespace of the process specified by
> +.I pid
> +alive even if all processes currently in the namespace terminate.
> +
> +Opening this file returns a file handle for the user namespace
> +of the process specified by
> +.IR pid .
> +As long as this file descriptor remains open,
> +the user namespace will remain alive,
> +even if all processes in the namespace terminate.
> +The file descriptor can be passed to
> +.BR setns (2).
> +.TP
> +.IR /proc/[pid]/ns/pid " (since Linux 3.8)"
> +Bind mounting this file (see
> +.BR mount (2))
> +to somewhere else in the filesystem keeps
> +the PID namespace of the process specified by
> +.I pid
> +alive even if all processes currently in the namespace terminate.
> +
> +Opening this file returns a file handle for the PID namespace
> +of the process specified by
> +.IR pid .
> +As long as this file descriptor remains open,
> +the PID namespace will remain alive,
> +even if all processes in the namespace terminate.
> +The file descriptor can be passed to
> +.BR setns (2).
> +.TP
> +.IR /proc/[pid]/ns/mnt " (since Linux 3.8)"
> +Bind mounting this file (see
> +.BR mount (2))
> +to somewhere else in the filesystem keeps
> +the mount namespace of the process specified by
> +.I pid
> +alive even if all processes currently in the namespace terminate.
> +
> +Opening this file returns a file handle for the mount namespace
> +of the process specified by
> +.IR pid .
> +As long as this file descriptor remains open,
> +the mount namespace will remain alive,
> +even if all processes in the namespace terminate.
> +The file descriptor can be passed to
> +.BR setns (2).
> +
> +.TP
> .IR /proc/[pid]/numa_maps " (since Linux 2.6.14)"
> See
> .BR numa (7).
> --
> 1.7.5.4
>
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
[not found] ` <87k3t7q39u.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-27 11:08 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkiaw5L_oNE8NENjmoBS8Hq_uj+iaEdhyXc1+hje4HdnNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-27 11:08 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Serge E. Hallyn, Linux Containers
Hi Eric,
Some questions below.
On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> ---
> man2/setns.2 | 41 +++++++++++++++++++++++++++++++++--------
> 1 files changed, 33 insertions(+), 8 deletions(-)
>
> diff --git a/man2/setns.2 b/man2/setns.2
> index 6aa01e1..63b04dc 100644
> --- a/man2/setns.2
> +++ b/man2/setns.2
> @@ -48,6 +48,18 @@ must refer to a network namespace.
> .BR CLONE_NEWUTS
> .I fd
> must refer to a UTS namespace.
> +.TP
> +.BR CLONE_NEWPID
> +.I fd
> +must refer to a PID namespace.
> +.TP
> +.BR CLONE_NEWUSER
> +.I fd
> +must refer to a user namespace.
> +.TP
> +.BR CLONE_NEWNS
> +.I fd
> +must refer to a mount namespace.
> .PP
> Specifying
> .I nstype
> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
> .IR fd
> if the file descriptor was opened by another process and, for example,
> passed to the caller via a UNIX domain socket.)
> +
> +The pid namespace is a little different. Reassociating the calling
> +thread with a pid namespace only changes the pid namespace that the
> +child processes will be created in.
> +
> +Changing the pid namespace for child processes is only allowed if the
> +pid namespace specified by
> +.IR fd
> +is a child pid namespace of the pid namespace of the current thread.
I assume "current thread" above should be "calling thread", right?
> +
> +A multi-threaded process may not change user namespace with setns. A
> +process may not reassociate the thread with the current user
> +namespace.
What do you mean by "the current user nsamesapce"?
> The process reassociating itself with a user namespace
> +must have CAP_SYS_ADMIN privileges in the target user namespace.
> +
> +A process may not be reassociated with a new mount namespace if it is
> +multi-threaded
I tried to verify the precdeing two lines from the kernel source, but
did not work out where this check is made. Where is it?
> or it does not possess both CAP_SYS_CHROOT privileges
> +and CAP_SYS_ADMIN rights over the target mount namespace.
Could you please expand/clarify the preceding two lines. As they
stand, I don't really understand them.
> .SH RETURN VALUE
> On success,
> .IR setns ()
> @@ -94,7 +125,8 @@ for this operation.
> The
> .BR setns ()
> system call first appeared in Linux in kernel 3.0;
> -library support was added to glibc in version 2.14.
> +library support was added to glibc in version 2.14;
> +Support for PID, user and mount namespaces first appeard in Linux in kernel 3.8.
> .SH CONFORMING TO
> The
> .BR setns ()
> @@ -106,13 +138,6 @@ a new thread is created using
> can be changed using
> .BR setns ().
> .SH BUGS
> -The PID namespace and the mount namespace are not currently supported.
> -(See the descriptions of
> -.BR CLONE_NEWPID
> -and
> -.BR CLONE_NEWNS
> -in
> -.BR clone (2).)
> .SH SEE ALSO
> .BR clone (2),
> .BR fork (2),
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
[not found] ` <CAKgNAkixXmtvQUbwyv=a8mU=gdf-x+w-ou_4N=cNaau+hVoy4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-27 16:58 ` Eric W. Biederman
[not found] ` <87obhfxwhb.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 17:23 ` Eric W. Biederman
1 sibling, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-27 16:58 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Linux API, Serge E. Hallyn, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> Hi Eric,
>
> Thanks for this patch. I have one question and a revised version f the
> text that I'd like you to review.
>
> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>
>> Document the user namespace files that report the mapping of uids
>> and gids between user namespaces.
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>> man5/proc.5 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 50 insertions(+), 0 deletions(-)
>>
>> diff --git a/man5/proc.5 b/man5/proc.5
>> index fb70d2b..840480d 100644
>> --- a/man5/proc.5
>> +++ b/man5/proc.5
>> @@ -317,6 +317,31 @@ The files in this directory are readable only by the owner of the process.
>> .\" .TP
>> .\" .IR /proc/[pid]/io " (since kernel 2.6.20)"
>> .TP
>> +.IR /proc/[pid]/gid_map " (since kernel 3.6)"
>> +This file reports the mapping of gids from the user namespace of the process specified by
>> +.IR pid
>> +to the user namespace of the process that opened
>> +.IR /proc/[pid]/gid_map .
>> +
>> +Each line specifies a 1 to 1 mapping of a range of contiguous gids from
>> +the user namespace of the process specified by
>> +.IR pid
>> +to the user namespace of the process that opened
>> +.IR /proc/[pid]/gid_map.
>
> I want to check the above point. What do you mean by "the process that
> opened uid_map"? Does that mean the process that opened uid_map to do
> the one-time write of the UID map? I had assumed that uid_map actually
> provided a mapping between the namespace of 'pid' and the 'parent'
> namespace, where the parent namespace is the namespace of the process
> that created this namespace via clone(CLONE_NEWUSER).
I mean the process that opens uid_map for read or write.
For writing you are correct about the mapping to the parent (but that is
not an exception that is a restriction on who can write to the file).
The complete rule is for the user namespace of the second value is:
- If the user namespace of the opener of the file and the user namespace
of the process do not match. The user namespace of the opener of the
file is used.
- If the user namespace of the opener of the file and the user namespace
of the process are the same. The parent user namespace of the process
is used for the second value.
While very wordy I think the rule makes a lot of intuitive and practical
sense. Especially since it is non-trivial to come up with the chain of
user namespaces a process is in.
>> +Each line contains three numbers. The start of the range of gids in
>> +the user namespace of the process specifed by
>> +.IR pid.
>> +The start of the range of gids in the user namespace of the process that
>> +opened
>> +.IR /proc/[pid]/gid_map.
>> +The number of gids in the range of numbers that is mapped between to two
>> +user namespaces.
>> +
>> +After the creation of a new user namespace this file may be written to
>> +exactly once to specify the mapping of gids in the new user namespace.
>> +
>> +.TP
>> .IR /proc/[pid]/limits " (since kernel 2.6.24)"
>> This file displays the soft limit, hard limit, and units of measurement
>> for each of the process's resource limits (see
>> @@ -1169,6 +1194,31 @@ directory are not available if the main thread has already terminated
>> (typically by calling
>> .BR pthread_exit (3)).
>> .TP
>> +.IR /proc/[pid]/uid_map " (since kernel 3.6)"
>> +This file reports the mapping of uids from the user namespace of the process specified by
>> +.IR pid
>> +to the user namespace of the process that opened
>> +.IR /proc/[pid]/uid_map .
>> +
>> +Each line specifies a 1 to 1 mapping of a range of contiguous uids from
>> +the user namespace of the process specified by
>> +.IR pid
>> +to the user namespace of the process that opened
>> +.IR /proc/[pid]/uid_map.
>> +
>> +Each line contains three numbers. The start of the range of uids in
>> +the user namespace of the process specifed by
>> +.IR pid.
>> +The start of the range of uids in the user namespace of the process that
>> +opened
>> +.IR /proc/[pid]/uid_map.
>> +The number of uids in the range of numbers that is mapped between to two
>> +user namespaces.
>> +
>> +After the creation of a new user namespace this file may be written to
>> +exactly once to specify the mapping of uids in the new user namespace.
>> +
>> +.TP
>> .I /proc/apm
>> Advanced power management version and battery information when
>> .B CONFIG_APM
>
> I revised your text quite a bit, and added a piece on the format od
> the uid_map files. Could you please read the following and let me know
> of errors:
>
> [[
> /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.6)
> These files expose the mappings for user and group IDs
> inside the user namespace for the process pid. The
> description here explains the details for uid_map;
> gid_map is exactly the same, but each instance of "user
> ID" is replaced by "group ID".
>
> The uid_map file exposes the mapping of user IDs from
> the user namespace of the process pid to the user names‐
> pace of the process that opened uid_map.
>
> Each line in the file specifies a 1-to-1 mapping of a
> range of contiguous user IDs from the user namespace of
> the process pid to the user namespace of the process
> that opened uid_map.
>
> Each line contains three numbers delimited by white
> space:
>
> (1) The start of the range of user IDs in the user
> namespace of the process pid.
>
> (2) The start of the range of user IDs in the user
> namespace of the process that opened uid_map.
>
> (3) The length of the range of user IDs that is mapped
> between the two user namespaces.
>
> After the creation of a new user namespace, this file
> may be written to exactly once to specify the mapping of
> user IDs in the new user namespace. (An attempt to
> write more than once to the file fails with the error
> EPERM.)
>
> The lines written to uid_map must conform to the follow‐
> ing rules:
>
> * The three fields must be valid numbers, and the last
> field must be greater than 0.
>
> * Lines are terminated by newline characters.
>
> * The file can contain a maximum of five lines.
A maximum of 5 lines is important to Document but it is a current
arbitrary limit that may be changed in the future. Right now 5 extents
are more than enough for any conceivable use case, and fit nicely within
a single cache line.
It is probably better to say writes that exceed an arbitrary maximum
length fail with -EINVAL. Currently the arbitrary maximum length is
five lines.
> * The values in both field 1 and field 2 of each line
> must be in ascending numerical order.
The rule is that the extents need to be non-overlapping. Ascending
numerical order is how that is implemented but that is a misfeature,
and there has already been one request to fix that. Removing the
ascending numerical order limitation is on my todo list.
> * The range of user IDs specified in each line cannot
> overlap with the ranges in any other lines.
>
> Writes that violate the above rules fail with the error
> EINVAL.
> ]]
>
> Thanks,
>
> Michael
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] clone.2: Describe the user namespace
[not found] ` <CAKgNAkgXWp49wXKom9hMm9fajKVOAwOmFzPdKWBesbBhfZEssA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-27 17:20 ` Eric W. Biederman
[not found] ` <87r4mbv2c9.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 17:47 ` Eric W. Biederman
1 sibling, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-27 17:20 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> Hi Eric,
>
> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>> man2/clone.2 | 39 +++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/man2/clone.2 b/man2/clone.2
>> index 0582057..4566677 100644
>> --- a/man2/clone.2
>> +++ b/man2/clone.2
>> @@ -366,6 +366,45 @@ in the same
>> .BR clone ()
>> call.
>> .TP
>> +.BR CLONE_NEWUSER " (since Linux 3.6)"
>
> Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained
> some meaning in 2.6.29.
Looking at it where I have said 3.6 that is wrong. I meant 3.5.
I think I made the same mistake in one or two other manpages. Nothing
was merged in 3.6 unfortunately.
My intent was these are the semantics of user namespaces since 3.5,
when my rework/refocusing of them was merged.
Since 3.5 all that has really happened with user namespaces is the
uid/gid to kuid/kgid conversion, permission checks have been relaxed,
and a few bugs have been fixed.
3.8 is huge from a usability standpoint. 3.8 is huge because setns(),
and unshare() are now complete from a namespace perspective, and because
enough permission checks have been relaxed in user namespaces that you
can really start using them.
But semantically from a user namespace perspective nothing really has
changed in 3.8.
>> +If
>> +.B CLONE_NEWUSER
>> +is set, the create the process in a new user namespace. If this flag is not set, then (as with
>> +.BR fork (2)),
>> +the process is created in the same user namespace as the calling process.
>> +
>> +A user namespace provides an isolated environment for security related identifiers in particular
>> +uids, gids, keys (see
>> +.BR keyctl (2)),
>> +and capabilities.
>> +
>> +When a user namespace is created it initially starts out without a mapping of uids and gids
>> +to the parent user namespace. The desired mapping of uids to the parent user namespace
>> +may be set by writting into
>> +.IR /proc/[pid]/uid_map.
>> +The desired mapping of gids to the parent user namespace may be set by writinng into
>> +.IR /proc/[pid]/gid_map.
>> +
>> +The first process in a user namespace starts out with a complete set of capabilities with
>> +respect to the new user namespace.
>> +
>> +syscalls that return uids and gids will either return the uid or gid mapped into the current
>> +user namespace if there is a mapping or depending on the context will return either
>> +the overflowuid (default 65534) or the overflowgid (default 65534). See
>> +.IR /proc/sys/kernel/overflowuid, /proc/sys/kernel/overflowgid
>> +
>> +As of Linux 3.8 no priviliges are needed to create a user namespace,
>> +and mount, pid, ipc, net, uts namespaces can be created with just
>> +CAP_SYS_ADMIN privileges in your current user namespace.
>> +
>> +Over the years there have been a lot of features that have been added
>> +to the linux kernel that are only available to privileged users
>> +because of their potential to confuse setuid root applications. In
>> +general it becomes safe to allow the root user in a user namespace to
>> +use those features because it is impossible while in a user namespace
>> +to gain more privilege than the root user of a user namespace has.
>> +
>> +.TP
>> .BR CLONE_NEWPID " (since Linux 2.6.24)"
>> .\" This explanation draws a lot of details from
>> .\" http://lwn.net/Articles/259217/
>
> I reworked your text somewhat. Could you please review the following:
>
> [[
> CLONE_NEWUSER
> (This flag first became meaningful for clone() in Linux
> 2.6.29, but the implementation of user namespaces was
> only completed in Linux 3.8.)
Long rant about 2.6.29 vs 3.8 above. I think what we need to say is:
(This flag first became meaningful for clone() in Linux
2.6.29, the current semantics were merged present in
3.5, and user namespaces only really became usable in 3.8.)
> If CLONE_NEWUSER is set,
> then create the process in a new user namespace. If
> this flag is not set, then (as with fork(2)) the process
> is created in the same user namespace as the calling
> process.
>
> A user namespace provides an isolated environment for
> security related identifiers, in particular, user IDs,
> group IDs, keys (see keyctl(2)), and capabilities.
>
> When a user namespace is created, it starts out without
> a mapping of user IDs (group IDs) to the parent user
> namespace. The desired mapping of user IDs (group IDs)
> to the parent user namespace may be set by writing into
> /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
/proc/[pid]/projid_map deserves a mention. Not that
I am a fan of project is or that xfs where the are
implemented has been converted yet but....
> The first process in a user namespace starts out with a
> complete set of capabilities with respect to the new
> user namespace.
>
> System calls that return user IDs (group IDs) will
> return either the user ID (group ID) mapped into the
> current user namespace if there is a mapping, or the
> overflow user ID (group ID); the default value for the
> overflow user ID (group ID) is 65534. See the descrip‐
> tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
> nel/overflowgid in proc(5).
>
> Starting with Linux 3.8, no privileges are needed to
> create a user namespace, and mount, PID, IPC, net, and
> UTS namespaces can be created with just the
> CAP_SYS_ADMIN capability in the caller's user namespace.
>
> Over the years, there have been a lot of features that
> have been added to the Linux kernel that are only avail‐
> able to privileged users because of their potential to
> confuse set-user-ID-root applications. In general, it
> becomes safe to allow the root user in a user namespace
> to use those features because it is impossible, while in
> a user namespace, to gain more privilege than the root
> user of a user namespace has.
I don't have any problems with this bit of text.
It occurs to me that what is going on with capabilities and user
namespaces needs to be documented better. There was a minor bug with
them this release cycle and I realized while the current definition
makes sense and isn't hard to understand in general. In detail the
interaction of capabilities and user namespaces are hard to describe.
I think capabilities and user namespaces are the work of a future patch
however.
> ]]
>
> Thanks,
>
> Michael
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
[not found] ` <CAKgNAkixXmtvQUbwyv=a8mU=gdf-x+w-ou_4N=cNaau+hVoy4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-27 16:58 ` Eric W. Biederman
@ 2012-12-27 17:23 ` Eric W. Biederman
[not found] ` <87licjv276.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
1 sibling, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-27 17:23 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Linux API, Serge E. Hallyn, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> Hi Eric,
>
> Thanks for this patch. I have one question and a revised version f the
> text that I'd like you to review.
In this patch where I said 3.6 it should have been 3.5
Eric
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
[not found] ` <CAKgNAkiaw5L_oNE8NENjmoBS8Hq_uj+iaEdhyXc1+hje4HdnNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-27 17:40 ` Eric W. Biederman
[not found] ` <87bodftmv0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-27 17:40 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> Hi Eric,
>
> Some questions below.
A quick note. Getting the permission checks correct has been a little
more interesting that I would have preferred.
I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns()
install methods except the user namespace. Not a change in pre 3.8
behavior but a change to my patch, and possibly a documentation change
below.
> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>> ---
>> man2/setns.2 | 41 +++++++++++++++++++++++++++++++++--------
>> 1 files changed, 33 insertions(+), 8 deletions(-)
>>
>> diff --git a/man2/setns.2 b/man2/setns.2
>> index 6aa01e1..63b04dc 100644
>> --- a/man2/setns.2
>> +++ b/man2/setns.2
>> @@ -48,6 +48,18 @@ must refer to a network namespace.
>> .BR CLONE_NEWUTS
>> .I fd
>> must refer to a UTS namespace.
>> +.TP
>> +.BR CLONE_NEWPID
>> +.I fd
>> +must refer to a PID namespace.
>> +.TP
>> +.BR CLONE_NEWUSER
>> +.I fd
>> +must refer to a user namespace.
>> +.TP
>> +.BR CLONE_NEWNS
>> +.I fd
>> +must refer to a mount namespace.
>> .PP
>> Specifying
>> .I nstype
>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
>> .IR fd
>> if the file descriptor was opened by another process and, for example,
>> passed to the caller via a UNIX domain socket.)
>> +
>> +The pid namespace is a little different. Reassociating the calling
>> +thread with a pid namespace only changes the pid namespace that the
>> +child processes will be created in.
>> +
>> +Changing the pid namespace for child processes is only allowed if the
>> +pid namespace specified by
>> +.IR fd
>> +is a child pid namespace of the pid namespace of the current thread.
>
> I assume "current thread" above should be "calling thread", right?
What I mean in "current" from a kernel perspective.
It should be just "caller".
Threads must share a pid namespace so mentioning threads seems wrong.
>> +
>> +A multi-threaded process may not change user namespace with setns. A
>> +process may not reassociate the thread with the current user
>> +namespace.
>
> What do you mean by "the current user nsamesapce"?
fd = open("/proc/self/ns/user");
setns(fd) -> -EINVAL.
So from a userspace perspective I mean "the callers user namespace".
>> The process reassociating itself with a user namespace
>> +must have CAP_SYS_ADMIN privileges in the target user namespace.
>>
>> +A process may not be reassociated with a new mount namespace if it is
>> +multi-threaded
>
> I tried to verify the precdeing two lines from the kernel source, but
> did not work out where this check is made. Where is it?
kernel/user_namespace.c:userns_install()
fs/namespace.c:mntns_install()
A couple of the security checks have been pushed down into a per
namespace context, because the exact check that makes sense depends on
the namespace.
>> or it does not possess both CAP_SYS_CHROOT privileges
>> +and CAP_SYS_ADMIN rights over the target mount namespace.
>
> Could you please expand/clarify the preceding two lines. As they
> stand, I don't really understand them.
Ugh. The text is slightly wrong.
The code is:
if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
!nsown_capable(CAP_SYS_CHROOT) ||
!nsown_capable(CAP_SYS_ADMIN))
return -EPERM;
Basically you aren't allowed change your mount namespace into
a mount namespace that doesn't see you as the all powerful root
able to mount and unmount filesystems.
You aren't allowed to change your mount namespace unless you possesses
CAP_SYS_CHROOT and CAP_SYS_ADMIN.
>> .SH RETURN VALUE
>> On success,
>> .IR setns ()
>> @@ -94,7 +125,8 @@ for this operation.
>> The
>> .BR setns ()
>> system call first appeared in Linux in kernel 3.0;
>> -library support was added to glibc in version 2.14.
>> +library support was added to glibc in version 2.14;
>> +Support for PID, user and mount namespaces first appeard in Linux in kernel 3.8.
>> .SH CONFORMING TO
>> The
>> .BR setns ()
>> @@ -106,13 +138,6 @@ a new thread is created using
>> can be changed using
>> .BR setns ().
>> .SH BUGS
>> -The PID namespace and the mount namespace are not currently supported.
>> -(See the descriptions of
>> -.BR CLONE_NEWPID
>> -and
>> -.BR CLONE_NEWNS
>> -in
>> -.BR clone (2).)
>> .SH SEE ALSO
>> .BR clone (2),
>> .BR fork (2),
>
> Cheers,
>
> Michael
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] clone.2: Describe the user namespace
[not found] ` <CAKgNAkgXWp49wXKom9hMm9fajKVOAwOmFzPdKWBesbBhfZEssA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-27 17:20 ` Eric W. Biederman
@ 2012-12-27 17:47 ` Eric W. Biederman
[not found] ` <87sj6rs7zc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
1 sibling, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-27 17:47 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers
There is one other bit that needs to be documented in clone, although
I am not certain where/how.
The sequences:
unshare(CLONE_NEWPID).
clone(CLONE_VM)
setns(fd, CLONE_NEWPID).
clone(CLONE_VM).
Now fail.
Basically the rule is all threads must be in the same pid namespace.
The joy of reviews with good comments that come much later than hoped.
Eric
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
[not found] ` <87licjv276.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-27 18:39 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-27 18:39 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Serge E. Hallyn, Linux Containers
On Thu, Dec 27, 2012 at 6:23 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Hi Eric,
>>
>> Thanks for this patch. I have one question and a revised version f the
>> text that I'd like you to review.
>
> In this patch where I said 3.6 it should have been 3.5
Fixed!
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
[not found] ` <87obhfxwhb.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2012-12-28 19:20 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkjs9T-s8SG-EgTT0O-Uj8S98Q_zfnMqnZ1ROrcYqh7Z5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2012-12-28 19:20 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Linux Containers
Hi Eric,
On Thu, Dec 27, 2012 at 5:58 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
>> Hi Eric,
>>
>> Thanks for this patch. I have one question and a revised version f the
>> text that I'd like you to review.
>>
>> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>>
>>> Document the user namespace files that report the mapping of uids
>>> and gids between user namespaces.
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>> man5/proc.5 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 1 files changed, 50 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/man5/proc.5 b/man5/proc.5
>>> index fb70d2b..840480d 100644
>>> --- a/man5/proc.5
>>> +++ b/man5/proc.5
>>> @@ -317,6 +317,31 @@ The files in this directory are readable only by the owner of the process.
>>> .\" .TP
>>> .\" .IR /proc/[pid]/io " (since kernel 2.6.20)"
>>> .TP
>>> +.IR /proc/[pid]/gid_map " (since kernel 3.6)"
>>> +This file reports the mapping of gids from the user namespace of the process specified by
>>> +.IR pid
>>> +to the user namespace of the process that opened
>>> +.IR /proc/[pid]/gid_map .
>>> +
>>> +Each line specifies a 1 to 1 mapping of a range of contiguous gids from
>>> +the user namespace of the process specified by
>>> +.IR pid
>>> +to the user namespace of the process that opened
>>> +.IR /proc/[pid]/gid_map.
>>
>> I want to check the above point. What do you mean by "the process that
>> opened uid_map"? Does that mean the process that opened uid_map to do
>> the one-time write of the UID map? I had assumed that uid_map actually
>> provided a mapping between the namespace of 'pid' and the 'parent'
>> namespace, where the parent namespace is the namespace of the process
>> that created this namespace via clone(CLONE_NEWUSER).
>
> I mean the process that opens uid_map for read or write.
Thanks for the confirmation.
> For writing you are correct about the mapping to the parent (but that is
> not an exception that is a restriction on who can write to the file).
So, by the way, I added this sentence to the page:
In order to write to the /proc/[pid]/uid_map
(/proc/[pid]/gid_map) file, a process must have the
CAP_SETUID (CAP_SETGID) capability in the user namespace
of the process pid.
Is that correct?
But, there appear to be more rules than this governing whether a
process can write to the file (i.e., various other -EPERM cases). What
are the rules?
> The complete rule is for the user namespace of the second value is:
>
> - If the user namespace of the opener of the file and the user namespace
> of the process do not match. The user namespace of the opener of the
> file is used.
>
> - If the user namespace of the opener of the file and the user namespace
> of the process are the same. The parent user namespace of the process
> is used for the second value.
Could you give an example of the last case? (What I'm really seeking,
I think, is clarification of "parent user namespace". Does that mean
"user namespace of the process that created the user namespace of this
process"?)
> While very wordy I think the rule makes a lot of intuitive and practical
> sense. Especially since it is non-trivial to come up with the chain of
> user namespaces a process is in.
>
>>> +Each line contains three numbers. The start of the range of gids in
>>> +the user namespace of the process specifed by
>>> +.IR pid.
>>> +The start of the range of gids in the user namespace of the process that
>>> +opened
>>> +.IR /proc/[pid]/gid_map.
>>> +The number of gids in the range of numbers that is mapped between to two
>>> +user namespaces.
>>> +
>>> +After the creation of a new user namespace this file may be written to
>>> +exactly once to specify the mapping of gids in the new user namespace.
>>> +
>>> +.TP
>>> .IR /proc/[pid]/limits " (since kernel 2.6.24)"
>>> This file displays the soft limit, hard limit, and units of measurement
>>> for each of the process's resource limits (see
>>> @@ -1169,6 +1194,31 @@ directory are not available if the main thread has already terminated
>>> (typically by calling
>>> .BR pthread_exit (3)).
>>> .TP
>>> +.IR /proc/[pid]/uid_map " (since kernel 3.6)"
>>> +This file reports the mapping of uids from the user namespace of the process specified by
>>> +.IR pid
>>> +to the user namespace of the process that opened
>>> +.IR /proc/[pid]/uid_map .
>>> +
>>> +Each line specifies a 1 to 1 mapping of a range of contiguous uids from
>>> +the user namespace of the process specified by
>>> +.IR pid
>>> +to the user namespace of the process that opened
>>> +.IR /proc/[pid]/uid_map.
>>> +
>>> +Each line contains three numbers. The start of the range of uids in
>>> +the user namespace of the process specifed by
>>> +.IR pid.
>>> +The start of the range of uids in the user namespace of the process that
>>> +opened
>>> +.IR /proc/[pid]/uid_map.
>>> +The number of uids in the range of numbers that is mapped between to two
>>> +user namespaces.
>>> +
>>> +After the creation of a new user namespace this file may be written to
>>> +exactly once to specify the mapping of uids in the new user namespace.
>>> +
>>> +.TP
>>> .I /proc/apm
>>> Advanced power management version and battery information when
>>> .B CONFIG_APM
>>
>> I revised your text quite a bit, and added a piece on the format od
>> the uid_map files. Could you please read the following and let me know
>> of errors:
>>
>> [[
>> /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.6)
>> These files expose the mappings for user and group IDs
>> inside the user namespace for the process pid. The
>> description here explains the details for uid_map;
>> gid_map is exactly the same, but each instance of "user
>> ID" is replaced by "group ID".
>>
>> The uid_map file exposes the mapping of user IDs from
>> the user namespace of the process pid to the user names‐
>> pace of the process that opened uid_map.
>>
>> Each line in the file specifies a 1-to-1 mapping of a
>> range of contiguous user IDs from the user namespace of
>> the process pid to the user namespace of the process
>> that opened uid_map.
>>
>> Each line contains three numbers delimited by white
>> space:
>>
>> (1) The start of the range of user IDs in the user
>> namespace of the process pid.
>>
>> (2) The start of the range of user IDs in the user
>> namespace of the process that opened uid_map.
>>
>> (3) The length of the range of user IDs that is mapped
>> between the two user namespaces.
>>
>> After the creation of a new user namespace, this file
>> may be written to exactly once to specify the mapping of
>> user IDs in the new user namespace. (An attempt to
>> write more than once to the file fails with the error
>> EPERM.)
>>
>> The lines written to uid_map must conform to the follow‐
>> ing rules:
>>
>> * The three fields must be valid numbers, and the last
>> field must be greater than 0.
>>
>> * Lines are terminated by newline characters.
>>
>> * The file can contain a maximum of five lines.
>
> A maximum of 5 lines is important to Document but it is a current
> arbitrary limit that may be changed in the future. Right now 5 extents
> are more than enough for any conceivable use case, and fit nicely within
> a single cache line.
>
> It is probably better to say writes that exceed an arbitrary maximum
> length fail with -EINVAL. Currently the arbitrary maximum length is
> five lines.
Okay -- reworded.
>
>> * The values in both field 1 and field 2 of each line
>> must be in ascending numerical order.
>
> The rule is that the extents need to be non-overlapping. Ascending
> numerical order is how that is implemented but that is a misfeature,
> and there has already been one request to fix that. Removing the
> ascending numerical order limitation is on my todo list.
Okay -- I've reworded some text here.
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
[not found] ` <CAKgNAkjs9T-s8SG-EgTT0O-Uj8S98Q_zfnMqnZ1ROrcYqh7Z5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2012-12-28 21:20 ` Eric W. Biederman
[not found] ` <87vcbldgbj.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2012-12-28 21:20 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> Hi Eric,
>
> On Thu, Dec 27, 2012 at 5:58 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> Thanks for this patch. I have one question and a revised version f the
>>> text that I'd like you to review.
>>>
>>> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>>
>>>> Document the user namespace files that report the mapping of uids
>>>> and gids between user namespaces.
>>>>
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> ---
>>>> man5/proc.5 | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 1 files changed, 50 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/man5/proc.5 b/man5/proc.5
>>>> index fb70d2b..840480d 100644
>>>> --- a/man5/proc.5
>>>> +++ b/man5/proc.5
>>>> @@ -317,6 +317,31 @@ The files in this directory are readable only by the owner of the process.
>>>> .\" .TP
>>>> .\" .IR /proc/[pid]/io " (since kernel 2.6.20)"
>>>> .TP
>>>> +.IR /proc/[pid]/gid_map " (since kernel 3.6)"
>>>> +This file reports the mapping of gids from the user namespace of the process specified by
>>>> +.IR pid
>>>> +to the user namespace of the process that opened
>>>> +.IR /proc/[pid]/gid_map .
>>>> +
>>>> +Each line specifies a 1 to 1 mapping of a range of contiguous gids from
>>>> +the user namespace of the process specified by
>>>> +.IR pid
>>>> +to the user namespace of the process that opened
>>>> +.IR /proc/[pid]/gid_map.
>>>
>>> I want to check the above point. What do you mean by "the process that
>>> opened uid_map"? Does that mean the process that opened uid_map to do
>>> the one-time write of the UID map? I had assumed that uid_map actually
>>> provided a mapping between the namespace of 'pid' and the 'parent'
>>> namespace, where the parent namespace is the namespace of the process
>>> that created this namespace via clone(CLONE_NEWUSER).
>>
>> I mean the process that opens uid_map for read or write.
>
> Thanks for the confirmation.
>
>> For writing you are correct about the mapping to the parent (but that is
>> not an exception that is a restriction on who can write to the file).
>
> So, by the way, I added this sentence to the page:
>
> In order to write to the /proc/[pid]/uid_map
> (/proc/[pid]/gid_map) file, a process must have the
> CAP_SETUID (CAP_SETGID) capability in the user namespace
> of the process pid.
>
> Is that correct?
Yes.
> But, there appear to be more rules than this governing whether a
> process can write to the file (i.e., various other -EPERM cases). What
> are the rules?
In general you must also have CAP_SETUID (CAP_SETGID) in the parent user
namespace as well. The one exception to that is if you are mapping
your current uid and gid. A rose by any other name will smell as
sweet. In practice this means you must be root to map to uid or gids
other than your own, which preserves the current limits on setuid and
setgid.
Additionally the writer must see the map file with the lower user
namespace being the parent user namespace. Which means you must be
inside the user namespace itself or in the parent user namespace to
write to the user namespaces mapping file.
For /proc/[pid]/projid_map which will be interesting once xfs
has kuid/kgid support there are no capability checks because xfs let's
anyone have any projid.
This is one of the few cases where it almost matters to understand
how ns_capable works when you are not in the user namespace in question,
and that goes to what is a parent user namespace. If you would like
some more detail on that please ask.
>> The complete rule is for the user namespace of the second value is:
>>
>> - If the user namespace of the opener of the file and the user namespace
>> of the process do not match. The user namespace of the opener of the
>> file is used.
>>
>> - If the user namespace of the opener of the file and the user namespace
>> of the process are the same. The parent user namespace of the process
>> is used for the second value.
>
> Could you give an example of the last case? (What I'm really seeking,
> I think, is clarification of "parent user namespace". Does that mean
> "user namespace of the process that created the user namespace of this
> process"?)
User namespaces form a tree. What you can do in one user namespace is a
subset of what you can do in the parent user namespace.
The parent user namespace is the user namespace of the process that
calls unshare or clone with CLONE_NEWUSER.
The last case is the common case of /proc/self/uid_map. And you see how
your uids map into the user namespace of the creator of your user
namespace.
With the default being just: 0 0 4294967295
>> While very wordy I think the rule makes a lot of intuitive and practical
>> sense. Especially since it is non-trivial to come up with the chain of
>> user namespaces a process is in.
>>
>>>> +Each line contains three numbers. The start of the range of gids in
>>>> +the user namespace of the process specifed by
>>>> +.IR pid.
>>>> +The start of the range of gids in the user namespace of the process that
>>>> +opened
>>>> +.IR /proc/[pid]/gid_map.
>>>> +The number of gids in the range of numbers that is mapped between to two
>>>> +user namespaces.
>>>> +
>>>> +After the creation of a new user namespace this file may be written to
>>>> +exactly once to specify the mapping of gids in the new user namespace.
>>>> +
>>>> +.TP
>>>> .IR /proc/[pid]/limits " (since kernel 2.6.24)"
>>>> This file displays the soft limit, hard limit, and units of measurement
>>>> for each of the process's resource limits (see
>>>> @@ -1169,6 +1194,31 @@ directory are not available if the main thread has already terminated
>>>> (typically by calling
>>>> .BR pthread_exit (3)).
>>>> .TP
>>>> +.IR /proc/[pid]/uid_map " (since kernel 3.6)"
>>>> +This file reports the mapping of uids from the user namespace of the process specified by
>>>> +.IR pid
>>>> +to the user namespace of the process that opened
>>>> +.IR /proc/[pid]/uid_map .
>>>> +
>>>> +Each line specifies a 1 to 1 mapping of a range of contiguous uids from
>>>> +the user namespace of the process specified by
>>>> +.IR pid
>>>> +to the user namespace of the process that opened
>>>> +.IR /proc/[pid]/uid_map.
>>>> +
>>>> +Each line contains three numbers. The start of the range of uids in
>>>> +the user namespace of the process specifed by
>>>> +.IR pid.
>>>> +The start of the range of uids in the user namespace of the process that
>>>> +opened
>>>> +.IR /proc/[pid]/uid_map.
>>>> +The number of uids in the range of numbers that is mapped between to two
>>>> +user namespaces.
>>>> +
>>>> +After the creation of a new user namespace this file may be written to
>>>> +exactly once to specify the mapping of uids in the new user namespace.
>>>> +
>>>> +.TP
>>>> .I /proc/apm
>>>> Advanced power management version and battery information when
>>>> .B CONFIG_APM
>>>
>>> I revised your text quite a bit, and added a piece on the format od
>>> the uid_map files. Could you please read the following and let me know
>>> of errors:
>>>
>>> [[
>>> /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.6)
>>> These files expose the mappings for user and group IDs
>>> inside the user namespace for the process pid. The
>>> description here explains the details for uid_map;
>>> gid_map is exactly the same, but each instance of "user
>>> ID" is replaced by "group ID".
>>>
>>> The uid_map file exposes the mapping of user IDs from
>>> the user namespace of the process pid to the user names‐
>>> pace of the process that opened uid_map.
>>>
>>> Each line in the file specifies a 1-to-1 mapping of a
>>> range of contiguous user IDs from the user namespace of
>>> the process pid to the user namespace of the process
>>> that opened uid_map.
>>>
>>> Each line contains three numbers delimited by white
>>> space:
>>>
>>> (1) The start of the range of user IDs in the user
>>> namespace of the process pid.
>>>
>>> (2) The start of the range of user IDs in the user
>>> namespace of the process that opened uid_map.
>>>
>>> (3) The length of the range of user IDs that is mapped
>>> between the two user namespaces.
>>>
>>> After the creation of a new user namespace, this file
>>> may be written to exactly once to specify the mapping of
>>> user IDs in the new user namespace. (An attempt to
>>> write more than once to the file fails with the error
>>> EPERM.)
>>>
>>> The lines written to uid_map must conform to the follow‐
>>> ing rules:
>>>
>>> * The three fields must be valid numbers, and the last
>>> field must be greater than 0.
>>>
>>> * Lines are terminated by newline characters.
>>>
>>> * The file can contain a maximum of five lines.
>>
>> A maximum of 5 lines is important to Document but it is a current
>> arbitrary limit that may be changed in the future. Right now 5 extents
>> are more than enough for any conceivable use case, and fit nicely within
>> a single cache line.
>>
>> It is probably better to say writes that exceed an arbitrary maximum
>> length fail with -EINVAL. Currently the arbitrary maximum length is
>> five lines.
>
> Okay -- reworded.
>
>>
>>> * The values in both field 1 and field 2 of each line
>>> must be in ascending numerical order.
>>
>> The rule is that the extents need to be non-overlapping. Ascending
>> numerical order is how that is implemented but that is a misfeature,
>> and there has already been one request to fix that. Removing the
>> ascending numerical order limitation is on my todo list.
>
> Okay -- I've reworded some text here.
Thank you very much for your time and patience in getting a good
description of the user namespace.
Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] clone.2: Describe the user namespace
[not found] ` <87sj6rs7zc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-01 9:29 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgRQXn0-x6CXxvW94eeG19dOAOEx78iNC0+w08uX+Sg1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-01 9:29 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Linux Containers
Hi Eric,
On Thu, Dec 27, 2012 at 6:47 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>
> There is one other bit that needs to be documented in clone, although
> I am not certain where/how.
>
> The sequences:
>
> unshare(CLONE_NEWPID).
> clone(CLONE_VM)
>
> setns(fd, CLONE_NEWPID).
> clone(CLONE_VM).
>
> Now fail.
Can you define "now" please. Which kernel version?
> Basically the rule is all threads must be in the same pid namespace.
>
> The joy of reviews with good comments that come much later than hoped.
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] clone.2: Describe the user namespace
[not found] ` <87r4mbv2c9.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-01 9:30 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgPET9jex1DO=1Z3HRQqO_WVD8qmG-UaH1DQB6wDGqO5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-01 9:30 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Serge E. Hallyn, Linux Containers
Hi Eric,
On Thu, Dec 27, 2012 at 6:20 PM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Hi Eric,
>>
>> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>>> ---
>>> man2/clone.2 | 39 +++++++++++++++++++++++++++++++++++++++
>>> 1 files changed, 39 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/man2/clone.2 b/man2/clone.2
>>> index 0582057..4566677 100644
>>> --- a/man2/clone.2
>>> +++ b/man2/clone.2
>>> @@ -366,6 +366,45 @@ in the same
>>> .BR clone ()
>>> call.
>>> .TP
>>> +.BR CLONE_NEWUSER " (since Linux 3.6)"
>>
>> Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained
>> some meaning in 2.6.29.
>
> Looking at it where I have said 3.6 that is wrong. I meant 3.5.
Okay.
> I think I made the same mistake in one or two other manpages. Nothing
> was merged in 3.6 unfortunately.
I think the other cases have been fixed by now.
> My intent was these are the semantics of user namespaces since 3.5,
> when my rework/refocusing of them was merged.
>
> Since 3.5 all that has really happened with user namespaces is the
> uid/gid to kuid/kgid conversion, permission checks have been relaxed,
> and a few bugs have been fixed.
>
> 3.8 is huge from a usability standpoint. 3.8 is huge because setns(),
> and unshare() are now complete from a namespace perspective, and because
> enough permission checks have been relaxed in user namespaces that you
> can really start using them.
>
> But semantically from a user namespace perspective nothing really has
> changed in 3.8.
>
[...]
>> I reworked your text somewhat. Could you please review the following:
>>
>> [[
>> CLONE_NEWUSER
>> (This flag first became meaningful for clone() in Linux
>> 2.6.29, but the implementation of user namespaces was
>> only completed in Linux 3.8.)
>
> Long rant about 2.6.29 vs 3.8 above. I think what we need to say is:
>
> (This flag first became meaningful for clone() in Linux
> 2.6.29, the current semantics were merged present in
> 3.5, and user namespaces only really became usable in 3.8.)
Yup. I've done something like that now.
>> If CLONE_NEWUSER is set,
>> then create the process in a new user namespace. If
>> this flag is not set, then (as with fork(2)) the process
>> is created in the same user namespace as the calling
>> process.
>>
>> A user namespace provides an isolated environment for
>> security related identifiers, in particular, user IDs,
>> group IDs, keys (see keyctl(2)), and capabilities.
>>
>> When a user namespace is created, it starts out without
>> a mapping of user IDs (group IDs) to the parent user
>> namespace. The desired mapping of user IDs (group IDs)
>> to the parent user namespace may be set by writing into
>> /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
>
> /proc/[pid]/projid_map deserves a mention. Not that
> I am a fan of project is or that xfs where the are
> implemented has been converted yet but....
Would you be able to send a patch documenting this in proc(5)?
>> The first process in a user namespace starts out with a
>> complete set of capabilities with respect to the new
>> user namespace.
>>
>> System calls that return user IDs (group IDs) will
>> return either the user ID (group ID) mapped into the
>> current user namespace if there is a mapping, or the
>> overflow user ID (group ID); the default value for the
>> overflow user ID (group ID) is 65534. See the descrip‐
>> tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
>> nel/overflowgid in proc(5).
>>
>> Starting with Linux 3.8, no privileges are needed to
>> create a user namespace, and mount, PID, IPC, net, and
>> UTS namespaces can be created with just the
>> CAP_SYS_ADMIN capability in the caller's user namespace.
>>
>> Over the years, there have been a lot of features that
>> have been added to the Linux kernel that are only avail‐
>> able to privileged users because of their potential to
>> confuse set-user-ID-root applications. In general, it
>> becomes safe to allow the root user in a user namespace
>> to use those features because it is impossible, while in
>> a user namespace, to gain more privilege than the root
>> user of a user namespace has.
>
> I don't have any problems with this bit of text.
>
> It occurs to me that what is going on with capabilities and user
> namespaces needs to be documented better. There was a minor bug with
> them this release cycle and I realized while the current definition
> makes sense and isn't hard to understand in general. In detail the
> interaction of capabilities and user namespaces are hard to describe.
>
> I think capabilities and user namespaces are the work of a future patch
> however.
Okay. So, below, a new iteration of the text. Could you please check
it over, and note any errors to be fixed or improvements to be made.
Thanks,
Michael
CLONE_NEWUSER
(This flag first became meaningful for clone() in Linux
2.6.23, the current clone() semantics were merged in
Linux 3.5, and the final pieces to make the user names‐
paces completely usable were merged in Linux 3.8.)
If CLONE_NEWUSER is set, then create the process in a
new user namespace. If this flag is not set, then (as
with fork(2)) the process is created in the same user
namespace as the calling process.
A user namespace provides an isolated environment for
security related identifiers, in particular, user IDs,
group IDs, keys (see keyctl(2)), and capabilities.
When a user namespace is created, it starts out without
a mapping of user IDs (group IDs) to the parent user
namespace. The desired mapping of user IDs (group IDs)
to the parent user namespace may be set by writing into
/proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
The first process in a user namespace starts out with a
complete set of capabilities with respect to the new
user namespace.
System calls that return user IDs (group IDs) will
return either the user ID (group ID) mapped into the
current user namespace if there is a mapping, or the
overflow user ID (group ID); the default value for the
overflow user ID (group ID) is 65534. See the descrip‐
tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
nel/overflowgid in proc(5).
Use of this flag requires a kernel configured with the
CONFIG_USER_NS option. Before Linux 3.8, use of
CLONE_NEWUSER required that the caller have three capa‐
bilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID.
Starting with Linux 3.8, no privileges are needed to
create a user namespace, and mount, PID, IPC, net, and
UTS namespaces can be created with just the
CAP_SYS_ADMIN capability in the caller's user namespace.
Over the years, there have been a lot of features that
have been added to the Linux kernel that are only avail‐
able to privileged users because of their potential to
confuse set-user-ID-root applications. In general, it
becomes safe to allow the root user in a user namespace
to use those features because it is impossible, while in
a user namespace, to gain more privilege than the root
user of a user namespace has.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
[not found] ` <87bodftmv0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-01 9:30 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkjJR02rKOBh98n7HJwXqAwywHY=Ef35t9tW7wOuyo86NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-01 9:30 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Linux Containers
Hi Eric,
On Thu, Dec 27, 2012 at 6:40 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
>> Hi Eric,
>>
>> Some questions below.
>
> A quick note. Getting the permission checks correct has been a little
> more interesting that I would have preferred.
>
> I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns()
> install methods except the user namespace. Not a change in pre 3.8
> behavior but a change to my patch, and possibly a documentation change
> below.
>
>> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
>> <ebiederm@xmission.com> wrote:
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>> ---
>>> man2/setns.2 | 41 +++++++++++++++++++++++++++++++++--------
>>> 1 files changed, 33 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/man2/setns.2 b/man2/setns.2
>>> index 6aa01e1..63b04dc 100644
>>> --- a/man2/setns.2
>>> +++ b/man2/setns.2
>>> @@ -48,6 +48,18 @@ must refer to a network namespace.
>>> .BR CLONE_NEWUTS
>>> .I fd
>>> must refer to a UTS namespace.
>>> +.TP
>>> +.BR CLONE_NEWPID
>>> +.I fd
>>> +must refer to a PID namespace.
>>> +.TP
>>> +.BR CLONE_NEWUSER
>>> +.I fd
>>> +must refer to a user namespace.
>>> +.TP
>>> +.BR CLONE_NEWNS
>>> +.I fd
>>> +must refer to a mount namespace.
>>> .PP
>>> Specifying
>>> .I nstype
>>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
>>> .IR fd
>>> if the file descriptor was opened by another process and, for example,
>>> passed to the caller via a UNIX domain socket.)
>>> +
>>> +The pid namespace is a little different. Reassociating the calling
>>> +thread with a pid namespace only changes the pid namespace that the
>>> +child processes will be created in.
>>> +
>>> +Changing the pid namespace for child processes is only allowed if the
>>> +pid namespace specified by
>>> +.IR fd
>>> +is a child pid namespace of the pid namespace of the current thread.
>>
>> I assume "current thread" above should be "calling thread", right?
>
> What I mean in "current" from a kernel perspective.
>
> It should be just "caller".
Okay. Changed.
> Threads must share a pid namespace so mentioning threads seems wrong.
>
>>> +
>>> +A multi-threaded process may not change user namespace with setns. A
>>> +process may not reassociate the thread with the current user
>>> +namespace.
>>
>> What do you mean by "the current user nsamesapce"?
>
> fd = open("/proc/self/ns/user");
> setns(fd) -> -EINVAL.
>
> So from a userspace perspective I mean "the callers user namespace".
>
>>> The process reassociating itself with a user namespace
>>> +must have CAP_SYS_ADMIN privileges in the target user namespace.
>>>
>>> +A process may not be reassociated with a new mount namespace if it is
>>> +multi-threaded
>>
>> I tried to verify the precdeing two lines from the kernel source, but
>> did not work out where this check is made. Where is it?
>
> kernel/user_namespace.c:userns_install()
> fs/namespace.c:mntns_install()
Thanks.
> A couple of the security checks have been pushed down into a per
> namespace context, because the exact check that makes sense depends on
> the namespace.
>
>>> or it does not possess both CAP_SYS_CHROOT privileges
>>> +and CAP_SYS_ADMIN rights over the target mount namespace.
>>
>> Could you please expand/clarify the preceding two lines. As they
>> stand, I don't really understand them.
>
> Ugh. The text is slightly wrong.
>
> The code is:
> if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
> !nsown_capable(CAP_SYS_CHROOT) ||
> !nsown_capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> Basically you aren't allowed change your mount namespace into
> a mount namespace that doesn't see you as the all powerful root
> able to mount and unmount filesystems.
>
> You aren't allowed to change your mount namespace unless you possesses
> CAP_SYS_CHROOT and CAP_SYS_ADMIN.
Okay -- reworded.
So, I've done some more reworking of the text, which now reads as
folows. Could you please check this (and see my questions below).
CLONE_NEWPID behaves somewhat differently from the other
nstype values: reassociating the calling thread with a
PID namespace only changes the PID namespace that child
processes of the caller will be created in; it does not
change the PID namespace of the caller itself.
I reworked the preceding piece a lot. Is it correct still?
Reassoci‐
ating with a PID namespace is only allowed if the PID
namespace specified by fd is a descendant (child, grand‐
child, etc.)
Is the preceding sentence correct? (You talked only of children in
your original patch, but I believe it's more general than that.)
PID namespace of the PID namespace of the
caller.
A multi-threaded process may not change user namespace
with setns(). A process may not reassociate the thread
with the caller's user namespace.
What does the last sentence above *mean*? I don't understand it.
A process reassociat‐
ing itself with a user namespace must have CAP_SYS_ADMIN
privileges in the target user namespace.
A process may not be reassociated with a new mount names‐
pace if it is multi-threaded. Changing the mount names‐
pace requires that the caller possess both CAP_SYS_CHROOT
and CAP_SYS_ADMIN capabilities.
Re the last sentence: are those capabilities required in (1) the
target namespace, or (2) the source namespace, or (3) both? I suspect
(1), but please confirm.
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
[not found] ` <87vcbldgbj.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-01 9:37 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkjf=KS5FnP0L-TPTCjQuTDAMs-N4cadAP89L4Mb3KubzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-01 9:37 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Linux Containers
Hi Eric,
On Fri, Dec 28, 2012 at 10:20 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
[...]
>>> For writing you are correct about the mapping to the parent (but that is
>>> not an exception that is a restriction on who can write to the file).
>>
>> So, by the way, I added this sentence to the page:
>>
>> In order to write to the /proc/[pid]/uid_map
>> (/proc/[pid]/gid_map) file, a process must have the
>> CAP_SETUID (CAP_SETGID) capability in the user namespace
>> of the process pid.
>>
>> Is that correct?
>
> Yes.
>
>> But, there appear to be more rules than this governing whether a
>> process can write to the file (i.e., various other -EPERM cases). What
>> are the rules?
>
> In general you must also have CAP_SETUID (CAP_SETGID) in the parent user
> namespace as well. The one exception to that is if you are mapping
> your current uid and gid.
Can you clarify what you mean by "mapping your own UID and GID" please
(i.e., who is "you" in that sentence).
> A rose by any other name will smell as
> sweet. In practice this means you must be root to map to uid or gids
> other than your own, which preserves the current limits on setuid and
> setgid.
>
> Additionally the writer must see the map file with the lower user
> namespace being the parent user namespace. Which means you must be
> inside the user namespace itself or in the parent user namespace to
> write to the user namespaces mapping file.
Okay -- I added some words on this point.
> For /proc/[pid]/projid_map which will be interesting once xfs
> has kuid/kgid support there are no capability checks because xfs let's
> anyone have any projid.
>
> This is one of the few cases where it almost matters to understand
> how ns_capable works when you are not in the user namespace in question,
> and that goes to what is a parent user namespace. If you would like
> some more detail on that please ask.
>
>>> The complete rule is for the user namespace of the second value is:
>>>
>>> - If the user namespace of the opener of the file and the user namespace
>>> of the process do not match. The user namespace of the opener of the
>>> file is used.
>>>
>>> - If the user namespace of the opener of the file and the user namespace
>>> of the process are the same. The parent user namespace of the process
>>> is used for the second value.
>>
>> Could you give an example of the last case? (What I'm really seeking,
>> I think, is clarification of "parent user namespace". Does that mean
>> "user namespace of the process that created the user namespace of this
>> process"?)
>
> User namespaces form a tree. What you can do in one user namespace is a
> subset of what you can do in the parent user namespace.
>
> The parent user namespace is the user namespace of the process that
> calls unshare or clone with CLONE_NEWUSER.
Thanks.
> The last case is the common case of /proc/self/uid_map. And you see how
> your uids map into the user namespace of the creator of your user
> namespace.
Okay -- got it now.
> With the default being just: 0 0 4294967295
Right.
>>> While very wordy I think the rule makes a lot of intuitive and practical
>>> sense. Especially since it is non-trivial to come up with the chain of
>>> user namespaces a process is in.
Yes, I see what you mean.
[...]
> Thank you very much for your time and patience in getting a good
> description of the user namespace.
Well, we're not done yet, but we're getting there. Below, I've pasted
the current text from proc(5). Could you please take a look, and let
me know of any errors or improvements.
Cheers,
Michael
/proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.5)
These files expose the mappings for user and group IDs
inside the user namespace for the process pid. The
description here explains the details for uid_map;
gid_map is exactly the same, but each instance of "user
ID" is replaced by "group ID".
The uid_map file exposes the mapping of user IDs from
the user namespace of the process pid to the user names‐
pace of the process that opened uid_map (but see a qual‐
ification to this point below). In other words, pro‐
cesses that are in different user namespaces will poten‐
tially see different values when reading from a particu‐
lar uid_map file, depending on the user ID mappings for
the user namespaces of the reading processes.
Each line in the file specifies a 1-to-1 mapping of a
range of contiguous between two user namespaces. The
specification in each line takes the form of three num‐
bers delimited by white space. The first two numbers
specify the starting user ID in each user namespace.
The third number specifies the length of the mapped
range. In detail, the fields are interpreted as fol‐
lows:
(1) The start of the range of user IDs in the user
namespace of the process pid.
(2) The start of the range of user IDs to which the user
IDs specified by field one map. How field two is
interpreted depends on whether the process that
opened uid_map and the process pid are in the same
user namespace, as follows:
a) If the two processes are in different user names‐
paces: field two is the start of a range of user
IDs in the user namespace of the process that
opened uid_map.
b) If the two processes are in the same user names‐
pace: field two is the start of the range of user
IDs in the parent user namespace of the process
pid. (The "parent user namespace" is the user
namespace of the process that created a user
namespace via a call to unshare(2) or clone(2)
with the CLONE_NEWUSER flag.) This case enables
the opener of uid_map (the common case here is
opening /proc/self/uid_map) to see the mapping of
user IDs into the user namespace of the process
that created this user namespace.
(3) The length of the range of user IDs that is mapped
between the two user namespaces.
After the creation of a new user namespace, the uid_map
file may be written to exactly once to specify the map‐
ping of user IDs in the new user namespace. (An attempt
to write more than once to the file fails with the error
EPERM.)
The lines written to uid_map must conform to the follow‐
ing rules:
* The three fields must be valid numbers, and the last
field must be greater than 0.
* Lines are terminated by newline characters.
* There is an (arbitrary) limit on the number of lines
in the file. As at Linux 3.8, the limit is five
lines.
* The range of user IDs specified in each line cannot
overlap with the ranges in any other lines. In the
current implementation (Linux 3.8), this requirement
is satisified by a simplistic implementation that
imposes the further requirement that the values in
both field 1 and field 2 of successive lines must be
in ascending numerical order.
Writes that violate the above rules fail with the error
EINVAL.
In order for a process to write to the
/proc/[pid]/uid_map (/proc/[pid]/gid_map) file, the fol‐
lowing requirements must be met:
* The process must have the CAP_SETUID (CAP_SETGID)
capability in the user namespace of the process pid.
* The process must have the CAP_SETUID (CAP_SETGID)
capability in the parent user namespace.
* The process must be in either the user namespace of
the process pid or inside the parent user namespace
of the process pid.
==== end ====
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] clone.2: Describe the user namespace
[not found] ` <CAKgNAkgRQXn0-x6CXxvW94eeG19dOAOEx78iNC0+w08uX+Sg1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-01 9:39 ` Eric W. Biederman
[not found] ` <87a9st5jj4.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-01 9:39 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Linux API, Serge E. Hallyn, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> Hi Eric,
>
> On Thu, Dec 27, 2012 at 6:47 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>
>> There is one other bit that needs to be documented in clone, although
>> I am not certain where/how.
>>
>> The sequences:
>>
>> unshare(CLONE_NEWPID).
>> clone(CLONE_VM)
>>
>> setns(fd, CLONE_NEWPID).
>> clone(CLONE_VM).
>>
>> Now fail.
>
> Can you define "now" please. Which kernel version?
3.8
The sequence was impossible in 3.7.
I think that change that made that impossible happened in the 3.8-rc1 to
3.8-rc2 window.
Eric
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] clone.2: Describe the user namespace
[not found] ` <CAKgNAkgPET9jex1DO=1Z3HRQqO_WVD8qmG-UaH1DQB6wDGqO5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-01 9:45 ` Eric W. Biederman
0 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-01 9:45 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> Hi Eric,
>
> On Thu, Dec 27, 2012 at 6:20 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>>
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> ---
>>>> man2/clone.2 | 39 +++++++++++++++++++++++++++++++++++++++
>>>> 1 files changed, 39 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/man2/clone.2 b/man2/clone.2
>>>> index 0582057..4566677 100644
>>>> --- a/man2/clone.2
>>>> +++ b/man2/clone.2
>>>> @@ -366,6 +366,45 @@ in the same
>>>> .BR clone ()
>>>> call.
>>>> .TP
>>>> +.BR CLONE_NEWUSER " (since Linux 3.6)"
>>>
>>> Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained
>>> some meaning in 2.6.29.
>>
>> Looking at it where I have said 3.6 that is wrong. I meant 3.5.
>
> Okay.
>
>> I think I made the same mistake in one or two other manpages. Nothing
>> was merged in 3.6 unfortunately.
>
> I think the other cases have been fixed by now.
>
>> My intent was these are the semantics of user namespaces since 3.5,
>> when my rework/refocusing of them was merged.
>>
>> Since 3.5 all that has really happened with user namespaces is the
>> uid/gid to kuid/kgid conversion, permission checks have been relaxed,
>> and a few bugs have been fixed.
>>
>> 3.8 is huge from a usability standpoint. 3.8 is huge because setns(),
>> and unshare() are now complete from a namespace perspective, and because
>> enough permission checks have been relaxed in user namespaces that you
>> can really start using them.
>>
>> But semantically from a user namespace perspective nothing really has
>> changed in 3.8.
>>
> [...]
>
>>> I reworked your text somewhat. Could you please review the following:
>>>
>>> [[
>>> CLONE_NEWUSER
>>> (This flag first became meaningful for clone() in Linux
>>> 2.6.29, but the implementation of user namespaces was
>>> only completed in Linux 3.8.)
>>
>> Long rant about 2.6.29 vs 3.8 above. I think what we need to say is:
>>
>> (This flag first became meaningful for clone() in Linux
>> 2.6.29, the current semantics were merged present in
>> 3.5, and user namespaces only really became usable in 3.8.)
>
> Yup. I've done something like that now.
>
>>> If CLONE_NEWUSER is set,
>>> then create the process in a new user namespace. If
>>> this flag is not set, then (as with fork(2)) the process
>>> is created in the same user namespace as the calling
>>> process.
>>>
>>> A user namespace provides an isolated environment for
>>> security related identifiers, in particular, user IDs,
>>> group IDs, keys (see keyctl(2)), and capabilities.
>>>
>>> When a user namespace is created, it starts out without
>>> a mapping of user IDs (group IDs) to the parent user
>>> namespace. The desired mapping of user IDs (group IDs)
>>> to the parent user namespace may be set by writing into
>>> /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
>>
>> /proc/[pid]/projid_map deserves a mention. Not that
>> I am a fan of project is or that xfs where the are
>> implemented has been converted yet but....
>
> Would you be able to send a patch documenting this in proc(5)?
Sure. I don't know why I didn't mention projid in my earlier patch.
Same story fewer permission checks. Silly me.
>>> The first process in a user namespace starts out with a
>>> complete set of capabilities with respect to the new
>>> user namespace.
>>>
>>> System calls that return user IDs (group IDs) will
>>> return either the user ID (group ID) mapped into the
>>> current user namespace if there is a mapping, or the
>>> overflow user ID (group ID); the default value for the
>>> overflow user ID (group ID) is 65534. See the descrip‐
>>> tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
>>> nel/overflowgid in proc(5).
>>>
>>> Starting with Linux 3.8, no privileges are needed to
>>> create a user namespace, and mount, PID, IPC, net, and
>>> UTS namespaces can be created with just the
>>> CAP_SYS_ADMIN capability in the caller's user namespace.
>>>
>>> Over the years, there have been a lot of features that
>>> have been added to the Linux kernel that are only avail‐
>>> able to privileged users because of their potential to
>>> confuse set-user-ID-root applications. In general, it
>>> becomes safe to allow the root user in a user namespace
>>> to use those features because it is impossible, while in
>>> a user namespace, to gain more privilege than the root
>>> user of a user namespace has.
>>
>> I don't have any problems with this bit of text.
>>
>> It occurs to me that what is going on with capabilities and user
>> namespaces needs to be documented better. There was a minor bug with
>> them this release cycle and I realized while the current definition
>> makes sense and isn't hard to understand in general. In detail the
>> interaction of capabilities and user namespaces are hard to describe.
>>
>> I think capabilities and user namespaces are the work of a future patch
>> however.
>
> Okay. So, below, a new iteration of the text. Could you please check
> it over, and note any errors to be fixed or improvements to be made.
>
> Thanks,
>
> Michael
>
> CLONE_NEWUSER
> (This flag first became meaningful for clone() in Linux
> 2.6.23, the current clone() semantics were merged in
> Linux 3.5, and the final pieces to make the user names‐
> paces completely usable were merged in Linux 3.8.)
>
> If CLONE_NEWUSER is set, then create the process in a
> new user namespace. If this flag is not set, then (as
> with fork(2)) the process is created in the same user
> namespace as the calling process.
>
> A user namespace provides an isolated environment for
> security related identifiers, in particular, user IDs,
> group IDs, keys (see keyctl(2)), and capabilities.
>
> When a user namespace is created, it starts out without
> a mapping of user IDs (group IDs) to the parent user
> namespace. The desired mapping of user IDs (group IDs)
> to the parent user namespace may be set by writing into
> /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
>
> The first process in a user namespace starts out with a
> complete set of capabilities with respect to the new
> user namespace.
>
> System calls that return user IDs (group IDs) will
> return either the user ID (group ID) mapped into the
> current user namespace if there is a mapping, or the
> overflow user ID (group ID); the default value for the
> overflow user ID (group ID) is 65534. See the descrip‐
> tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
> nel/overflowgid in proc(5).
>
> Use of this flag requires a kernel configured with the
> CONFIG_USER_NS option. Before Linux 3.8, use of
> CLONE_NEWUSER required that the caller have three capa‐
> bilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID.
> Starting with Linux 3.8, no privileges are needed to
> create a user namespace, and mount, PID, IPC, net, and
> UTS namespaces can be created with just the
> CAP_SYS_ADMIN capability in the caller's user namespace.
>
> Over the years, there have been a lot of features that
> have been added to the Linux kernel that are only avail‐
> able to privileged users because of their potential to
> confuse set-user-ID-root applications. In general, it
> becomes safe to allow the root user in a user namespace
> to use those features because it is impossible, while in
> a user namespace, to gain more privilege than the root
> user of a user namespace has.
I don't see anything wrong with that text.
Happy New Year.
Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
[not found] ` <CAKgNAkjJR02rKOBh98n7HJwXqAwywHY=Ef35t9tW7wOuyo86NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-01 9:58 ` Eric W. Biederman
[not found] ` <87mwwt2pj8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-01 9:58 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> Hi Eric,
>
> On Thu, Dec 27, 2012 at 6:40 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>>
>>> Hi Eric,
>>>
>>> Some questions below.
>>
>> A quick note. Getting the permission checks correct has been a little
>> more interesting that I would have preferred.
>>
>> I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns()
>> install methods except the user namespace. Not a change in pre 3.8
>> behavior but a change to my patch, and possibly a documentation change
>> below.
>>
>>> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>>
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> ---
>>>> man2/setns.2 | 41 +++++++++++++++++++++++++++++++++--------
>>>> 1 files changed, 33 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/man2/setns.2 b/man2/setns.2
>>>> index 6aa01e1..63b04dc 100644
>>>> --- a/man2/setns.2
>>>> +++ b/man2/setns.2
>>>> @@ -48,6 +48,18 @@ must refer to a network namespace.
>>>> .BR CLONE_NEWUTS
>>>> .I fd
>>>> must refer to a UTS namespace.
>>>> +.TP
>>>> +.BR CLONE_NEWPID
>>>> +.I fd
>>>> +must refer to a PID namespace.
>>>> +.TP
>>>> +.BR CLONE_NEWUSER
>>>> +.I fd
>>>> +must refer to a user namespace.
>>>> +.TP
>>>> +.BR CLONE_NEWNS
>>>> +.I fd
>>>> +must refer to a mount namespace.
>>>> .PP
>>>> Specifying
>>>> .I nstype
>>>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type.
>>>> .IR fd
>>>> if the file descriptor was opened by another process and, for example,
>>>> passed to the caller via a UNIX domain socket.)
>>>> +
>>>> +The pid namespace is a little different. Reassociating the calling
>>>> +thread with a pid namespace only changes the pid namespace that the
>>>> +child processes will be created in.
>>>> +
>>>> +Changing the pid namespace for child processes is only allowed if the
>>>> +pid namespace specified by
>>>> +.IR fd
>>>> +is a child pid namespace of the pid namespace of the current thread.
>>>
>>> I assume "current thread" above should be "calling thread", right?
>>
>> What I mean in "current" from a kernel perspective.
>>
>> It should be just "caller".
>
> Okay. Changed.
>
>> Threads must share a pid namespace so mentioning threads seems wrong.
>>
>>>> +
>>>> +A multi-threaded process may not change user namespace with setns. A
>>>> +process may not reassociate the thread with the current user
>>>> +namespace.
>>>
>>> What do you mean by "the current user nsamesapce"?
>>
>> fd = open("/proc/self/ns/user");
>> setns(fd) -> -EINVAL.
>>
>> So from a userspace perspective I mean "the callers user namespace".
>>
>>>> The process reassociating itself with a user namespace
>>>> +must have CAP_SYS_ADMIN privileges in the target user namespace.
>>>>
>>>> +A process may not be reassociated with a new mount namespace if it is
>>>> +multi-threaded
>>>
>>> I tried to verify the precdeing two lines from the kernel source, but
>>> did not work out where this check is made. Where is it?
>>
>> kernel/user_namespace.c:userns_install()
>> fs/namespace.c:mntns_install()
>
> Thanks.
>
>> A couple of the security checks have been pushed down into a per
>> namespace context, because the exact check that makes sense depends on
>> the namespace.
>>
>>>> or it does not possess both CAP_SYS_CHROOT privileges
>>>> +and CAP_SYS_ADMIN rights over the target mount namespace.
>>>
>>> Could you please expand/clarify the preceding two lines. As they
>>> stand, I don't really understand them.
>>
>> Ugh. The text is slightly wrong.
>>
>> The code is:
>> if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
>> !nsown_capable(CAP_SYS_CHROOT) ||
>> !nsown_capable(CAP_SYS_ADMIN))
>> return -EPERM;
>>
>> Basically you aren't allowed change your mount namespace into
>> a mount namespace that doesn't see you as the all powerful root
>> able to mount and unmount filesystems.
>>
>> You aren't allowed to change your mount namespace unless you possesses
>> CAP_SYS_CHROOT and CAP_SYS_ADMIN.
>
> Okay -- reworded.
>
> So, I've done some more reworking of the text, which now reads as
> folows. Could you please check this (and see my questions below).
>
> CLONE_NEWPID behaves somewhat differently from the other
> nstype values: reassociating the calling thread with a
> PID namespace only changes the PID namespace that child
> processes of the caller will be created in; it does not
> change the PID namespace of the caller itself.
> I reworked the preceding piece a lot. Is it correct still?
>
> Reassoci‐
> ating with a PID namespace is only allowed if the PID
> namespace specified by fd is a descendant (child, grand‐
> child, etc.)
>
> Is the preceding sentence correct? (You talked only of children in
> your original patch, but I believe it's more general than that.)
Yes. That is correct.
> PID namespace of the PID namespace of the
> caller.
>
> A multi-threaded process may not change user namespace
> with setns(). A process may not reassociate the thread
> with the caller's user namespace.
>
> What does the last sentence above *mean*? I don't understand it.
So the set of checks are:
/* Don't allow gaining capabilities by reentering
* the same user namespace.
*/
if (user_ns == current_user_ns())
return -EINVAL;
/* Threaded processes may not enter a different user namespace */
if (atomic_read(¤t->mm->mm_users) > 1)
return -EINVAL;
if (!ns_capable(user_ns, CAP_SYS_ADMIN))
return -EPERM;
Rereading it looks like I was going fast and suffered from dropping
important words.
A multi-threaded process may not change it's user namespace
with setns().
aka if you have threads setns for a user namespace will fail.
A process may not change the user namespace to the caller's user
namespace via setns. This is important because changing to a
user namespace via setns implies gaining all caps, and you should
not be able to gain all caps over your current user namespace.
Hopefully that clears it up.
> A process reassociat‐
> ing itself with a user namespace must have CAP_SYS_ADMIN
> privileges in the target user namespace.
>
> A process may not be reassociated with a new mount names‐
> pace if it is multi-threaded. Changing the mount names‐
> pace requires that the caller possess both CAP_SYS_CHROOT
> and CAP_SYS_ADMIN capabilities.
>
> Re the last sentence: are those capabilities required in (1) the
> target namespace, or (2) the source namespace, or (3) both? I suspect
> (1), but please confirm.
CAP_SYS_ADMIN is required in the current user namespace.
CAP_SYS_ADMIN is required over the target mount namesapce.
CAP_SYS_CHROOT is required in the current user namespace.
Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
[not found] ` <CAKgNAkjf=KS5FnP0L-TPTCjQuTDAMs-N4cadAP89L4Mb3KubzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-01 10:12 ` Eric W. Biederman
[not found] ` <87r4m51abp.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-01 10:12 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w; +Cc: Linux API, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
> Hi Eric,
>
> On Fri, Dec 28, 2012 at 10:20 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
> [...]
>
>>>> For writing you are correct about the mapping to the parent (but that is
>>>> not an exception that is a restriction on who can write to the file).
>>>
>>> So, by the way, I added this sentence to the page:
>>>
>>> In order to write to the /proc/[pid]/uid_map
>>> (/proc/[pid]/gid_map) file, a process must have the
>>> CAP_SETUID (CAP_SETGID) capability in the user namespace
>>> of the process pid.
>>>
>>> Is that correct?
>>
>> Yes.
>>
>>> But, there appear to be more rules than this governing whether a
>>> process can write to the file (i.e., various other -EPERM cases). What
>>> are the rules?
>>
>> In general you must also have CAP_SETUID (CAP_SETGID) in the parent user
>> namespace as well. The one exception to that is if you are mapping
>> your current uid and gid.
>
> Can you clarify what you mean by "mapping your own UID and GID" please
> (i.e., who is "you" in that sentence).
At the time of clone() or unshare() that creates a new user namespace,
the kuid and the kgid of the process does not change.
setuid and setgid fail before any mappings are set up.
Therefore the caller is allowed to map any single uid to the uid of the
caller in the parent user namespace. Likewise the caller is allowed to
map any single gid to the gid of the caller in the parent user
namespace.
>> A rose by any other name will smell as
>> sweet. In practice this means you must be root to map to uid or gids
>> other than your own, which preserves the current limits on setuid and
>> setgid.
>>
>> Additionally the writer must see the map file with the lower user
>> namespace being the parent user namespace. Which means you must be
>> inside the user namespace itself or in the parent user namespace to
>> write to the user namespaces mapping file.
>
> Okay -- I added some words on this point.
>
>> For /proc/[pid]/projid_map which will be interesting once xfs
>> has kuid/kgid support there are no capability checks because xfs let's
>> anyone have any projid.
>>
>> This is one of the few cases where it almost matters to understand
>> how ns_capable works when you are not in the user namespace in question,
>> and that goes to what is a parent user namespace. If you would like
>> some more detail on that please ask.
>>
>>>> The complete rule is for the user namespace of the second value is:
>>>>
>>>> - If the user namespace of the opener of the file and the user namespace
>>>> of the process do not match. The user namespace of the opener of the
>>>> file is used.
>>>>
>>>> - If the user namespace of the opener of the file and the user namespace
>>>> of the process are the same. The parent user namespace of the process
>>>> is used for the second value.
>>>
>>> Could you give an example of the last case? (What I'm really seeking,
>>> I think, is clarification of "parent user namespace". Does that mean
>>> "user namespace of the process that created the user namespace of this
>>> process"?)
>>
>> User namespaces form a tree. What you can do in one user namespace is a
>> subset of what you can do in the parent user namespace.
>>
>> The parent user namespace is the user namespace of the process that
>> calls unshare or clone with CLONE_NEWUSER.
>
> Thanks.
>
>> The last case is the common case of /proc/self/uid_map. And you see how
>> your uids map into the user namespace of the creator of your user
>> namespace.
>
> Okay -- got it now.
>
>> With the default being just: 0 0 4294967295
>
> Right.
>
>>>> While very wordy I think the rule makes a lot of intuitive and practical
>>>> sense. Especially since it is non-trivial to come up with the chain of
>>>> user namespaces a process is in.
>
> Yes, I see what you mean.
>
> [...]
>
>> Thank you very much for your time and patience in getting a good
>> description of the user namespace.
>
> Well, we're not done yet, but we're getting there. Below, I've pasted
> the current text from proc(5). Could you please take a look, and let
> me know of any errors or improvements.
>
> Cheers,
>
> Michael
>
> /proc/[pid]/uid_map, /proc/[pid]/gid_map (since Linux 3.5)
> These files expose the mappings for user and group IDs
> inside the user namespace for the process pid. The
> description here explains the details for uid_map;
> gid_map is exactly the same, but each instance of "user
> ID" is replaced by "group ID".
>
> The uid_map file exposes the mapping of user IDs from
> the user namespace of the process pid to the user names‐
> pace of the process that opened uid_map (but see a qual‐
> ification to this point below). In other words, pro‐
> cesses that are in different user namespaces will poten‐
> tially see different values when reading from a particu‐
> lar uid_map file, depending on the user ID mappings for
> the user namespaces of the reading processes.
>
> Each line in the file specifies a 1-to-1 mapping of a
> range of contiguous between two user namespaces. The
> specification in each line takes the form of three num‐
> bers delimited by white space. The first two numbers
> specify the starting user ID in each user namespace.
> The third number specifies the length of the mapped
> range. In detail, the fields are interpreted as fol‐
> lows:
>
> (1) The start of the range of user IDs in the user
> namespace of the process pid.
>
> (2) The start of the range of user IDs to which the user
> IDs specified by field one map. How field two is
> interpreted depends on whether the process that
> opened uid_map and the process pid are in the same
> user namespace, as follows:
>
> a) If the two processes are in different user names‐
> paces: field two is the start of a range of user
> IDs in the user namespace of the process that
> opened uid_map.
>
> b) If the two processes are in the same user names‐
> pace: field two is the start of the range of user
> IDs in the parent user namespace of the process
> pid. (The "parent user namespace" is the user
> namespace of the process that created a user
> namespace via a call to unshare(2) or clone(2)
> with the CLONE_NEWUSER flag.) This case enables
> the opener of uid_map (the common case here is
> opening /proc/self/uid_map) to see the mapping of
> user IDs into the user namespace of the process
> that created this user namespace.
>
> (3) The length of the range of user IDs that is mapped
> between the two user namespaces.
>
> After the creation of a new user namespace, the uid_map
> file may be written to exactly once to specify the map‐
> ping of user IDs in the new user namespace. (An attempt
> to write more than once to the file fails with the error
> EPERM.)
>
> The lines written to uid_map must conform to the follow‐
> ing rules:
>
> * The three fields must be valid numbers, and the last
> field must be greater than 0.
>
> * Lines are terminated by newline characters.
>
> * There is an (arbitrary) limit on the number of lines
> in the file. As at Linux 3.8, the limit is five
> lines.
>
> * The range of user IDs specified in each line cannot
> overlap with the ranges in any other lines. In the
> current implementation (Linux 3.8), this requirement
> is satisified by a simplistic implementation that
> imposes the further requirement that the values in
> both field 1 and field 2 of successive lines must be
> in ascending numerical order.
>
> Writes that violate the above rules fail with the error
> EINVAL.
>
> In order for a process to write to the
> /proc/[pid]/uid_map (/proc/[pid]/gid_map) file, the fol‐
> lowing requirements must be met:
>
> * The process must have the CAP_SETUID (CAP_SETGID)
> capability in the user namespace of the process pid.
>
> * The process must have the CAP_SETUID (CAP_SETGID)
> capability in the parent user namespace.
>
> * The process must be in either the user namespace of
> the process pid or inside the parent user namespace
> of the process pid.
That sounds right.
In addition /proc/[pid]/projid_map was added in 3.7, and obeys the same
rules except that there are no capabilities required to set the mapping.
I suspect it is probably easier to add a quick mention of projid_map
instead of repeating all of the text bug I could be wrong. In any event
I will leave off with projid_map until we get the uid_map and gid_map
ext solid.
Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] clone.2: Describe the user namespace
[not found] ` <87a9st5jj4.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-07 8:33 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkggMKib5v4ND9UR1jH=CrK-viM5hhfmc0Rw=mP5GbenSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-07 8:33 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Serge E. Hallyn, Linux Containers
Hi Eric,
On Tue, Jan 1, 2013 at 10:39 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Hi Eric,
>>
>> On Thu, Dec 27, 2012 at 6:47 PM, Eric W. Biederman
>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>
>>> There is one other bit that needs to be documented in clone, although
>>> I am not certain where/how.
>>>
>>> The sequences:
>>>
>>> unshare(CLONE_NEWPID).
>>> clone(CLONE_VM)
>>>
>>> setns(fd, CLONE_NEWPID).
>>> clone(CLONE_VM).
>>>
>>> Now fail.
>>
>> Can you define "now" please. Which kernel version?
>
> 3.8
>
> The sequence was impossible in 3.7.
>
> I think that change that made that impossible happened in the 3.8-rc1 to
> 3.8-rc2 window.
Adding something along these lines to the man page would be fine, but
we need some text to explain *why* these sequences fail. Could you
send me a sentence or two about that?
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 2/4] clone.2: Describe the user namespace
[not found] ` <CAKgNAkggMKib5v4ND9UR1jH=CrK-viM5hhfmc0Rw=mP5GbenSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-07 8:59 ` Eric W. Biederman
0 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-07 8:59 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Linux API, Serge E. Hallyn, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> Hi Eric,
>
> On Tue, Jan 1, 2013 at 10:39 AM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>>> Hi Eric,
>>>
>>> On Thu, Dec 27, 2012 at 6:47 PM, Eric W. Biederman
>>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>>>
>>>> There is one other bit that needs to be documented in clone, although
>>>> I am not certain where/how.
>>>>
>>>> The sequences:
>>>>
>>>> unshare(CLONE_NEWPID).
>>>> clone(CLONE_VM)
>>>>
>>>> setns(fd, CLONE_NEWPID).
>>>> clone(CLONE_VM).
>>>>
>>>> Now fail.
>>>
>>> Can you define "now" please. Which kernel version?
>>
>> 3.8
>>
>> The sequence was impossible in 3.7.
>>
>> I think that change that made that impossible happened in the 3.8-rc1 to
^^^^^^^^^ illegal 3.8-rc1 made the sequence possible.
>> 3.8-rc2 window.
>
> Adding something along these lines to the man page would be fine, but
> we need some text to explain *why* these sequences fail. Could you
> send me a sentence or two about that?
The basic principle is every thread in a process must be in the same pid
namespace. As unshare(CLONE_NEWPID) and setns(fd, CLONE_NEWPID) only
change the pid namespace for created children creating a child process
that is a thread would put that thread in a different pid namespace.
Creating a multithreaded application and then setns(fd, CLONE_NEWPID or
clone(CLONE_NEWPID) was outlawed because it was two bizarre and no one
cared. Oleg noticed you could create the threads afterwards and get
into a bizarre state that no one wanted to support.
Eric
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
[not found] ` <87mwwt2pj8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-07 9:51 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkggEOV0dXVzr4Zf3n_-it5SXfvjJ1ooYxiVNWaYzQgRLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-07 9:51 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Linux Containers
On Tue, Jan 1, 2013 at 10:58 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> writes:
>
[...]
>> PID namespace of the PID namespace of the
>> caller.
>>
>> A multi-threaded process may not change user namespace
>> with setns(). A process may not reassociate the thread
>> with the caller's user namespace.
>>
>> What does the last sentence above *mean*? I don't understand it.
>
> So the set of checks are:
>
> /* Don't allow gaining capabilities by reentering
> * the same user namespace.
> */
> if (user_ns == current_user_ns())
> return -EINVAL;
>
> /* Threaded processes may not enter a different user namespace */
> if (atomic_read(¤t->mm->mm_users) > 1)
> return -EINVAL;
>
> if (!ns_capable(user_ns, CAP_SYS_ADMIN))
> return -EPERM;
>
> Rereading it looks like I was going fast and suffered from dropping
> important words.
>
> A multi-threaded process may not change it's user namespace
> with setns().
>
> aka if you have threads setns for a user namespace will fail.
>
>
> A process may not change the user namespace to the caller's user
> namespace via setns. This is important because changing to a
> user namespace via setns implies gaining all caps, and you should
> not be able to gain all caps over your current user namespace.
>
> Hopefully that clears it up.
Well, I worded it rather differently, but I hope I got it right. See below.
>> A process reassociat‐
>> ing itself with a user namespace must have CAP_SYS_ADMIN
>> privileges in the target user namespace.
>>
>> A process may not be reassociated with a new mount names‐
>> pace if it is multi-threaded. Changing the mount names‐
>> pace requires that the caller possess both CAP_SYS_CHROOT
>> and CAP_SYS_ADMIN capabilities.
>>
>> Re the last sentence: are those capabilities required in (1) the
>> target namespace, or (2) the source namespace, or (3) both? I suspect
>> (1), but please confirm.
>
> CAP_SYS_ADMIN is required in the current user namespace.
> CAP_SYS_ADMIN is required over the target mount namesapce.
>
> CAP_SYS_CHROOT is required in the current user namespace.
Okay. See below.
So, let's take one more pass. How does the following look:
A multi-threaded process may not change user namespace with
setns(). It is not permitted to use setns() to reenter the
caller's current user namespace. This prevents a caller that
has dropped capabilities from regaining those capabilities via
a call to setns() A process reassociating itself with a user
namespace must have CAP_SYS_ADMIN privileges in the target user
namespace.
A process may not be reassociated with a new mount namespace if
it is multi-threaded. Changing the mount namespace requires
that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN
capabilities in its own user namespace and CAP_SYS_ADMIN in the
target mount namespace.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support.
[not found] ` <CAKgNAkggEOV0dXVzr4Zf3n_-it5SXfvjJ1ooYxiVNWaYzQgRLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-01-07 23:58 ` Eric W. Biederman
0 siblings, 0 replies; 30+ messages in thread
From: Eric W. Biederman @ 2013-01-07 23:58 UTC (permalink / raw)
To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
Cc: Linux API, Serge E. Hallyn, Linux Containers
"Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
> Okay. See below.
>
> So, let's take one more pass. How does the following look:
>
> A multi-threaded process may not change user namespace with
> setns(). It is not permitted to use setns() to reenter the
> caller's current user namespace. This prevents a caller that
> has dropped capabilities from regaining those capabilities via
> a call to setns() A process reassociating itself with a user
> namespace must have CAP_SYS_ADMIN privileges in the target user
> namespace.
>
> A process may not be reassociated with a new mount namespace if
> it is multi-threaded. Changing the mount namespace requires
> that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN
> capabilities in its own user namespace and CAP_SYS_ADMIN in the
> target mount namespace.
That wording looks correct.
Eric
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map
[not found] ` <87r4m51abp.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2013-01-14 8:59 ` Michael Kerrisk (man-pages)
0 siblings, 0 replies; 30+ messages in thread
From: Michael Kerrisk (man-pages) @ 2013-01-14 8:59 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Linux API, Linux Containers
Hi Eric,
On Tue, Jan 1, 2013 at 11:12 AM, Eric W. Biederman
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
>> Hi Eric,
>>
>> On Fri, Dec 28, 2012 at 10:20 PM, Eric W. Biederman
>> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>>> "Michael Kerrisk (man-pages)" <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>>
>> [...]
>>
>>>>> For writing you are correct about the mapping to the parent (but that is
>>>>> not an exception that is a restriction on who can write to the file).
>>>>
>>>> So, by the way, I added this sentence to the page:
>>>>
>>>> In order to write to the /proc/[pid]/uid_map
>>>> (/proc/[pid]/gid_map) file, a process must have the
>>>> CAP_SETUID (CAP_SETGID) capability in the user namespace
>>>> of the process pid.
>>>>
>>>> Is that correct?
>>>
>>> Yes.
>>>
>>>> But, there appear to be more rules than this governing whether a
>>>> process can write to the file (i.e., various other -EPERM cases). What
>>>> are the rules?
>>>
>>> In general you must also have CAP_SETUID (CAP_SETGID) in the parent user
>>> namespace as well. The one exception to that is if you are mapping
>>> your current uid and gid.
>>
>> Can you clarify what you mean by "mapping your own UID and GID" please
>> (i.e., who is "you" in that sentence).
>
> At the time of clone() or unshare() that creates a new user namespace,
> the kuid and the kgid of the process does not change.
>
> setuid and setgid fail before any mappings are set up.
>
> Therefore the caller is allowed to map any single uid to the uid of the
> caller in the parent user namespace. Likewise the caller is allowed to
> map any single gid to the gid of the caller in the parent user
> namespace.
So, then is the following text now correct and complete:
In order for a process to write to the /proc/[pid]/uid_map
(/proc/[pid]/gid_map) file, the following requirements must be
met:
* The process must have the CAP_SETUID (CAP_SETGID) capability
in the user namespace of the process pid.
* The process must have the CAP_SETUID (CAP_SETGID) capability
in the parent user namespace. There is an exception to this
requirement: a process writing to uid_map (gid_map) is
allowed to map any single UID (GID) to the file system UID
(GID) of the caller in the parent user namespace.
* The process must be in either the user namespace of the
process pid or inside the parent user namespace of the
process pid.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2013-01-14 8:59 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-26 22:57 [PATCH 0/4] namespace man page updates for 3.8 Eric W. Biederman
[not found] ` <87a9u4rmz0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-11-27 0:46 ` [PATCH 1/4] proc.5: Document /proc/[pid]/uid_map and /proc/[pid]/gid_map Eric W. Biederman
[not found] ` <874nkbrhyv.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 9:03 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkixXmtvQUbwyv=a8mU=gdf-x+w-ou_4N=cNaau+hVoy4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-27 16:58 ` Eric W. Biederman
[not found] ` <87obhfxwhb.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-28 19:20 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkjs9T-s8SG-EgTT0O-Uj8S98Q_zfnMqnZ1ROrcYqh7Z5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-28 21:20 ` Eric W. Biederman
[not found] ` <87vcbldgbj.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-01 9:37 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkjf=KS5FnP0L-TPTCjQuTDAMs-N4cadAP89L4Mb3KubzQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-01 10:12 ` Eric W. Biederman
[not found] ` <87r4m51abp.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-14 8:59 ` Michael Kerrisk (man-pages)
2012-12-27 17:23 ` Eric W. Biederman
[not found] ` <87licjv276.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 18:39 ` Michael Kerrisk (man-pages)
2012-11-27 0:46 ` [PATCH 2/4] clone.2: Describe the user namespace Eric W. Biederman
[not found] ` <87y5hnq3d5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 10:16 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgXWp49wXKom9hMm9fajKVOAwOmFzPdKWBesbBhfZEssA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-27 17:20 ` Eric W. Biederman
[not found] ` <87r4mbv2c9.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-01 9:30 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgPET9jex1DO=1Z3HRQqO_WVD8qmG-UaH1DQB6wDGqO5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-01 9:45 ` Eric W. Biederman
2012-12-27 17:47 ` Eric W. Biederman
[not found] ` <87sj6rs7zc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-01 9:29 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkgRQXn0-x6CXxvW94eeG19dOAOEx78iNC0+w08uX+Sg1w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-01 9:39 ` Eric W. Biederman
[not found] ` <87a9st5jj4.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-07 8:33 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkggMKib5v4ND9UR1jH=CrK-viM5hhfmc0Rw=mP5GbenSg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-07 8:59 ` Eric W. Biederman
2012-11-27 0:47 ` [PATCH 3/4] proc.5: Document the proc files for the user, mount, and pid namespaces Eric W. Biederman
[not found] ` <87pq2zq3b6.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 10:28 ` Michael Kerrisk (man-pages)
2012-11-27 0:48 ` [PATCH 4/4] setns.2: Document the pid, user, and mount namespace support Eric W. Biederman
[not found] ` <87k3t7q39u.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-27 11:08 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkiaw5L_oNE8NENjmoBS8Hq_uj+iaEdhyXc1+hje4HdnNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-27 17:40 ` Eric W. Biederman
[not found] ` <87bodftmv0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-01 9:30 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkjJR02rKOBh98n7HJwXqAwywHY=Ef35t9tW7wOuyo86NQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-01 9:58 ` Eric W. Biederman
[not found] ` <87mwwt2pj8.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-01-07 9:51 ` Michael Kerrisk (man-pages)
[not found] ` <CAKgNAkggEOV0dXVzr4Zf3n_-it5SXfvjJ1ooYxiVNWaYzQgRLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-01-07 23:58 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).