From: "Serge E. Hallyn" <serge@hallyn.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: akpm@osdl.org, segooon@gmail.com, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, containers@lists.linux-foundation.org,
dhowells@redhat.com, rdunlap@xenotime.net
Subject: Re: missing [PATCH 01/15]
Date: Sat, 3 Sep 2011 01:09:33 +0000 [thread overview]
Message-ID: <20110903010933.GA14126@hallyn.com> (raw)
In-Reply-To: <m11uvyld2d.fsf_-_@fess.ebiederm.org>
Quoting Eric W. Biederman (ebiederm@xmission.com):
>
>
> Was this blank email supposed to be patch 01/15?
Nope, that was <grr> a git-send-email misfire. Sorry about that. The
patch #1 did go through, here: https://lkml.org/lkml/2011/9/2/314
I'm appending it here again too for easier feedback.
thanks,
-serge
Subject: [PATCH 01/15] add Documentation/namespaces/user_namespace.txt (v3)
Quoting David Howells (dhowells@redhat.com):
> Randy Dunlap <rdunlap@xenotime.net> wrote:
>
> > > +Any task in or resource belonging to the initial user namespace will, to this
> > > +new task, appear to belong to UID and GID -1 - which is usually known as
> >
> > that extra hyphen is confusing. how about:
> >
> > to UID and GID -1, which is
>
> 'which are'.
>
> David
This will hold some info about the design. Currently it contains
future todos, issues and questions.
Changelog:
jul 26: incorporate feedback from David Howells.
jul 29: incorporate feedback from Randy Dunlap.
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
---
Documentation/namespaces/user_namespace.txt | 107 +++++++++++++++++++++++++++
1 files changed, 107 insertions(+), 0 deletions(-)
create mode 100644 Documentation/namespaces/user_namespace.txt
diff --git a/Documentation/namespaces/user_namespace.txt b/Documentation/namespaces/user_namespace.txt
new file mode 100644
index 0000000..b0bc480
--- /dev/null
+++ b/Documentation/namespaces/user_namespace.txt
@@ -0,0 +1,107 @@
+Description
+===========
+
+Traditionally, each task is owned by a user ID (UID) and belongs to one or more
+groups (GID). Both are simple numeric IDs, though userspace usually translates
+them to names. The user namespace allows tasks to have different views of the
+UIDs and GIDs associated with tasks and other resources. (See 'UID mapping'
+below for more.)
+
+The user namespace is a simple hierarchical one. The system starts with all
+tasks belonging to the initial user namespace. A task creates a new user
+namespace by passing the CLONE_NEWUSER flag to clone(2). This requires the
+creating task to have the CAP_SETUID, CAP_SETGID, and CAP_CHOWN capabilities,
+but it does not need to be running as root. The clone(2) call will result in a
+new task which to itself appears to be running as UID and GID 0, but to its
+creator seems to have the creator's credentials.
+
+To this new task, any resource belonging to the initial user namespace will
+appear to belong to user and group 'nobody', which are UID and GID -1.
+Permission to open such files will be granted according to world access
+permissions. UID comparisons and group membership checks will return false,
+and privilege will be denied.
+
+When a task belonging to (for example) userid 500 in the initial user namespace
+creates a new user namespace, even though the new task will see itself as
+belonging to UID 0, any task in the initial user namespace will see it as
+belonging to UID 500. Therefore, UID 500 in the initial user namespace will be
+able to kill the new task. Files created by the new user will (eventually) be
+seen by tasks in its own user namespace as belonging to UID 0, but to tasks in
+the initial user namespace as belonging to UID 500.
+
+Note that this userid mapping for the VFS is not yet implemented, though the
+lkml and containers mailing list archives will show several previous
+prototypes. In the end, those got hung up waiting on the concept of targeted
+capabilities to be developed, which, thanks to the insight of Eric Biederman,
+they finally did.
+
+Relationship between the User namespace and other namespaces
+============================================================
+
+Other namespaces, such as UTS and network, are owned by a user namespace. When
+such a namespace is created, it is assigned to the user namespace of the task
+by which it was created. Therefore, attempts to exercise privilege to
+resources in, for instance, a particular network namespace, can be properly
+validated by checking whether the caller has the needed privilege (i.e.
+CAP_NET_ADMIN) targeted to the user namespace which owns the network namespace.
+This is done using the ns_capable() function.
+
+As an example, if a new task is cloned with a private user namespace but
+no private network namespace, then the task's network namespace is owned
+by the parent user namespace. The new task has no privilege to the
+parent user namespace, so it will not be able to create or configure
+network devices. If, instead, the task were cloned with both private
+user and network namespaces, then the private network namespace is owned
+by the private user namespace, and so root in the new user namespace
+will have privilege targeted to the network namespace. It will be able
+to create and configure network devices.
+
+UID Mapping
+===========
+The current plan (see 'flexible UID mapping' at
+https://wiki.ubuntu.com/UserNamespace) is:
+
+The UID/GID stored on disk will be that in the init_user_ns. Most likely
+UID/GID in other namespaces will be stored in xattrs. But Eric was advocating
+(a few years ago) leaving the details up to filesystems while providing a lib/
+stock implementation. See the thread around here:
+http://www.mail-archive.com/devel@openvz.org/msg09331.html
+
+
+Working notes
+=============
+Capability checks for actions related to syslog must be against the
+init_user_ns until syslog is containerized.
+
+Same is true for reboot and power, control groups, devices, and time.
+
+Perf actions (kernel/event/core.c for instance) will always be constrained to
+init_user_ns.
+
+Q:
+Is accounting considered properly containerized with respect to pidns? (it
+appears to be). If so, then we can change the capable() check in
+kernel/acct.c to 'ns_capable(current_pid_ns()->user_ns, CAP_PACCT)'
+
+Q:
+For things like nice and schedaffinity, we could allow root in a container to
+control those, and leave only cgroups to constrain the container. I'm not sure
+whether that is right, or whether it violates admin expectations.
+
+I deferred some of commoncap.c. I'm punting on xattr stuff as they take
+dentries, not inodes.
+
+For drivers/tty/tty_io.c and drivers/tty/vt/vt.c, we'll want to (for some of
+them) target the capability checks at the user_ns owning the tty. That will
+have to wait until we get userns owning files straightened out.
+
+We need to figure out how to label devices. Should we just toss a user_ns
+right into struct device?
+
+capable(CAP_MAC_ADMIN) checks are always to be against init_user_ns, unless
+some day LSMs were to be containerized, near zero chance.
+
+inode_owner_or_capable() should probably take an optional ns and cap parameter.
+If cap is 0, then CAP_FOWNER is checked. If ns is NULL, we derive the ns from
+inode. But if ns is provided, then callers who need to derive
+inode_userns(inode) anyway can save a few cycles.
--
1.7.5.4
next prev parent reply other threads:[~2011-09-03 1:08 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-02 19:56 user namespaces v3: continue targetting capabilities Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` (unknown), Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 23:49 ` missing [PATCH 01/15] Eric W. Biederman
2011-09-03 1:09 ` Serge E. Hallyn [this message]
[not found] ` <m11uvyld2d.fsf_-_-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2011-09-03 1:09 ` Serge E. Hallyn
[not found] ` <1314993400-6910-3-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-09-02 23:49 ` Eric W. Biederman
2011-09-02 19:56 ` [PATCH 01/15] add Documentation/namespaces/user_namespace.txt (v3) Serge Hallyn
2011-09-07 22:50 ` Andrew Morton
2011-09-09 13:10 ` Serge E. Hallyn
[not found] ` <20110907155024.42e3fe27.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2011-09-09 13:10 ` Serge E. Hallyn
[not found] ` <1314993400-6910-4-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-09-07 22:50 ` Andrew Morton
2011-09-26 19:17 ` [kernel-hardening] " Vasiliy Kulikov
2011-09-26 19:17 ` Vasiliy Kulikov
2011-09-27 13:21 ` [kernel-hardening] " Serge E. Hallyn
2011-09-27 13:21 ` Serge E. Hallyn
2011-09-27 15:56 ` [kernel-hardening] " Vasiliy Kulikov
2011-09-27 15:56 ` Vasiliy Kulikov
2011-10-01 17:00 ` [kernel-hardening] " Serge E. Hallyn
2011-10-01 17:00 ` Serge E. Hallyn
2011-10-03 1:46 ` [kernel-hardening] " Eric W. Biederman
2011-10-03 1:46 ` Eric W. Biederman
2011-10-03 19:53 ` [kernel-hardening] " Eric W. Biederman
2011-10-03 19:53 ` Eric W. Biederman
2011-10-03 20:04 ` [kernel-hardening] " Serge E. Hallyn
2011-10-03 20:04 ` Serge E. Hallyn
2011-09-02 19:56 ` [PATCH 07/15] user namespace: use net->user_ns for some capable calls under net/ Serge Hallyn
2011-09-02 19:56 ` [PATCH 08/15] af_netlink.c: make netlink_capable userns-aware Serge Hallyn
2011-09-02 19:56 ` [PATCH 09/15] user ns: convert ipv6 to targeted capabilities Serge Hallyn
2011-09-02 19:56 ` [PATCH 10/15] net/core/scm.c: target capable() calls to user_ns owning the net_ns Serge Hallyn
2011-09-02 19:56 ` [PATCH 13/15] userns: net: make many network capable calls targeted Serge Hallyn
2011-09-02 19:56 ` [PATCH 14/15] net: pass user_ns to cap_netlink_recv() Serge Hallyn
[not found] ` <1314993400-6910-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-09-02 19:56 ` (unknown), Serge Hallyn
2011-09-02 19:56 ` [PATCH 01/15] add Documentation/namespaces/user_namespace.txt (v3) Serge Hallyn
2011-09-02 19:56 ` [PATCH 02/15] user ns: setns: move capable checks into per-ns attach helper Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-04 1:51 ` Matt Helsley
2011-09-09 14:56 ` Serge E. Hallyn
[not found] ` <20110904015140.GB32295-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2011-09-09 14:56 ` Serge E. Hallyn
[not found] ` <1314993400-6910-5-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
2011-09-04 1:51 ` Matt Helsley
2011-09-02 19:56 ` [PATCH 03/15] keyctl: check capabilities against key's user_ns Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` [PATCH 04/15] user_ns: convert fs/attr.c to targeted capabilities Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` [PATCH 05/15] userns: clamp down users of cap_raised Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` [PATCH 06/15] user namespace: make each net (net_ns) belong to a user_ns Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` [PATCH 07/15] user namespace: use net->user_ns for some capable calls under net/ Serge Hallyn
2011-09-02 19:56 ` [PATCH 08/15] af_netlink.c: make netlink_capable userns-aware Serge Hallyn
2011-09-02 19:56 ` [PATCH 09/15] user ns: convert ipv6 to targeted capabilities Serge Hallyn
2011-09-02 19:56 ` [PATCH 10/15] net/core/scm.c: target capable() calls to user_ns owning the net_ns Serge Hallyn
2011-09-02 19:56 ` [PATCH 11/15] userns: make some net-sysfs capable calls targeted Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` [PATCH 12/15] user_ns: target af_key capability check Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-02 19:56 ` [PATCH 13/15] userns: net: make many network capable calls targeted Serge Hallyn
2011-09-02 19:56 ` [PATCH 14/15] net: pass user_ns to cap_netlink_recv() Serge Hallyn
2011-09-02 19:56 ` [PATCH 15/15] make kernel/signal.c user ns safe (v2) Serge Hallyn
2011-09-02 19:56 ` Serge Hallyn
2011-09-13 14:43 ` user namespaces v3: continue targetting capabilities Serge E. Hallyn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110903010933.GA14126@hallyn.com \
--to=serge@hallyn.com \
--cc=akpm@osdl.org \
--cc=containers@lists.linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=rdunlap@xenotime.net \
--cc=segooon@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.