From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael Kerrisk (man-pages)" Subject: Re: For review: user_namespace(7) man page Date: Mon, 01 Sep 2014 19:31:43 +0200 Message-ID: <5404AD7F.4070004@gmail.com> References: <53F5310A.5080503@gmail.com> <87d2bhfxvc.fsf@x220.int.ebiederm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <87d2bhfxvc.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> Sender: linux-man-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: "Eric W. Biederman" Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, lkml , "linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Andy Lutomirski , richard.weinberger-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, "Serge E. Hallyn" List-Id: linux-man@vger.kernel.org On 08/30/2014 11:53 PM, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" writes: >=20 >> Hello Eric et al., >> >> For various reasons, my work on the namespaces man pages=20 >> fell off the table a while back. Nevertheless, the pages have >> been close to completion for a while now, and I recently restarted, >> in an effort to finish them. As you also noted to me f2f, there have >> been recently been some small namespace changes that you may affect >> the content of the pages. Therefore, I'll take the opportunity to >> send the namespace-related pages out for further (final?) review. >> >> So, here, I start with the user_namespaces(7) page, which is shown=20 >> in rendered form below, with source attached to this mail. I'll >> send various other pages in follow-on mails. >> >> Review comments/suggestions for improvements / bug fixes welcome. >> >> Cheers, >> >> Michael >> >> =3D=3D >> >> NAME >> user_namespaces - overview of Linux user_namespaces >> >> DESCRIPTION >> For an overview of namespaces, see namespaces(7). >> >> User namespaces isolate security-related identifiers = and >> attributes, in particular, user IDs and group IDs (see cre= den=E2=80=90 >> tials(7), the root directory, keys (see keyctl(2)), and capab= ili=E2=80=90 >> ties (see capabilities(7)). A process's user and group IDs = can >> be different inside and outside a user namespace. In particu= lar, >> a process can have a normal unprivileged user ID outside a = user >> namespace while at the same time having a user ID of 0 inside= the >> namespace; in other words, the process has full privileges = for >> operations inside the user namespace, but is unprivileged= for >> operations outside the namespace. >> >> Nested namespaces, namespace membership >> User namespaces can be nested; that is, each user namesp= ace=E2=80=94 >> except the initial ("root") namespace=E2=80=94has a parent= user names=E2=80=90 >> pace, and can have zero or more child user namespaces. The = par=E2=80=90 >> ent user namespace is the user namespace of the process that = cre=E2=80=90 >> ates the user namespace via a call to unshare(2) or clone(2) = with >> the CLONE_NEWUSER flag. >> >> The kernel imposes (since version 3.11) a limit of 32 nested = lev=E2=80=90 >> els of user namespaces. Calls to unshare(2) or clone(2) = that >> would cause this limit to be exceeded fail with the error EUS= ERS. >> >> Each process is a member of exactly one user namespace= =2E A >> process created via fork(2) or clone(2) without the CLONE_NEW= USER >> flag is a member of the same user namespace as its parent= =2E >> A > ^ single-threaded >=20 > Because of chroot and other things multi-threaded processes are not > allowed to join a user namespace. For the documentation just saying > single-threaded sounds like enough here. Thanks. Fixed. >> process can join another user namespace with setns(2) if it = has >> the CAP_SYS_ADMIN in that namespace; upon doing so, it gai= ns a >> full set of capabilities in that namespace. >> >> A call to clone(2) or unshare(2) with the CLONE_NEWUSER = flag >> makes the new child process (for clone(2)) or the caller = (for >> unshare(2)) a member of the new user namespace created by = the >> call. >> >> Capabilities >> The child process created by clone(2) with the CLONE_NEWUSER = flag >> starts out with a complete set of capabilities in the new = user >> namespace. Likewise, a process that creates a new user names= pace >> using unshare(2) or joins an existing user namespace u= sing >> setns(2) gains a full set of capabilities in that namespace.= On >> the other hand, that process has no capabilities in the pa= rent >> (in the case of clone(2)) or previous (in the case of unshar= e(2) >> and setns(2)) user namespace, even if the new namespace is = cre=E2=80=90 >> ated or joined by the root user (i.e., a process with user = ID 0 >> in the root namespace). >> >> Note that a call to execve(2) will cause a process to lose = any >> capabilities that it has, unless it has a user ID of 0 within= the >> namespace. See the discussion of user and group ID mappi= ngs, >> below. >> >> A call to clone(2), unshare(2), or setns(2) using = the >> CLONE_NEWUSER flag sets the "securebits" flags (see capab= ili=E2=80=90 >> ties(7)) to their default values (all flags disabled) in= the >> child (for clone(2)) or caller (for unshare(2), or setns(= 2)). >> Note that because the caller no longer has capabilities in= its >> original user namespace after a call to setns(2), it is not = pos=E2=80=90 >> sible for a process to reset its "securebits" flags while ret= ain=E2=80=90 >> ing its user namespace membership by using a pair of setn= s(2) >> calls to move to another user namespace and then return to= its >> original user namespace. >> >> Having a capability inside a user namespace permits a process= to >> perform operations (that require privilege) only on resou= rces >> governed by that namespace. The rules for determining whethe= r or >> not a process has a capability in a particular user namespace= are >> as follows: >> >> 1. A process has a capability inside a user namespace if it i= s a >> member of that namespace and it has the capability in= its >> effective capability set. A process can gain capabilities= in >> its effective capability set in various ways. For example= , it >> may execute a set-user-ID program or an executable with a= sso=E2=80=90 >> ciated file capabilities. In addition, a process may = gain >> capabilities via the effect of clone(2), unshare(2),= or >> setns(2), as already described. >> >> 2. If a process has a capability in a user namespace, then it= has >> that capability in all child (and further removed descend= ant) >> namespaces as well. >> >> 3. When a user namespace is created, the kernel records= the >> effective user ID of the creating process as being the "ow= ner" >> of the namespace. A process that resides in the parent of= the >> user namespace and whose effective user ID matches the o= wner >> of the namespace has all capabilities in the namespace.= By >> virtue of the previous rule, this means that the process = has >> all capabilities in all further removed descendant user na= mes=E2=80=90 >> paces as well. >> >> Interaction of user namespaces and other types of namespaces >> Starting in Linux 3.8, unprivileged processes can create = user >> namespaces, and mount, PID, IPC, network, and UTS namespaces= can >> be created with just the CAP_SYS_ADMIN capability in the call= er's >> user namespace. >> >> If CLONE_NEWUSER is specified along with other CLONE_NEW* f= lags >> in a single clone(2) or unshare(2) call, the user namespace= is >> guaranteed to be created first, giving the child (clone(2)= ) or >> caller (unshare(2)) privileges over the remaining namespaces = cre=E2=80=90 >> ated by the call. Thus, it is possible for an unprivileged c= all=E2=80=90 >> er to specify this combination of flags. >> >> When a new IPC, mount, network, PID, or UTS namespace is cre= ated >> via clone(2) or unshare(2), the kernel records the user names= pace >> of the creating process against the new namespace. (This ass= oci=E2=80=90 >> ation can't be changed.) When a process in the new names= pace >> subsequently performs privileged operations that operate= on >> global resources isolated by the namespace, the permission ch= ecks >> are performed according to the process's capabilities in the = user >> namespace that the kernel associated with the new namespace. >=20 > Restrictions on mount namespaces. >=20 > - A mount namespace has a owner user namespace. A mount namespace wh= ose > owner user namespace is different than the owerner user namespace o= f > it's parent mount namespace is considered a less privileged mount > namespace. >=20 > - When creating a less privileged mount namespace shared mounts are > reduced to slave mounts. This ensures that mappings performed in l= ess > privileged mount namespaces will not propogate to more privielged > mount namespaces. >=20 > - Mounts that come as a single unit from more privileged mount are > locked together and may not be separated in a less privielged mount > namespace. >=20 > - The mount flags readonly, nodev, nosuid, noexec, and the mount atim= e > settings when propogated from a more privielged to a less privilege= d > mount namespace become locked, and may not be changed in the less > privielged mount namespace. >=20 > - (As of 3.18-rc1 (in todays Al Viros vfs.git#for-next tree)) A file = or > directory that is a mountpoint in one namespace that is not a mount > point in another namespace, may be renamed, unlinked, or rmdired in > the mount namespace in which it is not a mount namespace if the > ordinary permission checks pass. >=20 > Previously attemping to rmdir, unlink or rename a file or directory > that was a mount point in another mount namespace would result in > -EBUSY. This behavior had technical problems of enforcement (nfs) > and resulted in a nice denial of servial attack against more > privileged users. (Aka preventing individual files from being upda= ted > by bind mounting on top of them). I need some help here. What is your intention for the above text. Do you mean I should add it pretty much as is under a subheading "Restrictions on mount namespaces"? Thanks, Michael --=20 Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html