From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael Kerrisk (man-pages)" Subject: Re: For review: user_namespace(7) man page Date: Thu, 11 Sep 2014 07:46:46 -0700 Message-ID: <5411B5D6.9010201@gmail.com> References: <53F5310A.5080503@gmail.com> <540F0810.7030408@gmail.com> <87ppf4n5ib.fsf@x220.int.ebiederm.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <87ppf4n5ib.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org> Sender: linux-man-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: "Eric W. Biederman" Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, Andy Lutomirski , lkml , "linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Linux Containers , richard -rw- weinberger , "Serge E. Hallyn" List-Id: linux-man@vger.kernel.org Hi Eric, On 09/09/2014 09:05 AM, Eric W. Biederman wrote: > "Michael Kerrisk (man-pages)" writes: >=20 >> Hi Andy, and Eric, >> >> On 09/01/2014 01:57 PM, Andy Lutomirski wrote: >>> On Wed, Aug 20, 2014 at 4:36 PM, Michael Kerrisk (man-pages) >>> wrote: >>>> Hello Eric et al., >>>> >>>> For various reasons, my work on the namespaces man pages >>>> fell off the table a while back. Nevertheless, the pages have >>>> been close to completion for a while now, and I recently restarted= , >>>> in an effort to finish them. As you also noted to me f2f, there ha= ve >>>> been recently been some small namespace changes that you may affec= t >>>> the content of the pages. Therefore, I'll take the opportunity to >>>> send the namespace-related pages out for further (final?) review. >>>> >>>> So, here, I start with the user_namespaces(7) page, which is shown >>>> in rendered form below, with source attached to this mail. I'll >>>> send various other pages in follow-on mails. >>>> >>>> Review comments/suggestions for improvements / bug fixes welcome. >>>> >>>> Cheers, >>>> >>>> Michael >>>> >>>> =3D=3D >>>> >>>> NAME >>>> user_namespaces - overview of Linux user_namespaces >>>> >>>> DESCRIPTION >>>> For an overview of namespaces, see namespaces(7). >>>> >>>> User namespaces isolate security-related identifier= s and >>>> attributes, in particular, user IDs and group IDs (see c= reden=E2=80=90 >>>> tials(7), the root directory, keys (see keyctl(2)), and cap= abili=E2=80=90 >>> >>> Putting "root directory" here is odd -- that's really part of a >>> different namespace. But user namespaces sort of isolate the other >>> namespaces from each other. >> >> I'm trying to remember the details here. I think this piece original= ly=20 >> came after a discussion with Eric, but I am not sure. Eric? >=20 > Probably. >=20 > I am not certain what the best way to say it but we do need to docume= nt > that an unprivileged user that creates a user namespace can now call > chroot. >=20 > We may also want to discuss the specific restrictions on chroot. >=20 > The text about chroot at least gives people a strong hint that the > chroot rules are affected by user namespaces. >=20 > The restrictions that we have settled on to avoid chroot being a prob= lem > are the creator of a user namespace must not be chrooted in their > current mount namespace, and the creator of the user namespace must n= ot > be threaded. >=20 > Andy can you check me on this it looks like unshare is currently bugg= y > in that it will allow a threaded application to create a user namespa= ce. So, somewhere we should have some text such as: [[ An unprivileged user who creates a namespace can call chroot(2) within that namesapce, subject to the restriction that the creator of a user namespace must not be chrooted in their current mount namespace, and the creator of the user namespace must not be threaded. ]] Right? >>> Also, ugh, keys. How did keyctl(2) ever make it through any kind o= f review? >>> >>>> ties (see capabilities(7)). A process's user and group ID= s can >>>> be different inside and outside a user namespace. In parti= cular, >>>> a process can have a normal unprivileged user ID outside a= user >>>> namespace while at the same time having a user ID of 0 insi= de the >>>> namespace; in other words, the process has full privilege= s for >>>> operations inside the user namespace, but is unprivileg= ed for >>>> operations outside the namespace. >>>> >>>> Nested namespaces, namespace membership >>>> User namespaces can be nested; that is, each user name= space=E2=80=94 >>>> except the initial ("root") namespace=E2=80=94has a pare= nt user names=E2=80=90 >>>> pace, and can have zero or more child user namespaces. The= par=E2=80=90 >>>> ent user namespace is the user namespace of the process tha= t cre=E2=80=90 >>>> ates the user namespace via a call to unshare(2) or clone(2= ) with >>>> the CLONE_NEWUSER flag. >>>> >>>> The kernel imposes (since version 3.11) a limit of 32 neste= d lev=E2=80=90 >>>> els of user namespaces. Calls to unshare(2) or clone(2)= that >>>> would cause this limit to be exceeded fail with the error E= USERS. >>>> >>>> Each process is a member of exactly one user namespa= ce. A >>>> process created via fork(2) or clone(2) without the CLONE_N= EWUSER >>>> flag is a member of the same user namespace as its pare= nt. A >>>> process can join another user namespace with setns(2) if i= t has >>>> the CAP_SYS_ADMIN in that namespace; upon doing so, it g= ains a >>>> full set of capabilities in that namespace. >>>> >>>> A call to clone(2) or unshare(2) with the CLONE_NEWUSER= flag >>>> makes the new child process (for clone(2)) or the calle= r (for >>>> unshare(2)) a member of the new user namespace created b= y the >>>> call. >>>> >>>> Capabilities >>>> The child process created by clone(2) with the CLONE_NEWUSE= R flag >>>> starts out with a complete set of capabilities in the new= user >>>> namespace. Likewise, a process that creates a new user nam= espace >>>> using unshare(2) or joins an existing user namespace = using >>>> setns(2) gains a full set of capabilities in that namespac= e. On >>>> the other hand, that process has no capabilities in the = parent >>>> (in the case of clone(2)) or previous (in the case of unsh= are(2) >>>> and setns(2)) user namespace, even if the new namespace is= cre=E2=80=90 >>>> ated or joined by the root user (i.e., a process with use= r ID 0 >>>> in the root namespace). >>>> >>>> Note that a call to execve(2) will cause a process to los= e any >>>> capabilities that it has, unless it has a user ID of 0 with= in the >>>> namespace. >>> >>> Or unless file capabilities have a non-empty inheritable mask. >>> >>> It may be worth mentioning that execve in a user namespace works >>> exactly like execve outside a userns. >> >> >> I';ve reworded that para to say: >> >> Note that a call to execve(2) will cause a process's capabil= i=E2=80=90 >> ties to be recalculated in the usual way (see capabilities(7)= ), >> so that usually, unless it has a user ID of 0 within the name= s=E2=80=90 >> pace or the executable file has a nonempty inheritable capabi= l=E2=80=90 >> ities mask, it will lose all capabilities. See the discussi= on >> of user and group ID mappings, below. >> >> Okay? >=20 > That seems reasonable to me. >=20 >>>> $ cat /proc/$$/uid_map >>>> 0 0 4294967295 >>>> >>>> This mapping tells us that the range starting at user ID = 0 in >>>> this namespace maps to a range starting at 0 in the (nonexi= stent) >>>> parent namespace, and the length of the range is the l= argest >>>> 32-bit unsigned integer. >>>> >>>> Defining user and group ID mappings: writing to uid_map and gid= _map >>>> After the creation of a new user namespace, the uid_map f= ile of >>>> one of the processes in the namespace may be written to on= ce to >>>> define the mapping of user IDs in the new user namespac= e. An >>>> attempt to write more than once to a uid_map file in a= user >>>> namespace fails with the error EPERM. Similar rules app= ly for >>>> gid_map files. >>>> >>>> The lines written to uid_map (gid_map) must conform to the= fol=E2=80=90 >>>> lowing rules: >>>> >>>> * The three fields must be valid numbers, and the last= field >>>> must be greater than 0. >>>> >>>> * Lines are terminated by newline characters. >>>> >>>> * There is an (arbitrary) limit on the number of lines i= n the >>>> file. As at Linux 3.8, the limit is five lines. In add= ition, >>>> the number of bytes written to the file must be less tha= n the >>>> system page size, and the write must be performed at the= start >>>> of the file (i.e., lseek(2) and pwrite(2) can't be us= ed to >>>> write to nonzero offsets in the file). >>>> >>>> * The range of user IDs (group IDs) specified in each lin= e can=E2=80=90 >>>> not overlap with the ranges in any other lines. In the= ini=E2=80=90 >>>> tial implementation (Linux 3.8), this requirement was = satis=E2=80=90 >>>> fied by a simplistic implementation that imposed the f= urther >>>> requirement that the values in both field 1 and fiel= d 2 of >>>> successive lines must be in ascending numerical order, = which >>>> prevented some otherwise valid maps from being created. = Linux >>>> 3.9 and later fix this limitation, allowing any valid s= et of >>>> nonoverlapping maps. >>>> >>>> * At least one line must be written to the file. >>>> >>>> Writes that violate the above rules fail with the error EIN= VAL. >>>> >>>> In order for a process to write to the /proc/[pid]/u= id_map >>>> (/proc/[pid]/gid_map) file, all of the following requir= ements >>>> must be met: >>>> >>>> 1. The writing process must have the CAP_SETUID (CAP_S= ETGID) >>>> capability in the user namespace of the process pid. >>> >>> This checked for the opening process (and I don't actually remember >>> whether it's checked for the writing process). >> >> Eric, can you comment? >=20 > We have to check for the opening processes and that changes was made > after I implemented my interface. Pieces of the code appear to also > examine the writing process and verify everything applies to it as we= ll. >=20 > I goofed when I designed the interface originall and had not realized > what a classic design error it can be to not restrict by the opening > process. So, I still need some help here. Should the sentence above just read: 1. The *opening* process must have the CAP_SETUID (CAP_SET= GID) capability in the user namespace of the process pid. or must something also be said about the writing process? (If so, i'd appreciate a completely formed sentence or two that I can just drop int= o the man page..) Cheers, Michael --=20 Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html