On 2015-10-17 11:58, Tobias Markus wrote: > Add capability CAP_SYS_USER_NS. > Tasks having CAP_SYS_USER_NS are allowed to create a new user namespace > when calling clone or unshare with CLONE_NEWUSER. > > Rationale: > > Linux 3.8 saw the introduction of unpriviledged user namespaces, > allowing unpriviledged users (without CAP_SYS_ADMIN) to be a "fake" root > inside a separate user namespace. Before that, any namespace creation > required CAP_SYS_ADMIN (or, in practice, the user had to be root). > Unfortunately, there have been some security-relevant bugs in the > meantime. Because of the fairly complex nature of user namespaces, it is > reasonable to say that future vulnerabilties can not be excluded. Some > distributions even wholly disable user namespaces because of this. > > Both options, user namespaces with and without CAP_SYS_ADMIN, can be > said to represent the extreme end of the spectrum. In practice, there is > no reason for every process to have the abilitiy to create user > namespaces. Indeed, only very few and specialized programs require user > namespaces. This seems to be a perfect fit for the (file) capability > system: Priviledged users could manually allow only a certain executable > to be able to create user namespaces by setting a certain capability, > I'd suggest the name CAP_SYS_USER_NS. Executables completely unrelated > to user namespaces should and can not create them. > > The capability should only be required in the "root" user namespace (the > user namespace with level 0) though, to allow nested user namespaces to > work as intended. If a user namespace has a level greater than 0, the > original process must have had CAP_SYS_USER_NS, so it is "trusted" anyway. > > One question remains though: Does this break userspace executables that > expect being able to create user namespaces without priviledge? Since > creating user namespaces without CAP_SYS_ADMIN was not possible before > Linux 3.8, programs should already expect a potential EPERM upon calling > clone. Since creating a user namespace without CAP_SYS_USER_NS would > also cause EPERM, we should be on the safe side. Potentially stupid counter proposal: Make it CAP_SYS_NS, make it allow access to all namespace types for non-root/CAP_SYS_ADMIN users, and teach the stuff that's using userns just to get to mount/pid/net/ipc namespaces to use those instead when it's something that doesn't really need to think it's running as root. While this would still add a new capability (which is arguably not a good thing), the resultant capability would be significantly more useful for many of the use cases. Potentially more flame resistant counter proposal: Write a simple LSM to allow selective usage of namespaces (IIRC, working LSM stacking is in mainline now). While this is more complicated than just adding a capability, it is also a lot more resilient from a long term prospective.