From mboxrd@z Thu Jan  1 00:00:00 1970
From: ebiederm@xmission.com (Eric W. Biederman)
Subject: Re: [REVIEW][PATCH 0/43] Completing the user namespace
Date: Tue, 10 Apr 2012 18:01:16 -0700
Message-ID: <m162d7kroj.fsf@fess.ebiederm.org>
References: <m11unyn70b.fsf@fess.ebiederm.org> <4F84838B.8000408@mit.edu>
	<m14nsrxn6v.fsf@fess.ebiederm.org>
	<CAObL_7F2oHtOoDkvNM1io=dovKENNTxS4EDPkr4ns9AEdFqwaQ@mail.gmail.com>
	<m14nsrtady.fsf@fess.ebiederm.org>
	<CAObL_7GFkNfQggDNZ+MicdeTe7duJY7cJJELHcb2-vxHHJkS_g@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Markus Gutschke <markus@chromium.org>,
	Will Drewry <wad@chromium.org>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	linux-security-module@vger.kernel.org,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
To: Andrew Lutomirski <luto@mit.edu>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from out07.mta.xmission.com ([166.70.13.237]:38286 "EHLO
	out02.mta.xmission.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org
	with ESMTP id S1759654Ab2DKA53 convert rfc822-to-8bit (ORCPT
	<rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 10 Apr 2012 20:57:29 -0400
In-Reply-To: <CAObL_7GFkNfQggDNZ+MicdeTe7duJY7cJJELHcb2-vxHHJkS_g@mail.gmail.com>
	(Andrew Lutomirski's message of "Tue, 10 Apr 2012 16:56:54 -0700")
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

Andrew Lutomirski <luto@mit.edu> writes:

> On Tue, Apr 10, 2012 at 4:50 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Andrew Lutomirski <luto@mit.edu> writes:
>>
>>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
>>> <ebiederm@xmission.com> wrote:
>>>> Andy Lutomirski <luto@MIT.EDU> writes:
>>>>
>>>> My understanding of no_new_privs is that current_cred() including
>>>> the user, the user namespace and the security label will never cha=
nge,
>>>> with the goal of making the security analysis simple.
>>>
>>> They can change but only if you already have the privilege to chang=
e
>>> them yourself and then you do so. =C2=A0For example, PR_SET_NO_NEW_=
PRIVS,
>>> setuid, then drop caps is allowed and useful -- it's a race-free wa=
y
>>> to make sure that a given uid never executes without no_new_privs s=
et.
>>> =C2=A0I've implemented this as a pam module.
>>
>> Careful. =C2=A0There is the security_task_fix_setuid call that will =
raise
>> your capabilities from cap->effective to cap->permitted if you call
>> setuid(0). =C2=A0Which in the general case means you can regain all =
of the
>> root privileges if you only have CAP_SETUID.
>>
>
> That's fine.  If you're running with CAP_SETUID and default
> securebits, then you effectively have all capabilities already and
> don't need to exploit setuid binaries to gain them.  no_new_privs
> doesn't change that.  If you don't want to be able to gain all privs,
> change securebits or drop CAP_SETUID.  seccomp reduces the kernel
> attack surface; no_new_privs reduces the userspace attack surface.
> But see below...
>
>
>>
>>>> I don't recall how seccomp filters are dealt with if you don't hav=
e
>>>> no_new_privs enabled. =C2=A0If seccomp filters installed by root
>>>> are dropped when we change privilege levels it might be worth look=
ing
>>>> at how to keep a seccomp filter installed as long as you stay in
>>>> a user namespace.
>>>>
>>>
>>> They're not dropped. =C2=A0I think in the current implementation th=
ey can't
>>> be dropped at all.
>>
>> Which makes sense. =C2=A0 Is this why you need no_new_privs? =C2=A0S=
o you can't run
>> seccomp on higher privileged executables and confusing them into kee=
ping
>> privileges when they should not?
>
> Exactly.  seccomp is flexible enough that it's probably possible to
> confuse many setuid executables with it.
>
>>
>>>> The emphasis is a bit different from new_new_privs as the user_nam=
espace
>>>> does not need to guarantee that the lsm will not change security l=
abels,
>>>> etc.
>>>
>>> Hmm. =C2=A0Is this safe? =C2=A0For example, if there's a program th=
at LSM policy
>>> grants extra privileges that malfunctions when run inside a user
>>> namespace, can that be used to break out of LSM restrictions?
>>
>> I can't see how it would not be safe.
>>
>> Except for the user namespace pointer the state the LSM and the rest=
 of
>> the kernel sees is the same state the kernel sees. =C2=A0Aka userspa=
ce sees
>> uid 0, the LSM does not. =C2=A0So I don't know why a LSM would get c=
onfused.
>>
>> Beyond that it is a bug for an LSM to grant permissions beyond the
>> core DAC model. =C2=A0So the worst I can see is an LSM not grokking =
user
>> namespaces and getting confused and not restricting a process as
>> much as the designer of the LSM would like.
>
> Right.  Suppose you have some program that has extra restrictions
> applied by an LSM.  It executes a helper (e.g. Apache's suidexec
> thing, but I bet there are more examples) which is supposed to be ver=
y
> careful not to leak privileges.  The LSM is set to restrict that
> helper less than the parent process.  But that program was written
> before user namespaces existed, and it has a bug (or missing feature)
> that allows its parent to exploit it when run inside an unmapped user
> namespace.  The parent can now escape from the LSM restrictions.
>
> no_new_privs is designed to prevent exactly this issue.

Currently the suid exec will fail because the uid's don't map.

I might switch that around to simply ignoring the change of uid
on suid exec.  I have a patch in my devel tree that plays with
that idea.  However as much as I hit that case once in testing
(I think it was ping).  I don't think running suid executables
is particularly interesting.

Certainly the application program won't care or break, because we are
still bounded by the usaual DAC security.

I wonder a little if the lsm might change labels on exec of a
non suid binary.  That case is more interesting in the unmapped
unprivileged user namespace.

But I just can't seem to care.  The LSM is the line behind which we hid=
e
the crazy.

The only real difference is that I can create namespaces, which are my
process local environment.  Unprivileged users setting up their own
mount namespace will likely allow all kinds of ways to sneak through th=
e
path based protections of apparmor and tomoyo.  As for smack and selinu=
x
shrug.  I know selinux is at least a lot more path based than the
developers like to admit.  I know most of the /proc and /sys checks are
path based, although I don't think they depend on where you mount
things.  I you can somehow trigger a selinux labelling spree with a
different mount namespace selinux will like do some very wrong things.
smack is simple so it will probably work as intended.

Shrug.  There is nothing special here with the unmapped uid case of
user namespaces.  This is all things that have to be dealt with in some
fashion, but I do believe that is for the LSM maintainers to worry
about.


>>> If a user namespace has no visible effect on processes that aren't
>>> descendents of whoever created it, then creating one in no_new_priv=
s
>>> mode should be safe. =C2=A0On the other hand, it could be somewhat =
useless.
>>
>> Creating a user namespace will allowing a process access to more ker=
nel
>> facilities. =C2=A0Aka you can (or at least will be able to) create n=
etwork
>> namespaces and mount namespaces and the like. =C2=A0That increases t=
he
>> surface of the kernel an attacker can hit.
>>
>> So in a perfect kernel there are no affects on others. =C2=A0In a sc=
enario
>> where you are limiting how much of the kernel a user can use I think
>> you would want that.
>>
>> Still given that you aren't doing the very restrictive current_cred(=
)
>> must not change I don't know how it matters, and a bpf based seccomp=
 can
>> pretty easily filter out new user namespace creation. =C2=A0Shrug.
>
> I'm not worried about that.  I'm more interested in whether
> unprivileged user namespace creation should require nnp and/or whethe=
r
> someone might want a mode in which a task is has nnp set but can
> create a user namespace that allows setuid execution inside the
> namespace in spite of the nnp setting.  The latter is probably rather
> complicated to get right and depends on nonexistent filesystem
> features.

Hmm.  If the goals is to avoid confusing lsms, I think when the user
namespaces and no new privs meet it becomes sensible for no new privs
to deny user namespace fiddling.  No clone(CLONE_NEWUSER), no
unshare(CLONE_NEWUSER) no setns(CLONE_NEWUSER).  It becomes trivial
to confuse path based lsms.

If the goal is to avoid confusing privileged executables with seccomp,
I don't think it matters.  The user namespace guarantees you can't get
additional privileges.

As for requiring no new privs for creating a user namespace, ick.  I
think that will just break things.  suid exec is otherwise safe in a
user namespace and it needs to be supported.  If the LSMs have problems
the LSMs need to figure out how to cope.

I do think  no new privs makes sense inside a user namespace exactly
the same way it makes sense if you don't think about user namespaces.

So I expect a really tight security policy use a user_namespace +
seccomp + no new privs.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel=
" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html