All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
To: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
Cc: containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	Linus Torvalds
	<torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Linux Kernel Mailing List
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [GIT PULL] user namespace and namespace infrastructure changes for 3.8
Date: Thu, 13 Dec 2012 20:11:37 -0800	[thread overview]
Message-ID: <87mwxhff2e.fsf@xmission.com> (raw)
In-Reply-To: <CALCETrWxXZ1OzZeH_SGeg1E16rssxBwg+hjG09N5dkqweVKeRA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> (Andy Lutomirski's message of "Thu, 13 Dec 2012 15:02:58 -0800")

Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:

> On Thu, Dec 13, 2012 at 2:01 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>>
>>> On 12/11/2012 01:17 PM, Eric W. Biederman wrote:
>
>> But please also note the difference between capable and ns_capable.  Any
>> security check that is capable() still requires priviliges in the
>> initial user namespace.
>
> Huh?
>
> I'm talking about:
>
> clone with CLONE_NEWUSER
>  - child does unshare(CLONE_NEWPID)
>  - parent does setfd(child's pid namespace)
>
> Now the parent is running in the init userns with a different pid ns.
> Setuid binaries will work but will see the unexpected pid ns.  With
> mount namespaces, this would be Really Bad (tm).  With pid or ipc or
> net, it's less obviously dangerous, but I'm not convinced it's safe.

That isn't safe.  That is a sneaky bug in my tree that I overlooked. :(

> I sort of think that setns on a *non*-userns should require
> CAP_SYS_ADMIN in the current userns, at least if no_new_privs isn't
> set.

Yes.  CAP_SYS_ADMIN in your current user namespace should make
setns as safe as it currently is before my patches.

That is just a matter of adding a couple nsown_capable(CAP_SYS_ADMIN)
permission checks.

Right now I test for nsown_capable(CAP_SYS_CHROOT) for the mount
namespace, which is probably sufficient to prevent those kinds of
shenanigans but I am going to add a nsown_capable(CAP_SYS_ADMIN) for
good measure.

> A non-threaded process can have mm_users == 2 with CLONE_VM.  I'm not
> sure this is a problem, though.

No it isn't.  I said threads because they are the easy concept not
because that covers all possible corner cases.

>>> In any case, why are threads special here?
>>
>> You know I don't think I stopped to think about it.   The combination
>> of CLONE_NEWUSER and CLONE_THREAD have been denined since the first user
>> namespace support was merged in 2008.
>>
>> I do know that things can get really strange when you mix multiple
>> namespaces in a process.  tkill of your own threads will stop working.
>> Which access permissions should apply to files you mmap, file handles
>> you have open, the core dumper etc.
>>
>> We do allow setresuid per thread so we might be able to cope
>> with a process that mixes with user namespaces in different threads,
>> but I would want a close review of things before we allow that kind of
>> sharing.
>>
>
> Fair enough.
>
> I'd personally be more concerned about shared signal handlers than
> shared tgid, though.  The signal handler set has all kinds of weird
> things like session id.

CLONE_THREAD implies CLONE_VM and CLONE_SIGNAL,
and in practice mm_users > 1 protects against all of those cases.

So I was really thinking all of those cases.


>> (See the end.  A significant bug in cap_capable slipped in about
>>  3.5. cap_capable is only supposed to grant permissions to the owner
>>  of a user namespace if it is a child user namespace).
>
> [snipping lots of stuff]
>
> If the intended semantics of cap_capable are, indeed:
>
> If targ_ns is equals or is a descendent of cred->user_ns and the cap
> is effective, return true.  If targ_ns is owned by cred->euid and
> targ_ns is a descendent of cred->user_ns (but is not equal to
> cred->user_ns), then return true.  Else return false
>
> then I agree with you on almost everything you said.  I assumed that
> the actual semantics were intended.

Good.

>>> unshare has a bug.  This code:
>>
>> Interesting...
>>
>> Looking at it this is a very small misfeature.
>>
>> What is happening is that commit_creds is setting is making the task
>> undumpable because we changed the set of capabilities in struct cred.
>>
>> This in turn results in pid_revalidate setting the owner of
>> of /proc/self/uid_map to GLOBAL_ROOT_UID.
>>
>> From the outside the permissions on /proc/self/uid_map look like:
>> -rw-r--r-- 1 root root 0 2012-12-13 12:43 /proc/30530/uid_map
>>
>> Then since /proc/self/uid_map uses the default permission function
>> and the test program below is not run as root the read-write open
>> of uid_map fails.
>
> It's probably either worth fixing this or disabling unshare of userns.
>  This makes it hard to use.  IMO non-dumpable tasks should still be
> able to access the contents of /proc/self -- i.e. I'd call this a more
> general bug.
>
> But I'd also argue that unsharing userns shouldn't set non-dumpable --
> cap_permitted increased, but the new capabilities are still logically
> a subset of the old ones.

Agreed.  Setting dumpable is the bug.

I am going to sleep on it but the code in commit_creds should probably
read:

	/* dumpability changes */
	if (!uid_eq(old->euid, new->euid) ||
	    !gid_eq(old->egid, new->egid) ||
	    !uid_eq(old->fsuid, new->fsuid) ||
	    !gid_eq(old->fsgid, new->fsgid) ||
	    ((old->user_ns == new->user_ns) &&
	     !cap_issubset(new->cap_permitted, old->cap_permitted))) {
		if (task->mm)
			set_dumpable(task->mm, suid_dumpable);
		task->pdeath_signal = 0;
		smp_wmb();
	}

Which is correct assuming a user_ns change is always to a more nested
user namespace.

> One more issue: the requirement that both upper and lower uids (etc.)
> in the maps are in order is rather limiting.  I have no objection if
> you only require upper ids to be monotonic, but currently there's no
> way to may root outside to uid n (for n > 0) and some nonroot user
> outside to uid 0.

There is.  You may set up to 5 (extents).  You just have to use a second
extent for the non-contiguous bits.  My reader is lazy and you have to
set all of the extents with a single write, so you may have missed the
ability to set more than one extent.

> Also, the fact that you can remap onto the unmapped id is a little
> strange.  I haven't found any reason it would be a security bug, but
> it's still odd.

Yes the unmapped id is odd.  Including the fact it can be set with a
sysctl.

The world has such lovely round edges.

Eric

WARNING: multiple messages have this Message-ID (diff)
From: ebiederm@xmission.com (Eric W. Biederman)
To: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	containers@lists.linux-foundation.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [GIT PULL] user namespace and namespace infrastructure changes for 3.8
Date: Thu, 13 Dec 2012 20:11:37 -0800	[thread overview]
Message-ID: <87mwxhff2e.fsf@xmission.com> (raw)
In-Reply-To: <CALCETrWxXZ1OzZeH_SGeg1E16rssxBwg+hjG09N5dkqweVKeRA@mail.gmail.com> (Andy Lutomirski's message of "Thu, 13 Dec 2012 15:02:58 -0800")

Andy Lutomirski <luto@amacapital.net> writes:

> On Thu, Dec 13, 2012 at 2:01 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Andy Lutomirski <luto@amacapital.net> writes:
>>
>>> On 12/11/2012 01:17 PM, Eric W. Biederman wrote:
>
>> But please also note the difference between capable and ns_capable.  Any
>> security check that is capable() still requires priviliges in the
>> initial user namespace.
>
> Huh?
>
> I'm talking about:
>
> clone with CLONE_NEWUSER
>  - child does unshare(CLONE_NEWPID)
>  - parent does setfd(child's pid namespace)
>
> Now the parent is running in the init userns with a different pid ns.
> Setuid binaries will work but will see the unexpected pid ns.  With
> mount namespaces, this would be Really Bad (tm).  With pid or ipc or
> net, it's less obviously dangerous, but I'm not convinced it's safe.

That isn't safe.  That is a sneaky bug in my tree that I overlooked. :(

> I sort of think that setns on a *non*-userns should require
> CAP_SYS_ADMIN in the current userns, at least if no_new_privs isn't
> set.

Yes.  CAP_SYS_ADMIN in your current user namespace should make
setns as safe as it currently is before my patches.

That is just a matter of adding a couple nsown_capable(CAP_SYS_ADMIN)
permission checks.

Right now I test for nsown_capable(CAP_SYS_CHROOT) for the mount
namespace, which is probably sufficient to prevent those kinds of
shenanigans but I am going to add a nsown_capable(CAP_SYS_ADMIN) for
good measure.

> A non-threaded process can have mm_users == 2 with CLONE_VM.  I'm not
> sure this is a problem, though.

No it isn't.  I said threads because they are the easy concept not
because that covers all possible corner cases.

>>> In any case, why are threads special here?
>>
>> You know I don't think I stopped to think about it.   The combination
>> of CLONE_NEWUSER and CLONE_THREAD have been denined since the first user
>> namespace support was merged in 2008.
>>
>> I do know that things can get really strange when you mix multiple
>> namespaces in a process.  tkill of your own threads will stop working.
>> Which access permissions should apply to files you mmap, file handles
>> you have open, the core dumper etc.
>>
>> We do allow setresuid per thread so we might be able to cope
>> with a process that mixes with user namespaces in different threads,
>> but I would want a close review of things before we allow that kind of
>> sharing.
>>
>
> Fair enough.
>
> I'd personally be more concerned about shared signal handlers than
> shared tgid, though.  The signal handler set has all kinds of weird
> things like session id.

CLONE_THREAD implies CLONE_VM and CLONE_SIGNAL,
and in practice mm_users > 1 protects against all of those cases.

So I was really thinking all of those cases.


>> (See the end.  A significant bug in cap_capable slipped in about
>>  3.5. cap_capable is only supposed to grant permissions to the owner
>>  of a user namespace if it is a child user namespace).
>
> [snipping lots of stuff]
>
> If the intended semantics of cap_capable are, indeed:
>
> If targ_ns is equals or is a descendent of cred->user_ns and the cap
> is effective, return true.  If targ_ns is owned by cred->euid and
> targ_ns is a descendent of cred->user_ns (but is not equal to
> cred->user_ns), then return true.  Else return false
>
> then I agree with you on almost everything you said.  I assumed that
> the actual semantics were intended.

Good.

>>> unshare has a bug.  This code:
>>
>> Interesting...
>>
>> Looking at it this is a very small misfeature.
>>
>> What is happening is that commit_creds is setting is making the task
>> undumpable because we changed the set of capabilities in struct cred.
>>
>> This in turn results in pid_revalidate setting the owner of
>> of /proc/self/uid_map to GLOBAL_ROOT_UID.
>>
>> From the outside the permissions on /proc/self/uid_map look like:
>> -rw-r--r-- 1 root root 0 2012-12-13 12:43 /proc/30530/uid_map
>>
>> Then since /proc/self/uid_map uses the default permission function
>> and the test program below is not run as root the read-write open
>> of uid_map fails.
>
> It's probably either worth fixing this or disabling unshare of userns.
>  This makes it hard to use.  IMO non-dumpable tasks should still be
> able to access the contents of /proc/self -- i.e. I'd call this a more
> general bug.
>
> But I'd also argue that unsharing userns shouldn't set non-dumpable --
> cap_permitted increased, but the new capabilities are still logically
> a subset of the old ones.

Agreed.  Setting dumpable is the bug.

I am going to sleep on it but the code in commit_creds should probably
read:

	/* dumpability changes */
	if (!uid_eq(old->euid, new->euid) ||
	    !gid_eq(old->egid, new->egid) ||
	    !uid_eq(old->fsuid, new->fsuid) ||
	    !gid_eq(old->fsgid, new->fsgid) ||
	    ((old->user_ns == new->user_ns) &&
	     !cap_issubset(new->cap_permitted, old->cap_permitted))) {
		if (task->mm)
			set_dumpable(task->mm, suid_dumpable);
		task->pdeath_signal = 0;
		smp_wmb();
	}

Which is correct assuming a user_ns change is always to a more nested
user namespace.

> One more issue: the requirement that both upper and lower uids (etc.)
> in the maps are in order is rather limiting.  I have no objection if
> you only require upper ids to be monotonic, but currently there's no
> way to may root outside to uid n (for n > 0) and some nonroot user
> outside to uid 0.

There is.  You may set up to 5 (extents).  You just have to use a second
extent for the non-contiguous bits.  My reader is lazy and you have to
set all of the extents with a single write, so you may have missed the
ability to set more than one extent.

> Also, the fact that you can remap onto the unmapped id is a little
> strange.  I haven't found any reason it would be a security bug, but
> it's still odd.

Yes the unmapped id is odd.  Including the fact it can be set with a
sysctl.

The world has such lovely round edges.

Eric


  parent reply	other threads:[~2012-12-14  4:11 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-11 21:17 [GIT PULL] user namespace and namespace infrastructure changes for 3.8 Eric W. Biederman
2012-12-11 21:17 ` Eric W. Biederman
     [not found] ` <87ip88uw4n.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-13 19:24   ` Andy Lutomirski
2012-12-13 19:24     ` Andy Lutomirski
     [not found]     ` <50CA2B55.5070402-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>
2012-12-13 22:01       ` Eric W. Biederman
2012-12-13 22:01         ` Eric W. Biederman
2012-12-13 22:39         ` [RFC][PATCH] Fix cap_capable to only allow owners in the parent user namespace to have caps Eric W. Biederman
2012-12-13 23:21           ` Andy Lutomirski
     [not found]             ` <CALCETrXLLcUu8Rajjx7+3N_6j5E0T0CR1h=hD+gcc5_r4Umyqw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-14  2:33               ` Eric W. Biederman
2012-12-14  2:33             ` Eric W. Biederman
     [not found]               ` <876245jrbc.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-14  2:36                 ` Andy Lutomirski
2012-12-14  2:36               ` Andy Lutomirski
2012-12-14  3:20                 ` [PATCH] " Eric W. Biederman
     [not found]                 ` <CALCETrXRYOh2tkwB+U9ZjA5BNZwscWsq1WGzjP3wUiOXQUXOQg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-14  3:20                   ` Eric W. Biederman
     [not found]           ` <87zk1hshk7.fsf_-_-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-13 22:43             ` [RFC][PATCH] " Linus Torvalds
2012-12-13 22:43               ` Linus Torvalds
2012-12-13 22:55               ` Eric W. Biederman
     [not found]               ` <CA+55aFwXnFEFXbkwFPq9xt30xp2_6jfpBLd3E2bms79KKK=V1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-13 22:55                 ` Eric W. Biederman
2012-12-13 23:21             ` Andy Lutomirski
2012-12-14  3:28             ` Serge E. Hallyn
2012-12-14  3:28           ` Serge E. Hallyn
     [not found]             ` <20121214032820.GA5115-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-12-14  3:32               ` Eric W. Biederman
2012-12-14  3:32                 ` Eric W. Biederman
     [not found]                 ` <87bodxi9zw.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-14 15:26                   ` Serge E. Hallyn
2012-12-14 15:26                     ` Serge E. Hallyn
     [not found]                     ` <20121214152607.GA9266-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-12-14 15:47                       ` Eric W. Biederman
2012-12-14 15:47                     ` Eric W. Biederman
     [not found]                       ` <87bodwd4aw.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-14 16:15                         ` Serge E. Hallyn
2012-12-14 16:15                           ` Serge E. Hallyn
     [not found]                           ` <20121214161514.GA9962-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-12-14 18:12                             ` Eric W. Biederman
2012-12-14 18:12                               ` Eric W. Biederman
     [not found]                               ` <87r4ms5wpm.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-14 18:43                                 ` Linus Torvalds
2012-12-14 18:43                                   ` Linus Torvalds
2012-12-14 18:47                                   ` Andy Lutomirski
2012-12-14 20:50                                   ` Serge E. Hallyn
2012-12-14 21:43                                   ` Eric W. Biederman
     [not found]                                   ` <CA+55aFw5CMf0-o=yDt2Rj-SYH4pfW1L9QbNb6uKHEdzAyYcvGQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-14 18:47                                     ` Andy Lutomirski
2012-12-14 20:50                                     ` Serge E. Hallyn
2012-12-14 21:43                                     ` Eric W. Biederman
2012-12-14 20:29                                 ` Serge E. Hallyn
2012-12-14 20:29                               ` Serge E. Hallyn
     [not found]                                 ` <20121214202921.GA11450-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2012-12-14 22:32                                   ` Eric W. Biederman
2012-12-14 22:32                                     ` Eric W. Biederman
     [not found]                                     ` <87bodww9hv.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-15  0:14                                       ` Serge E. Hallyn
2012-12-15  0:14                                     ` Serge E. Hallyn
     [not found]         ` <87mwxhtxve.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-13 22:39           ` Eric W. Biederman
2012-12-13 23:02           ` [GIT PULL] user namespace and namespace infrastructure changes for 3.8 Andy Lutomirski
2012-12-13 23:02             ` Andy Lutomirski
     [not found]             ` <CALCETrWxXZ1OzZeH_SGeg1E16rssxBwg+hjG09N5dkqweVKeRA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-14  4:11               ` Eric W. Biederman [this message]
2012-12-14  4:11                 ` Eric W. Biederman
     [not found]                 ` <87mwxhff2e.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-14  5:34                   ` Andy Lutomirski
2012-12-14  5:34                     ` Andy Lutomirski
     [not found]                     ` <CALCETrXagfjy4o0_JCZpMfdocYK-MpOp3eH-tPZhgazvJAy-EQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2012-12-14 17:48                       ` Eric W. Biederman
2012-12-14 17:48                         ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2012-12-17 23:18 Eric W. Biederman
2012-12-17 23:18 Eric W. Biederman
     [not found] ` <87wqwggtcu.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2012-12-18  7:47   ` Eric W. Biederman
2012-12-18  7:47     ` Eric W. Biederman
2012-12-21  7:05   ` Rob Landley
2012-12-21  7:05     ` Rob Landley
2012-12-21  7:47     ` Eric W. Biederman
2012-12-21  7:47       ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mwxhff2e.fsf@xmission.com \
    --to=ebiederm-as9lmozglivwk0htik3j/w@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
    --cc=torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.