[PATCH 0/2] capability controlled user-namespaces

linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] capability controlled user-namespaces
@ 2017-09-29 23:09 Mahesh Bandewar
       [not found] ` <20170929230952.29673-1-mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Mahesh Bandewar @ 2017-09-29 23:09 UTC (permalink / raw)
  To: LKML
  Cc: Netdev, Kernel-hardening, Linux API, Kees Cook, Serge Hallyn,
	Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar,
	Mahesh Bandewar

From: Mahesh Bandewar <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>

[Same as the previous RFC series sent on 9/21]

TL;DR version
-------------
Creating a sandbox environment with namespaces is challenging
considering what these sandboxed processes can engage into. e.g.
CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
Current form of user-namespaces, however, if changed a bit can allow
us to create a sandbox environment without locking down user-
namespaces.

Detailed version
----------------

Problem
-------
User-namespaces in the current form have increased the attack surface as
any process can acquire capabilities which are not available to them (by
default) by performing combination of clone()/unshare()/setns() syscalls.

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <sched.h>
    #include <netinet/in.h>

    int main(int ac, char **av)
    {
        int sock = -1;

        printf("Attempting to open RAW socket before unshare()...\n");
        sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0) {
            perror("socket() SOCK_RAW failed: ");
        } else {
            printf("Successfully opened RAW-Sock before unshare().\n");
            close(sock);
            sock = -1;
        }

        if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
            perror("unshare() failed: ");
            return 1;
        }

        printf("Attempting to open RAW socket after unshare()...\n");
        sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0) {
            perror("socket() SOCK_RAW failed: ");
        } else {
            printf("Successfully opened RAW-Sock after unshare().\n");
            close(sock);
            sock = -1;
        }

        return 0;
    }

The above example shows how easy it is to acquire NET_RAW capabilities
and once acquired, these processes could take benefit of above mentioned
or similar issues discovered/undiscovered with malicious intent. Note
that this is just an example and the problem/solution is not limited
to NET_RAW capability *only*. 

The easiest fix one can apply here is to lock-down user-namespaces which
many of the distros do (i.e. don't allow users to create user namespaces),
but unfortunately that prevents everyone from using them.

Approach
--------
Introduce a notion of 'controlled' user-namespaces. Every process on
the host is allowed to create user-namespaces (governed by the limit
imposed by per-ns sysctl) however, mark user-namespaces created by
sandboxed processes as 'controlled'. Use this 'mark' at the time of
capability check in conjunction with a global capability whitelist.
If the capability is not whitelisted, processes that belong to 
controlled user-namespaces will not be allowed.

Once a user-ns is marked as 'controlled'; all its child user-
namespaces are marked as 'controlled' too.

A global whitelist is list of capabilities governed by the
sysctl which is available to (privileged) user in init-ns to modify
while it's applicable to all controlled user-namespaces on the host.

Marking user-namespaces controlled without modifying the whitelist is
equivalent of the current behavior. The default value of whitelist includes
all capabilities so that the compatibility is maintained. However it gives
admins fine-grained ability to control various capabilities system wide
without locking down user-namespaces.

Please see individual patches in this series.

Mahesh Bandewar (2):
  capability: introduce sysctl for controlled user-ns capability
    whitelist
  userns: control capabilities of some user namespaces

 Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
 include/linux/capability.h      |  4 ++++
 include/linux/user_namespace.h  | 20 ++++++++++++++++
 kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c                 |  5 ++++
 kernel/user_namespace.c         |  3 +++
 security/commoncap.c            |  8 +++++++
 7 files changed, 113 insertions(+)

-- 
2.14.2.822.g60be5d43e6-goog

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [kernel-hardening] [PATCH 0/2] capability controlled user-namespaces
       [not found] ` <20170929230952.29673-1-mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org>
@ 2017-10-02 17:14   ` Serge E. Hallyn
       [not found]     ` <20171002171410.GA19611-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Serge E. Hallyn @ 2017-10-02 17:14 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: LKML, Netdev, Kernel-hardening, Linux API, Kees Cook,
	Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller,
	Mahesh Bandewar

Quoting Mahesh Bandewar (mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org):
> From: Mahesh Bandewar <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
> 
> [Same as the previous RFC series sent on 9/21]
> 
> TL;DR version
> -------------
> Creating a sandbox environment with namespaces is challenging
> considering what these sandboxed processes can engage into. e.g.
> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
> Current form of user-namespaces, however, if changed a bit can allow
> us to create a sandbox environment without locking down user-
> namespaces.
> 
> Detailed version
> ----------------

Hi,

still struggling with how I feel about the idea in general.

So is the intent mainly that if/when there comes an 0-day which allows
users with CAP_NET_ADMIN in any namespace to gain privilege on the host,
then this can be used as a stop-gap measure until there is a proper fix?

Otherwise, do you have any guidance for how people should use this?

IMO it should be heavily discouraged to use this tool as a regular
day to day configuration, as I'm not sure there is any "educated"
decision to be made, even by those who are in the know, about what
to put in this set.

> Problem
> -------
> User-namespaces in the current form have increased the attack surface as
> any process can acquire capabilities which are not available to them (by
> default) by performing combination of clone()/unshare()/setns() syscalls.
> 
>     #define _GNU_SOURCE
>     #include <stdio.h>
>     #include <sched.h>
>     #include <netinet/in.h>
> 
>     int main(int ac, char **av)
>     {
>         int sock = -1;
> 
>         printf("Attempting to open RAW socket before unshare()...\n");
>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>         if (sock < 0) {
>             perror("socket() SOCK_RAW failed: ");
>         } else {
>             printf("Successfully opened RAW-Sock before unshare().\n");
>             close(sock);
>             sock = -1;
>         }
> 
>         if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
>             perror("unshare() failed: ");
>             return 1;
>         }
> 
>         printf("Attempting to open RAW socket after unshare()...\n");
>         sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
>         if (sock < 0) {
>             perror("socket() SOCK_RAW failed: ");
>         } else {
>             printf("Successfully opened RAW-Sock after unshare().\n");
>             close(sock);
>             sock = -1;
>         }
> 
>         return 0;
>     }
> 
> The above example shows how easy it is to acquire NET_RAW capabilities
> and once acquired, these processes could take benefit of above mentioned
> or similar issues discovered/undiscovered with malicious intent. Note
> that this is just an example and the problem/solution is not limited
> to NET_RAW capability *only*. 
> 
> The easiest fix one can apply here is to lock-down user-namespaces which
> many of the distros do (i.e. don't allow users to create user namespaces),
> but unfortunately that prevents everyone from using them.
> 
> Approach
> --------
> Introduce a notion of 'controlled' user-namespaces. Every process on
> the host is allowed to create user-namespaces (governed by the limit
> imposed by per-ns sysctl) however, mark user-namespaces created by
> sandboxed processes as 'controlled'. Use this 'mark' at the time of
> capability check in conjunction with a global capability whitelist.
> If the capability is not whitelisted, processes that belong to 
> controlled user-namespaces will not be allowed.
> 
> Once a user-ns is marked as 'controlled'; all its child user-
> namespaces are marked as 'controlled' too.
> 
> A global whitelist is list of capabilities governed by the
> sysctl which is available to (privileged) user in init-ns to modify
> while it's applicable to all controlled user-namespaces on the host.
> 
> Marking user-namespaces controlled without modifying the whitelist is
> equivalent of the current behavior. The default value of whitelist includes
> all capabilities so that the compatibility is maintained. However it gives
> admins fine-grained ability to control various capabilities system wide
> without locking down user-namespaces.
> 
> Please see individual patches in this series.
> 
> Mahesh Bandewar (2):
>   capability: introduce sysctl for controlled user-ns capability
>     whitelist
>   userns: control capabilities of some user namespaces
> 
>  Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
>  include/linux/capability.h      |  4 ++++
>  include/linux/user_namespace.h  | 20 ++++++++++++++++
>  kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
>  kernel/sysctl.c                 |  5 ++++
>  kernel/user_namespace.c         |  3 +++
>  security/commoncap.c            |  8 +++++++
>  7 files changed, 113 insertions(+)
> 
> -- 
> 2.14.2.822.g60be5d43e6-goog

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [kernel-hardening] [PATCH 0/2] capability controlled user-namespaces
       [not found]     ` <20171002171410.GA19611-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
@ 2017-10-02 18:12       ` Mahesh Bandewar (महेश बंडेवार)
       [not found]         ` <CAF2d9jjc6z_cO7co_C7L7xPujqA2npa2nXpUHZbE-Le6BSGFNg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2017-10-02 18:12 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Mahesh Bandewar, LKML, Netdev, Kernel-hardening, Linux API,
	Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller

On Mon, Oct 2, 2017 at 10:14 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> Quoting Mahesh Bandewar (mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org):
>> From: Mahesh Bandewar <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>>
>> [Same as the previous RFC series sent on 9/21]
>>
>> TL;DR version
>> -------------
>> Creating a sandbox environment with namespaces is challenging
>> considering what these sandboxed processes can engage into. e.g.
>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
>> Current form of user-namespaces, however, if changed a bit can allow
>> us to create a sandbox environment without locking down user-
>> namespaces.
>>
>> Detailed version
>> ----------------
>
> Hi,
>
> still struggling with how I feel about the idea in general.
>
> So is the intent mainly that if/when there comes an 0-day which allows
> users with CAP_NET_ADMIN in any namespace to gain privilege on the host,
> then this can be used as a stop-gap measure until there is a proper fix?
>
Thank for looking at this Serge.

Yes, but at the same time it's not just limited to NET_ADMIN but could
be any of the current capabilities.

> Otherwise, do you have any guidance for how people should use this?
>
> IMO it should be heavily discouraged to use this tool as a regular
> day to day configuration, as I'm not sure there is any "educated"
> decision to be made, even by those who are in the know, about what
> to put in this set.
>
I think that really depends on the environment. e.g. in certain
sandboxes third-part / semi-trusted workload is executed where network
resource is not used. In that environment I can easily take off
NET_ADMIN and NET_RAW without affecting anything there. At the same
time I wont have to worry about 0-day related to these two
capabilities. I would say the Admins at these places are in the best
place to decide what they can take-off safely and what they cannot.
Even if they decide not to take-off anything, having a tool at hand to
gain control is important when the next 0-day strikes us that can be
exploited using any of the currently used capabilities.

However, you are absolutely right in terms of using it as a stop-gap
measure to protect environment until it's fixed and the capability in
question can not be safely taken off permanently without hampering
operations.

thanks,
--mahesh..

[...]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [kernel-hardening] [PATCH 0/2] capability controlled user-namespaces
       [not found]         ` <CAF2d9jjc6z_cO7co_C7L7xPujqA2npa2nXpUHZbE-Le6BSGFNg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-10-19 16:15           ` Mahesh Bandewar (महेश बंडेवार)
  0 siblings, 0 replies; 4+ messages in thread
From: Mahesh Bandewar (महेश बंडेवार) @ 2017-10-19 16:15 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Mahesh Bandewar, LKML, Netdev, Kernel-hardening, Linux API,
	Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller

On Mon, Oct 2, 2017 at 11:12 AM, Mahesh Bandewar (महेश बंडेवार)
<maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Mon, Oct 2, 2017 at 10:14 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
>> Quoting Mahesh Bandewar (mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org):
>>> From: Mahesh Bandewar <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>>>
>>> [Same as the previous RFC series sent on 9/21]
>>>
>>> TL;DR version
>>> -------------
>>> Creating a sandbox environment with namespaces is challenging
>>> considering what these sandboxed processes can engage into. e.g.
>>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
>>> Current form of user-namespaces, however, if changed a bit can allow
>>> us to create a sandbox environment without locking down user-
>>> namespaces.
>>>
>>> Detailed version
>>> ----------------
>>
>> Hi,
>>
>> still struggling with how I feel about the idea in general.
>>
>> So is the intent mainly that if/when there comes an 0-day which allows
>> users with CAP_NET_ADMIN in any namespace to gain privilege on the host,
>> then this can be used as a stop-gap measure until there is a proper fix?
>>
> Thank for looking at this Serge.
>
> Yes, but at the same time it's not just limited to NET_ADMIN but could
> be any of the current capabilities.
>
>> Otherwise, do you have any guidance for how people should use this?
>>
>> IMO it should be heavily discouraged to use this tool as a regular
>> day to day configuration, as I'm not sure there is any "educated"
>> decision to be made, even by those who are in the know, about what
>> to put in this set.
>>
> I think that really depends on the environment. e.g. in certain
> sandboxes third-part / semi-trusted workload is executed where network
> resource is not used. In that environment I can easily take off
> NET_ADMIN and NET_RAW without affecting anything there. At the same
> time I wont have to worry about 0-day related to these two
> capabilities. I would say the Admins at these places are in the best
> place to decide what they can take-off safely and what they cannot.
> Even if they decide not to take-off anything, having a tool at hand to
> gain control is important when the next 0-day strikes us that can be
> exploited using any of the currently used capabilities.
>
> However, you are absolutely right in terms of using it as a stop-gap
> measure to protect environment until it's fixed and the capability in
> question can not be safely taken off permanently without hampering
> operations.
>
> thanks,
> --mahesh..
>
> [...]

friendly ping.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-10-19 16:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-29 23:09 [PATCH 0/2] capability controlled user-namespaces Mahesh Bandewar
     [not found] ` <20170929230952.29673-1-mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org>
2017-10-02 17:14   ` [kernel-hardening] " Serge E. Hallyn
     [not found]     ` <20171002171410.GA19611-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2017-10-02 18:12       ` Mahesh Bandewar (महेश बंडेवार)
     [not found]         ` <CAF2d9jjc6z_cO7co_C7L7xPujqA2npa2nXpUHZbE-Le6BSGFNg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-19 16:15           ` Mahesh Bandewar (महेश बंडेवार)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).