* [PATCHv3 0/2] capability controlled user-namespaces
@ 2017-12-05 22:30 Mahesh Bandewar
[not found] ` <20171205223052.12687-1-mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org>
0 siblings, 1 reply; 19+ messages in thread
From: Mahesh Bandewar @ 2017-12-05 22:30 UTC (permalink / raw)
To: LKML, Netdev
Cc: Kernel-hardening, Linux API, Kees Cook, Serge Hallyn,
Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar,
Mahesh Bandewar
From: Mahesh Bandewar <maheshb@google.com>
TL;DR version
-------------
Creating a sandbox environment with namespaces is challenging
considering what these sandboxed processes can engage into. e.g.
CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
Current form of user-namespaces, however, if changed a bit can allow
us to create a sandbox environment without locking down user-
namespaces.
Detailed version
----------------
Problem
-------
User-namespaces in the current form have increased the attack surface as
any process can acquire capabilities which are not available to them (by
default) by performing combination of clone()/unshare()/setns() syscalls.
#define _GNU_SOURCE
#include <stdio.h>
#include <sched.h>
#include <netinet/in.h>
int main(int ac, char **av)
{
int sock = -1;
printf("Attempting to open RAW socket before unshare()...\n");
sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
if (sock < 0) {
perror("socket() SOCK_RAW failed: ");
} else {
printf("Successfully opened RAW-Sock before unshare().\n");
close(sock);
sock = -1;
}
if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
perror("unshare() failed: ");
return 1;
}
printf("Attempting to open RAW socket after unshare()...\n");
sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
if (sock < 0) {
perror("socket() SOCK_RAW failed: ");
} else {
printf("Successfully opened RAW-Sock after unshare().\n");
close(sock);
sock = -1;
}
return 0;
}
The above example shows how easy it is to acquire NET_RAW capabilities
and once acquired, these processes could take benefit of above mentioned
or similar issues discovered/undiscovered with malicious intent. Note
that this is just an example and the problem/solution is not limited
to NET_RAW capability *only*.
The easiest fix one can apply here is to lock-down user-namespaces which
many of the distros do (i.e. don't allow users to create user namespaces),
but unfortunately that prevents everyone from using them.
Approach
--------
Introduce a notion of 'controlled' user-namespaces. Every process on
the host is allowed to create user-namespaces (governed by the limit
imposed by per-ns sysctl) however, mark user-namespaces created by
sandboxed processes as 'controlled'. Use this 'mark' at the time of
capability check in conjunction with a global capability whitelist.
If the capability is not whitelisted, processes that belong to
controlled user-namespaces will not be allowed.
Once a user-ns is marked as 'controlled'; all its child user-
namespaces are marked as 'controlled' too.
A global whitelist is list of capabilities governed by the
sysctl which is available to (privileged) user in init-ns to modify
while it's applicable to all controlled user-namespaces on the host.
Marking user-namespaces controlled without modifying the whitelist is
equivalent of the current behavior. The default value of whitelist includes
all capabilities so that the compatibility is maintained. However it gives
admins fine-grained ability to control various capabilities system wide
without locking down user-namespaces.
Please see individual patches in this series.
Mahesh Bandewar (2):
capability: introduce sysctl for controlled user-ns capability whitelist
userns: control capabilities of some user namespaces
Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
include/linux/capability.h | 7 ++++++
include/linux/user_namespace.h | 25 ++++++++++++++++++++
kernel/capability.c | 52 +++++++++++++++++++++++++++++++++++++++++
kernel/sysctl.c | 5 ++++
kernel/user_namespace.c | 4 ++++
security/commoncap.c | 8 +++++++
7 files changed, 122 insertions(+)
--
2.15.0.531.g2ccb3012c9-goog
^ permalink raw reply [flat|nested] 19+ messages in thread[parent not found: <20171205223052.12687-1-mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org>]
* Re: [PATCHv3 0/2] capability controlled user-namespaces [not found] ` <20171205223052.12687-1-mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org> @ 2017-12-27 17:09 ` Mahesh Bandewar (महेश बंडेवार) 2017-12-27 20:23 ` Michael Kerrisk (man-pages) 2017-12-30 8:31 ` James Morris 0 siblings, 2 replies; 19+ messages in thread From: Mahesh Bandewar (महेश बंडेवार) @ 2017-12-27 17:09 UTC (permalink / raw) To: james.l.morris-QHcLZuEGTsvQT0dZR+AlfA Cc: LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar Hello James, Seems like I missed your name to be added into the review of this patch series. Would you be willing be pull this into the security tree? Serge Hallyn has already ACKed it. Thanks, --mahesh.. On Tue, Dec 5, 2017 at 2:30 PM, Mahesh Bandewar <mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org> wrote: > From: Mahesh Bandewar <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> > > TL;DR version > ------------- > Creating a sandbox environment with namespaces is challenging > considering what these sandboxed processes can engage into. e.g. > CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. > Current form of user-namespaces, however, if changed a bit can allow > us to create a sandbox environment without locking down user- > namespaces. > > Detailed version > ---------------- > > Problem > ------- > User-namespaces in the current form have increased the attack surface as > any process can acquire capabilities which are not available to them (by > default) by performing combination of clone()/unshare()/setns() syscalls. > > #define _GNU_SOURCE > #include <stdio.h> > #include <sched.h> > #include <netinet/in.h> > > int main(int ac, char **av) > { > int sock = -1; > > printf("Attempting to open RAW socket before unshare()...\n"); > sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); > if (sock < 0) { > perror("socket() SOCK_RAW failed: "); > } else { > printf("Successfully opened RAW-Sock before unshare().\n"); > close(sock); > sock = -1; > } > > if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) { > perror("unshare() failed: "); > return 1; > } > > printf("Attempting to open RAW socket after unshare()...\n"); > sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); > if (sock < 0) { > perror("socket() SOCK_RAW failed: "); > } else { > printf("Successfully opened RAW-Sock after unshare().\n"); > close(sock); > sock = -1; > } > > return 0; > } > > The above example shows how easy it is to acquire NET_RAW capabilities > and once acquired, these processes could take benefit of above mentioned > or similar issues discovered/undiscovered with malicious intent. Note > that this is just an example and the problem/solution is not limited > to NET_RAW capability *only*. > > The easiest fix one can apply here is to lock-down user-namespaces which > many of the distros do (i.e. don't allow users to create user namespaces), > but unfortunately that prevents everyone from using them. > > Approach > -------- > Introduce a notion of 'controlled' user-namespaces. Every process on > the host is allowed to create user-namespaces (governed by the limit > imposed by per-ns sysctl) however, mark user-namespaces created by > sandboxed processes as 'controlled'. Use this 'mark' at the time of > capability check in conjunction with a global capability whitelist. > If the capability is not whitelisted, processes that belong to > controlled user-namespaces will not be allowed. > > Once a user-ns is marked as 'controlled'; all its child user- > namespaces are marked as 'controlled' too. > > A global whitelist is list of capabilities governed by the > sysctl which is available to (privileged) user in init-ns to modify > while it's applicable to all controlled user-namespaces on the host. > > Marking user-namespaces controlled without modifying the whitelist is > equivalent of the current behavior. The default value of whitelist includes > all capabilities so that the compatibility is maintained. However it gives > admins fine-grained ability to control various capabilities system wide > without locking down user-namespaces. > > Please see individual patches in this series. > > Mahesh Bandewar (2): > capability: introduce sysctl for controlled user-ns capability whitelist > userns: control capabilities of some user namespaces > > Documentation/sysctl/kernel.txt | 21 +++++++++++++++++ > include/linux/capability.h | 7 ++++++ > include/linux/user_namespace.h | 25 ++++++++++++++++++++ > kernel/capability.c | 52 +++++++++++++++++++++++++++++++++++++++++ > kernel/sysctl.c | 5 ++++ > kernel/user_namespace.c | 4 ++++ > security/commoncap.c | 8 +++++++ > 7 files changed, 122 insertions(+) > > -- > 2.15.0.531.g2ccb3012c9-goog > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2017-12-27 17:09 ` Mahesh Bandewar (महेश बंडेवार) @ 2017-12-27 20:23 ` Michael Kerrisk (man-pages) 2017-12-28 0:45 ` Mahesh Bandewar (महेश बंडेवार) 2017-12-30 8:31 ` James Morris 1 sibling, 1 reply; 19+ messages in thread From: Michael Kerrisk (man-pages) @ 2017-12-27 20:23 UTC (permalink / raw) To: Mahesh Bandewar (महेश बंडेवार) Cc: James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar Hello Mahesh, On 27 December 2017 at 18:09, Mahesh Bandewar (महेश बंडेवार) <maheshb@google.com> wrote: > Hello James, > > Seems like I missed your name to be added into the review of this > patch series. Would you be willing be pull this into the security > tree? Serge Hallyn has already ACKed it. We seem to have no formal documentation/specification of this feature. I think that should be written up before this patch goes into mainline... Cheers, Michael > > On Tue, Dec 5, 2017 at 2:30 PM, Mahesh Bandewar <mahesh@bandewar.net> wrote: >> From: Mahesh Bandewar <maheshb@google.com> >> >> TL;DR version >> ------------- >> Creating a sandbox environment with namespaces is challenging >> considering what these sandboxed processes can engage into. e.g. >> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. >> Current form of user-namespaces, however, if changed a bit can allow >> us to create a sandbox environment without locking down user- >> namespaces. >> >> Detailed version >> ---------------- >> >> Problem >> ------- >> User-namespaces in the current form have increased the attack surface as >> any process can acquire capabilities which are not available to them (by >> default) by performing combination of clone()/unshare()/setns() syscalls. >> >> #define _GNU_SOURCE >> #include <stdio.h> >> #include <sched.h> >> #include <netinet/in.h> >> >> int main(int ac, char **av) >> { >> int sock = -1; >> >> printf("Attempting to open RAW socket before unshare()...\n"); >> sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >> if (sock < 0) { >> perror("socket() SOCK_RAW failed: "); >> } else { >> printf("Successfully opened RAW-Sock before unshare().\n"); >> close(sock); >> sock = -1; >> } >> >> if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) { >> perror("unshare() failed: "); >> return 1; >> } >> >> printf("Attempting to open RAW socket after unshare()...\n"); >> sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >> if (sock < 0) { >> perror("socket() SOCK_RAW failed: "); >> } else { >> printf("Successfully opened RAW-Sock after unshare().\n"); >> close(sock); >> sock = -1; >> } >> >> return 0; >> } >> >> The above example shows how easy it is to acquire NET_RAW capabilities >> and once acquired, these processes could take benefit of above mentioned >> or similar issues discovered/undiscovered with malicious intent. Note >> that this is just an example and the problem/solution is not limited >> to NET_RAW capability *only*. >> >> The easiest fix one can apply here is to lock-down user-namespaces which >> many of the distros do (i.e. don't allow users to create user namespaces), >> but unfortunately that prevents everyone from using them. >> >> Approach >> -------- >> Introduce a notion of 'controlled' user-namespaces. Every process on >> the host is allowed to create user-namespaces (governed by the limit >> imposed by per-ns sysctl) however, mark user-namespaces created by >> sandboxed processes as 'controlled'. Use this 'mark' at the time of >> capability check in conjunction with a global capability whitelist. >> If the capability is not whitelisted, processes that belong to >> controlled user-namespaces will not be allowed. >> >> Once a user-ns is marked as 'controlled'; all its child user- >> namespaces are marked as 'controlled' too. >> >> A global whitelist is list of capabilities governed by the >> sysctl which is available to (privileged) user in init-ns to modify >> while it's applicable to all controlled user-namespaces on the host. >> >> Marking user-namespaces controlled without modifying the whitelist is >> equivalent of the current behavior. The default value of whitelist includes >> all capabilities so that the compatibility is maintained. However it gives >> admins fine-grained ability to control various capabilities system wide >> without locking down user-namespaces. >> >> Please see individual patches in this series. >> >> Mahesh Bandewar (2): >> capability: introduce sysctl for controlled user-ns capability whitelist >> userns: control capabilities of some user namespaces >> >> Documentation/sysctl/kernel.txt | 21 +++++++++++++++++ >> include/linux/capability.h | 7 ++++++ >> include/linux/user_namespace.h | 25 ++++++++++++++++++++ >> kernel/capability.c | 52 +++++++++++++++++++++++++++++++++++++++++ >> kernel/sysctl.c | 5 ++++ >> kernel/user_namespace.c | 4 ++++ >> security/commoncap.c | 8 +++++++ >> 7 files changed, 122 insertions(+) >> >> -- >> 2.15.0.531.g2ccb3012c9-goog >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-api" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2017-12-27 20:23 ` Michael Kerrisk (man-pages) @ 2017-12-28 0:45 ` Mahesh Bandewar (महेश बंडेवार) [not found] ` <CAF2d9jjCJxu+oiCCSa1zN8OxfdiCMQb4dx7Mc0YdNgJuMNkOzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Mahesh Bandewar (महेश बंडेवार) @ 2017-12-28 0:45 UTC (permalink / raw) To: mtk.manpages Cc: James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar On Wed, Dec 27, 2017 at 12:23 PM, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote: > Hello Mahesh, > > On 27 December 2017 at 18:09, Mahesh Bandewar (महेश बंडेवार) > <maheshb@google.com> wrote: >> Hello James, >> >> Seems like I missed your name to be added into the review of this >> patch series. Would you be willing be pull this into the security >> tree? Serge Hallyn has already ACKed it. > > We seem to have no formal documentation/specification of this feature. > I think that should be written up before this patch goes into > mainline... > absolutely. I have added enough information into the Documentation dir relevant to this feature (please look at the individual patches), that could be used. I could help if needed. thanks, --mahesh.. > Cheers, > > Michael > > >> >> On Tue, Dec 5, 2017 at 2:30 PM, Mahesh Bandewar <mahesh@bandewar.net> wrote: >>> From: Mahesh Bandewar <maheshb@google.com> >>> >>> TL;DR version >>> ------------- >>> Creating a sandbox environment with namespaces is challenging >>> considering what these sandboxed processes can engage into. e.g. >>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. >>> Current form of user-namespaces, however, if changed a bit can allow >>> us to create a sandbox environment without locking down user- >>> namespaces. >>> >>> Detailed version >>> ---------------- >>> >>> Problem >>> ------- >>> User-namespaces in the current form have increased the attack surface as >>> any process can acquire capabilities which are not available to them (by >>> default) by performing combination of clone()/unshare()/setns() syscalls. >>> >>> #define _GNU_SOURCE >>> #include <stdio.h> >>> #include <sched.h> >>> #include <netinet/in.h> >>> >>> int main(int ac, char **av) >>> { >>> int sock = -1; >>> >>> printf("Attempting to open RAW socket before unshare()...\n"); >>> sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >>> if (sock < 0) { >>> perror("socket() SOCK_RAW failed: "); >>> } else { >>> printf("Successfully opened RAW-Sock before unshare().\n"); >>> close(sock); >>> sock = -1; >>> } >>> >>> if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) { >>> perror("unshare() failed: "); >>> return 1; >>> } >>> >>> printf("Attempting to open RAW socket after unshare()...\n"); >>> sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >>> if (sock < 0) { >>> perror("socket() SOCK_RAW failed: "); >>> } else { >>> printf("Successfully opened RAW-Sock after unshare().\n"); >>> close(sock); >>> sock = -1; >>> } >>> >>> return 0; >>> } >>> >>> The above example shows how easy it is to acquire NET_RAW capabilities >>> and once acquired, these processes could take benefit of above mentioned >>> or similar issues discovered/undiscovered with malicious intent. Note >>> that this is just an example and the problem/solution is not limited >>> to NET_RAW capability *only*. >>> >>> The easiest fix one can apply here is to lock-down user-namespaces which >>> many of the distros do (i.e. don't allow users to create user namespaces), >>> but unfortunately that prevents everyone from using them. >>> >>> Approach >>> -------- >>> Introduce a notion of 'controlled' user-namespaces. Every process on >>> the host is allowed to create user-namespaces (governed by the limit >>> imposed by per-ns sysctl) however, mark user-namespaces created by >>> sandboxed processes as 'controlled'. Use this 'mark' at the time of >>> capability check in conjunction with a global capability whitelist. >>> If the capability is not whitelisted, processes that belong to >>> controlled user-namespaces will not be allowed. >>> >>> Once a user-ns is marked as 'controlled'; all its child user- >>> namespaces are marked as 'controlled' too. >>> >>> A global whitelist is list of capabilities governed by the >>> sysctl which is available to (privileged) user in init-ns to modify >>> while it's applicable to all controlled user-namespaces on the host. >>> >>> Marking user-namespaces controlled without modifying the whitelist is >>> equivalent of the current behavior. The default value of whitelist includes >>> all capabilities so that the compatibility is maintained. However it gives >>> admins fine-grained ability to control various capabilities system wide >>> without locking down user-namespaces. >>> >>> Please see individual patches in this series. >>> >>> Mahesh Bandewar (2): >>> capability: introduce sysctl for controlled user-ns capability whitelist >>> userns: control capabilities of some user namespaces >>> >>> Documentation/sysctl/kernel.txt | 21 +++++++++++++++++ >>> include/linux/capability.h | 7 ++++++ >>> include/linux/user_namespace.h | 25 ++++++++++++++++++++ >>> kernel/capability.c | 52 +++++++++++++++++++++++++++++++++++++++++ >>> kernel/sysctl.c | 5 ++++ >>> kernel/user_namespace.c | 4 ++++ >>> security/commoncap.c | 8 +++++++ >>> 7 files changed, 122 insertions(+) >>> >>> -- >>> 2.15.0.531.g2ccb3012c9-goog >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-api" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CAF2d9jjCJxu+oiCCSa1zN8OxfdiCMQb4dx7Mc0YdNgJuMNkOzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCHv3 0/2] capability controlled user-namespaces [not found] ` <CAF2d9jjCJxu+oiCCSa1zN8OxfdiCMQb4dx7Mc0YdNgJuMNkOzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-12-30 8:50 ` Michael Kerrisk (man-pages) 2018-01-03 1:35 ` Mahesh Bandewar (महेश बंडेवार) 0 siblings, 1 reply; 19+ messages in thread From: Michael Kerrisk (man-pages) @ 2017-12-30 8:50 UTC (permalink / raw) To: Mahesh Bandewar (महेश बंडेवार) Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar Hello Mahesh, On 12/28/2017 01:45 AM, Mahesh Bandewar (महेश बंडेवार) wrote: > On Wed, Dec 27, 2017 at 12:23 PM, Michael Kerrisk (man-pages) > <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Hello Mahesh, >> >> On 27 December 2017 at 18:09, Mahesh Bandewar (महेश बंडेवार) >> <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote: >>> Hello James, >>> >>> Seems like I missed your name to be added into the review of this >>> patch series. Would you be willing be pull this into the security >>> tree? Serge Hallyn has already ACKed it. >> >> We seem to have no formal documentation/specification of this feature. >> I think that should be written up before this patch goes into >> mainline... >> > absolutely. I have added enough information into the Documentation dir > relevant to this feature (please look at the individual patches), > that could be used. I could help if needed. Yes, but I think that the documentation is rather incomplete. I'll also reply to the relevant Documentation thread. See also some comments below about this commit message, which should make things *much* easier for the reader. >>> On Tue, Dec 5, 2017 at 2:30 PM, Mahesh Bandewar <mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org> wrote: >>>> From: Mahesh Bandewar <maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> >>>> >>>> TL;DR version >>>> ------------- >>>> Creating a sandbox environment with namespaces is challenging >>>> considering what these sandboxed processes can engage into. e.g. >>>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. >>>> Current form of user-namespaces, however, if changed a bit can allow >>>> us to create a sandbox environment without locking down user- >>>> namespaces. >>>> >>>> Detailed version >>>> ---------------- >>>> >>>> Problem >>>> ------- >>>> User-namespaces in the current form have increased the attack surface as >>>> any process can acquire capabilities which are not available to them (by >>>> default) by performing combination of clone()/unshare()/setns() syscalls. >>>> >>>> #define _GNU_SOURCE >>>> #include <stdio.h> >>>> #include <sched.h> >>>> #include <netinet/in.h> >>>> >>>> int main(int ac, char **av) >>>> { >>>> int sock = -1; >>>> >>>> printf("Attempting to open RAW socket before unshare()...\n"); >>>> sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >>>> if (sock < 0) { >>>> perror("socket() SOCK_RAW failed: "); >>>> } else { >>>> printf("Successfully opened RAW-Sock before unshare().\n"); >>>> close(sock); >>>> sock = -1; >>>> } >>>> >>>> if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) { >>>> perror("unshare() failed: "); >>>> return 1; >>>> } >>>> >>>> printf("Attempting to open RAW socket after unshare()...\n"); >>>> sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >>>> if (sock < 0) { >>>> perror("socket() SOCK_RAW failed: "); >>>> } else { >>>> printf("Successfully opened RAW-Sock after unshare().\n"); >>>> close(sock); >>>> sock = -1; >>>> } >>>> >>>> return 0; >>>> } >>>> >>>> The above example shows how easy it is to acquire NET_RAW capabilities >>>> and once acquired, these processes could take benefit of above mentioned >>>> or similar issues discovered/undiscovered with malicious intent. But you do not actually describe what the problem is. I think it's not sufficient to simply refer to some CVEs. Your mail message/commit should clearly describe what the issue is, rather than leave the reader to decipher a bunch of CVEs, and derive your concerns from those CVEs. >>>> Note >>>> that this is just an example and the problem/solution is not limited >>>> to NET_RAW capability *only*. >>>> >>>> The easiest fix one can apply here is to lock-down user-namespaces which >>>> many of the distros do (i.e. don't allow users to create user namespaces), >>>> but unfortunately that prevents everyone from using them. >>>> >>>> Approach >>>> -------- >>>> Introduce a notion of 'controlled' user-namespaces. Every process on >>>> the host is allowed to create user-namespaces (governed by the limit >>>> imposed by per-ns sysctl) however, mark user-namespaces created by >>>> sandboxed processes as 'controlled'. Use this 'mark' at the time of >>>> capability check in conjunction with a global capability whitelist. >>>> If the capability is not whitelisted, processes that belong to >>>> controlled user-namespaces will not be allowed. >>>> >>>> Once a user-ns is marked as 'controlled'; all its child user- >>>> namespaces are marked as 'controlled' too. How is a user-ns marked as "controlled"? It is not clear at this point. Please clarify this in your cover mail (and on the Documentation patch.) >>>> A global whitelist is list of capabilities governed by the >>>> sysctl which is available to (privileged) user in init-ns to modify What "the sysctl? Please name it at this point. (This may be purely a language issue. Do you mean "...governed by *a* sysctl, [sysctl-name-inserted-here]"?) >>>> while it's applicable to all controlled user-namespaces on the host. >>>> >>>> Marking user-namespaces controlled without modifying the whitelist is >>>> equivalent of the current behavior. The default value of whitelist includes >>>> all capabilities so that the compatibility is maintained. However it gives >>>> admins fine-grained ability to control various capabilities system wide >>>> without locking down user-namespaces. Is there a way that a process can see whether it is a controlled user-ns versus an uncontrolled user-ns? I think it would be good to explain that in this cover mail, and perhaps also in the documentation patch. In general, it's not too obvious what you are trying to do, based on this commit message. Can I suggest including as part of the commit messages a walk through shell session that demonstrates the use of these interfaces and how they allow/disallow capabilities. I think such a walkthrough might also be worth including in the Documentation patch. Thanks, Michael >>>> >>>> Please see individual patches in this series. >>>> >>>> Mahesh Bandewar (2): >>>> capability: introduce sysctl for controlled user-ns capability whitelist >>>> userns: control capabilities of some user namespaces >>>> >>>> Documentation/sysctl/kernel.txt | 21 +++++++++++++++++ >>>> include/linux/capability.h | 7 ++++++ >>>> include/linux/user_namespace.h | 25 ++++++++++++++++++++ >>>> kernel/capability.c | 52 +++++++++++++++++++++++++++++++++++++++++ >>>> kernel/sysctl.c | 5 ++++ >>>> kernel/user_namespace.c | 4 ++++ >>>> security/commoncap.c | 8 +++++++ >>>> 7 files changed, 122 insertions(+) >>>> >>>> -- >>>> 2.15.0.531.g2ccb3012c9-goog >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-api" in >>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> Michael Kerrisk >> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ >> Linux/UNIX System Programming Training: http://man7.org/training/ > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2017-12-30 8:50 ` Michael Kerrisk (man-pages) @ 2018-01-03 1:35 ` Mahesh Bandewar (महेश बंडेवार) 0 siblings, 0 replies; 19+ messages in thread From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-03 1:35 UTC (permalink / raw) To: Michael Kerrisk (man-pages) Cc: James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar Hello Michael, I really don't want to turn this into how-to-hack guide but I do see few points in your argument to make the case clearer. Please see the comments inline. On Sat, Dec 30, 2017 at 12:50 AM, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote: > Hello Mahesh, > > On 12/28/2017 01:45 AM, Mahesh Bandewar (महेश बंडेवार) wrote: >> On Wed, Dec 27, 2017 at 12:23 PM, Michael Kerrisk (man-pages) >> <mtk.manpages@gmail.com> wrote: >>> Hello Mahesh, >>> >>> On 27 December 2017 at 18:09, Mahesh Bandewar (महेश बंडेवार) >>> <maheshb@google.com> wrote: >>>> Hello James, >>>> >>>> Seems like I missed your name to be added into the review of this >>>> patch series. Would you be willing be pull this into the security >>>> tree? Serge Hallyn has already ACKed it. >>> >>> We seem to have no formal documentation/specification of this feature. >>> I think that should be written up before this patch goes into >>> mainline... >>> >> absolutely. I have added enough information into the Documentation dir >> relevant to this feature (please look at the individual patches), >> that could be used. I could help if needed. > > Yes, but I think that the documentation is rather incomplete. > I'll also reply to the relevant Documentation thread. > > See also some comments below about this commit message, which > should make things *much* easier for the reader. > >>>> On Tue, Dec 5, 2017 at 2:30 PM, Mahesh Bandewar <mahesh@bandewar.net> wrote: >>>>> From: Mahesh Bandewar <maheshb@google.com> >>>>> >>>>> TL;DR version >>>>> ------------- >>>>> Creating a sandbox environment with namespaces is challenging >>>>> considering what these sandboxed processes can engage into. e.g. >>>>> CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. >>>>> Current form of user-namespaces, however, if changed a bit can allow >>>>> us to create a sandbox environment without locking down user- >>>>> namespaces. >>>>> >>>>> Detailed version >>>>> ---------------- >>>>> >>>>> Problem >>>>> ------- >>>>> User-namespaces in the current form have increased the attack surface as >>>>> any process can acquire capabilities which are not available to them (by >>>>> default) by performing combination of clone()/unshare()/setns() syscalls. >>>>> >>>>> #define _GNU_SOURCE >>>>> #include <stdio.h> >>>>> #include <sched.h> >>>>> #include <netinet/in.h> >>>>> >>>>> int main(int ac, char **av) >>>>> { >>>>> int sock = -1; >>>>> >>>>> printf("Attempting to open RAW socket before unshare()...\n"); >>>>> sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >>>>> if (sock < 0) { >>>>> perror("socket() SOCK_RAW failed: "); >>>>> } else { >>>>> printf("Successfully opened RAW-Sock before unshare().\n"); >>>>> close(sock); >>>>> sock = -1; >>>>> } >>>>> >>>>> if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) { >>>>> perror("unshare() failed: "); >>>>> return 1; >>>>> } >>>>> >>>>> printf("Attempting to open RAW socket after unshare()...\n"); >>>>> sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >>>>> if (sock < 0) { >>>>> perror("socket() SOCK_RAW failed: "); >>>>> } else { >>>>> printf("Successfully opened RAW-Sock after unshare().\n"); >>>>> close(sock); >>>>> sock = -1; >>>>> } >>>>> >>>>> return 0; >>>>> } >>>>> >>>>> The above example shows how easy it is to acquire NET_RAW capabilities >>>>> and once acquired, these processes could take benefit of above mentioned >>>>> or similar issues discovered/undiscovered with malicious intent. > > But you do not actually describe what the problem is. I think > it's not sufficient to simply refer to some CVEs. > Your mail message/commit should clearly describe what the issue is, > rather than leave the reader to decipher a bunch of CVEs, and derive > your concerns from those CVEs. > I have mentioned in the 'problem' section of this log - how easy to acquire 'the capability' and then CVE describes if you have 'the capability' you can exploit! So I'm not sure why there is any decipher needed. Also that is the general example while this patch-set addresses this for any generic capability and not just the one mentioned in the a.m.CVEs. So rather than going deep into the CVE, I will try to demonstrate the modified behavior to give an idea. >>>>> Note >>>>> that this is just an example and the problem/solution is not limited >>>>> to NET_RAW capability *only*. >>>>> >>>>> The easiest fix one can apply here is to lock-down user-namespaces which >>>>> many of the distros do (i.e. don't allow users to create user namespaces), >>>>> but unfortunately that prevents everyone from using them. >>>>> >>>>> Approach >>>>> -------- >>>>> Introduce a notion of 'controlled' user-namespaces. Every process on >>>>> the host is allowed to create user-namespaces (governed by the limit >>>>> imposed by per-ns sysctl) however, mark user-namespaces created by >>>>> sandboxed processes as 'controlled'. Use this 'mark' at the time of >>>>> capability check in conjunction with a global capability whitelist. >>>>> If the capability is not whitelisted, processes that belong to >>>>> controlled user-namespaces will not be allowed. >>>>> >>>>> Once a user-ns is marked as 'controlled'; all its child user- >>>>> namespaces are marked as 'controlled' too. > > How is a user-ns marked as "controlled"? It is not clear at this point. > Please clarify this in your cover mail (and on the Documentation patch.) > Yes, I would add some more text describing how user-ns gets marked as controlled. It's actually part of the Documentation patch in this series but it does makes sense to add some text here to describe it clearly. >>>>> A global whitelist is list of capabilities governed by the >>>>> sysctl which is available to (privileged) user in init-ns to modify > > What "the sysctl? Please name it at this point. (This may be purely a > language issue. Do you mean "...governed by *a* sysctl, > [sysctl-name-inserted-here]"?) > Correct, '..governed by a sysctl var kernel.controlled_userns_caps_whitelist'. >>>>> while it's applicable to all controlled user-namespaces on the host. >>>>> >>>>> Marking user-namespaces controlled without modifying the whitelist is >>>>> equivalent of the current behavior. The default value of whitelist includes >>>>> all capabilities so that the compatibility is maintained. However it gives >>>>> admins fine-grained ability to control various capabilities system wide >>>>> without locking down user-namespaces. > > Is there a way that a process can see whether it is a controlled user-ns > versus an uncontrolled user-ns? I think it would be good to explain that > in this cover mail, and perhaps also in the documentation patch. > There is no direct way of doing this and I'm not sure it's a good idea/investment to add a new syscall just for that. > In general, it's not too obvious what you are trying to do, based on > this commit message. > > Can I suggest including as part of the commit messages a walk through > shell session that demonstrates the use of these interfaces and how > they allow/disallow capabilities. I think such a walkthrough might also > be worth including in the Documentation patch. > OK, I'll add something to address these concerns. Thanks, --mahesh.. > Thanks, > > Michael > > >>>>> >>>>> Please see individual patches in this series. >>>>> >>>>> Mahesh Bandewar (2): >>>>> capability: introduce sysctl for controlled user-ns capability whitelist >>>>> userns: control capabilities of some user namespaces >>>>> >>>>> Documentation/sysctl/kernel.txt | 21 +++++++++++++++++ >>>>> include/linux/capability.h | 7 ++++++ >>>>> include/linux/user_namespace.h | 25 ++++++++++++++++++++ >>>>> kernel/capability.c | 52 +++++++++++++++++++++++++++++++++++++++++ >>>>> kernel/sysctl.c | 5 ++++ >>>>> kernel/user_namespace.c | 4 ++++ >>>>> security/commoncap.c | 8 +++++++ >>>>> 7 files changed, 122 insertions(+) >>>>> >>>>> -- >>>>> 2.15.0.531.g2ccb3012c9-goog >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-api" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> >>> -- >>> Michael Kerrisk >>> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ >>> Linux/UNIX System Programming Training: http://man7.org/training/ >> > > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > Linux/UNIX System Programming Training: http://man7.org/training/ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2017-12-27 17:09 ` Mahesh Bandewar (महेश बंडेवार) 2017-12-27 20:23 ` Michael Kerrisk (man-pages) @ 2017-12-30 8:31 ` James Morris 2018-01-03 1:30 ` Mahesh Bandewar (महेश बंडेवार) 1 sibling, 1 reply; 19+ messages in thread From: James Morris @ 2017-12-30 8:31 UTC (permalink / raw) To: Mahesh Bandewar (महेश बंडेवार) Cc: LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar [-- Attachment #1: Type: text/plain, Size: 4949 bytes --] On Wed, 27 Dec 2017, Mahesh Bandewar (महेश बंडेवार) wrote: > Hello James, > > Seems like I missed your name to be added into the review of this > patch series. Would you be willing be pull this into the security > tree? Serge Hallyn has already ACKed it. Sure! > > Thanks, > --mahesh.. > > On Tue, Dec 5, 2017 at 2:30 PM, Mahesh Bandewar <mahesh@bandewar.net> wrote: > > From: Mahesh Bandewar <maheshb@google.com> > > > > TL;DR version > > ------------- > > Creating a sandbox environment with namespaces is challenging > > considering what these sandboxed processes can engage into. e.g. > > CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. > > Current form of user-namespaces, however, if changed a bit can allow > > us to create a sandbox environment without locking down user- > > namespaces. > > > > Detailed version > > ---------------- > > > > Problem > > ------- > > User-namespaces in the current form have increased the attack surface as > > any process can acquire capabilities which are not available to them (by > > default) by performing combination of clone()/unshare()/setns() syscalls. > > > > #define _GNU_SOURCE > > #include <stdio.h> > > #include <sched.h> > > #include <netinet/in.h> > > > > int main(int ac, char **av) > > { > > int sock = -1; > > > > printf("Attempting to open RAW socket before unshare()...\n"); > > sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); > > if (sock < 0) { > > perror("socket() SOCK_RAW failed: "); > > } else { > > printf("Successfully opened RAW-Sock before unshare().\n"); > > close(sock); > > sock = -1; > > } > > > > if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) { > > perror("unshare() failed: "); > > return 1; > > } > > > > printf("Attempting to open RAW socket after unshare()...\n"); > > sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); > > if (sock < 0) { > > perror("socket() SOCK_RAW failed: "); > > } else { > > printf("Successfully opened RAW-Sock after unshare().\n"); > > close(sock); > > sock = -1; > > } > > > > return 0; > > } > > > > The above example shows how easy it is to acquire NET_RAW capabilities > > and once acquired, these processes could take benefit of above mentioned > > or similar issues discovered/undiscovered with malicious intent. Note > > that this is just an example and the problem/solution is not limited > > to NET_RAW capability *only*. > > > > The easiest fix one can apply here is to lock-down user-namespaces which > > many of the distros do (i.e. don't allow users to create user namespaces), > > but unfortunately that prevents everyone from using them. > > > > Approach > > -------- > > Introduce a notion of 'controlled' user-namespaces. Every process on > > the host is allowed to create user-namespaces (governed by the limit > > imposed by per-ns sysctl) however, mark user-namespaces created by > > sandboxed processes as 'controlled'. Use this 'mark' at the time of > > capability check in conjunction with a global capability whitelist. > > If the capability is not whitelisted, processes that belong to > > controlled user-namespaces will not be allowed. > > > > Once a user-ns is marked as 'controlled'; all its child user- > > namespaces are marked as 'controlled' too. > > > > A global whitelist is list of capabilities governed by the > > sysctl which is available to (privileged) user in init-ns to modify > > while it's applicable to all controlled user-namespaces on the host. > > > > Marking user-namespaces controlled without modifying the whitelist is > > equivalent of the current behavior. The default value of whitelist includes > > all capabilities so that the compatibility is maintained. However it gives > > admins fine-grained ability to control various capabilities system wide > > without locking down user-namespaces. > > > > Please see individual patches in this series. > > > > Mahesh Bandewar (2): > > capability: introduce sysctl for controlled user-ns capability whitelist > > userns: control capabilities of some user namespaces > > > > Documentation/sysctl/kernel.txt | 21 +++++++++++++++++ > > include/linux/capability.h | 7 ++++++ > > include/linux/user_namespace.h | 25 ++++++++++++++++++++ > > kernel/capability.c | 52 +++++++++++++++++++++++++++++++++++++++++ > > kernel/sysctl.c | 5 ++++ > > kernel/user_namespace.c | 4 ++++ > > security/commoncap.c | 8 +++++++ > > 7 files changed, 122 insertions(+) > > > > -- > > 2.15.0.531.g2ccb3012c9-goog > > > -- James Morris <james.l.morris@oracle.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2017-12-30 8:31 ` James Morris @ 2018-01-03 1:30 ` Mahesh Bandewar (महेश बंडेवार) 2018-01-08 0:35 ` James Morris 0 siblings, 1 reply; 19+ messages in thread From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-03 1:30 UTC (permalink / raw) To: James Morris Cc: LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar On Sat, Dec 30, 2017 at 12:31 AM, James Morris <james.l.morris@oracle.com> wrote: > On Wed, 27 Dec 2017, Mahesh Bandewar (महेश बंडेवार) wrote: > >> Hello James, >> >> Seems like I missed your name to be added into the review of this >> patch series. Would you be willing be pull this into the security >> tree? Serge Hallyn has already ACKed it. > > Sure! > Thank you James. > >> >> Thanks, >> --mahesh.. >> >> On Tue, Dec 5, 2017 at 2:30 PM, Mahesh Bandewar <mahesh@bandewar.net> wrote: >> > From: Mahesh Bandewar <maheshb@google.com> >> > >> > TL;DR version >> > ------------- >> > Creating a sandbox environment with namespaces is challenging >> > considering what these sandboxed processes can engage into. e.g. >> > CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few. >> > Current form of user-namespaces, however, if changed a bit can allow >> > us to create a sandbox environment without locking down user- >> > namespaces. >> > >> > Detailed version >> > ---------------- >> > >> > Problem >> > ------- >> > User-namespaces in the current form have increased the attack surface as >> > any process can acquire capabilities which are not available to them (by >> > default) by performing combination of clone()/unshare()/setns() syscalls. >> > >> > #define _GNU_SOURCE >> > #include <stdio.h> >> > #include <sched.h> >> > #include <netinet/in.h> >> > >> > int main(int ac, char **av) >> > { >> > int sock = -1; >> > >> > printf("Attempting to open RAW socket before unshare()...\n"); >> > sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >> > if (sock < 0) { >> > perror("socket() SOCK_RAW failed: "); >> > } else { >> > printf("Successfully opened RAW-Sock before unshare().\n"); >> > close(sock); >> > sock = -1; >> > } >> > >> > if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) { >> > perror("unshare() failed: "); >> > return 1; >> > } >> > >> > printf("Attempting to open RAW socket after unshare()...\n"); >> > sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW); >> > if (sock < 0) { >> > perror("socket() SOCK_RAW failed: "); >> > } else { >> > printf("Successfully opened RAW-Sock after unshare().\n"); >> > close(sock); >> > sock = -1; >> > } >> > >> > return 0; >> > } >> > >> > The above example shows how easy it is to acquire NET_RAW capabilities >> > and once acquired, these processes could take benefit of above mentioned >> > or similar issues discovered/undiscovered with malicious intent. Note >> > that this is just an example and the problem/solution is not limited >> > to NET_RAW capability *only*. >> > >> > The easiest fix one can apply here is to lock-down user-namespaces which >> > many of the distros do (i.e. don't allow users to create user namespaces), >> > but unfortunately that prevents everyone from using them. >> > >> > Approach >> > -------- >> > Introduce a notion of 'controlled' user-namespaces. Every process on >> > the host is allowed to create user-namespaces (governed by the limit >> > imposed by per-ns sysctl) however, mark user-namespaces created by >> > sandboxed processes as 'controlled'. Use this 'mark' at the time of >> > capability check in conjunction with a global capability whitelist. >> > If the capability is not whitelisted, processes that belong to >> > controlled user-namespaces will not be allowed. >> > >> > Once a user-ns is marked as 'controlled'; all its child user- >> > namespaces are marked as 'controlled' too. >> > >> > A global whitelist is list of capabilities governed by the >> > sysctl which is available to (privileged) user in init-ns to modify >> > while it's applicable to all controlled user-namespaces on the host. >> > >> > Marking user-namespaces controlled without modifying the whitelist is >> > equivalent of the current behavior. The default value of whitelist includes >> > all capabilities so that the compatibility is maintained. However it gives >> > admins fine-grained ability to control various capabilities system wide >> > without locking down user-namespaces. >> > >> > Please see individual patches in this series. >> > >> > Mahesh Bandewar (2): >> > capability: introduce sysctl for controlled user-ns capability whitelist >> > userns: control capabilities of some user namespaces >> > >> > Documentation/sysctl/kernel.txt | 21 +++++++++++++++++ >> > include/linux/capability.h | 7 ++++++ >> > include/linux/user_namespace.h | 25 ++++++++++++++++++++ >> > kernel/capability.c | 52 +++++++++++++++++++++++++++++++++++++++++ >> > kernel/sysctl.c | 5 ++++ >> > kernel/user_namespace.c | 4 ++++ >> > security/commoncap.c | 8 +++++++ >> > 7 files changed, 122 insertions(+) >> > >> > -- >> > 2.15.0.531.g2ccb3012c9-goog >> > >> > > -- > James Morris > <james.l.morris@oracle.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2018-01-03 1:30 ` Mahesh Bandewar (महेश बंडेवार) @ 2018-01-08 0:35 ` James Morris 2018-01-08 6:24 ` Serge E. Hallyn 0 siblings, 1 reply; 19+ messages in thread From: James Morris @ 2018-01-08 0:35 UTC (permalink / raw) To: Mahesh Bandewar (महेश बंडेवार) Cc: LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar [-- Attachment #1: Type: text/plain, Size: 750 bytes --] On Tue, 2 Jan 2018, Mahesh Bandewar (महेश बंडेवार) wrote: > On Sat, Dec 30, 2017 at 12:31 AM, James Morris > <james.l.morris@oracle.com> wrote: > > On Wed, 27 Dec 2017, Mahesh Bandewar (महेश बंडेवार) wrote: > > > >> Hello James, > >> > >> Seems like I missed your name to be added into the review of this > >> patch series. Would you be willing be pull this into the security > >> tree? Serge Hallyn has already ACKed it. > > > > Sure! > > > Thank you James. I'd like to see what Eric Biederman thinks of this. Also, why do we need the concept of a controlled user-ns at all, if the default whitelist maintains existing behavior? -- James Morris <james.l.morris@oracle.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2018-01-08 0:35 ` James Morris @ 2018-01-08 6:24 ` Serge E. Hallyn 2018-01-08 9:51 ` James Morris 0 siblings, 1 reply; 19+ messages in thread From: Serge E. Hallyn @ 2018-01-08 6:24 UTC (permalink / raw) To: James Morris Cc: Mahesh Bandewar (महेश बंडेवार), LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar On Mon, Jan 08, 2018 at 11:35:26AM +1100, James Morris wrote: > On Tue, 2 Jan 2018, Mahesh Bandewar (महेश बंडेवार) wrote: > > > On Sat, Dec 30, 2017 at 12:31 AM, James Morris > > <james.l.morris@oracle.com> wrote: > > > On Wed, 27 Dec 2017, Mahesh Bandewar (महेश बंडेवार) wrote: > > > > > >> Hello James, > > >> > > >> Seems like I missed your name to be added into the review of this > > >> patch series. Would you be willing be pull this into the security > > >> tree? Serge Hallyn has already ACKed it. > > > > > > Sure! > > > > > Thank you James. > > I'd like to see what Eric Biederman thinks of this. > > Also, why do we need the concept of a controlled user-ns at all, if the > default whitelist maintains existing behavior? In past discussions two uses have been brought up: 1. if an 0-day is discovered which is exacerbated by a specific privilege in user namespaces, that privilege could be turned off until a reboot with a fixed kernel is scheduled, without fully disabling all containers. 2. some systems may be specifically designed to run software which only requires a few capabilities in a userns. In that case all others could be disabled. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2018-01-08 6:24 ` Serge E. Hallyn @ 2018-01-08 9:51 ` James Morris 2018-01-08 15:47 ` Serge E. Hallyn 0 siblings, 1 reply; 19+ messages in thread From: James Morris @ 2018-01-08 9:51 UTC (permalink / raw) To: Serge E. Hallyn Cc: Mahesh Bandewar (महेश बंडेवार), LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar On Mon, 8 Jan 2018, Serge E. Hallyn wrote: > > Also, why do we need the concept of a controlled user-ns at all, if the > > default whitelist maintains existing behavior? > > In past discussions two uses have been brought up: > > 1. if an 0-day is discovered which is exacerbated by a specific > privilege in user namespaces, that privilege could be turned off until a > reboot with a fixed kernel is scheduled, without fully disabling all > containers. > > 2. some systems may be specifically designed to run software which > only requires a few capabilities in a userns. In that case all others > could be disabled. > I meant in terms of "marking" a user ns as "controlled" type -- it's unnecessary jargon from an end user point of view. This may happen internally but don't make it a special case with a different name and don't bother users with internal concepts: simply implement capability whitelists with the default having equivalent behavior of everything allowed. Then, document the semantics of the whitelist in terms of inheritance etc., as a feature of user namespaces, not as a "type" of user namespace. - James -- James Morris <james.l.morris@oracle.com> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2018-01-08 9:51 ` James Morris @ 2018-01-08 15:47 ` Serge E. Hallyn 2018-01-08 17:21 ` Mahesh Bandewar (महेश बंडेवार) 0 siblings, 1 reply; 19+ messages in thread From: Serge E. Hallyn @ 2018-01-08 15:47 UTC (permalink / raw) To: James Morris Cc: Serge E. Hallyn, Mahesh Bandewar (महेश बंडेवार), LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar Quoting James Morris (james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org): > On Mon, 8 Jan 2018, Serge E. Hallyn wrote: > > > > Also, why do we need the concept of a controlled user-ns at all, if the > > > default whitelist maintains existing behavior? > > > > In past discussions two uses have been brought up: > > > > 1. if an 0-day is discovered which is exacerbated by a specific > > privilege in user namespaces, that privilege could be turned off until a > > reboot with a fixed kernel is scheduled, without fully disabling all > > containers. > > > > 2. some systems may be specifically designed to run software which > > only requires a few capabilities in a userns. In that case all others > > could be disabled. > > > > I meant in terms of "marking" a user ns as "controlled" type -- it's > unnecessary jargon from an end user point of view. Ah, yes, that was my point in http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html and http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html > This may happen internally but don't make it a special case with a > different name and don't bother users with internal concepts: simply > implement capability whitelists with the default having equivalent > behavior of everything allowed. Then, document the semantics of the > whitelist in terms of inheritance etc., as a feature of user namespaces, > not as a "type" of user namespace. The problem with making them inheritable is that an adversarial user can just create a user namespace at boot that sits and waits for an 0day to be published, then log in and attach to that namespace later, since it has already inherited the open whitelist. It feels like there must be some other approach that doesn't feel as... band-aid-y as this does, but I'm not sure what. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2018-01-08 15:47 ` Serge E. Hallyn @ 2018-01-08 17:21 ` Mahesh Bandewar (महेश बंडेवार) [not found] ` <CAF2d9jgVJpuAH+jgK0v7sQ9Pr75xy=GSnqKDdpeE7d97O0EbcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-08 17:21 UTC (permalink / raw) To: Serge E. Hallyn Cc: James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar On Mon, Jan 8, 2018 at 7:47 AM, Serge E. Hallyn <serge@hallyn.com> wrote: > Quoting James Morris (james.l.morris@oracle.com): >> On Mon, 8 Jan 2018, Serge E. Hallyn wrote: >> >> > > Also, why do we need the concept of a controlled user-ns at all, if the >> > > default whitelist maintains existing behavior? >> > >> > In past discussions two uses have been brought up: >> > >> > 1. if an 0-day is discovered which is exacerbated by a specific >> > privilege in user namespaces, that privilege could be turned off until a >> > reboot with a fixed kernel is scheduled, without fully disabling all >> > containers. >> > >> > 2. some systems may be specifically designed to run software which >> > only requires a few capabilities in a userns. In that case all others >> > could be disabled. >> > >> >> I meant in terms of "marking" a user ns as "controlled" type -- it's >> unnecessary jargon from an end user point of view. > > Ah, yes, that was my point in > > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html > and > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html > >> This may happen internally but don't make it a special case with a >> different name and don't bother users with internal concepts: simply >> implement capability whitelists with the default having equivalent >> behavior of everything allowed. Then, document the semantics of the >> whitelist in terms of inheritance etc., as a feature of user namespaces, >> not as a "type" of user namespace. > > The problem with making them inheritable is that an adversarial user > can just create a user namespace at boot that sits and waits for an > 0day to be published, then log in and attach to that namespace later, > since it has already inherited the open whitelist. > > It feels like there must be some other approach that doesn't feel as... > band-aid-y as this does, but I'm not sure what. We had a long discussion thread about this approach in the past and many of these points have been discussed there. Serge is to the point in terms of both the points (0-day issue as well as sandbox environment). At this moment we are exposed to those threats and apart from this patch-set I'm not aware of anything that handles it. Of course there are other alternatives that block user-ns creation altogether but blocking user-ns is not a real solution that works in every use-case. I'm open other ideas (if any) that do not block creation of user-ns, but lack of those will keep the 0-day issue lingering especially for environments where 'user-ns creation' is used heavily. 'Controlled-user-ns' jargon is within the kernel-space and is not exposed to the users (we don't have any API to do that), but I used those terms to explain within the kernel-community what this patch-set is trying to do. The user-application does not need nor need to know any of these changes as such. However, this additional knob gives admin an ability to control their behavior in those two circumstances. The default behavior that chose in this patch-set is so that it doesn't cause regression to anyone whatever is their use case is but now admin can set whatever default behavior they wish in the boot-scripts to suite their needs. Thanks, --mahesh.. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CAF2d9jgVJpuAH+jgK0v7sQ9Pr75xy=GSnqKDdpeE7d97O0EbcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCHv3 0/2] capability controlled user-namespaces [not found] ` <CAF2d9jgVJpuAH+jgK0v7sQ9Pr75xy=GSnqKDdpeE7d97O0EbcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2018-01-08 18:11 ` Serge E. Hallyn 2018-01-08 18:24 ` Mahesh Bandewar (महेश बंडेवार) 0 siblings, 1 reply; 19+ messages in thread From: Serge E. Hallyn @ 2018-01-08 18:11 UTC (permalink / raw) To: Mahesh Bandewar (महेश बंडेवार) Cc: Serge E. Hallyn, James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org): > On Mon, Jan 8, 2018 at 7:47 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: > > Quoting James Morris (james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org): > >> On Mon, 8 Jan 2018, Serge E. Hallyn wrote: > >> I meant in terms of "marking" a user ns as "controlled" type -- it's > >> unnecessary jargon from an end user point of view. > > > > Ah, yes, that was my point in > > > > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html > > and > > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html > > > >> This may happen internally but don't make it a special case with a > >> different name and don't bother users with internal concepts: simply > >> implement capability whitelists with the default having equivalent So the challenge is to have unprivileged users be contained, while allowing trusted workloads in containers created by a root user to bypass the restriction. Now, the current proposal actually doesn't support a root user starting an application that it doesn't quite trust in such a way that it *is* subject to the whitelist. Which is unfortunate. But apart from using ptags or a cgroup, I can't think of a good way to get us everything we want: 1. unprivileged users always restricted 2. existing unprivileged containers become restricted when whitelist is enabled 3. privileged users are able to create containers which are not restricted 4. privileged users are able to create containers which *are* restricted ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2018-01-08 18:11 ` Serge E. Hallyn @ 2018-01-08 18:24 ` Mahesh Bandewar (महेश बंडेवार) 2018-01-08 18:36 ` Serge E. Hallyn 0 siblings, 1 reply; 19+ messages in thread From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-08 18:24 UTC (permalink / raw) To: Serge E. Hallyn Cc: James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar On Mon, Jan 8, 2018 at 10:11 AM, Serge E. Hallyn <serge@hallyn.com> wrote: > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): >> On Mon, Jan 8, 2018 at 7:47 AM, Serge E. Hallyn <serge@hallyn.com> wrote: >> > Quoting James Morris (james.l.morris@oracle.com): >> >> On Mon, 8 Jan 2018, Serge E. Hallyn wrote: >> >> I meant in terms of "marking" a user ns as "controlled" type -- it's >> >> unnecessary jargon from an end user point of view. >> > >> > Ah, yes, that was my point in >> > >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html >> > and >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html >> > >> >> This may happen internally but don't make it a special case with a >> >> different name and don't bother users with internal concepts: simply >> >> implement capability whitelists with the default having equivalent > > So the challenge is to have unprivileged users be contained, while > allowing trusted workloads in containers created by a root user to > bypass the restriction. > > Now, the current proposal actually doesn't support a root user starting > an application that it doesn't quite trust in such a way that it *is* > subject to the whitelist. Well, this is not hard since root process can spawn another process and loose privileges before creating user-ns to be controlled by the whitelist. You need an ability to preserve the creation of user-namespaces that exhibit 'the uncontrolled behavior' and only trusted/privileged (root) user should have it which is maintained here. > Which is unfortunate. But apart from using > ptags or a cgroup, I can't think of a good way to get us everything we > want: > > 1. unprivileged users always restricted > 2. existing unprivileged containers become restricted when whitelist > is enabled > 3. privileged users are able to create containers which are not restricted all this is achieved by the patch-set without any changes to the application with the above knob. > 4. privileged users are able to create containers which *are* restricted > With this patch-set; the root user process can fork another process with less privileges before creating a user-ns if the exec-ed process cannot be trusted. So there is a way with little modification as opposed to nothing available at this moment for this scenario. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2018-01-08 18:24 ` Mahesh Bandewar (महेश बंडेवार) @ 2018-01-08 18:36 ` Serge E. Hallyn 2018-01-08 18:55 ` Mahesh Bandewar (महेश बंडेवार) 0 siblings, 1 reply; 19+ messages in thread From: Serge E. Hallyn @ 2018-01-08 18:36 UTC (permalink / raw) To: Mahesh Bandewar (महेश बंडेवार) Cc: Serge E. Hallyn, James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): > On Mon, Jan 8, 2018 at 10:11 AM, Serge E. Hallyn <serge@hallyn.com> wrote: > > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): > >> On Mon, Jan 8, 2018 at 7:47 AM, Serge E. Hallyn <serge@hallyn.com> wrote: > >> > Quoting James Morris (james.l.morris@oracle.com): > >> >> On Mon, 8 Jan 2018, Serge E. Hallyn wrote: > >> >> I meant in terms of "marking" a user ns as "controlled" type -- it's > >> >> unnecessary jargon from an end user point of view. > >> > > >> > Ah, yes, that was my point in > >> > > >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html > >> > and > >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html > >> > > >> >> This may happen internally but don't make it a special case with a > >> >> different name and don't bother users with internal concepts: simply > >> >> implement capability whitelists with the default having equivalent > > > > So the challenge is to have unprivileged users be contained, while > > allowing trusted workloads in containers created by a root user to > > bypass the restriction. > > > > Now, the current proposal actually doesn't support a root user starting > > an application that it doesn't quite trust in such a way that it *is* > > subject to the whitelist. > > Well, this is not hard since root process can spawn another process > and loose privileges before creating user-ns to be controlled by the > whitelist. It would have to drop cap_sys_admin for the container to be marked as "controlled", which may prevent the container runtime from properly starting the container. > You need an ability to preserve the creation of user-namespaces that > exhibit 'the uncontrolled behavior' and only trusted/privileged (root) > user should have it which is maintained here. > > > Which is unfortunate. But apart from using > > ptags or a cgroup, I can't think of a good way to get us everything we > > want: > > > > 1. unprivileged users always restricted > > 2. existing unprivileged containers become restricted when whitelist > > is enabled > > 3. privileged users are able to create containers which are not restricted > > all this is achieved by the patch-set without any changes to the > application with the above knob. > > > 4. privileged users are able to create containers which *are* restricted > > > With this patch-set; the root user process can fork another process > with less privileges before creating a user-ns if the exec-ed process > cannot be trusted. So there is a way with little modification as > opposed to nothing available at this moment for this scenario. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2018-01-08 18:36 ` Serge E. Hallyn @ 2018-01-08 18:55 ` Mahesh Bandewar (महेश बंडेवार) 2018-01-09 22:28 ` Serge E. Hallyn 0 siblings, 1 reply; 19+ messages in thread From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-08 18:55 UTC (permalink / raw) To: Serge E. Hallyn Cc: James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar On Mon, Jan 8, 2018 at 10:36 AM, Serge E. Hallyn <serge@hallyn.com> wrote: > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): >> On Mon, Jan 8, 2018 at 10:11 AM, Serge E. Hallyn <serge@hallyn.com> wrote: >> > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): >> >> On Mon, Jan 8, 2018 at 7:47 AM, Serge E. Hallyn <serge@hallyn.com> wrote: >> >> > Quoting James Morris (james.l.morris@oracle.com): >> >> >> On Mon, 8 Jan 2018, Serge E. Hallyn wrote: >> >> >> I meant in terms of "marking" a user ns as "controlled" type -- it's >> >> >> unnecessary jargon from an end user point of view. >> >> > >> >> > Ah, yes, that was my point in >> >> > >> >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html >> >> > and >> >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html >> >> > >> >> >> This may happen internally but don't make it a special case with a >> >> >> different name and don't bother users with internal concepts: simply >> >> >> implement capability whitelists with the default having equivalent >> > >> > So the challenge is to have unprivileged users be contained, while >> > allowing trusted workloads in containers created by a root user to >> > bypass the restriction. >> > >> > Now, the current proposal actually doesn't support a root user starting >> > an application that it doesn't quite trust in such a way that it *is* >> > subject to the whitelist. >> >> Well, this is not hard since root process can spawn another process >> and loose privileges before creating user-ns to be controlled by the >> whitelist. > > It would have to drop cap_sys_admin for the container to be marked as > "controlled", which may prevent the container runtime from properly starting > the container. > Yes, but that's a conflict of trusted operations (that requires SYS_ADMIN) and untrusted processes it may spawn. >> You need an ability to preserve the creation of user-namespaces that >> exhibit 'the uncontrolled behavior' and only trusted/privileged (root) >> user should have it which is maintained here. >> >> > Which is unfortunate. But apart from using >> > ptags or a cgroup, I can't think of a good way to get us everything we >> > want: >> > >> > 1. unprivileged users always restricted >> > 2. existing unprivileged containers become restricted when whitelist >> > is enabled >> > 3. privileged users are able to create containers which are not restricted >> >> all this is achieved by the patch-set without any changes to the >> application with the above knob. >> >> > 4. privileged users are able to create containers which *are* restricted >> > >> With this patch-set; the root user process can fork another process >> with less privileges before creating a user-ns if the exec-ed process >> cannot be trusted. So there is a way with little modification as >> opposed to nothing available at this moment for this scenario. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCHv3 0/2] capability controlled user-namespaces 2018-01-08 18:55 ` Mahesh Bandewar (महेश बंडेवार) @ 2018-01-09 22:28 ` Serge E. Hallyn [not found] ` <20180109222859.GA25956-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Serge E. Hallyn @ 2018-01-09 22:28 UTC (permalink / raw) To: Mahesh Bandewar (महेश बंडेवार) Cc: Serge E. Hallyn, James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): > On Mon, Jan 8, 2018 at 10:36 AM, Serge E. Hallyn <serge@hallyn.com> wrote: > > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): > >> On Mon, Jan 8, 2018 at 10:11 AM, Serge E. Hallyn <serge@hallyn.com> wrote: > >> > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): > >> >> On Mon, Jan 8, 2018 at 7:47 AM, Serge E. Hallyn <serge@hallyn.com> wrote: > >> >> > Quoting James Morris (james.l.morris@oracle.com): > >> >> >> On Mon, 8 Jan 2018, Serge E. Hallyn wrote: > >> >> >> I meant in terms of "marking" a user ns as "controlled" type -- it's > >> >> >> unnecessary jargon from an end user point of view. > >> >> > > >> >> > Ah, yes, that was my point in > >> >> > > >> >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html > >> >> > and > >> >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html > >> >> > > >> >> >> This may happen internally but don't make it a special case with a > >> >> >> different name and don't bother users with internal concepts: simply > >> >> >> implement capability whitelists with the default having equivalent > >> > > >> > So the challenge is to have unprivileged users be contained, while > >> > allowing trusted workloads in containers created by a root user to > >> > bypass the restriction. > >> > > >> > Now, the current proposal actually doesn't support a root user starting > >> > an application that it doesn't quite trust in such a way that it *is* > >> > subject to the whitelist. > >> > >> Well, this is not hard since root process can spawn another process > >> and loose privileges before creating user-ns to be controlled by the > >> whitelist. > > > > It would have to drop cap_sys_admin for the container to be marked as > > "controlled", which may prevent the container runtime from properly starting > > the container. > > > Yes, but that's a conflict of trusted operations (that requires > SYS_ADMIN) and untrusted processes it may spawn. Not sure I understand what you're saying, but I guess that in any case the task which is doing unshare(CLONE_NEWNS) can drop cap_sys_admin first. Though that is harder if using clone, and it is awkward because it's not the container manager, but the user, who will judge whether the container workload should be restricted. So the container driver will add a flag like "run-controlled", and the driver will convert that to dropping a capability; which again is weird. It would seem nicer to introduce a userns flag, 'caps-controlled' For an unprivileged userns, it is always set to 1, and root cannot change it. For a root-created userns, it stays 0, but root can set it to 1 (using /proc file?). In this way a either container runtime or just an admin script can say "no wait I want this container to still be controlled". Or we could instead add a second sysctl to decide whether all or only 'controlled' user namespaces should be controlled. That's not pretty though. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20180109222859.GA25956-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>]
* Re: [PATCHv3 0/2] capability controlled user-namespaces [not found] ` <20180109222859.GA25956-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org> @ 2018-01-10 2:08 ` Mahesh Bandewar (महेश बंडेवार) 0 siblings, 0 replies; 19+ messages in thread From: Mahesh Bandewar (महेश बंडेवार) @ 2018-01-10 2:08 UTC (permalink / raw) To: Serge E. Hallyn Cc: James Morris, LKML, Netdev, Kernel-hardening, Linux API, Kees Cook, Eric W . Biederman, Eric Dumazet, David Miller, Mahesh Bandewar On Tue, Jan 9, 2018 at 2:28 PM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org): >> On Mon, Jan 8, 2018 at 10:36 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: >> > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): >> >> On Mon, Jan 8, 2018 at 10:11 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: >> >> > Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com): >> >> >> On Mon, Jan 8, 2018 at 7:47 AM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote: >> >> >> > Quoting James Morris (james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org): >> >> >> >> On Mon, 8 Jan 2018, Serge E. Hallyn wrote: >> >> >> >> I meant in terms of "marking" a user ns as "controlled" type -- it's >> >> >> >> unnecessary jargon from an end user point of view. >> >> >> > >> >> >> > Ah, yes, that was my point in >> >> >> > >> >> >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html >> >> >> > and >> >> >> > http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html >> >> >> > >> >> >> >> This may happen internally but don't make it a special case with a >> >> >> >> different name and don't bother users with internal concepts: simply >> >> >> >> implement capability whitelists with the default having equivalent >> >> > >> >> > So the challenge is to have unprivileged users be contained, while >> >> > allowing trusted workloads in containers created by a root user to >> >> > bypass the restriction. >> >> > >> >> > Now, the current proposal actually doesn't support a root user starting >> >> > an application that it doesn't quite trust in such a way that it *is* >> >> > subject to the whitelist. >> >> >> >> Well, this is not hard since root process can spawn another process >> >> and loose privileges before creating user-ns to be controlled by the >> >> whitelist. >> > >> > It would have to drop cap_sys_admin for the container to be marked as >> > "controlled", which may prevent the container runtime from properly starting >> > the container. >> > >> Yes, but that's a conflict of trusted operations (that requires >> SYS_ADMIN) and untrusted processes it may spawn. > > Not sure I understand what you're saying, but > > I guess that in any case the task which is doing unshare(CLONE_NEWNS) > can drop cap_sys_admin first. Though that is harder if using clone, > and it is awkward because it's not the container manager, but the user, > who will judge whether the container workload should be restricted. > So the container driver will add a flag like "run-controlled", and > the driver will convert that to dropping a capability; which again > is weird. It would seem nicer to introduce a userns flag, 'caps-controlled' > For an unprivileged userns, it is always set to 1, and root cannot > change it. For a root-created userns, it stays 0, but root can set it > to 1 (using /proc file?). In this way a either container runtime or just an > admin script can say "no wait I want this container to still be controlled". > > Or we could instead add a second sysctl to decide whether all or only > 'controlled' user namespaces should be controlled. That's not pretty though. > Yes, I like your idea of a flag to clone() which will force the user-ns to be controlled. This will have effect only on the root user and any other user specifying is actually a NOP since those will be controlled with or without that flag. But this is still an enhancement to the current patch-set and I don't mind doing it as a follow-up after this patch-series. At this moment James has asked for Eric's input, which I believe hasn't been recorded. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2018-01-10 2:08 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-05 22:30 [PATCHv3 0/2] capability controlled user-namespaces Mahesh Bandewar
[not found] ` <20171205223052.12687-1-mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org>
2017-12-27 17:09 ` Mahesh Bandewar (महेश बंडेवार)
2017-12-27 20:23 ` Michael Kerrisk (man-pages)
2017-12-28 0:45 ` Mahesh Bandewar (महेश बंडेवार)
[not found] ` <CAF2d9jjCJxu+oiCCSa1zN8OxfdiCMQb4dx7Mc0YdNgJuMNkOzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-30 8:50 ` Michael Kerrisk (man-pages)
2018-01-03 1:35 ` Mahesh Bandewar (महेश बंडेवार)
2017-12-30 8:31 ` James Morris
2018-01-03 1:30 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-08 0:35 ` James Morris
2018-01-08 6:24 ` Serge E. Hallyn
2018-01-08 9:51 ` James Morris
2018-01-08 15:47 ` Serge E. Hallyn
2018-01-08 17:21 ` Mahesh Bandewar (महेश बंडेवार)
[not found] ` <CAF2d9jgVJpuAH+jgK0v7sQ9Pr75xy=GSnqKDdpeE7d97O0EbcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-08 18:11 ` Serge E. Hallyn
2018-01-08 18:24 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-08 18:36 ` Serge E. Hallyn
2018-01-08 18:55 ` Mahesh Bandewar (महेश बंडेवार)
2018-01-09 22:28 ` Serge E. Hallyn
[not found] ` <20180109222859.GA25956-7LNsyQBKDXoIagZqoN9o3w@public.gmane.org>
2018-01-10 2:08 ` Mahesh Bandewar (महेश बंडेवार)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).