From: "Serge E. Hallyn" <serge@hallyn.com>
To: Boris Lukashev <blukashev@sempervictus.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
"Daniel Micay" <danielmicay@gmail.com>,
"Mahesh Bandewar (महेश बंडेवार)" <maheshb@google.com>,
"Mahesh Bandewar" <mahesh@bandewar.net>,
LKML <linux-kernel@vger.kernel.org>,
Netdev <netdev@vger.kernel.org>,
Kernel-hardening <kernel-hardening@lists.openwall.com>,
"Linux API" <linux-api@vger.kernel.org>,
"Kees Cook" <keescook@chromium.org>,
"Eric W . Biederman" <ebiederm@xmission.com>,
"Eric Dumazet" <edumazet@google.com>,
"David Miller" <davem@davemloft.net>
Subject: Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Date: Mon, 6 Nov 2017 21:28:02 -0600 [thread overview]
Message-ID: <20171107032802.GA6669@mail.hallyn.com> (raw)
In-Reply-To: <CAFUG7CcW077LHcQEqk7qy7iVvmi-3J8psD1Kwj45XvHThiZC6w@mail.gmail.com>
On Mon, Nov 06, 2017 at 07:01:58PM -0500, Boris Lukashev wrote:
> On Mon, Nov 6, 2017 at 6:39 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> > Quoting Boris Lukashev (blukashev@sempervictus.com):
> >> On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn <serge@hallyn.com> wrote:
> >> > Quoting Daniel Micay (danielmicay@gmail.com):
> >> >> Substantial added attack surface will never go away as a problem. There
> >> >> aren't a finite number of vulnerabilities to be found.
> >> >
> >> > There's varying levels of usefulness and quality. There is code which I
> >> > want to be able to use in a container, and code which I can't ever see a
> >> > reason for using there. The latter, especially if it's also in a
> >> > staging driver, would be nice to have a toggle to disable.
> >> >
> >> > You're not advocating dropping the added attack surface, only adding a
> >> > way of dealing with an 0day after the fact. Privilege raising 0days can
> >> > exist anywhere, not just in code which only root in a user namespace can
> >> > exercise. So from that point of view, ksplice seems a more complete
> >> > solution. Why not just actually fix the bad code block when we know
> >> > about it?
> >> >
> >> > Finally, it has been well argued that you can gain many new caps from
> >> > having only a few others. Given that, how could you ever be sure that,
> >> > if an 0day is found which allows root in a user ns to abuse
> >> > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
> >> > would suffice? It seems to me that the existing control in
> >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
> >> > in that case.
> >> >
> >> > -serge
> >>
> >> This seems to be heading toward "we need full zones in Linux" with
> >> their own procfs and sysfs namespace and a stricter isolation model
> >> for resources and capabilities. So long as things can happen in a
> >> namespace which have a privileged relationship with host resources,
> >> this is going to be cat-and-mouse to one degree or another.
> >>
> >> Containers and namespaces dont have a one-to-one relationship, so i'm
> >> not sure that's the best term to use in the kernel security context
> >
> > Sorry - what's not the best term to use?
>
> Pardon, "containers," since they're namespaces+system construct.
>
> >
> >> since there's a bunch of userspace and implementation delta across the
> >> different systems (with their own security models and so forth).
> >> Without accounting for what a specific implementation may or may not
> >> do, and only looking at "how do we reduce privileged impact on parent
> >> context from unprivileged namespaces," this patch does seem to provide
> >> a logical way of reducing the privileges available in such a namespace
> >> and often needed to mount escapes/impact parent context.
> >
> > What different implementations do is irrelevant - as an unprivileged user
> > I can always, with no help, create a new user namespace mapping my current
> > uid to root, and exercise this code. So the security model implemented
> > by a particular userspace namespace-using driver doesn't matter, as it
> > only restricts me if I choose to use it.
> >
> > But, I guess you're actually saying that some program might know that it
> > should never use network code so want to drop CAP_NET_*? And you're
> > saying that a "global capability bounding set" might be useful?
> >
>
> The "global capability bounding set" with forced inheritance can be
> used to prevent the vector you describe wherein the capability of UID
> 0 in the child NS is restricted from the parent implicitly, so yes,
> that nomenclature seems appropriate.
>
> > Would it be better to actually implement it as a new bounding set that
> > is maintained across user namespace creations, but is per-task (inherted
> > by children of course)? Instead of a sysctl?
> >
> > -serge
>
> In line with the previous comment, the inheritance across subsequent
> invocations should be forced to prevent the context you described.
> Please pardon my ignorance, not sure what you mean in terms of
> "per-task" across namespace creation.
I meant each task has a perm_cap_bset next to the cap_bset. So task
p1 (if it has privilege) can drop CAP_SYS_ADMIN from perm_cap_bset,
p2 (if it has privilege) can drop CAP_NET_ADMIN. When p1 creates a
new user_ns, that init task has its cap_bset set to all caps but
CAP_SYS_ADMIN.
I think for simplicity perm_cap_bset would *only* affect the filling
of cap_bset at user namespace creation. So if you wanted to drop a
capability from your own cap_bset as well, you'd have to do that
separately.
WARNING: multiple messages have this Message-ID (diff)
From: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>
To: Boris Lukashev <blukashev-JNja4Z15B3SvB/ACxS1yDA@public.gmane.org>
Cc: "Serge E. Hallyn" <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>,
"Daniel Micay"
<danielmicay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
"Mahesh Bandewar (महेश बंडेवार)"
<maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
"Mahesh Bandewar"
<mahesh-bmGAjcP2qsnk1uMJSBkQmQ@public.gmane.org>,
LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Netdev <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Kernel-hardening
<kernel-hardening-ZwoEplunGu1jrUoiu81ncdBPR1lH4CV8@public.gmane.org>,
"Linux API" <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"Kees Cook" <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>,
"Eric W . Biederman"
<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>,
"Eric Dumazet" <edumazet-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
"David Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Subject: Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
Date: Mon, 6 Nov 2017 21:28:02 -0600 [thread overview]
Message-ID: <20171107032802.GA6669@mail.hallyn.com> (raw)
In-Reply-To: <CAFUG7CcW077LHcQEqk7qy7iVvmi-3J8psD1Kwj45XvHThiZC6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Mon, Nov 06, 2017 at 07:01:58PM -0500, Boris Lukashev wrote:
> On Mon, Nov 6, 2017 at 6:39 PM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> > Quoting Boris Lukashev (blukashev-JNja4Z15B3SvB/ACxS1yDA@public.gmane.org):
> >> On Mon, Nov 6, 2017 at 5:14 PM, Serge E. Hallyn <serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org> wrote:
> >> > Quoting Daniel Micay (danielmicay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):
> >> >> Substantial added attack surface will never go away as a problem. There
> >> >> aren't a finite number of vulnerabilities to be found.
> >> >
> >> > There's varying levels of usefulness and quality. There is code which I
> >> > want to be able to use in a container, and code which I can't ever see a
> >> > reason for using there. The latter, especially if it's also in a
> >> > staging driver, would be nice to have a toggle to disable.
> >> >
> >> > You're not advocating dropping the added attack surface, only adding a
> >> > way of dealing with an 0day after the fact. Privilege raising 0days can
> >> > exist anywhere, not just in code which only root in a user namespace can
> >> > exercise. So from that point of view, ksplice seems a more complete
> >> > solution. Why not just actually fix the bad code block when we know
> >> > about it?
> >> >
> >> > Finally, it has been well argued that you can gain many new caps from
> >> > having only a few others. Given that, how could you ever be sure that,
> >> > if an 0day is found which allows root in a user ns to abuse
> >> > CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them
> >> > would suffice? It seems to me that the existing control in
> >> > /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape
> >> > in that case.
> >> >
> >> > -serge
> >>
> >> This seems to be heading toward "we need full zones in Linux" with
> >> their own procfs and sysfs namespace and a stricter isolation model
> >> for resources and capabilities. So long as things can happen in a
> >> namespace which have a privileged relationship with host resources,
> >> this is going to be cat-and-mouse to one degree or another.
> >>
> >> Containers and namespaces dont have a one-to-one relationship, so i'm
> >> not sure that's the best term to use in the kernel security context
> >
> > Sorry - what's not the best term to use?
>
> Pardon, "containers," since they're namespaces+system construct.
>
> >
> >> since there's a bunch of userspace and implementation delta across the
> >> different systems (with their own security models and so forth).
> >> Without accounting for what a specific implementation may or may not
> >> do, and only looking at "how do we reduce privileged impact on parent
> >> context from unprivileged namespaces," this patch does seem to provide
> >> a logical way of reducing the privileges available in such a namespace
> >> and often needed to mount escapes/impact parent context.
> >
> > What different implementations do is irrelevant - as an unprivileged user
> > I can always, with no help, create a new user namespace mapping my current
> > uid to root, and exercise this code. So the security model implemented
> > by a particular userspace namespace-using driver doesn't matter, as it
> > only restricts me if I choose to use it.
> >
> > But, I guess you're actually saying that some program might know that it
> > should never use network code so want to drop CAP_NET_*? And you're
> > saying that a "global capability bounding set" might be useful?
> >
>
> The "global capability bounding set" with forced inheritance can be
> used to prevent the vector you describe wherein the capability of UID
> 0 in the child NS is restricted from the parent implicitly, so yes,
> that nomenclature seems appropriate.
>
> > Would it be better to actually implement it as a new bounding set that
> > is maintained across user namespace creations, but is per-task (inherted
> > by children of course)? Instead of a sysctl?
> >
> > -serge
>
> In line with the previous comment, the inheritance across subsequent
> invocations should be forced to prevent the context you described.
> Please pardon my ignorance, not sure what you mean in terms of
> "per-task" across namespace creation.
I meant each task has a perm_cap_bset next to the cap_bset. So task
p1 (if it has privilege) can drop CAP_SYS_ADMIN from perm_cap_bset,
p2 (if it has privilege) can drop CAP_NET_ADMIN. When p1 creates a
new user_ns, that init task has its cap_bset set to all caps but
CAP_SYS_ADMIN.
I think for simplicity perm_cap_bset would *only* affect the filling
of cap_bset at user namespace creation. So if you wanted to drop a
capability from your own cap_bset as well, you'd have to do that
separately.
next prev parent reply other threads:[~2017-11-07 3:28 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-03 0:44 [kernel-hardening] [PATCH resend 2/2] userns: control capabilities of some user namespaces Mahesh Bandewar
2017-11-03 0:44 ` Mahesh Bandewar
2017-11-04 23:53 ` [kernel-hardening] " Serge E. Hallyn
2017-11-04 23:53 ` Serge E. Hallyn
2017-11-04 23:53 ` Serge E. Hallyn
2017-11-06 7:23 ` [kernel-hardening] " Mahesh Bandewar (महेश बंडेवार)
2017-11-06 7:23 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-06 7:23 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-06 15:03 ` [kernel-hardening] " Serge E. Hallyn
2017-11-06 15:03 ` Serge E. Hallyn
2017-11-06 21:33 ` [kernel-hardening] " Daniel Micay
2017-11-06 21:33 ` Daniel Micay
2017-11-06 22:14 ` Serge E. Hallyn
2017-11-06 22:14 ` Serge E. Hallyn
2017-11-06 22:42 ` Christian Brauner
2017-11-06 22:42 ` Christian Brauner
2017-11-06 23:17 ` Boris Lukashev
2017-11-06 23:39 ` Serge E. Hallyn
2017-11-07 0:01 ` Boris Lukashev
2017-11-07 0:01 ` Boris Lukashev
2017-11-07 3:28 ` Serge E. Hallyn [this message]
2017-11-07 3:28 ` [kernel-hardening] " Serge E. Hallyn
2017-11-08 11:09 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-08 11:09 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-08 19:02 ` Christian Brauner
2017-11-09 0:55 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-09 0:55 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-09 3:21 ` Serge E. Hallyn
2017-11-09 3:21 ` Serge E. Hallyn
2017-11-09 7:13 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-09 7:13 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-09 7:18 ` [kernel-hardening] " Mahesh Bandewar (महेश बंडेवार)
2017-11-09 7:18 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-09 16:14 ` [kernel-hardening] " Serge E. Hallyn
2017-11-09 16:14 ` Serge E. Hallyn
2017-11-09 21:58 ` [kernel-hardening] " Eric W. Biederman
2017-11-09 21:58 ` Eric W. Biederman
2017-11-10 4:30 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-10 4:30 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-10 4:46 ` Serge E. Hallyn
2017-11-10 4:46 ` Serge E. Hallyn
2017-11-10 5:28 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-10 5:28 ` Mahesh Bandewar (महेश बंडेवार)
2017-11-07 2:16 ` Daniel Micay
2017-11-07 2:16 ` Daniel Micay
2017-11-07 3:23 ` Serge E. Hallyn
2017-11-07 3:23 ` Serge E. Hallyn
2017-11-09 18:01 ` chris hyser
2017-11-09 18:05 ` Serge E. Hallyn
2017-11-09 18:05 ` Serge E. Hallyn
2017-11-09 18:27 ` chris hyser
2017-11-09 17:25 ` Serge E. Hallyn
2017-11-09 17:25 ` Serge E. Hallyn
2017-11-10 1:49 ` [kernel-hardening] " Mahesh Bandewar (महेश बंडेवार)
2017-11-10 1:49 ` Mahesh Bandewar (महेश बंडेवार)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171107032802.GA6669@mail.hallyn.com \
--to=serge@hallyn.com \
--cc=blukashev@sempervictus.com \
--cc=danielmicay@gmail.com \
--cc=davem@davemloft.net \
--cc=ebiederm@xmission.com \
--cc=edumazet@google.com \
--cc=keescook@chromium.org \
--cc=kernel-hardening@lists.openwall.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mahesh@bandewar.net \
--cc=maheshb@google.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.