[kernel-hardening] [PATCH v2] sysctl: allow CLONE

kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed

* [kernel-hardening] [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled
@ 2016-01-28 14:38 Kees Cook
  2016-01-28 15:28 ` Serge E. Hallyn
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Kees Cook @ 2016-01-28 14:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Al Viro, Serge E. Hallyn, Andy Lutomirski,
	Austin S. Hemmelgarn, Richard Weinberger,
	Robert Święcki, Dmitry Vyukov, David Howells,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Sasha Levin,
	linux-doc, linux-kernel, kernel-hardening

There continue to be unexpected security exposures when users have access
to CLONE_NEWUSER. For admins of systems that do not use user namespaces
and are running distro kernels with CONFIG_USER_NS enabled, there is
no way to disable CLONE_NEWUSER. This provides a way for sysadmins to
disable the feature to reduce their attack surface without needing to
rebuild their kernels.

This is inspired by a similar restriction in Grsecurity, but adds
a sysctl.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
This is the simplified version of the sysctl.
---
 Documentation/sysctl/kernel.txt | 14 ++++++++++++++
 kernel/sysctl.c                 | 14 ++++++++++++++
 kernel/user_namespace.c         |  6 ++++++
 3 files changed, 34 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index a93b414672a7..dcbd3f99efb3 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -85,6 +85,7 @@ show up in /proc/sys/kernel:
 - tainted
 - threads-max
 - unknown_nmi_panic
+- userns_restrict
 - watchdog
 - watchdog_thresh
 - version
@@ -930,6 +931,19 @@ example.  If a system hangs up, try pressing the NMI switch.
 
 ==============================================================
 
+userns_restrict:
+
+This toggle indicates whether CLONE_NEWUSER is available. As CLONE_NEWUSER
+has many unexpected side-effects and security exposures, this allows the
+sysadmin to disable the feature without needing to rebuild the kernel.
+
+When userns_restrict is set to (0), the default, there are no restrictions.
+
+When userns_restrict is set to (1), CLONE_NEWUSER is only available to
+processes that have CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID.
+
+==============================================================
+
 watchdog:
 
 This parameter can be used to disable or enable the soft lockup detector
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 97715fd9e790..9f99c8d9e968 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -112,6 +112,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max;
 #ifndef CONFIG_MMU
 extern int sysctl_nr_trim_pages;
 #endif
+#ifdef CONFIG_USER_NS
+extern int sysctl_userns_restrict;
+#endif
 
 /* Constants used for minimum and  maximum */
 #ifdef CONFIG_LOCKUP_DETECTOR
@@ -817,6 +820,17 @@ static struct ctl_table kern_table[] = {
 		.extra2		= &two,
 	},
 #endif
+#ifdef CONFIG_USER_NS
+	{
+		.procname	= "userns_restrict",
+		.data		= &sysctl_userns_restrict,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+#endif
 	{
 		.procname	= "ngroups_max",
 		.data		= &ngroups_max,
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9bafc211930c..3cace8637144 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -25,6 +25,7 @@
 
 static struct kmem_cache *user_ns_cachep __read_mostly;
 static DEFINE_MUTEX(userns_state_mutex);
+int sysctl_userns_restrict __read_mostly;
 
 static bool new_idmap_permitted(const struct file *file,
 				struct user_namespace *ns, int cap_setid,
@@ -84,6 +85,11 @@ int create_user_ns(struct cred *new)
 	    !kgid_has_mapping(parent_ns, group))
 		return -EPERM;
 
+	if (sysctl_userns_restrict && !(capable(CAP_SYS_ADMIN) &&
+					capable(CAP_SETUID) &&
+					capable(CAP_SETGID)))
+		return -EPERM;
+
 	ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);
 	if (!ns)
 		return -ENOMEM;
-- 
2.6.3


-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [kernel-hardening] [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled
  2016-01-28 14:38 [kernel-hardening] [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled Kees Cook
@ 2016-01-28 15:28 ` Serge E. Hallyn
  2016-01-28 17:41 ` [kernel-hardening] " Eric W. Biederman
  2016-01-28 17:48 ` Eric W. Biederman
  2 siblings, 0 replies; 7+ messages in thread
From: Serge E. Hallyn @ 2016-01-28 15:28 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Eric W. Biederman, Andrew Morton, Al Viro, Serge E. Hallyn,
	Andy Lutomirski, Austin S. Hemmelgarn, Richard Weinberger,
	Robert Święcki, Dmitry Vyukov, David Howells,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Sasha Levin,
	linux-doc, linux-kernel

On Thu, Jan 28, 2016 at 06:38:25AM -0800, Kees Cook wrote:
> There continue to be unexpected security exposures when users have access
> to CLONE_NEWUSER. For admins of systems that do not use user namespaces
> and are running distro kernels with CONFIG_USER_NS enabled, there is
> no way to disable CLONE_NEWUSER. This provides a way for sysadmins to
> disable the feature to reduce their attack surface without needing to
> rebuild their kernels.
> 
> This is inspired by a similar restriction in Grsecurity, but adds
> a sysctl.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>

> ---
> This is the simplified version of the sysctl.
> ---
>  Documentation/sysctl/kernel.txt | 14 ++++++++++++++
>  kernel/sysctl.c                 | 14 ++++++++++++++
>  kernel/user_namespace.c         |  6 ++++++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index a93b414672a7..dcbd3f99efb3 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -85,6 +85,7 @@ show up in /proc/sys/kernel:
>  - tainted
>  - threads-max
>  - unknown_nmi_panic
> +- userns_restrict
>  - watchdog
>  - watchdog_thresh
>  - version
> @@ -930,6 +931,19 @@ example.  If a system hangs up, try pressing the NMI switch.
>  
>  ==============================================================
>  
> +userns_restrict:
> +
> +This toggle indicates whether CLONE_NEWUSER is available. As CLONE_NEWUSER
> +has many unexpected side-effects and security exposures, this allows the
> +sysadmin to disable the feature without needing to rebuild the kernel.
> +
> +When userns_restrict is set to (0), the default, there are no restrictions.
> +
> +When userns_restrict is set to (1), CLONE_NEWUSER is only available to
> +processes that have CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID.
> +
> +==============================================================
> +
>  watchdog:
>  
>  This parameter can be used to disable or enable the soft lockup detector
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 97715fd9e790..9f99c8d9e968 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -112,6 +112,9 @@ extern int sysctl_nr_open_min, sysctl_nr_open_max;
>  #ifndef CONFIG_MMU
>  extern int sysctl_nr_trim_pages;
>  #endif
> +#ifdef CONFIG_USER_NS
> +extern int sysctl_userns_restrict;
> +#endif
>  
>  /* Constants used for minimum and  maximum */
>  #ifdef CONFIG_LOCKUP_DETECTOR
> @@ -817,6 +820,17 @@ static struct ctl_table kern_table[] = {
>  		.extra2		= &two,
>  	},
>  #endif
> +#ifdef CONFIG_USER_NS
> +	{
> +		.procname	= "userns_restrict",
> +		.data		= &sysctl_userns_restrict,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= &zero,
> +		.extra2		= &one,
> +	},
> +#endif
>  	{
>  		.procname	= "ngroups_max",
>  		.data		= &ngroups_max,
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 9bafc211930c..3cace8637144 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -25,6 +25,7 @@
>  
>  static struct kmem_cache *user_ns_cachep __read_mostly;
>  static DEFINE_MUTEX(userns_state_mutex);
> +int sysctl_userns_restrict __read_mostly;
>  
>  static bool new_idmap_permitted(const struct file *file,
>  				struct user_namespace *ns, int cap_setid,
> @@ -84,6 +85,11 @@ int create_user_ns(struct cred *new)
>  	    !kgid_has_mapping(parent_ns, group))
>  		return -EPERM;
>  
> +	if (sysctl_userns_restrict && !(capable(CAP_SYS_ADMIN) &&
> +					capable(CAP_SETUID) &&
> +					capable(CAP_SETGID)))
> +		return -EPERM;
> +
>  	ns = kmem_cache_zalloc(user_ns_cachep, GFP_KERNEL);
>  	if (!ns)
>  		return -ENOMEM;
> -- 
> 2.6.3
> 
> 
> -- 
> Kees Cook
> Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [kernel-hardening] Re: [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled
  2016-01-28 14:38 [kernel-hardening] [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled Kees Cook
  2016-01-28 15:28 ` Serge E. Hallyn
@ 2016-01-28 17:41 ` Eric W. Biederman
  2016-01-28 20:08   ` Kees Cook
  2016-01-28 17:48 ` Eric W. Biederman
  2 siblings, 1 reply; 7+ messages in thread
From: Eric W. Biederman @ 2016-01-28 17:41 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Al Viro, Serge E. Hallyn, Andy Lutomirski,
	Austin S. Hemmelgarn, Richard Weinberger,
	Robert Święcki, Dmitry Vyukov, David Howells,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Sasha Levin,
	linux-doc, linux-kernel, kernel-hardening

Kees Cook <keescook@chromium.org> writes:

> There continue to be unexpected security exposures when users have access
> to CLONE_NEWUSER.

So how does this sucessfully address that issue?

> For admins of systems that do not use user namespaces
> and are running distro kernels with CONFIG_USER_NS enabled, there is
> no way to disable CLONE_NEWUSER. This provides a way for sysadmins to
> disable the feature to reduce their attack surface without needing to
> rebuild their kernels.
>
> This is inspired by a similar restriction in Grsecurity, but adds
> a sysctl.
>

I have already nacked this patch.   Thank you for removing the broken
capability in sysctl check.  But this does not address any of the other
issues I have raised.

Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Further as far as I can tell this is just about a witch hunt.  Isn't
that what you call a campaign against something when the complaining
party does not understand something persecutes it and does not bother to
try and understand?

I have already told you what kind of direction would be acceptable.  I
gave concrete suggests and here you are wasting our time with this patch
again.

Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [kernel-hardening] Re: [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled
  2016-01-28 14:38 [kernel-hardening] [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled Kees Cook
  2016-01-28 15:28 ` Serge E. Hallyn
  2016-01-28 17:41 ` [kernel-hardening] " Eric W. Biederman
@ 2016-01-28 17:48 ` Eric W. Biederman
  2016-01-28 19:11   ` Robert Święcki
  2 siblings, 1 reply; 7+ messages in thread
From: Eric W. Biederman @ 2016-01-28 17:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Al Viro, Serge E. Hallyn, Andy Lutomirski,
	Austin S. Hemmelgarn, Richard Weinberger,
	Robert Święcki, Dmitry Vyukov, David Howells,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Sasha Levin,
	linux-doc, linux-kernel, kernel-hardening

Kees Cook <keescook@chromium.org> writes:

> +	if (sysctl_userns_restrict && !(capable(CAP_SYS_ADMIN) &&
> +					capable(CAP_SETUID) &&
> +					capable(CAP_SETGID)))
> +		return -EPERM;
> +

I will also note that the way I have seen containers used this check
adds no security and is not mentioned or justified in any way in your
patch description.

Furthermore this looks like blame shifting.  And quite frankly shifting
the responsibility to users if they get hacked is not an acceptable
attitude.

Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [kernel-hardening] Re: [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled
  2016-01-28 17:48 ` Eric W. Biederman
@ 2016-01-28 19:11   ` Robert Święcki
  2016-01-28 20:17     ` Kees Cook
  0 siblings, 1 reply; 7+ messages in thread
From: Robert Święcki @ 2016-01-28 19:11 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Kees Cook, Andrew Morton, Al Viro, Serge E. Hallyn,
	Andy Lutomirski, Austin S. Hemmelgarn, Richard Weinberger,
	Dmitry Vyukov, David Howells, Kostya Serebryany,
	Alexander Potapenko, Eric Dumazet, Sasha Levin, linux-doc, LKML,
	kernel-hardening

2016-01-28 18:48 GMT+01:00 Eric W. Biederman <ebiederm@xmission.com>:
> Kees Cook <keescook@chromium.org> writes:
>
>> +     if (sysctl_userns_restrict && !(capable(CAP_SYS_ADMIN) &&
>> +                                     capable(CAP_SETUID) &&
>> +                                     capable(CAP_SETGID)))
>> +             return -EPERM;
>> +
>
> I will also note that the way I have seen containers used this check
> adds no security and is not mentioned or justified in any way in your
> patch description.
>
> Furthermore this looks like blame shifting.  And quite frankly shifting
> the responsibility to users if they get hacked is not an acceptable
> attitude.

I think I might start understanding your point. Which, if I'm not
mistaken, is that it's not user namespaces which are buggy, but rather
some pieces of the kernel which would otherwise not be reachable from
the typical low-priv level of regular users (e.g. bugs in SOCK_RAW
sockets or iptables or mounts)?

If so, I can agree with such wording, but the proposed sysctl might
still be needed in such case. I guess those bits of the kernel which
were not reachable previously from non-priv users historically got
much less attention in terms of time spent on security reviews and
security fuzzing. And as much as users of the kernel would like to see
those pieces of the kernel to be tested to a level that the attack
surface reachable from unprivileged users level were tested, it will
not happen tomorrow. And our best option now might be to have some
switchable setting to disable this attack surface for those users who
feel they need it. In the meantime, we can concentrate on sec
reviewing those newly reachable kernel APIs, so some day we could
remove this sysctl.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [kernel-hardening] Re: [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled
  2016-01-28 17:41 ` [kernel-hardening] " Eric W. Biederman
@ 2016-01-28 20:08   ` Kees Cook
  0 siblings, 0 replies; 7+ messages in thread
From: Kees Cook @ 2016-01-28 20:08 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrew Morton, Al Viro, Serge E. Hallyn, Andy Lutomirski,
	Austin S. Hemmelgarn, Richard Weinberger,
	Robert Święcki, Dmitry Vyukov, David Howells,
	Kostya Serebryany, Alexander Potapenko, Eric Dumazet, Sasha Levin,
	linux-doc@vger.kernel.org, LKML,
	kernel-hardening@lists.openwall.com

On Thu, Jan 28, 2016 at 9:41 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Kees Cook <keescook@chromium.org> writes:
>
>> There continue to be unexpected security exposures when users have access
>> to CLONE_NEWUSER.
>
> So how does this sucessfully address that issue?

I've explained this several times. It disables new user namespaces.
I've asked earlier but got no answer: is revocation a requirement for
this feature?

>> For admins of systems that do not use user namespaces
>> and are running distro kernels with CONFIG_USER_NS enabled, there is
>> no way to disable CLONE_NEWUSER. This provides a way for sysadmins to
>> disable the feature to reduce their attack surface without needing to
>> rebuild their kernels.
>>
>> This is inspired by a similar restriction in Grsecurity, but adds
>> a sysctl.
>>
>
> I have already nacked this patch.   Thank you for removing the broken
> capability in sysctl check.  But this does not address any of the other
> issues I have raised.

I also asked about implementation directions but got no reply: do you
want this as an rlimit, or something else?

> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Further as far as I can tell this is just about a witch hunt.  Isn't
> that what you call a campaign against something when the complaining
> party does not understand something persecutes it and does not bother to
> try and understand?
>
> I have already told you what kind of direction would be acceptable.  I
> gave concrete suggests and here you are wasting our time with this patch
> again.

I wanted to have a "clean" version of the sysctl patch and figured I'd
share it since it was the implementation that Andy and others had
showed interest in, even if it won't be the final version of this
feature.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [kernel-hardening] Re: [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled
  2016-01-28 19:11   ` Robert Święcki
@ 2016-01-28 20:17     ` Kees Cook
  0 siblings, 0 replies; 7+ messages in thread
From: Kees Cook @ 2016-01-28 20:17 UTC (permalink / raw)
  To: Robert Święcki
  Cc: Eric W. Biederman, Andrew Morton, Al Viro, Serge E. Hallyn,
	Andy Lutomirski, Austin S. Hemmelgarn, Richard Weinberger,
	Dmitry Vyukov, David Howells, Kostya Serebryany,
	Alexander Potapenko, Eric Dumazet, Sasha Levin,
	linux-doc@vger.kernel.org, LKML,
	kernel-hardening@lists.openwall.com

On Thu, Jan 28, 2016 at 11:11 AM, Robert Święcki <robert@swiecki.net> wrote:
> 2016-01-28 18:48 GMT+01:00 Eric W. Biederman <ebiederm@xmission.com>:
>> Kees Cook <keescook@chromium.org> writes:
>>
>>> +     if (sysctl_userns_restrict && !(capable(CAP_SYS_ADMIN) &&
>>> +                                     capable(CAP_SETUID) &&
>>> +                                     capable(CAP_SETGID)))
>>> +             return -EPERM;
>>> +
>>
>> I will also note that the way I have seen containers used this check
>> adds no security and is not mentioned or justified in any way in your
>> patch description.
>>
>> Furthermore this looks like blame shifting.  And quite frankly shifting
>> the responsibility to users if they get hacked is not an acceptable
>> attitude.
>
> I think I might start understanding your point. Which, if I'm not
> mistaken, is that it's not user namespaces which are buggy, but rather
> some pieces of the kernel which would otherwise not be reachable from
> the typical low-priv level of regular users (e.g. bugs in SOCK_RAW
> sockets or iptables or mounts)?
>
> If so, I can agree with such wording, but the proposed sysctl might
> still be needed in such case. I guess those bits of the kernel which
> were not reachable previously from non-priv users historically got
> much less attention in terms of time spent on security reviews and
> security fuzzing. And as much as users of the kernel would like to see
> those pieces of the kernel to be tested to a level that the attack
> surface reachable from unprivileged users level were tested, it will
> not happen tomorrow. And our best option now might be to have some
> switchable setting to disable this attack surface for those users who
> feel they need it. In the meantime, we can concentrate on sec
> reviewing those newly reachable kernel APIs, so some day we could
> remove this sysctl.

Yes, exactly. I want to find a way to offer admins a way to reduce
attack surface. I have no interest in blame; this is a matter of
practicality. It exposes a less well tested set of APIs to
non-privileged users, so let's offer a way to remove that when
desired.

-Kees

-- 
Kees Cook
Chrome OS & Brillo Security

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-01-28 20:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-28 14:38 [kernel-hardening] [PATCH v2] sysctl: allow CLONE_NEWUSER to be disabled Kees Cook
2016-01-28 15:28 ` Serge E. Hallyn
2016-01-28 17:41 ` [kernel-hardening] " Eric W. Biederman
2016-01-28 20:08   ` Kees Cook
2016-01-28 17:48 ` Eric W. Biederman
2016-01-28 19:11   ` Robert Święcki
2016-01-28 20:17     ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).