From: "Serge E. Hallyn" <serge@hallyn.com>
To: Jonathan Calmels <jcalmels@3xx0.net>
Cc: brauner@kernel.org, ebiederm@xmission.com,
Luis Chamberlain <mcgrof@kernel.org>,
Kees Cook <keescook@chromium.org>,
Joel Granados <j.granados@samsung.com>,
Serge Hallyn <serge@hallyn.com>, Paul Moore <paul@paul-moore.com>,
James Morris <jmorris@namei.org>,
David Howells <dhowells@redhat.com>,
Jarkko Sakkinen <jarkko@kernel.org>,
containers@lists.linux.dev, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org,
linux-security-module@vger.kernel.org, keyrings@vger.kernel.org
Subject: Re: [PATCH 3/3] capabilities: add cap userns sysctl mask
Date: Sun, 19 May 2024 22:38:29 -0500 [thread overview]
Message-ID: <20240520033829.GB1816262@mail.hallyn.com> (raw)
In-Reply-To: <20240516092213.6799-4-jcalmels@3xx0.net>
On Thu, May 16, 2024 at 02:22:05AM -0700, Jonathan Calmels wrote:
> This patch adds a new system-wide userns capability mask designed to mask
> off capabilities in user namespaces.
>
> This mask is controlled through a sysctl and can be set early in the boot
> process or on the kernel command line to exclude known capabilities from
> ever being gained in namespaces. Once set, it can be further restricted to
> exert dynamic policies on the system (e.g. ward off a potential exploit).
>
> Changing this mask requires privileges over CAP_SYS_ADMIN and CAP_SETPCAP
> in the initial user namespace.
>
> Example:
>
> # sysctl -qw kernel.cap_userns_mask=0x1fffffdffff && \
> unshare -r grep Cap /proc/self/status
> CapInh: 0000000000000000
> CapPrm: 000001fffffdffff
> CapEff: 000001fffffdffff
> CapBnd: 000001fffffdffff
> CapAmb: 0000000000000000
> CapUNs: 000001fffffdffff
>
> Signed-off-by: Jonathan Calmels <jcalmels@3xx0.net>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
> ---
> include/linux/user_namespace.h | 7 ++++
> kernel/sysctl.c | 10 ++++++
> kernel/user_namespace.c | 66 ++++++++++++++++++++++++++++++++++
> 3 files changed, 83 insertions(+)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index 6030a8235617..e3478bd54ee5 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -2,6 +2,7 @@
> #ifndef _LINUX_USER_NAMESPACE_H
> #define _LINUX_USER_NAMESPACE_H
>
> +#include <linux/capability.h>
> #include <linux/kref.h>
> #include <linux/nsproxy.h>
> #include <linux/ns_common.h>
> @@ -14,6 +15,12 @@
> #define UID_GID_MAP_MAX_BASE_EXTENTS 5
> #define UID_GID_MAP_MAX_EXTENTS 340
>
> +#ifdef CONFIG_SYSCTL
> +extern kernel_cap_t cap_userns_mask;
> +int proc_cap_userns_handler(struct ctl_table *table, int write,
> + void *buffer, size_t *lenp, loff_t *ppos);
> +#endif
> +
> struct uid_gid_extent {
> u32 first;
> u32 lower_first;
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 81cc974913bb..1546eebd6aea 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -62,6 +62,7 @@
> #include <linux/sched/sysctl.h>
> #include <linux/mount.h>
> #include <linux/userfaultfd_k.h>
> +#include <linux/user_namespace.h>
> #include <linux/pid.h>
>
> #include "../lib/kstrtox.h"
> @@ -1846,6 +1847,15 @@ static struct ctl_table kern_table[] = {
> .mode = 0444,
> .proc_handler = proc_dointvec,
> },
> +#ifdef CONFIG_USER_NS
> + {
> + .procname = "cap_userns_mask",
> + .data = &cap_userns_mask,
> + .maxlen = sizeof(kernel_cap_t),
> + .mode = 0644,
> + .proc_handler = proc_cap_userns_handler,
> + },
> +#endif
> #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86)
> {
> .procname = "unknown_nmi_panic",
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 53848e2b68cd..e0cf606e9140 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -26,6 +26,66 @@
> static struct kmem_cache *user_ns_cachep __ro_after_init;
> static DEFINE_MUTEX(userns_state_mutex);
>
> +#ifdef CONFIG_SYSCTL
> +static DEFINE_SPINLOCK(cap_userns_lock);
> +kernel_cap_t cap_userns_mask = CAP_FULL_SET;
> +
> +int proc_cap_userns_handler(struct ctl_table *table, int write,
> + void *buffer, size_t *lenp, loff_t *ppos)
> +{
> + struct ctl_table t;
> + unsigned long mask_array[2];
> + kernel_cap_t new_mask, *mask;
> + int err;
> +
> + if (write && (!capable(CAP_SETPCAP) ||
> + !capable(CAP_SYS_ADMIN)))
> + return -EPERM;
> +
> + /*
> + * convert from the global kernel_cap_t to the ulong array to print to
> + * userspace if this is a read.
> + *
> + * capabilities are exposed as one 64-bit value or two 32-bit values
> + * depending on the architecture
> + */
> + mask = table->data;
> + spin_lock(&cap_userns_lock);
> + mask_array[0] = (unsigned long) mask->val;
> +#if BITS_PER_LONG != 64
> + mask_array[1] = mask->val >> BITS_PER_LONG;
> +#endif
> + spin_unlock(&cap_userns_lock);
> +
> + t = *table;
> + t.data = &mask_array;
> +
> + /*
> + * actually read or write and array of ulongs from userspace. Remember
> + * these are least significant bits first
> + */
> + err = proc_doulongvec_minmax(&t, write, buffer, lenp, ppos);
> + if (err < 0)
> + return err;
> +
> + new_mask.val = mask_array[0];
> +#if BITS_PER_LONG != 64
> + new_mask.val += (u64)mask_array[1] << BITS_PER_LONG;
> +#endif
> +
> + /*
> + * Drop everything not in the new_mask (but don't add things)
> + */
> + if (write) {
> + spin_lock(&cap_userns_lock);
> + *mask = cap_intersect(*mask, new_mask);
> + spin_unlock(&cap_userns_lock);
> + }
> +
> + return 0;
> +}
> +#endif
> +
> static bool new_idmap_permitted(const struct file *file,
> struct user_namespace *ns, int cap_setid,
> struct uid_gid_map *map);
> @@ -46,6 +106,12 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns)
> /* Limit userns capabilities to our parent's bounding set. */
> if (iscredsecure(cred, SECURE_USERNS_STRICT_CAPS))
> cred->cap_userns = cap_intersect(cred->cap_userns, cred->cap_bset);
> +#ifdef CONFIG_SYSCTL
> + /* Mask off userns capabilities that are not permitted by the system-wide mask. */
> + spin_lock(&cap_userns_lock);
> + cred->cap_userns = cap_intersect(cred->cap_userns, cap_userns_mask);
> + spin_unlock(&cap_userns_lock);
> +#endif
>
> /* Start with the capabilities defined in the userns set. */
> cred->cap_bset = cred->cap_userns;
> --
> 2.45.0
>
next prev parent reply other threads:[~2024-05-20 3:38 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-16 9:22 [PATCH 0/3] Introduce user namespace capabilities Jonathan Calmels
2024-05-16 9:22 ` [PATCH 1/3] capabilities: " Jonathan Calmels
2024-05-16 12:27 ` Jarkko Sakkinen
2024-05-16 22:07 ` John Johansen
2024-05-17 10:51 ` Jonathan Calmels
2024-05-17 11:59 ` John Johansen
2024-05-18 3:50 ` Jonathan Calmels
2024-05-18 12:27 ` John Johansen
2024-05-19 1:33 ` Jonathan Calmels
2024-05-17 11:32 ` Eric W. Biederman
2024-05-17 11:55 ` Jonathan Calmels
2024-05-17 12:48 ` John Johansen
2024-05-17 14:22 ` Eric W. Biederman
2024-05-17 18:02 ` Jonathan Calmels
2024-05-21 15:52 ` John Johansen
2024-05-20 3:30 ` Serge E. Hallyn
2024-05-20 3:36 ` Serge E. Hallyn
2024-05-16 9:22 ` [PATCH 2/3] capabilities: add securebit for strict userns caps Jonathan Calmels
2024-05-16 12:42 ` Jarkko Sakkinen
2024-05-20 3:38 ` Serge E. Hallyn
2024-05-16 9:22 ` [PATCH 3/3] capabilities: add cap userns sysctl mask Jonathan Calmels
2024-05-16 12:44 ` Jarkko Sakkinen
2024-05-20 3:38 ` Serge E. Hallyn [this message]
2024-05-20 13:30 ` Tycho Andersen
2024-05-20 19:25 ` Jonathan Calmels
2024-05-20 21:13 ` Tycho Andersen
2024-05-20 22:12 ` Jarkko Sakkinen
2024-05-21 14:29 ` Tycho Andersen
2024-05-21 14:45 ` Jarkko Sakkinen
2024-05-16 13:30 ` [PATCH 0/3] Introduce user namespace capabilities Ben Boeckel
2024-05-16 13:36 ` Jarkko Sakkinen
2024-05-17 10:00 ` Jonathan Calmels
2024-05-16 16:23 ` Paul Moore
2024-05-16 17:18 ` Jarkko Sakkinen
2024-05-16 19:07 ` Casey Schaufler
2024-05-16 19:29 ` Jarkko Sakkinen
2024-05-16 19:31 ` Jarkko Sakkinen
2024-05-16 20:00 ` Jarkko Sakkinen
2024-05-17 11:42 ` Jonathan Calmels
2024-05-17 17:53 ` Casey Schaufler
2024-05-17 19:11 ` Jonathan Calmels
2024-05-18 11:08 ` Jarkko Sakkinen
2024-05-18 11:17 ` Jarkko Sakkinen
2024-05-18 11:21 ` Jarkko Sakkinen
2024-05-21 13:57 ` John Johansen
2024-05-21 14:12 ` Jarkko Sakkinen
2024-05-21 14:45 ` John Johansen
2024-05-22 0:45 ` Jonathan Calmels
2024-05-31 7:43 ` John Johansen
2024-05-18 12:20 ` Serge Hallyn
2024-05-19 17:03 ` Casey Schaufler
2024-05-20 0:54 ` Jonathan Calmels
2024-05-21 14:29 ` John Johansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240520033829.GB1816262@mail.hallyn.com \
--to=serge@hallyn.com \
--cc=brauner@kernel.org \
--cc=containers@lists.linux.dev \
--cc=dhowells@redhat.com \
--cc=ebiederm@xmission.com \
--cc=j.granados@samsung.com \
--cc=jarkko@kernel.org \
--cc=jcalmels@3xx0.net \
--cc=jmorris@namei.org \
--cc=keescook@chromium.org \
--cc=keyrings@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=paul@paul-moore.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).