[PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation
@ 2023-09-27 22:57 Andrii Nakryiko
  2023-09-27 22:57 ` [PATCH v6 bpf-next 01/13] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
                   ` (12 more replies)
  0 siblings, 13 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:57 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

This patch set introduces an ability to delegate a subset of BPF subsystem
functionality from privileged system-wide daemon (e.g., systemd or any other
container manager) through special mount options for userns-bound BPF FS to
a *trusted* unprivileged application. Trust is the key here. This
functionality is not about allowing unconditional unprivileged BPF usage.
Establishing trust, though, is completely up to the discretion of respective
privileged application that would create and mount a BPF FS instance with
delegation enabled, as different production setups can and do achieve it
through a combination of different means (signing, LSM, code reviews, etc),
and it's undesirable and infeasible for kernel to enforce any particular way
of validating trustworthiness of particular process.

The main motivation for this work is a desire to enable containerized BPF
applications to be used together with user namespaces. This is currently
impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced
or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF
helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read
arbitrary memory, and it's impossible to ensure that they only read memory of
processes belonging to any given namespace. This means that it's impossible to
have a mechanically verifiable namespace-aware CAP_BPF capability, and as such
another mechanism to allow safe usage of BPF functionality is necessary.BPF FS
delegation mount options and BPF token derived from such BPF FS instance is
such a mechanism. Kernel makes no assumption about what "trusted" constitutes
in any particular case, and it's up to specific privileged applications and
their surrounding infrastructure to decide that. What kernel provides is a set
of APIs to setup and mount special BPF FS instanecs and derive BPF tokens from
it. BPF FS and BPF token are both bound to its owning userns and in such a way
are constrained inside intended container. Users can then pass BPF token FD to
privileged bpf() syscall commands, like BPF map creation and BPF program
loading, to perform such operations without having init userns privileged.

This v4 incorporates feedback and suggestions ([3]) received on v3 of this
patch set, and instead of allowing to create BPF tokens directly assuming
capable(CAP_SYS_ADMIN), we instead enhance BPF FS to accepts a few new
delegation mount options. If these options are used and BPF FS itself is
properly created, set up, and mounted inside the user namespaced container,
user application is able to derive a BPF token object from BPF FS instance,
and pass that token to bpf() syscall. As explained in patch #2, BPF token
itself doesn't grant access to BPF functionality, but instead allows kernel to
do namespaced capabilities checks (ns_capable() vs capable()) for CAP_BPF,
CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, as applicable. So it forms one
half of a puzzle and allows container managers and sys admins to have safe and
flexible configuration options: determining which containers get delegation of
BPF functionality through BPF FS, and then which applications within such
containers are allowed to perform bpf() commands, based on namespaces
capabilities.

Previous attempt at addressing this very same problem ([0]) attempted to
utilize authoritative LSM approach, but was conclusively rejected by upstream
LSM maintainers. BPF token concept is not changing anything about LSM
approach, but can be combined with LSM hooks for very fine-grained security
policy. Some ideas about making BPF token more convenient to use with LSM (in
particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF
2023 presentation ([1]). E.g., an ability to specify user-provided data
(context), which in combination with BPF LSM would allow implementing a very
dynamic and fine-granular custom security policies on top of BPF token. In the
interest of minimizing API surface area and discussions this was relegated to
follow up patches, as it's not essential to the fundamental concept of
delegatable BPF token.

It should be noted that BPF token is conceptually quite similar to the idea of
/dev/bpf device file, proposed by Song a while ago ([2]). The biggest
difference is the idea of using virtual anon_inode file to hold BPF token and
allowing multiple independent instances of them, each (potentially) with its
own set of restrictions. And also, crucially, BPF token approach is not using
any special stateful task-scoped flags. Instead, bpf() syscall accepts
token_fd parameters explicitly for each relevant BPF command. This addresses
main concerns brought up during the /dev/bpf discussion, and fits better with
overall BPF subsystem design.

This patch set adds a basic minimum of functionality to make BPF token idea
useful and to discuss API and functionality. Currently only low-level libbpf
APIs support creating and passing BPF token around, allowing to test kernel
functionality, but for the most part is not sufficient for real-world
applications, which typically use high-level libbpf APIs based on `struct
bpf_object` type. This was done with the intent to limit the size of patch set
and concentrate on mostly kernel-side changes. All the necessary plumbing for
libbpf will be sent as a separate follow up patch set kernel support makes it
upstream.

Another part that should happen once kernel-side BPF token is established, is
a set of conventions between applications (e.g., systemd), tools (e.g.,
bpftool), and libraries (e.g., libbpf) on exposing delegatable BPF FS
instance(s) at well-defined locations to allow applications take advantage of
this in automatic fashion without explicit code changes on BPF application's
side. But I'd like to postpone this discussion to after BPF token concept
lands.

  [0] https://lore.kernel.org/bpf/20230412043300.360803-1-andrii@kernel.org/
  [1] http://vger.kernel.org/bpfconf2023_material/Trusted_unprivileged_BPF_LSFMM2023.pdf
  [2] https://lore.kernel.org/bpf/20190627201923.2589391-2-songliubraving@fb.com/
  [3] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

v5->v6:
  - fix possible use of uninitialized variable in selftests (CI);
  - don't use anon_inode, instead create one from BPF FS instance (Christian);
  - don't store bpf_token inside struct bpf_map, instead pass it explicitly to
    map_check_btf(). We do store bpf_token inside prog->aux, because it's used
    during verification and even can be checked during attach time for some
    program types;
  - LSM hooks are left intact pending the conclusion of discussion with Paul
    Moore; I'd prefer to do LSM-related changes as a follow up patch set
    anyways;
v4->v5:
  - add pre-patch unifying CAP_NET_ADMIN handling inside kernel/bpf/syscall.c
    (Paul Moore);
  - fix build warnings and errors in selftests and kernel, detected by CI and
    kernel test robot;
v3->v4:
  - add delegation mount options to BPF FS;
  - BPF token is derived from the instance of BPF FS and associates itself
    with BPF FS' owning userns;
  - BPF token doesn't grant BPF functionality directly, it just turns
    capable() checks into ns_capable() checks within BPF FS' owning user;
  - BPF token cannot be pinned;
v2->v3:
  - make BPF_TOKEN_CREATE pin created BPF token in BPF FS, and disallow
    BPF_OBJ_PIN for BPF token;
v1->v2:
  - fix build failures on Kconfig with CONFIG_BPF_SYSCALL unset;
  - drop BPF_F_TOKEN_UNKNOWN_* flags and simplify UAPI (Stanislav).

Andrii Nakryiko (13):
  bpf: align CAP_NET_ADMIN checks with bpf_capable() approach
  bpf: add BPF token delegation mount options to BPF FS
  bpf: introduce BPF token object
  bpf: add BPF token support to BPF_MAP_CREATE command
  bpf: add BPF token support to BPF_BTF_LOAD command
  bpf: add BPF token support to BPF_PROG_LOAD command
  bpf: take into account BPF token when fetching helper protos
  bpf: consistenly use BPF token throughout BPF verifier logic
  libbpf: add bpf_token_create() API
  libbpf: add BPF token support to bpf_map_create() API
  libbpf: add BPF token support to bpf_btf_load() API
  libbpf: add BPF token support to bpf_prog_load() API
  selftests/bpf: add BPF token-enabled tests

 drivers/media/rc/bpf-lirc.c                   |   2 +-
 include/linux/bpf.h                           |  79 ++-
 include/linux/filter.h                        |   2 +-
 include/uapi/linux/bpf.h                      |  44 ++
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/arraymap.c                         |   2 +-
 kernel/bpf/cgroup.c                           |   6 +-
 kernel/bpf/core.c                             |   3 +-
 kernel/bpf/helpers.c                          |   6 +-
 kernel/bpf/inode.c                            |  99 ++-
 kernel/bpf/syscall.c                          | 187 ++++--
 kernel/bpf/token.c                            | 237 +++++++
 kernel/bpf/verifier.c                         |  13 +-
 kernel/trace/bpf_trace.c                      |   2 +-
 net/core/filter.c                             |  36 +-
 net/ipv4/bpf_tcp_ca.c                         |   2 +-
 net/netfilter/nf_bpf_link.c                   |   2 +-
 tools/include/uapi/linux/bpf.h                |  44 ++
 tools/lib/bpf/bpf.c                           |  30 +-
 tools/lib/bpf/bpf.h                           |  39 +-
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/libbpf_probes.c  |   4 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |   6 +
 .../testing/selftests/bpf/prog_tests/token.c  | 629 ++++++++++++++++++
 24 files changed, 1366 insertions(+), 111 deletions(-)
 create mode 100644 kernel/bpf/token.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/token.c

-- 
2.34.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 01/13] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
@ 2023-09-27 22:57 ` Andrii Nakryiko
  2023-09-27 22:57 ` [PATCH v6 bpf-next 02/13] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:57 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Within BPF syscall handling code CAP_NET_ADMIN checks stand out a bit
compared to CAP_BPF and CAP_PERFMON checks. For the latter, CAP_BPF or
CAP_PERFMON are checked first, but if they are not set, CAP_SYS_ADMIN
takes over and grants whatever part of BPF syscall is required.

Similar kind of checks that involve CAP_NET_ADMIN are not so consistent.
One out of four uses does follow CAP_BPF/CAP_PERFMON model: during
BPF_PROG_LOAD, if the type of BPF program is "network-related" either
CAP_NET_ADMIN or CAP_SYS_ADMIN is required to proceed.

But in three other cases CAP_NET_ADMIN is required even if CAP_SYS_ADMIN
is set:
  - when creating DEVMAP/XDKMAP/CPU_MAP maps;
  - when attaching CGROUP_SKB programs;
  - when handling BPF_PROG_QUERY command.

This patch is changing the latter three cases to follow BPF_PROG_LOAD
model, that is allowing to proceed under either CAP_NET_ADMIN or
CAP_SYS_ADMIN.

This also makes it cleaner in subsequent BPF token patches to switch
wholesomely to a generic bpf_token_capable(int cap) check, that always
falls back to CAP_SYS_ADMIN if requested capability is missing.

Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 6b5280f14a53..7445dad01fb3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1097,6 +1097,11 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	return ret;
 }
 
+static bool bpf_net_capable(void)
+{
+	return capable(CAP_NET_ADMIN) || capable(CAP_SYS_ADMIN);
+}
+
 #define BPF_MAP_CREATE_LAST_FIELD map_extra
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
@@ -1200,7 +1205,7 @@ static int map_create(union bpf_attr *attr)
 	case BPF_MAP_TYPE_DEVMAP:
 	case BPF_MAP_TYPE_DEVMAP_HASH:
 	case BPF_MAP_TYPE_XSKMAP:
-		if (!capable(CAP_NET_ADMIN))
+		if (!bpf_net_capable())
 			return -EPERM;
 		break;
 	default:
@@ -2595,7 +2600,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	    !bpf_capable())
 		return -EPERM;
 
-	if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) && !capable(CAP_SYS_ADMIN))
+	if (is_net_admin_prog_type(type) && !bpf_net_capable())
 		return -EPERM;
 	if (is_perfmon_prog_type(type) && !perfmon_capable())
 		return -EPERM;
@@ -3740,7 +3745,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	case BPF_PROG_TYPE_SK_LOOKUP:
 		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
 	case BPF_PROG_TYPE_CGROUP_SKB:
-		if (!capable(CAP_NET_ADMIN))
+		if (!bpf_net_capable())
 			/* cg-skb progs can be loaded by unpriv user.
 			 * check permissions at attach time.
 			 */
@@ -3924,7 +3929,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 static int bpf_prog_query(const union bpf_attr *attr,
 			  union bpf_attr __user *uattr)
 {
-	if (!capable(CAP_NET_ADMIN))
+	if (!bpf_net_capable())
 		return -EPERM;
 	if (CHECK_ATTR(BPF_PROG_QUERY))
 		return -EINVAL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 02/13] bpf: add BPF token delegation mount options to BPF FS
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
  2023-09-27 22:57 ` [PATCH v6 bpf-next 01/13] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
@ 2023-09-27 22:57 ` Andrii Nakryiko
  2023-10-10  7:08   ` Hou Tao
  2023-09-27 22:57 ` [PATCH v6 bpf-next 03/13] bpf: introduce BPF token object Andrii Nakryiko
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:57 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add few new mount options to BPF FS that allow to specify that a given
BPF FS instance allows creation of BPF token (added in the next patch),
and what sort of operations are allowed under BPF token. As such, we get
4 new mount options, each is a bit mask
  - `delegate_cmds` allow to specify which bpf() syscall commands are
    allowed with BPF token derived from this BPF FS instance;
  - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
    a set of allowable BPF map types that could be created with BPF token;
  - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
    a set of allowable BPF program types that could be loaded with BPF token;
  - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
    a set of allowable BPF program attach types that could be loaded with
    BPF token; delegate_progs and delegate_attachs are meant to be used
    together, as full BPF program type is, in general, determined
    through both program type and program attach type.

Currently, these mount options accept the following forms of values:
  - a special value "any", that enables all possible values of a given
  bit set;
  - numeric value (decimal or hexadecimal, determined by kernel
  automatically) that specifies a bit mask value directly;
  - all the values for a given mount option are combined, if specified
  multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
  delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
  mask.

Ideally, more convenient (for humans) symbolic form derived from
corresponding UAPI enums would be accepted (e.g., `-o
delegate_progs=kprobe|tracepoint`) and I intend to implement this, but
it requires a bunch of UAPI header churn, so I postponed it until this
feature lands upstream or at least there is a definite consensus that
this feature is acceptable and is going to make it, just to minimize
amount of wasted effort and not increase amount of non-essential code to
be reviewed.

Attentive reader will notice that BPF FS is now marked as
FS_USERNS_MOUNT, which theoretically makes it mountable inside non-init
user namespace as long as the process has sufficient *namespaced*
capabilities within that user namespace. But in reality we still
restrict BPF FS to be mountable only by processes with CAP_SYS_ADMIN *in
init userns* (extra check in bpf_fill_super()). FS_USERNS_MOUNT is added
to allow creating BPF FS context object (i.e., fsopen("bpf")) from
inside unprivileged process inside non-init userns, to capture that
userns as the owning userns. It will still be required to pass this
context object back to privileged process to instantiate and mount it.

This manipulation is important, because capturing non-init userns as the
owning userns of BPF FS instance (super block) allows to use that userns
to constraint BPF token to that userns later on (see next patch). So
creating BPF FS with delegation inside unprivileged userns will restrict
derived BPF token objects to only "work" inside that intended userns,
making it scoped to a intended "container".

There is a set of selftests at the end of the patch set that simulates
this sequence of steps and validates that everything works as intended.
But careful review is requested to make sure there are no missed gaps in
the implementation and testing.

All this is based on suggestions and discussions with Christian Brauner
([0]), to the best of my ability to follow all the implications.

  [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h | 10 ++++++
 kernel/bpf/inode.c  | 88 +++++++++++++++++++++++++++++++++++++++------
 2 files changed, 88 insertions(+), 10 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a82efd34b741..a5bd40f71fd0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1562,6 +1562,16 @@ struct bpf_link_primer {
 	u32 id;
 };
 
+struct bpf_mount_opts {
+	umode_t mode;
+
+	/* BPF token-related delegation options */
+	u64 delegate_cmds;
+	u64 delegate_maps;
+	u64 delegate_progs;
+	u64 delegate_attachs;
+};
+
 struct bpf_struct_ops_value;
 struct btf_member;
 
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 99d0625b6c82..24b3faf901f4 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -20,6 +20,7 @@
 #include <linux/filter.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
+#include <linux/kstrtox.h>
 #include "preload/bpf_preload.h"
 
 enum bpf_type {
@@ -600,10 +601,31 @@ EXPORT_SYMBOL(bpf_prog_get_type_path);
  */
 static int bpf_show_options(struct seq_file *m, struct dentry *root)
 {
+	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
 	umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
 
 	if (mode != S_IRWXUGO)
 		seq_printf(m, ",mode=%o", mode);
+
+	if (opts->delegate_cmds == ~0ULL)
+		seq_printf(m, ",delegate_cmds=any");
+	else if (opts->delegate_cmds)
+		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
+
+	if (opts->delegate_maps == ~0ULL)
+		seq_printf(m, ",delegate_maps=any");
+	else if (opts->delegate_maps)
+		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
+
+	if (opts->delegate_progs == ~0ULL)
+		seq_printf(m, ",delegate_progs=any");
+	else if (opts->delegate_progs)
+		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
+
+	if (opts->delegate_attachs == ~0ULL)
+		seq_printf(m, ",delegate_attachs=any");
+	else if (opts->delegate_attachs)
+		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
 	return 0;
 }
 
@@ -627,22 +649,27 @@ static const struct super_operations bpf_super_ops = {
 
 enum {
 	OPT_MODE,
+	OPT_DELEGATE_CMDS,
+	OPT_DELEGATE_MAPS,
+	OPT_DELEGATE_PROGS,
+	OPT_DELEGATE_ATTACHS,
 };
 
 static const struct fs_parameter_spec bpf_fs_parameters[] = {
 	fsparam_u32oct	("mode",			OPT_MODE),
+	fsparam_string	("delegate_cmds",		OPT_DELEGATE_CMDS),
+	fsparam_string	("delegate_maps",		OPT_DELEGATE_MAPS),
+	fsparam_string	("delegate_progs",		OPT_DELEGATE_PROGS),
+	fsparam_string	("delegate_attachs",		OPT_DELEGATE_ATTACHS),
 	{}
 };
 
-struct bpf_mount_opts {
-	umode_t mode;
-};
-
 static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 {
-	struct bpf_mount_opts *opts = fc->fs_private;
+	struct bpf_mount_opts *opts = fc->s_fs_info;
 	struct fs_parse_result result;
-	int opt;
+	int opt, err;
+	u64 msk;
 
 	opt = fs_parse(fc, bpf_fs_parameters, param, &result);
 	if (opt < 0) {
@@ -666,6 +693,25 @@ static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param)
 	case OPT_MODE:
 		opts->mode = result.uint_32 & S_IALLUGO;
 		break;
+	case OPT_DELEGATE_CMDS:
+	case OPT_DELEGATE_MAPS:
+	case OPT_DELEGATE_PROGS:
+	case OPT_DELEGATE_ATTACHS:
+		if (strcmp(param->string, "any") == 0) {
+			msk = ~0ULL;
+		} else {
+			err = kstrtou64(param->string, 0, &msk);
+			if (err)
+				return err;
+		}
+		switch (opt) {
+		case OPT_DELEGATE_CMDS: opts->delegate_cmds |= msk; break;
+		case OPT_DELEGATE_MAPS: opts->delegate_maps |= msk; break;
+		case OPT_DELEGATE_PROGS: opts->delegate_progs |= msk; break;
+		case OPT_DELEGATE_ATTACHS: opts->delegate_attachs |= msk; break;
+		default: return -EINVAL;
+		}
+		break;
 	}
 
 	return 0;
@@ -740,10 +786,14 @@ static int populate_bpffs(struct dentry *parent)
 static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
 {
 	static const struct tree_descr bpf_rfiles[] = { { "" } };
-	struct bpf_mount_opts *opts = fc->fs_private;
+	struct bpf_mount_opts *opts = sb->s_fs_info;
 	struct inode *inode;
 	int ret;
 
+	/* Mounting an instance of BPF FS requires privileges */
+	if (fc->user_ns != &init_user_ns && !capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
 	ret = simple_fill_super(sb, BPF_FS_MAGIC, bpf_rfiles);
 	if (ret)
 		return ret;
@@ -765,7 +815,10 @@ static int bpf_get_tree(struct fs_context *fc)
 
 static void bpf_free_fc(struct fs_context *fc)
 {
-	kfree(fc->fs_private);
+	struct bpf_mount_opts *opts = fc->s_fs_info;
+
+	if (opts)
+		kfree(opts);
 }
 
 static const struct fs_context_operations bpf_context_ops = {
@@ -787,17 +840,32 @@ static int bpf_init_fs_context(struct fs_context *fc)
 
 	opts->mode = S_IRWXUGO;
 
-	fc->fs_private = opts;
+	/* start out with no BPF token delegation enabled */
+	opts->delegate_cmds = 0;
+	opts->delegate_maps = 0;
+	opts->delegate_progs = 0;
+	opts->delegate_attachs = 0;
+
+	fc->s_fs_info = opts;
 	fc->ops = &bpf_context_ops;
 	return 0;
 }
 
+static void bpf_kill_super(struct super_block *sb)
+{
+	struct bpf_mount_opts *opts = sb->s_fs_info;
+
+	kill_litter_super(sb);
+	kfree(opts);
+}
+
 static struct file_system_type bpf_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "bpf",
 	.init_fs_context = bpf_init_fs_context,
 	.parameters	= bpf_fs_parameters,
-	.kill_sb	= kill_litter_super,
+	.kill_sb	= bpf_kill_super,
+	.fs_flags	= FS_USERNS_MOUNT,
 };
 
 static int __init bpf_init(void)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 03/13] bpf: introduce BPF token object
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
  2023-09-27 22:57 ` [PATCH v6 bpf-next 01/13] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
  2023-09-27 22:57 ` [PATCH v6 bpf-next 02/13] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
@ 2023-09-27 22:57 ` Andrii Nakryiko
  2023-10-11  1:17   ` [PATCH v6 3/13] " Paul Moore
  2023-10-11  2:35   ` [PATCH v6 bpf-next 03/13] " Hou Tao
  2023-09-27 22:58 ` [PATCH v6 bpf-next 04/13] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:57 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add new kind of BPF kernel object, BPF token. BPF token is meant to
allow delegating privileged BPF functionality, like loading a BPF
program or creating a BPF map, from privileged process to a *trusted*
unprivileged process, all while have a good amount of control over which
privileged operations could be performed using provided BPF token.

This is achieved through mounting BPF FS instance with extra delegation
mount options, which determine what operations are delegatable, and also
constraining it to the owning user namespace (as mentioned in the
previous patch).

BPF token itself is just a derivative from BPF FS and can be created
through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
a path specification (using the usual fd + string path combo) to a BPF
FS mount. Currently, BPF token "inherits" delegated command, map types,
prog type, and attach type bit sets from BPF FS as is. In the future,
having an BPF token as a separate object with its own FD, we can allow
to further restrict BPF token's allowable set of things either at the creation
time or after the fact, allowing the process to guard itself further
from, e.g., unintentionally trying to load undesired kind of BPF
programs. But for now we keep things simple and just copy bit sets as is.

When BPF token is created from BPF FS mount, we take reference to the
BPF super block's owning user namespace, and then use that namespace for
checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
capabilities that are normally only checked against init userns (using
capable()), but now we check them using ns_capable() instead (if BPF
token is provided). See bpf_token_capable() for details.

Such setup means that BPF token in itself is not sufficient to grant BPF
functionality. User namespaced process has to *also* have necessary
combination of capabilities inside that user namespace. So while
previously CAP_BPF was useless when granted within user namespace, now
it gains a meaning and allows container managers and sys admins to have
a flexible control over which processes can and need to use BPF
functionality within the user namespace (i.e., container in practice).
And BPF FS delegation mount options and derived BPF tokens serve as
a per-container "flag" to grant overall ability to use bpf() (plus further
restrict on which parts of bpf() syscalls are treated as namespaced).

The alternative to creating BPF token object was:
  a) not having any extra object and just pasing BPF FS path to each
     relevant bpf() command. This seems suboptimal as it's racy (mount
     under the same path might change in between checking it and using it
     for bpf() command). And also less flexible if we'd like to further
     restrict ourselves compared to all the delegated functionality
     allowed on BPF FS.
  b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
     a dedicated FD that would represent a token-like functionality. This
     doesn't seem superior to having a proper bpf() command, so
     BPF_TOKEN_CREATE was chosen.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h            |  40 +++++++
 include/uapi/linux/bpf.h       |  39 +++++++
 kernel/bpf/Makefile            |   2 +-
 kernel/bpf/inode.c             |  10 +-
 kernel/bpf/syscall.c           |  17 +++
 kernel/bpf/token.c             | 197 +++++++++++++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  39 +++++++
 7 files changed, 339 insertions(+), 5 deletions(-)
 create mode 100644 kernel/bpf/token.c

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a5bd40f71fd0..c43131a24579 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -51,6 +51,10 @@ struct module;
 struct bpf_func_state;
 struct ftrace_ops;
 struct cgroup;
+struct bpf_token;
+struct user_namespace;
+struct super_block;
+struct inode;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1572,6 +1576,13 @@ struct bpf_mount_opts {
 	u64 delegate_attachs;
 };
 
+struct bpf_token {
+	struct work_struct work;
+	atomic64_t refcnt;
+	struct user_namespace *userns;
+	u64 allowed_cmds;
+};
+
 struct bpf_struct_ops_value;
 struct btf_member;
 
@@ -2162,6 +2173,8 @@ static inline void bpf_map_dec_elem_count(struct bpf_map *map)
 
 extern int sysctl_unprivileged_bpf_disabled;
 
+bool bpf_token_capable(const struct bpf_token *token, int cap);
+
 static inline bool bpf_allow_ptr_leaks(void)
 {
 	return perfmon_capable();
@@ -2196,8 +2209,17 @@ int bpf_link_new_fd(struct bpf_link *link);
 struct bpf_link *bpf_link_get_from_fd(u32 ufd);
 struct bpf_link *bpf_link_get_curr_or_next(u32 *id);
 
+void bpf_token_inc(struct bpf_token *token);
+void bpf_token_put(struct bpf_token *token);
+int bpf_token_create(union bpf_attr *attr);
+struct bpf_token *bpf_token_get_from_fd(u32 ufd);
+
+bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
+struct inode *bpf_get_inode(struct super_block *sb, const struct inode *dir,
+			    umode_t mode);
 
 #define BPF_ITER_FUNC_PREFIX "bpf_iter_"
 #define DEFINE_BPF_ITER_FUNC(target, args...)			\
@@ -2557,6 +2579,24 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags)
 	return -EOPNOTSUPP;
 }
 
+static inline bool bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
+}
+
+static inline void bpf_token_inc(struct bpf_token *token)
+{
+}
+
+static inline void bpf_token_put(struct bpf_token *token)
+{
+}
+
+static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
+
 static inline void __dev_flush(void)
 {
 }
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 70bfa997e896..78692911f4a0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -847,6 +847,37 @@ union bpf_iter_link_info {
  *		Returns zero on success. On error, -1 is returned and *errno*
  *		is set appropriately.
  *
+ * BPF_TOKEN_CREATE
+ *	Description
+ *		Create BPF token with embedded information about what
+ *		BPF-related functionality it allows:
+ *		- a set of allowed bpf() syscall commands;
+ *		- a set of allowed BPF map types to be created with
+ *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
+ *		- a set of allowed BPF program types and BPF program attach
+ *		types to be loaded with BPF_PROG_LOAD command, if
+ *		BPF_PROG_LOAD itself is allowed.
+ *
+ *		BPF token is created (derived) from an instance of BPF FS,
+ *		assuming it has necessary delegation mount options specified.
+ *		BPF FS mount is specified with openat()-style path FD + string.
+ *		This BPF token can be passed as an extra parameter to various
+ *		bpf() syscall commands to grant BPF subsystem functionality to
+ *		unprivileged processes.
+ *
+ *		When created, BPF token is "associated" with the owning
+ *		user namespace of BPF FS instance (super block) that it was
+ *		derived from, and subsequent BPF operations performed with
+ *		BPF token would be performing capabilities checks (i.e.,
+ *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
+ *		that user namespace. Without BPF token, such capabilities
+ *		have to be granted in init user namespace, making bpf()
+ *		syscall incompatible with user namespace, for the most part.
+ *
+ *	Return
+ *		A new file descriptor (a nonnegative integer), or -1 if an
+ *		error occurred (in which case, *errno* is set appropriately).
+ *
  * NOTES
  *	eBPF objects (maps and programs) can be shared between processes.
  *
@@ -901,6 +932,8 @@ enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_TOKEN_CREATE,
+	__MAX_BPF_CMD,
 };
 
 enum bpf_map_type {
@@ -1694,6 +1727,12 @@ union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* struct used by BPF_TOKEN_CREATE command */
+		__u32		flags;
+		__u32		bpffs_path_fd;
+		__u64		bpffs_pathname;
+	} token_create;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index f526b7573e97..4ce95acfcaa7 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -6,7 +6,7 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
 endif
 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
 
-obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
+obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
 obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 24b3faf901f4..de1fdf396521 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -99,9 +99,9 @@ static const struct inode_operations bpf_prog_iops = { };
 static const struct inode_operations bpf_map_iops  = { };
 static const struct inode_operations bpf_link_iops  = { };
 
-static struct inode *bpf_get_inode(struct super_block *sb,
-				   const struct inode *dir,
-				   umode_t mode)
+struct inode *bpf_get_inode(struct super_block *sb,
+			    const struct inode *dir,
+			    umode_t mode)
 {
 	struct inode *inode;
 
@@ -603,11 +603,13 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 {
 	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
 	umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
+	u64 mask;
 
 	if (mode != S_IRWXUGO)
 		seq_printf(m, ",mode=%o", mode);
 
-	if (opts->delegate_cmds == ~0ULL)
+	mask = (1ULL << __MAX_BPF_CMD) - 1;
+	if ((opts->delegate_cmds & mask) == mask)
 		seq_printf(m, ",delegate_cmds=any");
 	else if (opts->delegate_cmds)
 		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7445dad01fb3..b47791a80930 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5304,6 +5304,20 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
 	return ret;
 }
 
+#define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_pathname
+
+static int token_create(union bpf_attr *attr)
+{
+	if (CHECK_ATTR(BPF_TOKEN_CREATE))
+		return -EINVAL;
+
+	/* no flags are supported yet */
+	if (attr->token_create.flags)
+		return -EINVAL;
+
+	return bpf_token_create(attr);
+}
+
 static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 {
 	union bpf_attr attr;
@@ -5437,6 +5451,9 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 	case BPF_PROG_BIND_MAP:
 		err = bpf_prog_bind_map(&attr);
 		break;
+	case BPF_TOKEN_CREATE:
+		err = token_create(&attr);
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
new file mode 100644
index 000000000000..779aad5007a3
--- /dev/null
+++ b/kernel/bpf/token.c
@@ -0,0 +1,197 @@
+#include <linux/bpf.h>
+#include <linux/vmalloc.h>
+#include <linux/anon_inodes.h>
+#include <linux/fdtable.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/idr.h>
+#include <linux/namei.h>
+#include <linux/user_namespace.h>
+
+bool bpf_token_capable(const struct bpf_token *token, int cap)
+{
+	/* BPF token allows ns_capable() level of capabilities */
+	if (token) {
+		if (ns_capable(token->userns, cap))
+			return true;
+		if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
+			return true;
+	}
+	/* otherwise fallback to capable() checks */
+	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
+}
+
+void bpf_token_inc(struct bpf_token *token)
+{
+	atomic64_inc(&token->refcnt);
+}
+
+static void bpf_token_free(struct bpf_token *token)
+{
+	put_user_ns(token->userns);
+	kvfree(token);
+}
+
+static void bpf_token_put_deferred(struct work_struct *work)
+{
+	struct bpf_token *token = container_of(work, struct bpf_token, work);
+
+	bpf_token_free(token);
+}
+
+void bpf_token_put(struct bpf_token *token)
+{
+	if (!token)
+		return;
+
+	if (!atomic64_dec_and_test(&token->refcnt))
+		return;
+
+	INIT_WORK(&token->work, bpf_token_put_deferred);
+	schedule_work(&token->work);
+}
+
+static int bpf_token_release(struct inode *inode, struct file *filp)
+{
+	struct bpf_token *token = filp->private_data;
+
+	bpf_token_put(token);
+	return 0;
+}
+
+static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
+{
+	struct bpf_token *token = filp->private_data;
+	u64 mask;
+
+	mask = (1ULL << __MAX_BPF_CMD) - 1;
+	if ((token->allowed_cmds & mask) == mask)
+		seq_printf(m, "allowed_cmds:\tany\n");
+	else
+		seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
+}
+
+static struct bpf_token *bpf_token_alloc(void)
+{
+	struct bpf_token *token;
+
+	token = kvzalloc(sizeof(*token), GFP_USER);
+	if (!token)
+		return NULL;
+
+	atomic64_set(&token->refcnt, 1);
+
+	return token;
+}
+
+#define BPF_TOKEN_INODE_NAME "bpf-token"
+
+static const struct inode_operations bpf_token_iops = { };
+
+static const struct file_operations bpf_token_fops = {
+	.release	= bpf_token_release,
+	.show_fdinfo	= bpf_token_show_fdinfo,
+};
+
+int bpf_token_create(union bpf_attr *attr)
+{
+	struct bpf_mount_opts *mnt_opts;
+	struct bpf_token *token = NULL;
+	struct inode *inode;
+	struct file *file;
+	struct path path;
+	umode_t mode;
+	int err, fd;
+
+	err = user_path_at(attr->token_create.bpffs_path_fd,
+			   u64_to_user_ptr(attr->token_create.bpffs_pathname),
+			   LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
+	if (err)
+		return err;
+
+	if (path.mnt->mnt_root != path.dentry) {
+		err = -EINVAL;
+		goto out_path;
+	}
+	err = path_permission(&path, MAY_ACCESS);
+	if (err)
+		goto out_path;
+
+	mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
+	inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
+	if (IS_ERR(inode)) {
+		err = PTR_ERR(inode);
+		goto out_path;
+	}
+
+	inode->i_op = &bpf_token_iops;
+	inode->i_fop = &bpf_token_fops;
+	clear_nlink(inode); /* make sure it is unlinked */
+
+	file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
+	if (IS_ERR(file)) {
+		iput(inode);
+		err = PTR_ERR(file);
+		goto out_file;
+	}
+
+	token = bpf_token_alloc();
+	if (!token) {
+		err = -ENOMEM;
+		goto out_file;
+	}
+
+	/* remember bpffs owning userns for future ns_capable() checks */
+	token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
+
+	mnt_opts = path.dentry->d_sb->s_fs_info;
+	token->allowed_cmds = mnt_opts->delegate_cmds;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		err = fd;
+		goto out_token;
+	}
+
+	file->private_data = token;
+	fd_install(fd, file);
+
+	path_put(&path);
+	return fd;
+
+out_token:
+	bpf_token_free(token);
+out_file:
+	fput(file);
+out_path:
+	path_put(&path);
+	return err;
+}
+
+struct bpf_token *bpf_token_get_from_fd(u32 ufd)
+{
+	struct fd f = fdget(ufd);
+	struct bpf_token *token;
+
+	if (!f.file)
+		return ERR_PTR(-EBADF);
+	if (f.file->f_op != &bpf_token_fops) {
+		fdput(f);
+		return ERR_PTR(-EINVAL);
+	}
+
+	token = f.file->private_data;
+	bpf_token_inc(token);
+	fdput(f);
+
+	return token;
+}
+
+bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
+{
+	if (!token)
+		return false;
+
+	return token->allowed_cmds & (1ULL << cmd);
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 70bfa997e896..78692911f4a0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -847,6 +847,37 @@ union bpf_iter_link_info {
  *		Returns zero on success. On error, -1 is returned and *errno*
  *		is set appropriately.
  *
+ * BPF_TOKEN_CREATE
+ *	Description
+ *		Create BPF token with embedded information about what
+ *		BPF-related functionality it allows:
+ *		- a set of allowed bpf() syscall commands;
+ *		- a set of allowed BPF map types to be created with
+ *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
+ *		- a set of allowed BPF program types and BPF program attach
+ *		types to be loaded with BPF_PROG_LOAD command, if
+ *		BPF_PROG_LOAD itself is allowed.
+ *
+ *		BPF token is created (derived) from an instance of BPF FS,
+ *		assuming it has necessary delegation mount options specified.
+ *		BPF FS mount is specified with openat()-style path FD + string.
+ *		This BPF token can be passed as an extra parameter to various
+ *		bpf() syscall commands to grant BPF subsystem functionality to
+ *		unprivileged processes.
+ *
+ *		When created, BPF token is "associated" with the owning
+ *		user namespace of BPF FS instance (super block) that it was
+ *		derived from, and subsequent BPF operations performed with
+ *		BPF token would be performing capabilities checks (i.e.,
+ *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
+ *		that user namespace. Without BPF token, such capabilities
+ *		have to be granted in init user namespace, making bpf()
+ *		syscall incompatible with user namespace, for the most part.
+ *
+ *	Return
+ *		A new file descriptor (a nonnegative integer), or -1 if an
+ *		error occurred (in which case, *errno* is set appropriately).
+ *
  * NOTES
  *	eBPF objects (maps and programs) can be shared between processes.
  *
@@ -901,6 +932,8 @@ enum bpf_cmd {
 	BPF_ITER_CREATE,
 	BPF_LINK_DETACH,
 	BPF_PROG_BIND_MAP,
+	BPF_TOKEN_CREATE,
+	__MAX_BPF_CMD,
 };
 
 enum bpf_map_type {
@@ -1694,6 +1727,12 @@ union bpf_attr {
 		__u32		flags;		/* extra flags */
 	} prog_bind_map;
 
+	struct { /* struct used by BPF_TOKEN_CREATE command */
+		__u32		flags;
+		__u32		bpffs_path_fd;
+		__u64		bpffs_pathname;
+	} token_create;
+
 } __attribute__((aligned(8)));
 
 /* The description below is an attempt at providing documentation to eBPF
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 04/13] bpf: add BPF token support to BPF_MAP_CREATE command
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (2 preceding siblings ...)
  2023-09-27 22:57 ` [PATCH v6 bpf-next 03/13] bpf: introduce BPF token object Andrii Nakryiko
@ 2023-09-27 22:58 ` Andrii Nakryiko
  2023-10-10  8:35   ` Jiri Olsa
  2023-09-27 22:58 ` [PATCH v6 bpf-next 05/13] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:58 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Allow providing token_fd for BPF_MAP_CREATE command to allow controlled
BPF map creation from unprivileged process through delegated BPF token.

Wire through a set of allowed BPF map types to BPF token, derived from
BPF FS at BPF token creation time. This, in combination with allowed_cmds
allows to create a narrowly-focused BPF token (controlled by privileged
agent) with a restrictive set of BPF maps that application can attempt
to create.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h                           |  2 +
 include/uapi/linux/bpf.h                      |  2 +
 kernel/bpf/inode.c                            |  3 +-
 kernel/bpf/syscall.c                          | 51 ++++++++++++++-----
 kernel/bpf/token.c                            | 15 ++++++
 tools/include/uapi/linux/bpf.h                |  2 +
 .../selftests/bpf/prog_tests/libbpf_probes.c  |  2 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |  3 ++
 8 files changed, 65 insertions(+), 15 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c43131a24579..e6d040f3ab0e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1581,6 +1581,7 @@ struct bpf_token {
 	atomic64_t refcnt;
 	struct user_namespace *userns;
 	u64 allowed_cmds;
+	u64 allowed_maps;
 };
 
 struct bpf_struct_ops_value;
@@ -2215,6 +2216,7 @@ int bpf_token_create(union bpf_attr *attr);
 struct bpf_token *bpf_token_get_from_fd(u32 ufd);
 
 bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
+bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type);
 
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 78692911f4a0..dcc429eb3c89 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -984,6 +984,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_BLOOM_FILTER,
 	BPF_MAP_TYPE_USER_RINGBUF,
 	BPF_MAP_TYPE_CGRP_STORAGE,
+	__MAX_BPF_MAP_TYPE
 };
 
 /* Note that tracing related programs such as
@@ -1423,6 +1424,7 @@ union bpf_attr {
 		 * to using 5 hash functions).
 		 */
 		__u64	map_extra;
+		__u32	map_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index de1fdf396521..3ef2982367cc 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -614,7 +614,8 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	else if (opts->delegate_cmds)
 		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
 
-	if (opts->delegate_maps == ~0ULL)
+	mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
+	if ((opts->delegate_maps & mask) == mask)
 		seq_printf(m, ",delegate_maps=any");
 	else if (opts->delegate_maps)
 		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b47791a80930..7a1a8495d490 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -985,8 +985,8 @@ int map_check_no_btf(const struct bpf_map *map,
 	return -ENOTSUPP;
 }
 
-static int map_check_btf(struct bpf_map *map, const struct btf *btf,
-			 u32 btf_key_id, u32 btf_value_id)
+static int map_check_btf(struct bpf_map *map, struct bpf_token *token,
+			 const struct btf *btf, u32 btf_key_id, u32 btf_value_id)
 {
 	const struct btf_type *key_type, *value_type;
 	u32 key_size, value_size;
@@ -1014,7 +1014,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	if (!IS_ERR_OR_NULL(map->record)) {
 		int i;
 
-		if (!bpf_capable()) {
+		if (!bpf_token_capable(token, CAP_BPF)) {
 			ret = -EPERM;
 			goto free_map_tab;
 		}
@@ -1102,11 +1102,12 @@ static bool bpf_net_capable(void)
 	return capable(CAP_NET_ADMIN) || capable(CAP_SYS_ADMIN);
 }
 
-#define BPF_MAP_CREATE_LAST_FIELD map_extra
+#define BPF_MAP_CREATE_LAST_FIELD map_token_fd
 /* called via syscall */
 static int map_create(union bpf_attr *attr)
 {
 	const struct bpf_map_ops *ops;
+	struct bpf_token *token = NULL;
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 map_type = attr->map_type;
 	struct bpf_map *map;
@@ -1157,14 +1158,32 @@ static int map_create(union bpf_attr *attr)
 	if (!ops->map_mem_usage)
 		return -EINVAL;
 
+	if (attr->map_token_fd) {
+		token = bpf_token_get_from_fd(attr->map_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+
+		/* if current token doesn't grant map creation permissions,
+		 * then we can't use this token, so ignore it and rely on
+		 * system-wide capabilities checks
+		 */
+		if (!bpf_token_allow_cmd(token, BPF_MAP_CREATE) ||
+		    !bpf_token_allow_map_type(token, attr->map_type)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	err = -EPERM;
+
 	/* Intent here is for unprivileged_bpf_disabled to block BPF map
 	 * creation for unprivileged users; other actions depend
 	 * on fd availability and access to bpffs, so are dependent on
 	 * object creation success. Even with unprivileged BPF disabled,
 	 * capability checks are still carried out.
 	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
-		return -EPERM;
+	if (sysctl_unprivileged_bpf_disabled && !bpf_token_capable(token, CAP_BPF))
+		goto put_token;
 
 	/* check privileged map type permissions */
 	switch (map_type) {
@@ -1197,25 +1216,27 @@ static int map_create(union bpf_attr *attr)
 	case BPF_MAP_TYPE_LRU_PERCPU_HASH:
 	case BPF_MAP_TYPE_STRUCT_OPS:
 	case BPF_MAP_TYPE_CPUMAP:
-		if (!bpf_capable())
-			return -EPERM;
+		if (!bpf_token_capable(token, CAP_BPF))
+			goto put_token;
 		break;
 	case BPF_MAP_TYPE_SOCKMAP:
 	case BPF_MAP_TYPE_SOCKHASH:
 	case BPF_MAP_TYPE_DEVMAP:
 	case BPF_MAP_TYPE_DEVMAP_HASH:
 	case BPF_MAP_TYPE_XSKMAP:
-		if (!bpf_net_capable())
-			return -EPERM;
+		if (!bpf_token_capable(token, CAP_NET_ADMIN))
+			goto put_token;
 		break;
 	default:
 		WARN(1, "unsupported map type %d", map_type);
-		return -EPERM;
+		goto put_token;
 	}
 
 	map = ops->map_alloc(attr);
-	if (IS_ERR(map))
-		return PTR_ERR(map);
+	if (IS_ERR(map)) {
+		err = PTR_ERR(map);
+		goto put_token;
+	}
 	map->ops = ops;
 	map->map_type = map_type;
 
@@ -1252,7 +1273,7 @@ static int map_create(union bpf_attr *attr)
 		map->btf = btf;
 
 		if (attr->btf_value_type_id) {
-			err = map_check_btf(map, btf, attr->btf_key_type_id,
+			err = map_check_btf(map, token, btf, attr->btf_key_type_id,
 					    attr->btf_value_type_id);
 			if (err)
 				goto free_map;
@@ -1293,6 +1314,8 @@ static int map_create(union bpf_attr *attr)
 free_map:
 	btf_put(map->btf);
 	map->ops->map_free(map);
+put_token:
+	bpf_token_put(token);
 	return err;
 }
 
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index 779aad5007a3..a62f21077c63 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -70,6 +70,12 @@ static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
 		seq_printf(m, "allowed_cmds:\tany\n");
 	else
 		seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds);
+
+	mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1;
+	if ((token->allowed_maps & mask) == mask)
+		seq_printf(m, "allowed_maps:\tany\n");
+	else
+		seq_printf(m, "allowed_maps:\t0x%llx\n", token->allowed_maps);
 }
 
 static struct bpf_token *bpf_token_alloc(void)
@@ -147,6 +153,7 @@ int bpf_token_create(union bpf_attr *attr)
 
 	mnt_opts = path.dentry->d_sb->s_fs_info;
 	token->allowed_cmds = mnt_opts->delegate_cmds;
+	token->allowed_maps = mnt_opts->delegate_maps;
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0) {
@@ -195,3 +202,11 @@ bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
 
 	return token->allowed_cmds & (1ULL << cmd);
 }
+
+bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type)
+{
+	if (!token || type >= __MAX_BPF_MAP_TYPE)
+		return false;
+
+	return token->allowed_maps & (1ULL << type);
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 78692911f4a0..dcc429eb3c89 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -984,6 +984,7 @@ enum bpf_map_type {
 	BPF_MAP_TYPE_BLOOM_FILTER,
 	BPF_MAP_TYPE_USER_RINGBUF,
 	BPF_MAP_TYPE_CGRP_STORAGE,
+	__MAX_BPF_MAP_TYPE
 };
 
 /* Note that tracing related programs such as
@@ -1423,6 +1424,7 @@ union bpf_attr {
 		 * to using 5 hash functions).
 		 */
 		__u64	map_extra;
+		__u32	map_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
index 9f766ddd946a..573249a2814d 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
@@ -68,6 +68,8 @@ void test_libbpf_probe_map_types(void)
 
 		if (map_type == BPF_MAP_TYPE_UNSPEC)
 			continue;
+		if (strcmp(map_type_name, "__MAX_BPF_MAP_TYPE") == 0)
+			continue;
 
 		if (!test__start_subtest(map_type_name))
 			continue;
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
index c440ea3311ed..2a0633f43c73 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
@@ -132,6 +132,9 @@ static void test_libbpf_bpf_map_type_str(void)
 		const char *map_type_str;
 		char buf[256];
 
+		if (map_type == __MAX_BPF_MAP_TYPE)
+			continue;
+
 		map_type_name = btf__str_by_offset(btf, e->name_off);
 		map_type_str = libbpf_bpf_map_type_str(map_type);
 		ASSERT_OK_PTR(map_type_str, map_type_name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 05/13] bpf: add BPF token support to BPF_BTF_LOAD command
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (3 preceding siblings ...)
  2023-09-27 22:58 ` [PATCH v6 bpf-next 04/13] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
@ 2023-09-27 22:58 ` Andrii Nakryiko
  2023-09-27 22:58 ` [PATCH v6 bpf-next 06/13] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:58 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Accept BPF token FD in BPF_BTF_LOAD command to allow BTF data loading
through delegated BPF token. BTF loading is a pretty straightforward
operation, so as long as BPF token is created with allow_cmds granting
BPF_BTF_LOAD command, kernel proceeds to parsing BTF data and creating
BTF object.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/uapi/linux/bpf.h       |  1 +
 kernel/bpf/syscall.c           | 20 ++++++++++++++++++--
 tools/include/uapi/linux/bpf.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index dcc429eb3c89..6c4651176046 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1606,6 +1606,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		btf_log_true_size;
+		__u32		btf_token_fd;
 	};
 
 	struct {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7a1a8495d490..5c5c2b6648b2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4712,15 +4712,31 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
 	return err;
 }
 
-#define BPF_BTF_LOAD_LAST_FIELD btf_log_true_size
+#define BPF_BTF_LOAD_LAST_FIELD btf_token_fd
 
 static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size)
 {
+	struct bpf_token *token = NULL;
+
 	if (CHECK_ATTR(BPF_BTF_LOAD))
 		return -EINVAL;
 
-	if (!bpf_capable())
+	if (attr->btf_token_fd) {
+		token = bpf_token_get_from_fd(attr->btf_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+		if (!bpf_token_allow_cmd(token, BPF_BTF_LOAD)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	if (!bpf_token_capable(token, CAP_BPF)) {
+		bpf_token_put(token);
 		return -EPERM;
+	}
+
+	bpf_token_put(token);
 
 	return btf_new_fd(attr, uattr, uattr_size);
 }
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index dcc429eb3c89..6c4651176046 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1606,6 +1606,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		btf_log_true_size;
+		__u32		btf_token_fd;
 	};
 
 	struct {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 06/13] bpf: add BPF token support to BPF_PROG_LOAD command
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (4 preceding siblings ...)
  2023-09-27 22:58 ` [PATCH v6 bpf-next 05/13] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
@ 2023-09-27 22:58 ` Andrii Nakryiko
  2023-10-11  1:17   ` [PATCH v6 6/13] " Paul Moore
  2023-09-27 22:58 ` [PATCH v6 bpf-next 07/13] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:58 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of
allowed BPF program types and attach types, derived from BPF FS at BPF
token creation time. Then make sure we perform bpf_token_capable()
checks everywhere where it's relevant.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h                           |  6 ++
 include/uapi/linux/bpf.h                      |  2 +
 kernel/bpf/core.c                             |  1 +
 kernel/bpf/inode.c                            |  6 +-
 kernel/bpf/syscall.c                          | 87 ++++++++++++++-----
 kernel/bpf/token.c                            | 25 ++++++
 tools/include/uapi/linux/bpf.h                |  2 +
 .../selftests/bpf/prog_tests/libbpf_probes.c  |  2 +
 .../selftests/bpf/prog_tests/libbpf_str.c     |  3 +
 9 files changed, 108 insertions(+), 26 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e6d040f3ab0e..a1c1604c64e7 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1442,6 +1442,7 @@ struct bpf_prog_aux {
 #ifdef CONFIG_SECURITY
 	void *security;
 #endif
+	struct bpf_token *token;
 	struct bpf_prog_offload *offload;
 	struct btf *btf;
 	struct bpf_func_info *func_info;
@@ -1582,6 +1583,8 @@ struct bpf_token {
 	struct user_namespace *userns;
 	u64 allowed_cmds;
 	u64 allowed_maps;
+	u64 allowed_progs;
+	u64 allowed_attachs;
 };
 
 struct bpf_struct_ops_value;
@@ -2217,6 +2220,9 @@ struct bpf_token *bpf_token_get_from_fd(u32 ufd);
 
 bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd);
 bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type);
+bool bpf_token_allow_prog_type(const struct bpf_token *token,
+			       enum bpf_prog_type prog_type,
+			       enum bpf_attach_type attach_type);
 
 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname);
 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 6c4651176046..86fb59e6f47b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1029,6 +1029,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
 	BPF_PROG_TYPE_NETFILTER,
+	__MAX_BPF_PROG_TYPE
 };
 
 enum bpf_attach_type {
@@ -1494,6 +1495,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		log_true_size;
+		__u32		prog_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 08626b519ce2..fc8de25b7948 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2747,6 +2747,7 @@ void bpf_prog_free(struct bpf_prog *fp)
 
 	if (aux->dst_prog)
 		bpf_prog_put(aux->dst_prog);
+	bpf_token_put(aux->token);
 	INIT_WORK(&aux->work, bpf_prog_free_deferred);
 	schedule_work(&aux->work);
 }
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 3ef2982367cc..612904d44b15 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -620,12 +620,14 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
 	else if (opts->delegate_maps)
 		seq_printf(m, ",delegate_maps=0x%llx", opts->delegate_maps);
 
-	if (opts->delegate_progs == ~0ULL)
+	mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
+	if ((opts->delegate_progs & mask) == mask)
 		seq_printf(m, ",delegate_progs=any");
 	else if (opts->delegate_progs)
 		seq_printf(m, ",delegate_progs=0x%llx", opts->delegate_progs);
 
-	if (opts->delegate_attachs == ~0ULL)
+	mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
+	if ((opts->delegate_attachs & mask) == mask)
 		seq_printf(m, ",delegate_attachs=any");
 	else if (opts->delegate_attachs)
 		seq_printf(m, ",delegate_attachs=0x%llx", opts->delegate_attachs);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5c5c2b6648b2..d0b219f09bcc 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2578,13 +2578,15 @@ static bool is_perfmon_prog_type(enum bpf_prog_type prog_type)
 }
 
 /* last field in 'union bpf_attr' used by this command */
-#define	BPF_PROG_LOAD_LAST_FIELD log_true_size
+#define BPF_PROG_LOAD_LAST_FIELD prog_token_fd
 
 static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 {
 	enum bpf_prog_type type = attr->prog_type;
 	struct bpf_prog *prog, *dst_prog = NULL;
 	struct btf *attach_btf = NULL;
+	struct bpf_token *token = NULL;
+	bool bpf_cap;
 	int err;
 	char license[128];
 
@@ -2600,10 +2602,31 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 				 BPF_F_XDP_DEV_BOUND_ONLY))
 		return -EINVAL;
 
+	bpf_prog_load_fixup_attach_type(attr);
+
+	if (attr->prog_token_fd) {
+		token = bpf_token_get_from_fd(attr->prog_token_fd);
+		if (IS_ERR(token))
+			return PTR_ERR(token);
+		/* if current token doesn't grant prog loading permissions,
+		 * then we can't use this token, so ignore it and rely on
+		 * system-wide capabilities checks
+		 */
+		if (!bpf_token_allow_cmd(token, BPF_PROG_LOAD) ||
+		    !bpf_token_allow_prog_type(token, attr->prog_type,
+					       attr->expected_attach_type)) {
+			bpf_token_put(token);
+			token = NULL;
+		}
+	}
+
+	bpf_cap = bpf_token_capable(token, CAP_BPF);
+	err = -EPERM;
+
 	if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) &&
 	    (attr->prog_flags & BPF_F_ANY_ALIGNMENT) &&
-	    !bpf_capable())
-		return -EPERM;
+	    !bpf_cap)
+		goto put_token;
 
 	/* Intent here is for unprivileged_bpf_disabled to block BPF program
 	 * creation for unprivileged users; other actions depend
@@ -2612,21 +2635,23 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	 * capability checks are still carried out for these
 	 * and other operations.
 	 */
-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
-		return -EPERM;
+	if (sysctl_unprivileged_bpf_disabled && !bpf_cap)
+		goto put_token;
 
 	if (attr->insn_cnt == 0 ||
-	    attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS))
-		return -E2BIG;
+	    attr->insn_cnt > (bpf_cap ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS)) {
+		err = -E2BIG;
+		goto put_token;
+	}
 	if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
 	    type != BPF_PROG_TYPE_CGROUP_SKB &&
-	    !bpf_capable())
-		return -EPERM;
+	    !bpf_cap)
+		goto put_token;
 
-	if (is_net_admin_prog_type(type) && !bpf_net_capable())
-		return -EPERM;
-	if (is_perfmon_prog_type(type) && !perfmon_capable())
-		return -EPERM;
+	if (is_net_admin_prog_type(type) && !bpf_token_capable(token, CAP_NET_ADMIN))
+		goto put_token;
+	if (is_perfmon_prog_type(type) && !bpf_token_capable(token, CAP_PERFMON))
+		goto put_token;
 
 	/* attach_prog_fd/attach_btf_obj_fd can specify fd of either bpf_prog
 	 * or btf, we need to check which one it is
@@ -2636,27 +2661,33 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 		if (IS_ERR(dst_prog)) {
 			dst_prog = NULL;
 			attach_btf = btf_get_by_fd(attr->attach_btf_obj_fd);
-			if (IS_ERR(attach_btf))
-				return -EINVAL;
+			if (IS_ERR(attach_btf)) {
+				err = -EINVAL;
+				goto put_token;
+			}
 			if (!btf_is_kernel(attach_btf)) {
 				/* attaching through specifying bpf_prog's BTF
 				 * objects directly might be supported eventually
 				 */
 				btf_put(attach_btf);
-				return -ENOTSUPP;
+				err = -ENOTSUPP;
+				goto put_token;
 			}
 		}
 	} else if (attr->attach_btf_id) {
 		/* fall back to vmlinux BTF, if BTF type ID is specified */
 		attach_btf = bpf_get_btf_vmlinux();
-		if (IS_ERR(attach_btf))
-			return PTR_ERR(attach_btf);
-		if (!attach_btf)
-			return -EINVAL;
+		if (IS_ERR(attach_btf)) {
+			err = PTR_ERR(attach_btf);
+			goto put_token;
+		}
+		if (!attach_btf) {
+			err = -EINVAL;
+			goto put_token;
+		}
 		btf_get(attach_btf);
 	}
 
-	bpf_prog_load_fixup_attach_type(attr);
 	if (bpf_prog_load_check_attach(type, attr->expected_attach_type,
 				       attach_btf, attr->attach_btf_id,
 				       dst_prog)) {
@@ -2664,7 +2695,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 			bpf_prog_put(dst_prog);
 		if (attach_btf)
 			btf_put(attach_btf);
-		return -EINVAL;
+		err = -EINVAL;
+		goto put_token;
 	}
 
 	/* plain bpf_prog allocation */
@@ -2674,7 +2706,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 			bpf_prog_put(dst_prog);
 		if (attach_btf)
 			btf_put(attach_btf);
-		return -ENOMEM;
+		err = -EINVAL;
+		goto put_token;
 	}
 
 	prog->expected_attach_type = attr->expected_attach_type;
@@ -2685,6 +2718,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
 	prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
 
+	/* move token into prog->aux, reuse taken refcnt */
+	prog->aux->token = token;
+	token = NULL;
+
 	err = security_bpf_prog_alloc(prog->aux);
 	if (err)
 		goto free_prog;
@@ -2786,6 +2823,8 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
 	if (prog->aux->attach_btf)
 		btf_put(prog->aux->attach_btf);
 	bpf_prog_free(prog);
+put_token:
+	bpf_token_put(token);
 	return err;
 }
 
@@ -3768,7 +3807,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	case BPF_PROG_TYPE_SK_LOOKUP:
 		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
 	case BPF_PROG_TYPE_CGROUP_SKB:
-		if (!bpf_net_capable())
+		if (!bpf_token_capable(prog->aux->token, CAP_NET_ADMIN))
 			/* cg-skb progs can be loaded by unpriv user.
 			 * check permissions at attach time.
 			 */
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index a62f21077c63..38ff82fc97c8 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -76,6 +76,18 @@ static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp)
 		seq_printf(m, "allowed_maps:\tany\n");
 	else
 		seq_printf(m, "allowed_maps:\t0x%llx\n", token->allowed_maps);
+
+	mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1;
+	if ((token->allowed_progs & mask) == mask)
+		seq_printf(m, "allowed_progs:\tany\n");
+	else
+		seq_printf(m, "allowed_progs:\t0x%llx\n", token->allowed_progs);
+
+	mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1;
+	if ((token->allowed_attachs & mask) == mask)
+		seq_printf(m, "allowed_attachs:\tany\n");
+	else
+		seq_printf(m, "allowed_attachs:\t0x%llx\n", token->allowed_attachs);
 }
 
 static struct bpf_token *bpf_token_alloc(void)
@@ -154,6 +166,8 @@ int bpf_token_create(union bpf_attr *attr)
 	mnt_opts = path.dentry->d_sb->s_fs_info;
 	token->allowed_cmds = mnt_opts->delegate_cmds;
 	token->allowed_maps = mnt_opts->delegate_maps;
+	token->allowed_progs = mnt_opts->delegate_progs;
+	token->allowed_attachs = mnt_opts->delegate_attachs;
 
 	fd = get_unused_fd_flags(O_CLOEXEC);
 	if (fd < 0) {
@@ -210,3 +224,14 @@ bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type t
 
 	return token->allowed_maps & (1ULL << type);
 }
+
+bool bpf_token_allow_prog_type(const struct bpf_token *token,
+			       enum bpf_prog_type prog_type,
+			       enum bpf_attach_type attach_type)
+{
+	if (!token || prog_type >= __MAX_BPF_PROG_TYPE || attach_type >= __MAX_BPF_ATTACH_TYPE)
+		return false;
+
+	return (token->allowed_progs & (1ULL << prog_type)) &&
+	       (token->allowed_attachs & (1ULL << attach_type));
+}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 6c4651176046..86fb59e6f47b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1029,6 +1029,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_SK_LOOKUP,
 	BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
 	BPF_PROG_TYPE_NETFILTER,
+	__MAX_BPF_PROG_TYPE
 };
 
 enum bpf_attach_type {
@@ -1494,6 +1495,7 @@ union bpf_attr {
 		 * truncated), or smaller (if log buffer wasn't filled completely).
 		 */
 		__u32		log_true_size;
+		__u32		prog_token_fd;
 	};
 
 	struct { /* anonymous struct used by BPF_OBJ_* commands */
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
index 573249a2814d..4ed46ed58a7b 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_probes.c
@@ -30,6 +30,8 @@ void test_libbpf_probe_prog_types(void)
 
 		if (prog_type == BPF_PROG_TYPE_UNSPEC)
 			continue;
+		if (strcmp(prog_type_name, "__MAX_BPF_PROG_TYPE") == 0)
+			continue;
 
 		if (!test__start_subtest(prog_type_name))
 			continue;
diff --git a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
index 2a0633f43c73..384bc1f7a65e 100644
--- a/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
+++ b/tools/testing/selftests/bpf/prog_tests/libbpf_str.c
@@ -189,6 +189,9 @@ static void test_libbpf_bpf_prog_type_str(void)
 		const char *prog_type_str;
 		char buf[256];
 
+		if (prog_type == __MAX_BPF_PROG_TYPE)
+			continue;
+
 		prog_type_name = btf__str_by_offset(btf, e->name_off);
 		prog_type_str = libbpf_bpf_prog_type_str(prog_type);
 		ASSERT_OK_PTR(prog_type_str, prog_type_name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 07/13] bpf: take into account BPF token when fetching helper protos
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (5 preceding siblings ...)
  2023-09-27 22:58 ` [PATCH v6 bpf-next 06/13] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
@ 2023-09-27 22:58 ` Andrii Nakryiko
  2023-09-27 22:58 ` [PATCH v6 bpf-next 08/13] bpf: consistenly use BPF token throughout BPF verifier logic Andrii Nakryiko
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:58 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Instead of performing unconditional system-wide bpf_capable() and
perfmon_capable() calls inside bpf_base_func_proto() function (and other
similar ones) to determine eligibility of a given BPF helper for a given
program, use previously recorded BPF token during BPF_PROG_LOAD command
handling to inform the decision.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 drivers/media/rc/bpf-lirc.c |  2 +-
 include/linux/bpf.h         |  5 +++--
 kernel/bpf/cgroup.c         |  6 +++---
 kernel/bpf/helpers.c        |  6 +++---
 kernel/bpf/syscall.c        |  5 +++--
 kernel/trace/bpf_trace.c    |  2 +-
 net/core/filter.c           | 32 ++++++++++++++++----------------
 net/ipv4/bpf_tcp_ca.c       |  2 +-
 net/netfilter/nf_bpf_link.c |  2 +-
 9 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index fe17c7f98e81..6d07693c6b9f 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -110,7 +110,7 @@ lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_get_prandom_u32:
 		return &bpf_get_prandom_u32_proto;
 	case BPF_FUNC_trace_printk:
-		if (perfmon_capable())
+		if (bpf_token_capable(prog->aux->token, CAP_PERFMON))
 			return bpf_get_trace_printk_proto();
 		fallthrough;
 	default:
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a1c1604c64e7..1d5758e4db21 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2471,7 +2471,8 @@ const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type
 struct bpf_prog *bpf_prog_by_id(u32 id);
 struct bpf_link *bpf_link_by_id(u32 id);
 
-const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id);
+const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id,
+						 const struct bpf_prog *prog);
 void bpf_task_storage_free(struct task_struct *task);
 void bpf_cgrp_storage_free(struct cgroup *cgroup);
 bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
@@ -2728,7 +2729,7 @@ static inline int btf_struct_access(struct bpf_verifier_log *log,
 }
 
 static inline const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	return NULL;
 }
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 03b3d4492980..41fbcd67a111 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1616,7 +1616,7 @@ cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -2174,7 +2174,7 @@ sysctl_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -2331,7 +2331,7 @@ cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_event_output_data_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index dd1c69ee3375..d3e69141b9ec 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -1666,7 +1666,7 @@ const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
 const struct bpf_func_proto bpf_task_pt_regs_proto __weak;
 
 const struct bpf_func_proto *
-bpf_base_func_proto(enum bpf_func_id func_id)
+bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_map_lookup_elem:
@@ -1717,7 +1717,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!bpf_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 		return NULL;
 
 	switch (func_id) {
@@ -1775,7 +1775,7 @@ bpf_base_func_proto(enum bpf_func_id func_id)
 		break;
 	}
 
-	if (!perfmon_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
 		return NULL;
 
 	switch (func_id) {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d0b219f09bcc..9900d6b6cf2b 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -5638,7 +5638,7 @@ static const struct bpf_func_proto bpf_sys_bpf_proto = {
 const struct bpf_func_proto * __weak
 tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	return bpf_base_func_proto(func_id, prog);
 }
 
 BPF_CALL_1(bpf_sys_close, u32, fd)
@@ -5688,7 +5688,8 @@ syscall_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	switch (func_id) {
 	case BPF_FUNC_sys_bpf:
-		return !perfmon_capable() ? NULL : &bpf_sys_bpf_proto;
+		return !bpf_token_capable(prog->aux->token, CAP_PERFMON)
+		       ? NULL : &bpf_sys_bpf_proto;
 	case BPF_FUNC_btf_find_by_name_kind:
 		return &bpf_btf_find_by_name_kind_proto;
 	case BPF_FUNC_sys_close:
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index df697c74d519..ef22a2e870b3 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1557,7 +1557,7 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_trace_vprintk:
 		return bpf_get_trace_vprintk_proto();
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/net/core/filter.c b/net/core/filter.c
index a094694899c9..6f0aa4095543 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -83,7 +83,7 @@
 #include <net/netfilter/nf_conntrack_bpf.h>
 
 static const struct bpf_func_proto *
-bpf_sk_base_func_proto(enum bpf_func_id func_id);
+bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog);
 
 int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len)
 {
@@ -7806,7 +7806,7 @@ sock_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -7889,7 +7889,7 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 			return NULL;
 		}
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -7908,7 +7908,7 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_perf_event_output:
 		return &bpf_skb_event_output_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8095,7 +8095,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8154,7 +8154,7 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 
 #if IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES)
@@ -8215,7 +8215,7 @@ sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_tcp_sock_proto;
 #endif /* CONFIG_INET */
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8257,7 +8257,7 @@ sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_get_cgroup_classid_curr_proto;
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8301,7 +8301,7 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_skc_lookup_tcp_proto;
 #endif
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8312,7 +8312,7 @@ flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_load_bytes:
 		return &bpf_flow_dissector_load_bytes_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -8339,7 +8339,7 @@ lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_skb_under_cgroup:
 		return &bpf_skb_under_cgroup_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11170,7 +11170,7 @@ sk_reuseport_func_proto(enum bpf_func_id func_id,
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11352,7 +11352,7 @@ sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 	case BPF_FUNC_sk_release:
 		return &bpf_sk_release_proto;
 	default:
-		return bpf_sk_base_func_proto(func_id);
+		return bpf_sk_base_func_proto(func_id, prog);
 	}
 }
 
@@ -11686,7 +11686,7 @@ const struct bpf_func_proto bpf_sock_from_file_proto = {
 };
 
 static const struct bpf_func_proto *
-bpf_sk_base_func_proto(enum bpf_func_id func_id)
+bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
 	const struct bpf_func_proto *func;
 
@@ -11715,10 +11715,10 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id)
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 
-	if (!perfmon_capable())
+	if (!bpf_token_capable(prog->aux->token, CAP_PERFMON))
 		return NULL;
 
 	return func;
diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c
index 39dcccf0f174..c7bbd8f3c708 100644
--- a/net/ipv4/bpf_tcp_ca.c
+++ b/net/ipv4/bpf_tcp_ca.c
@@ -191,7 +191,7 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id,
 	case BPF_FUNC_ktime_get_coarse_ns:
 		return &bpf_ktime_get_coarse_ns_proto;
 	default:
-		return bpf_base_func_proto(func_id);
+		return bpf_base_func_proto(func_id, prog);
 	}
 }
 
diff --git a/net/netfilter/nf_bpf_link.c b/net/netfilter/nf_bpf_link.c
index e502ec00b2fe..1969facac91c 100644
--- a/net/netfilter/nf_bpf_link.c
+++ b/net/netfilter/nf_bpf_link.c
@@ -314,7 +314,7 @@ static bool nf_is_valid_access(int off, int size, enum bpf_access_type type,
 static const struct bpf_func_proto *
 bpf_nf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 {
-	return bpf_base_func_proto(func_id);
+	return bpf_base_func_proto(func_id, prog);
 }
 
 const struct bpf_verifier_ops netfilter_verifier_ops = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 08/13] bpf: consistenly use BPF token throughout BPF verifier logic
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (6 preceding siblings ...)
  2023-09-27 22:58 ` [PATCH v6 bpf-next 07/13] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
@ 2023-09-27 22:58 ` Andrii Nakryiko
  2023-09-27 22:58 ` [PATCH v6 bpf-next 09/13] libbpf: add bpf_token_create() API Andrii Nakryiko
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:58 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Remove remaining direct queries to perfmon_capable() and bpf_capable()
in BPF verifier logic and instead use BPF token (if available) to make
decisions about privileges.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 include/linux/bpf.h    | 16 ++++++++--------
 include/linux/filter.h |  2 +-
 kernel/bpf/arraymap.c  |  2 +-
 kernel/bpf/core.c      |  2 +-
 kernel/bpf/verifier.c  | 13 ++++++-------
 net/core/filter.c      |  4 ++--
 6 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1d5758e4db21..134791b6a06b 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2179,24 +2179,24 @@ extern int sysctl_unprivileged_bpf_disabled;
 
 bool bpf_token_capable(const struct bpf_token *token, int cap);
 
-static inline bool bpf_allow_ptr_leaks(void)
+static inline bool bpf_allow_ptr_leaks(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_allow_uninit_stack(void)
+static inline bool bpf_allow_uninit_stack(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_bypass_spec_v1(void)
+static inline bool bpf_bypass_spec_v1(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
-static inline bool bpf_bypass_spec_v4(void)
+static inline bool bpf_bypass_spec_v4(const struct bpf_token *token)
 {
-	return perfmon_capable();
+	return bpf_token_capable(token, CAP_PERFMON);
 }
 
 int bpf_map_new_fd(struct bpf_map *map, int flags);
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 27406aee2d40..bab6d369677a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1145,7 +1145,7 @@ static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog)
 		return false;
 	if (!bpf_jit_harden)
 		return false;
-	if (bpf_jit_harden == 1 && bpf_capable())
+	if (bpf_jit_harden == 1 && bpf_token_capable(prog->aux->token, CAP_BPF))
 		return false;
 
 	return true;
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 2058e89b5ddd..f0c64df6b6ff 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -82,7 +82,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
 	bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
 	int numa_node = bpf_map_attr_numa_node(attr);
 	u32 elem_size, index_mask, max_entries;
-	bool bypass_spec_v1 = bpf_bypass_spec_v1();
+	bool bypass_spec_v1 = bpf_bypass_spec_v1(NULL);
 	u64 array_size, mask64;
 	struct bpf_array *array;
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index fc8de25b7948..ce307440fa8d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -675,7 +675,7 @@ static bool bpf_prog_kallsyms_candidate(const struct bpf_prog *fp)
 void bpf_prog_kallsyms_add(struct bpf_prog *fp)
 {
 	if (!bpf_prog_kallsyms_candidate(fp) ||
-	    !bpf_capable())
+	    !bpf_token_capable(fp->aux->token, CAP_BPF))
 		return;
 
 	bpf_prog_ksym_set_addr(fp);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index eed7350e15f4..3c4509738134 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -20135,7 +20135,12 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	env->prog = *prog;
 	env->ops = bpf_verifier_ops[env->prog->type];
 	env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel);
-	is_priv = bpf_capable();
+
+	env->allow_ptr_leaks = bpf_allow_ptr_leaks(env->prog->aux->token);
+	env->allow_uninit_stack = bpf_allow_uninit_stack(env->prog->aux->token);
+	env->bypass_spec_v1 = bpf_bypass_spec_v1(env->prog->aux->token);
+	env->bypass_spec_v4 = bpf_bypass_spec_v4(env->prog->aux->token);
+	env->bpf_capable = is_priv = bpf_token_capable(env->prog->aux->token, CAP_BPF);
 
 	bpf_get_btf_vmlinux();
 
@@ -20167,12 +20172,6 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3
 	if (attr->prog_flags & BPF_F_ANY_ALIGNMENT)
 		env->strict_alignment = false;
 
-	env->allow_ptr_leaks = bpf_allow_ptr_leaks();
-	env->allow_uninit_stack = bpf_allow_uninit_stack();
-	env->bypass_spec_v1 = bpf_bypass_spec_v1();
-	env->bypass_spec_v4 = bpf_bypass_spec_v4();
-	env->bpf_capable = bpf_capable();
-
 	if (is_priv)
 		env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ;
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 6f0aa4095543..b4f4041541c3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8514,7 +8514,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		return false;
 	case bpf_ctx_range(struct __sk_buff, data):
 	case bpf_ctx_range(struct __sk_buff, data_end):
-		if (!bpf_capable())
+		if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 			return false;
 		break;
 	}
@@ -8526,7 +8526,7 @@ static bool cg_skb_is_valid_access(int off, int size,
 		case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
 			break;
 		case bpf_ctx_range(struct __sk_buff, tstamp):
-			if (!bpf_capable())
+			if (!bpf_token_capable(prog->aux->token, CAP_BPF))
 				return false;
 			break;
 		default:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 09/13] libbpf: add bpf_token_create() API
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (7 preceding siblings ...)
  2023-09-27 22:58 ` [PATCH v6 bpf-next 08/13] bpf: consistenly use BPF token throughout BPF verifier logic Andrii Nakryiko
@ 2023-09-27 22:58 ` Andrii Nakryiko
  2023-09-27 22:58 ` [PATCH v6 bpf-next 10/13] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:58 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add low-level wrapper API for BPF_TOKEN_CREATE command in bpf() syscall.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c      | 19 +++++++++++++++++++
 tools/lib/bpf/bpf.h      | 28 ++++++++++++++++++++++++++++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 48 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index b0f1913763a3..593ff9ea120d 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1271,3 +1271,22 @@ int bpf_prog_bind_map(int prog_fd, int map_fd,
 	ret = sys_bpf(BPF_PROG_BIND_MAP, &attr, attr_sz);
 	return libbpf_err_errno(ret);
 }
+
+int bpf_token_create(int bpffs_path_fd, const char *bpffs_pathname,
+		     struct bpf_token_create_opts *opts)
+{
+	const size_t attr_sz = offsetofend(union bpf_attr, token_create);
+	union bpf_attr attr;
+	int fd;
+
+	if (!OPTS_VALID(opts, bpf_token_create_opts))
+		return libbpf_err(-EINVAL);
+
+	memset(&attr, 0, attr_sz);
+	attr.token_create.bpffs_path_fd = bpffs_path_fd;
+	attr.token_create.bpffs_pathname = ptr_to_u64(bpffs_pathname);
+	attr.token_create.flags = OPTS_GET(opts, flags, 0);
+
+	fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz);
+	return libbpf_err_errno(fd);
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 74c2887cfd24..a5ddb0393fee 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -635,6 +635,34 @@ struct bpf_test_run_opts {
 LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
 				      struct bpf_test_run_opts *opts);
 
+struct bpf_token_create_opts {
+	size_t sz; /* size of this struct for forward/backward compatibility */
+	__u32 flags;
+	size_t :0;
+};
+#define bpf_token_create_opts__last_field flags
+
+/**
+ * @brief **bpf_token_create()** creates a new instance of BPF token derived
+ * from specified BPF FS mount point.
+ *
+ * BPF token created with this API can be passed to bpf() syscall for
+ * commands like BPF_PROG_LOAD, BPF_MAP_CREATE, etc.
+ *
+ * @param bpffs_path_fd O_PATH FD (see man 2 openat() for semantics) specifying,
+ * in combination with *bpffs_pathname*, BPF FS mount from which to derive
+ * a BPF token instance.
+ * @param pin_pathname absolute or relative path specifying,in combination
+ * with *bpffs_pathname*, BPF FS mount from which to derive a BPF token
+ * instance.
+ * @param opts optional BPF token creation options, can be NULL
+ *
+ * @return BPF token FD > 0, on success; negative error code, otherwise (errno
+ * is also set to the error code)
+ */
+LIBBPF_API int bpf_token_create(int bpffs_path_fd, const char *bpffs_pathname,
+				struct bpf_token_create_opts *opts);
+
 #ifdef __cplusplus
 } /* extern "C" */
 #endif
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index cc973b678a39..9d05ebcd3a9e 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -400,6 +400,7 @@ LIBBPF_1.3.0 {
 		bpf_program__attach_netfilter;
 		bpf_program__attach_tcx;
 		bpf_program__attach_uprobe_multi;
+		bpf_token_create;
 		ring__avail_data_size;
 		ring__consume;
 		ring__consumer_pos;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 10/13] libbpf: add BPF token support to bpf_map_create() API
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (8 preceding siblings ...)
  2023-09-27 22:58 ` [PATCH v6 bpf-next 09/13] libbpf: add bpf_token_create() API Andrii Nakryiko
@ 2023-09-27 22:58 ` Andrii Nakryiko
  2023-09-27 22:58 ` [PATCH v6 bpf-next 11/13] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:58 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add ability to provide token_fd for BPF_MAP_CREATE command through
bpf_map_create() API.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 4 +++-
 tools/lib/bpf/bpf.h | 5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 593ff9ea120d..f9ee7608a96a 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -169,7 +169,7 @@ int bpf_map_create(enum bpf_map_type map_type,
 		   __u32 max_entries,
 		   const struct bpf_map_create_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, map_extra);
+	const size_t attr_sz = offsetofend(union bpf_attr, map_token_fd);
 	union bpf_attr attr;
 	int fd;
 
@@ -198,6 +198,8 @@ int bpf_map_create(enum bpf_map_type map_type,
 	attr.numa_node = OPTS_GET(opts, numa_node, 0);
 	attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
 
+	attr.map_token_fd = OPTS_GET(opts, token_fd, 0);
+
 	fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
 	return libbpf_err_errno(fd);
 }
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index a5ddb0393fee..156bc82c4895 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -51,8 +51,11 @@ struct bpf_map_create_opts {
 
 	__u32 numa_node;
 	__u32 map_ifindex;
+
+	__u32 token_fd;
+	size_t :0;
 };
-#define bpf_map_create_opts__last_field map_ifindex
+#define bpf_map_create_opts__last_field token_fd
 
 LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
 			      const char *map_name,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 11/13] libbpf: add BPF token support to bpf_btf_load() API
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (9 preceding siblings ...)
  2023-09-27 22:58 ` [PATCH v6 bpf-next 10/13] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
@ 2023-09-27 22:58 ` Andrii Nakryiko
  2023-09-27 22:58 ` [PATCH v6 bpf-next 12/13] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
  2023-09-27 22:58 ` [PATCH v6 bpf-next 13/13] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
  12 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:58 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Allow user to specify token_fd for bpf_btf_load() API that wraps
kernel's BPF_BTF_LOAD command. This allows loading BTF from unprivileged
process as long as it has BPF token allowing BPF_BTF_LOAD command, which
can be created and delegated by privileged process.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 4 +++-
 tools/lib/bpf/bpf.h | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index f9ee7608a96a..4547ae1037af 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -1168,7 +1168,7 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd)
 
 int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, btf_log_true_size);
+	const size_t attr_sz = offsetofend(union bpf_attr, btf_token_fd);
 	union bpf_attr attr;
 	char *log_buf;
 	size_t log_size;
@@ -1193,6 +1193,8 @@ int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts
 
 	attr.btf = ptr_to_u64(btf_data);
 	attr.btf_size = btf_size;
+	attr.btf_token_fd = OPTS_GET(opts, token_fd, 0);
+
 	/* log_level == 0 and log_buf != NULL means "try loading without
 	 * log_buf, but retry with log_buf and log_level=1 on error", which is
 	 * consistent across low-level and high-level BTF and program loading
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 156bc82c4895..d7df5543f402 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -133,9 +133,10 @@ struct bpf_btf_load_opts {
 	 * If kernel doesn't support this feature, log_size is left unchanged.
 	 */
 	__u32 log_true_size;
+	__u32 token_fd;
 	size_t :0;
 };
-#define bpf_btf_load_opts__last_field log_true_size
+#define bpf_btf_load_opts__last_field token_fd
 
 LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
 			    struct bpf_btf_load_opts *opts);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 12/13] libbpf: add BPF token support to bpf_prog_load() API
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (10 preceding siblings ...)
  2023-09-27 22:58 ` [PATCH v6 bpf-next 11/13] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
@ 2023-09-27 22:58 ` Andrii Nakryiko
  2023-09-27 22:58 ` [PATCH v6 bpf-next 13/13] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko
  12 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:58 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Wire through token_fd into bpf_prog_load().

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 tools/lib/bpf/bpf.c | 3 ++-
 tools/lib/bpf/bpf.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 4547ae1037af..5a238831b4ff 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -234,7 +234,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 		  const struct bpf_insn *insns, size_t insn_cnt,
 		  struct bpf_prog_load_opts *opts)
 {
-	const size_t attr_sz = offsetofend(union bpf_attr, log_true_size);
+	const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd);
 	void *finfo = NULL, *linfo = NULL;
 	const char *func_info, *line_info;
 	__u32 log_size, log_level, attach_prog_fd, attach_btf_obj_fd;
@@ -263,6 +263,7 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
 	attr.prog_flags = OPTS_GET(opts, prog_flags, 0);
 	attr.prog_ifindex = OPTS_GET(opts, prog_ifindex, 0);
 	attr.kern_version = OPTS_GET(opts, kern_version, 0);
+	attr.prog_token_fd = OPTS_GET(opts, token_fd, 0);
 
 	if (prog_name && kernel_supports(NULL, FEAT_PROG_NAME))
 		libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name));
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index d7df5543f402..edc0cab465d6 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -105,9 +105,10 @@ struct bpf_prog_load_opts {
 	 * If kernel doesn't support this feature, log_size is left unchanged.
 	 */
 	__u32 log_true_size;
+	__u32 token_fd;
 	size_t :0;
 };
-#define bpf_prog_load_opts__last_field log_true_size
+#define bpf_prog_load_opts__last_field token_fd
 
 LIBBPF_API int bpf_prog_load(enum bpf_prog_type prog_type,
 			     const char *prog_name, const char *license,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v6 bpf-next 13/13] selftests/bpf: add BPF token-enabled tests
  2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
                   ` (11 preceding siblings ...)
  2023-09-27 22:58 ` [PATCH v6 bpf-next 12/13] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
@ 2023-09-27 22:58 ` Andrii Nakryiko
  12 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-09-27 22:58 UTC (permalink / raw)
  To: bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Add a selftest that attempts to conceptually replicate intended BPF
token use cases inside user namespaced container.

Child process is forked. It is then put into its own userns and mountns.
Child creates BPF FS context object and sets it up as desired. This
ensures child userns is captures as owning userns for this instance of
BPF FS.

This context is passed back to privileged parent process through Unix
socket, where parent creates and mounts it as a detached mount. This
mount FD is passed back to the child to be used for BPF token creation,
which allows otherwise privileged BPF operations to succeed inside
userns.

We validate that all of token-enabled privileged commands (BPF_BTF_LOAD,
BPF_MAP_CREATE, and BPF_PROG_LOAD) work as intended. They should only
succeed inside the userns if a) BPF token is provided with proper
allowed sets of commands and types; and b) namespaces CAP_BPF and other
privileges are set. Lacking a) or b) should lead to -EPERM failures.

Based on suggested workflow by Christian Brauner ([0]).

  [0] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 .../testing/selftests/bpf/prog_tests/token.c  | 629 ++++++++++++++++++
 1 file changed, 629 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/token.c

diff --git a/tools/testing/selftests/bpf/prog_tests/token.c b/tools/testing/selftests/bpf/prog_tests/token.c
new file mode 100644
index 000000000000..41cee6b4731e
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/token.c
@@ -0,0 +1,629 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */
+#define _GNU_SOURCE
+#include <test_progs.h>
+#include <bpf/btf.h>
+#include "cap_helpers.h"
+#include <fcntl.h>
+#include <sched.h>
+#include <signal.h>
+#include <unistd.h>
+#include <linux/filter.h>
+#include <linux/unistd.h>
+#include <sys/mount.h>
+#include <sys/socket.h>
+#include <sys/syscall.h>
+#include <sys/un.h>
+
+/* copied from include/uapi/linux/mount.h, as including it conflicts with
+ * sys/mount.h include
+ */
+enum fsconfig_command {
+	FSCONFIG_SET_FLAG       = 0,    /* Set parameter, supplying no value */
+	FSCONFIG_SET_STRING     = 1,    /* Set parameter, supplying a string value */
+	FSCONFIG_SET_BINARY     = 2,    /* Set parameter, supplying a binary blob value */
+	FSCONFIG_SET_PATH       = 3,    /* Set parameter, supplying an object by path */
+	FSCONFIG_SET_PATH_EMPTY = 4,    /* Set parameter, supplying an object by (empty) path */
+	FSCONFIG_SET_FD         = 5,    /* Set parameter, supplying an object by fd */
+	FSCONFIG_CMD_CREATE     = 6,    /* Invoke superblock creation */
+	FSCONFIG_CMD_RECONFIGURE = 7,   /* Invoke superblock reconfiguration */
+};
+
+static inline int sys_fsopen(const char *fsname, unsigned flags)
+{
+	return syscall(__NR_fsopen, fsname, flags);
+}
+
+static inline int sys_fsconfig(int fs_fd, unsigned cmd, const char *key, const void *val, int aux)
+{
+	return syscall(__NR_fsconfig, fs_fd, cmd, key, val, aux);
+}
+
+static inline int sys_fsmount(int fs_fd, unsigned flags, unsigned ms_flags)
+{
+	return syscall(__NR_fsmount, fs_fd, flags, ms_flags);
+}
+
+static int drop_priv_caps(__u64 *old_caps)
+{
+	return cap_disable_effective((1ULL << CAP_BPF) |
+				     (1ULL << CAP_PERFMON) |
+				     (1ULL << CAP_NET_ADMIN) |
+				     (1ULL << CAP_SYS_ADMIN), old_caps);
+}
+
+static int restore_priv_caps(__u64 old_caps)
+{
+	return cap_enable_effective(old_caps, NULL);
+}
+
+static int set_delegate_mask(int fs_fd, const char *key, __u64 mask)
+{
+	char buf[32];
+	int err;
+
+	snprintf(buf, sizeof(buf), "0x%llx", (unsigned long long)mask);
+	err = sys_fsconfig(fs_fd, FSCONFIG_SET_STRING, key,
+			   mask == ~0ULL ? "any" : buf, 0);
+	if (err < 0)
+		err = -errno;
+	return err;
+}
+
+#define zclose(fd) do { if (fd >= 0) close(fd); fd = -1; } while (0)
+
+struct bpffs_opts {
+	__u64 cmds;
+	__u64 maps;
+	__u64 progs;
+	__u64 attachs;
+};
+
+static int setup_bpffs_fd(struct bpffs_opts *opts)
+{
+	int fs_fd = -1, err;
+
+	/* create VFS context */
+	fs_fd = sys_fsopen("bpf", 0);
+	if (!ASSERT_GE(fs_fd, 0, "fs_fd"))
+		goto cleanup;
+
+	/* set up token delegation mount options */
+	err = set_delegate_mask(fs_fd, "delegate_cmds", opts->cmds);
+	if (!ASSERT_OK(err, "fs_cfg_cmds"))
+		goto cleanup;
+	err = set_delegate_mask(fs_fd, "delegate_maps", opts->maps);
+	if (!ASSERT_OK(err, "fs_cfg_maps"))
+		goto cleanup;
+	err = set_delegate_mask(fs_fd, "delegate_progs", opts->progs);
+	if (!ASSERT_OK(err, "fs_cfg_progs"))
+		goto cleanup;
+	err = set_delegate_mask(fs_fd, "delegate_attachs", opts->attachs);
+	if (!ASSERT_OK(err, "fs_cfg_attachs"))
+		goto cleanup;
+
+	return fs_fd;
+cleanup:
+	zclose(fs_fd);
+	return -1;
+}
+
+static int materialize_bpffs_fd(int fs_fd)
+{
+	int mnt_fd, err;
+
+	/* instantiate FS object */
+	err = sys_fsconfig(fs_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+	if (err < 0)
+		return -errno;
+
+	/* create O_PATH fd for detached mount */
+	mnt_fd = sys_fsmount(fs_fd, 0, 0);
+	if (err < 0)
+		return -errno;
+
+	return mnt_fd;
+}
+
+/* send FD over Unix domain (AF_UNIX) socket */
+static int sendfd(int sockfd, int fd)
+{
+	struct msghdr msg = {};
+	struct cmsghdr *cmsg;
+	int fds[1] = { fd }, err;
+	char iobuf[1];
+	struct iovec io = {
+		.iov_base = iobuf,
+		.iov_len = sizeof(iobuf),
+	};
+	union {
+		char buf[CMSG_SPACE(sizeof(fds))];
+		struct cmsghdr align;
+	} u;
+
+	msg.msg_iov = &io;
+	msg.msg_iovlen = 1;
+	msg.msg_control = u.buf;
+	msg.msg_controllen = sizeof(u.buf);
+	cmsg = CMSG_FIRSTHDR(&msg);
+	cmsg->cmsg_level = SOL_SOCKET;
+	cmsg->cmsg_type = SCM_RIGHTS;
+	cmsg->cmsg_len = CMSG_LEN(sizeof(fds));
+	memcpy(CMSG_DATA(cmsg), fds, sizeof(fds));
+
+	err = sendmsg(sockfd, &msg, 0);
+	if (err < 0)
+		err = -errno;
+	if (!ASSERT_EQ(err, 1, "sendmsg"))
+		return -EINVAL;
+
+	return 0;
+}
+
+/* receive FD over Unix domain (AF_UNIX) socket */
+static int recvfd(int sockfd, int *fd)
+{
+	struct msghdr msg = {};
+	struct cmsghdr *cmsg;
+	int fds[1], err;
+	char iobuf[1];
+	struct iovec io = {
+		.iov_base = iobuf,
+		.iov_len = sizeof(iobuf),
+	};
+	union {
+		char buf[CMSG_SPACE(sizeof(fds))];
+		struct cmsghdr align;
+	} u;
+
+	msg.msg_iov = &io;
+	msg.msg_iovlen = 1;
+	msg.msg_control = u.buf;
+	msg.msg_controllen = sizeof(u.buf);
+
+	err = recvmsg(sockfd, &msg, 0);
+	if (err < 0)
+		err = -errno;
+	if (!ASSERT_EQ(err, 1, "recvmsg"))
+		return -EINVAL;
+
+	cmsg = CMSG_FIRSTHDR(&msg);
+	if (!ASSERT_OK_PTR(cmsg, "cmsg_null") ||
+	    !ASSERT_EQ(cmsg->cmsg_len, CMSG_LEN(sizeof(fds)), "cmsg_len") ||
+	    !ASSERT_EQ(cmsg->cmsg_level, SOL_SOCKET, "cmsg_level") ||
+	    !ASSERT_EQ(cmsg->cmsg_type, SCM_RIGHTS, "cmsg_type"))
+		return -EINVAL;
+
+	memcpy(fds, CMSG_DATA(cmsg), sizeof(fds));
+	*fd = fds[0];
+
+	return 0;
+}
+
+static ssize_t write_nointr(int fd, const void *buf, size_t count)
+{
+	ssize_t ret;
+
+	do {
+		ret = write(fd, buf, count);
+	} while (ret < 0 && errno == EINTR);
+
+	return ret;
+}
+
+static int write_file(const char *path, const void *buf, size_t count)
+{
+	int fd;
+	ssize_t ret;
+
+	fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW);
+	if (fd < 0)
+		return -1;
+
+	ret = write_nointr(fd, buf, count);
+	close(fd);
+	if (ret < 0 || (size_t)ret != count)
+		return -1;
+
+	return 0;
+}
+
+static int create_and_enter_userns(void)
+{
+	uid_t uid;
+	gid_t gid;
+	char map[100];
+
+	uid = getuid();
+	gid = getgid();
+
+	if (unshare(CLONE_NEWUSER))
+		return -1;
+
+	if (write_file("/proc/self/setgroups", "deny", sizeof("deny") - 1) &&
+	    errno != ENOENT)
+		return -1;
+
+	snprintf(map, sizeof(map), "0 %d 1", uid);
+	if (write_file("/proc/self/uid_map", map, strlen(map)))
+		return -1;
+
+
+	snprintf(map, sizeof(map), "0 %d 1", gid);
+	if (write_file("/proc/self/gid_map", map, strlen(map)))
+		return -1;
+
+	if (setgid(0))
+		return -1;
+
+	if (setuid(0))
+		return -1;
+
+	return 0;
+}
+
+typedef int (*child_callback_fn)(int);
+
+static void child(int sock_fd, struct bpffs_opts *bpffs_opts, child_callback_fn callback)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, map_opts);
+	int mnt_fd = -1, fs_fd = -1, err = 0;
+
+	/* setup userns with root mappings */
+	err = create_and_enter_userns();
+	if (!ASSERT_OK(err, "create_and_enter_userns"))
+		goto cleanup;
+
+	/* setup mountns to allow creating BPF FS (fsopen("bpf")) from unpriv process */
+	err = unshare(CLONE_NEWNS);
+	if (!ASSERT_OK(err, "create_mountns"))
+		goto cleanup;
+
+	err = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
+	if (!ASSERT_OK(err, "remount_root"))
+		goto cleanup;
+
+	fs_fd = setup_bpffs_fd(bpffs_opts);
+	if (!ASSERT_GE(fs_fd, 0, "setup_bpffs")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* pass BPF FS context object to parent */
+	err = sendfd(sock_fd, fs_fd);
+	if (!ASSERT_OK(err, "send_fs_fd"))
+		goto cleanup;
+
+	/* avoid mucking around with mount namespaces and mounting at
+	 * well-known path, just get detach-mounted BPF FS fd back from parent
+	 */
+	err = recvfd(sock_fd, &mnt_fd);
+	if (!ASSERT_OK(err, "recv_mnt_fd"))
+		goto cleanup;
+
+	/* do custom test logic with customly set up BPF FS instance */
+	err = callback(mnt_fd);
+	if (!ASSERT_OK(err, "test_callback"))
+		goto cleanup;
+
+	err = 0;
+cleanup:
+	zclose(sock_fd);
+	zclose(mnt_fd);
+
+	exit(-err);
+}
+
+static int wait_for_pid(pid_t pid)
+{
+	int status, ret;
+
+again:
+	ret = waitpid(pid, &status, 0);
+	if (ret == -1) {
+		if (errno == EINTR)
+			goto again;
+
+		return -1;
+	}
+
+	if (!WIFEXITED(status))
+		return -1;
+
+	return WEXITSTATUS(status);
+}
+
+static void parent(int child_pid, int sock_fd)
+{
+	int fs_fd = -1, mnt_fd = -1, err;
+
+	err = recvfd(sock_fd, &fs_fd);
+	if (!ASSERT_OK(err, "recv_bpffs_fd"))
+		goto cleanup;
+
+	mnt_fd = materialize_bpffs_fd(fs_fd);
+	if (!ASSERT_GE(mnt_fd, 0, "materialize_bpffs_fd")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+	zclose(fs_fd);
+
+	/* pass BPF FS context object to parent */
+	err = sendfd(sock_fd, mnt_fd);
+	if (!ASSERT_OK(err, "send_mnt_fd"))
+		goto cleanup;
+	zclose(mnt_fd);
+
+	err = wait_for_pid(child_pid);
+	ASSERT_OK(err, "waitpid_child");
+
+cleanup:
+	zclose(sock_fd);
+	zclose(fs_fd);
+	zclose(mnt_fd);
+
+	if (child_pid > 0)
+		(void)kill(child_pid, SIGKILL);
+}
+
+static void subtest_userns(struct bpffs_opts *bpffs_opts, child_callback_fn cb)
+{
+	int sock_fds[2] = { -1, -1 };
+	int child_pid = 0, err;
+
+	err = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fds);
+	if (!ASSERT_OK(err, "socketpair"))
+		goto cleanup;
+
+	child_pid = fork();
+	if (!ASSERT_GE(child_pid, 0, "fork"))
+		goto cleanup;
+
+	if (child_pid == 0) {
+		zclose(sock_fds[0]);
+		return child(sock_fds[1], bpffs_opts, cb);
+
+	} else {
+		zclose(sock_fds[1]);
+		return parent(child_pid, sock_fds[0]);
+	}
+
+cleanup:
+	zclose(sock_fds[0]);
+	zclose(sock_fds[1]);
+	if (child_pid > 0)
+		(void)kill(child_pid, SIGKILL);
+}
+
+static int userns_map_create(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_map_create_opts, map_opts);
+	int err, token_fd = -1, map_fd = -1;
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, "", NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* while inside non-init userns, we need both a BPF token *and*
+	 * CAP_BPF inside current userns to create privileged map; let's test
+	 * that neither BPF token alone nor namespaced CAP_BPF is sufficient
+	 */
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* no token, no CAP_BPF -> fail */
+	map_opts.token_fd = 0;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "wo_token_wo_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_wo_token_wo_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* token without CAP_BPF -> fail */
+	map_opts.token_fd = token_fd;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "w_token_wo_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_w_token_wo_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* get back effective local CAP_BPF (and CAP_SYS_ADMIN) */
+	err = restore_priv_caps(old_caps);
+	if (!ASSERT_OK(err, "restore_caps"))
+		goto cleanup;
+
+	/* CAP_BPF without token -> fail */
+	map_opts.token_fd = 0;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "wo_token_w_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_LT(map_fd, 0, "stack_map_wo_token_w_cap_bpf_should_fail")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* finally, namespaced CAP_BPF + token -> success */
+	map_opts.token_fd = token_fd;
+	map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "w_token_w_bpf", 0, 8, 1, &map_opts);
+	if (!ASSERT_GT(map_fd, 0, "stack_map_w_token_w_cap_bpf")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+cleanup:
+	zclose(token_fd);
+	zclose(map_fd);
+	return err;
+}
+
+static int userns_btf_load(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_btf_load_opts, btf_opts);
+	int err, token_fd = -1, btf_fd = -1;
+	const void *raw_btf_data;
+	struct btf *btf = NULL;
+	__u32 raw_btf_size;
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, "", NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* while inside non-init userns, we need both a BPF token *and*
+	 * CAP_BPF inside current userns to create privileged map; let's test
+	 * that neither BPF token alone nor namespaced CAP_BPF is sufficient
+	 */
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* setup a trivial BTF data to load to the kernel */
+	btf = btf__new_empty();
+	if (!ASSERT_OK_PTR(btf, "empty_btf"))
+		goto cleanup;
+
+	ASSERT_GT(btf__add_int(btf, "int", 4, 0), 0, "int_type");
+
+	raw_btf_data = btf__raw_data(btf, &raw_btf_size);
+	if (!ASSERT_OK_PTR(raw_btf_data, "raw_btf_data"))
+		goto cleanup;
+
+	/* no token + no CAP_BPF -> failure */
+	btf_opts.token_fd = 0;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_LT(btf_fd, 0, "no_token_no_cap_should_fail"))
+		goto cleanup;
+
+	/* token + no CAP_BPF -> failure */
+	btf_opts.token_fd = token_fd;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_LT(btf_fd, 0, "token_no_cap_should_fail"))
+		goto cleanup;
+
+	/* get back effective local CAP_BPF (and CAP_SYS_ADMIN) */
+	err = restore_priv_caps(old_caps);
+	if (!ASSERT_OK(err, "restore_caps"))
+		goto cleanup;
+
+	/* token + CAP_BPF -> success */
+	btf_opts.token_fd = token_fd;
+	btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts);
+	if (!ASSERT_GT(btf_fd, 0, "token_and_cap_success"))
+		goto cleanup;
+
+	err = 0;
+cleanup:
+	btf__free(btf);
+	zclose(btf_fd);
+	zclose(token_fd);
+	return err;
+}
+
+static int userns_prog_load(int mnt_fd)
+{
+	LIBBPF_OPTS(bpf_prog_load_opts, prog_opts);
+	int err, token_fd = -1, prog_fd = -1;
+	struct bpf_insn insns[] = {
+		/* bpf_jiffies64() requires CAP_BPF */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64),
+		/* bpf_get_current_task() requires CAP_PERFMON */
+		BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_current_task),
+		/* r0 = 0; exit; */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	};
+	size_t insn_cnt = ARRAY_SIZE(insns);
+	__u64 old_caps = 0;
+
+	/* create BPF token from BPF FS mount */
+	token_fd = bpf_token_create(mnt_fd, "", NULL);
+	if (!ASSERT_GT(token_fd, 0, "token_create")) {
+		err = -EINVAL;
+		goto cleanup;
+	}
+
+	/* validate we can successfully load BPF program with token; this
+	 * being XDP program (CAP_NET_ADMIN) using bpf_jiffies64() (CAP_BPF)
+	 * and bpf_get_current_task() (CAP_PERFMON) helpers validates we have
+	 * BPF token wired properly in a bunch of places in the kernel
+	 */
+	prog_opts.token_fd = token_fd;
+	prog_opts.expected_attach_type = BPF_XDP;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_GT(prog_fd, 0, "prog_fd")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	/* no token + caps -> failure */
+	prog_opts.token_fd = 0;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	err = drop_priv_caps(&old_caps);
+	if (!ASSERT_OK(err, "drop_caps"))
+		goto cleanup;
+
+	/* no caps + token -> failure */
+	prog_opts.token_fd = token_fd;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	/* no caps + no token -> definitely a failure */
+	prog_opts.token_fd = 0;
+	prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL",
+				insns, insn_cnt, &prog_opts);
+	if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) {
+		err = -EPERM;
+		goto cleanup;
+	}
+
+	err = 0;
+cleanup:
+	zclose(prog_fd);
+	zclose(token_fd);
+	return err;
+}
+
+void test_token(void)
+{
+	if (test__start_subtest("map_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_MAP_CREATE,
+			.maps = 1ULL << BPF_MAP_TYPE_STACK,
+		};
+
+		subtest_userns(&opts, userns_map_create);
+	}
+	if (test__start_subtest("btf_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_BTF_LOAD,
+		};
+
+		subtest_userns(&opts, userns_btf_load);
+	}
+	if (test__start_subtest("prog_token")) {
+		struct bpffs_opts opts = {
+			.cmds = 1ULL << BPF_PROG_LOAD,
+			.progs = 1ULL << BPF_PROG_TYPE_XDP,
+			.attachs = 1ULL << BPF_XDP,
+		};
+
+		subtest_userns(&opts, userns_prog_load);
+	}
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 bpf-next 02/13] bpf: add BPF token delegation mount options to BPF FS
  2023-09-27 22:57 ` [PATCH v6 bpf-next 02/13] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
@ 2023-10-10  7:08   ` Hou Tao
  2023-10-12  0:30     ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Hou Tao @ 2023-10-10  7:08 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun



On 9/28/2023 6:57 AM, Andrii Nakryiko wrote:
> Add few new mount options to BPF FS that allow to specify that a given
> BPF FS instance allows creation of BPF token (added in the next patch),
> and what sort of operations are allowed under BPF token. As such, we get
> 4 new mount options, each is a bit mask
>   - `delegate_cmds` allow to specify which bpf() syscall commands are
>     allowed with BPF token derived from this BPF FS instance;
>   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
>     a set of allowable BPF map types that could be created with BPF token;
>   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
>     a set of allowable BPF program types that could be loaded with BPF token;
>   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
>     a set of allowable BPF program attach types that could be loaded with
>     BPF token; delegate_progs and delegate_attachs are meant to be used
>     together, as full BPF program type is, in general, determined
>     through both program type and program attach type.
>
> Currently, these mount options accept the following forms of values:
>   - a special value "any", that enables all possible values of a given
>   bit set;
>   - numeric value (decimal or hexadecimal, determined by kernel
>   automatically) that specifies a bit mask value directly;
>   - all the values for a given mount option are combined, if specified
>   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
>   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
>   mask.
>
SNIP
>  	return 0;
> @@ -740,10 +786,14 @@ static int populate_bpffs(struct dentry *parent)
>  static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
>  {
>  	static const struct tree_descr bpf_rfiles[] = { { "" } };
> -	struct bpf_mount_opts *opts = fc->fs_private;
> +	struct bpf_mount_opts *opts = sb->s_fs_info;
>  	struct inode *inode;
>  	int ret;
>  
> +	/* Mounting an instance of BPF FS requires privileges */
> +	if (fc->user_ns != &init_user_ns && !capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
>  	ret = simple_fill_super(sb, BPF_FS_MAGIC, bpf_rfiles);
>  	if (ret)
>  		return ret;
> @@ -765,7 +815,10 @@ static int bpf_get_tree(struct fs_context *fc)
>  
>  static void bpf_free_fc(struct fs_context *fc)
>  {
> -	kfree(fc->fs_private);
> +	struct bpf_mount_opts *opts = fc->s_fs_info;
> +
> +	if (opts)
> +		kfree(opts);
>  }
>  

The NULL check is not needed here, use kfree(fc->s_fs_info) will be enough.
>  static const struct fs_context_operations bpf_context_ops = {
> @@ -787,17 +840,32 @@ static int bpf_init_fs_context(struct fs_context *fc)
>  
>  	opts->mode = S_IRWXUGO;
>  
> -	fc->fs_private = opts;
> +	/* start out with no BPF token delegation enabled */
> +	opts->delegate_cmds = 0;
> +	opts->delegate_maps = 0;
> +	opts->delegate_progs = 0;
> +	opts->delegate_attachs = 0;
> +
> +	fc->s_fs_info = opts;
>  	fc->ops = &bpf_context_ops;
>  	return 0;
>  }
>  
> +static void bpf_kill_super(struct super_block *sb)
> +{
> +	struct bpf_mount_opts *opts = sb->s_fs_info;
> +
> +	kill_litter_super(sb);
> +	kfree(opts);
> +}
> +
>  static struct file_system_type bpf_fs_type = {
>  	.owner		= THIS_MODULE,
>  	.name		= "bpf",
>  	.init_fs_context = bpf_init_fs_context,
>  	.parameters	= bpf_fs_parameters,
> -	.kill_sb	= kill_litter_super,
> +	.kill_sb	= bpf_kill_super,
> +	.fs_flags	= FS_USERNS_MOUNT,
>  };
>  
>  static int __init bpf_init(void)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 bpf-next 04/13] bpf: add BPF token support to BPF_MAP_CREATE command
  2023-09-27 22:58 ` [PATCH v6 bpf-next 04/13] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
@ 2023-10-10  8:35   ` Jiri Olsa
  2023-10-12  0:30     ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Jiri Olsa @ 2023-10-10  8:35 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, netdev, linux-fsdevel, linux-security-module, keescook,
	brauner, lennart, kernel-team, sargun

On Wed, Sep 27, 2023 at 03:58:00PM -0700, Andrii Nakryiko wrote:

SNIP

> -#define BPF_MAP_CREATE_LAST_FIELD map_extra
> +#define BPF_MAP_CREATE_LAST_FIELD map_token_fd
>  /* called via syscall */
>  static int map_create(union bpf_attr *attr)
>  {
>  	const struct bpf_map_ops *ops;
> +	struct bpf_token *token = NULL;
>  	int numa_node = bpf_map_attr_numa_node(attr);
>  	u32 map_type = attr->map_type;
>  	struct bpf_map *map;
> @@ -1157,14 +1158,32 @@ static int map_create(union bpf_attr *attr)
>  	if (!ops->map_mem_usage)
>  		return -EINVAL;
>  
> +	if (attr->map_token_fd) {
> +		token = bpf_token_get_from_fd(attr->map_token_fd);
> +		if (IS_ERR(token))
> +			return PTR_ERR(token);
> +
> +		/* if current token doesn't grant map creation permissions,
> +		 * then we can't use this token, so ignore it and rely on
> +		 * system-wide capabilities checks
> +		 */
> +		if (!bpf_token_allow_cmd(token, BPF_MAP_CREATE) ||
> +		    !bpf_token_allow_map_type(token, attr->map_type)) {
> +			bpf_token_put(token);
> +			token = NULL;
> +		}
> +	}
> +
> +	err = -EPERM;
> +
>  	/* Intent here is for unprivileged_bpf_disabled to block BPF map
>  	 * creation for unprivileged users; other actions depend
>  	 * on fd availability and access to bpffs, so are dependent on
>  	 * object creation success. Even with unprivileged BPF disabled,
>  	 * capability checks are still carried out.
>  	 */
> -	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> -		return -EPERM;
> +	if (sysctl_unprivileged_bpf_disabled && !bpf_token_capable(token, CAP_BPF))
> +		goto put_token;
>  
>  	/* check privileged map type permissions */
>  	switch (map_type) {
> @@ -1197,25 +1216,27 @@ static int map_create(union bpf_attr *attr)
>  	case BPF_MAP_TYPE_LRU_PERCPU_HASH:
>  	case BPF_MAP_TYPE_STRUCT_OPS:
>  	case BPF_MAP_TYPE_CPUMAP:
> -		if (!bpf_capable())
> -			return -EPERM;
> +		if (!bpf_token_capable(token, CAP_BPF))
> +			goto put_token;
>  		break;
>  	case BPF_MAP_TYPE_SOCKMAP:
>  	case BPF_MAP_TYPE_SOCKHASH:
>  	case BPF_MAP_TYPE_DEVMAP:
>  	case BPF_MAP_TYPE_DEVMAP_HASH:
>  	case BPF_MAP_TYPE_XSKMAP:
> -		if (!bpf_net_capable())
> -			return -EPERM;
> +		if (!bpf_token_capable(token, CAP_NET_ADMIN))
> +			goto put_token;
>  		break;
>  	default:
>  		WARN(1, "unsupported map type %d", map_type);
> -		return -EPERM;
> +		goto put_token;
>  	}
>  
>  	map = ops->map_alloc(attr);
> -	if (IS_ERR(map))
> -		return PTR_ERR(map);
> +	if (IS_ERR(map)) {
> +		err = PTR_ERR(map);
> +		goto put_token;
> +	}
>  	map->ops = ops;
>  	map->map_type = map_type;
>  
> @@ -1252,7 +1273,7 @@ static int map_create(union bpf_attr *attr)
>  		map->btf = btf;
>  
>  		if (attr->btf_value_type_id) {
> -			err = map_check_btf(map, btf, attr->btf_key_type_id,
> +			err = map_check_btf(map, token, btf, attr->btf_key_type_id,
>  					    attr->btf_value_type_id);
>  			if (err)
>  				goto free_map;

I might be missing something, but should we call bpf_token_put(token)
on non-error path as well? probably after bpf_map_save_memcg call

jirka

> @@ -1293,6 +1314,8 @@ static int map_create(union bpf_attr *attr)
>  free_map:
>  	btf_put(map->btf);
>  	map->ops->map_free(map);
> +put_token:
> +	bpf_token_put(token);
>  	return err;
>  }
>  

SNIP

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 3/13] bpf: introduce BPF token object
  2023-09-27 22:57 ` [PATCH v6 bpf-next 03/13] bpf: introduce BPF token object Andrii Nakryiko
@ 2023-10-11  1:17   ` Paul Moore
  2023-10-12  0:31     ` Andrii Nakryiko
  2023-10-11  2:35   ` [PATCH v6 bpf-next 03/13] " Hou Tao
  1 sibling, 1 reply; 29+ messages in thread
From: Paul Moore @ 2023-10-11  1:17 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun, selinux

On Sep 27, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Add new kind of BPF kernel object, BPF token. BPF token is meant to
> allow delegating privileged BPF functionality, like loading a BPF
> program or creating a BPF map, from privileged process to a *trusted*
> unprivileged process, all while have a good amount of control over which
> privileged operations could be performed using provided BPF token.
> 
> This is achieved through mounting BPF FS instance with extra delegation
> mount options, which determine what operations are delegatable, and also
> constraining it to the owning user namespace (as mentioned in the
> previous patch).
> 
> BPF token itself is just a derivative from BPF FS and can be created
> through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> a path specification (using the usual fd + string path combo) to a BPF
> FS mount. Currently, BPF token "inherits" delegated command, map types,
> prog type, and attach type bit sets from BPF FS as is. In the future,
> having an BPF token as a separate object with its own FD, we can allow
> to further restrict BPF token's allowable set of things either at the creation
> time or after the fact, allowing the process to guard itself further
> from, e.g., unintentionally trying to load undesired kind of BPF
> programs. But for now we keep things simple and just copy bit sets as is.
> 
> When BPF token is created from BPF FS mount, we take reference to the
> BPF super block's owning user namespace, and then use that namespace for
> checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> capabilities that are normally only checked against init userns (using
> capable()), but now we check them using ns_capable() instead (if BPF
> token is provided). See bpf_token_capable() for details.
> 
> Such setup means that BPF token in itself is not sufficient to grant BPF
> functionality. User namespaced process has to *also* have necessary
> combination of capabilities inside that user namespace. So while
> previously CAP_BPF was useless when granted within user namespace, now
> it gains a meaning and allows container managers and sys admins to have
> a flexible control over which processes can and need to use BPF
> functionality within the user namespace (i.e., container in practice).
> And BPF FS delegation mount options and derived BPF tokens serve as
> a per-container "flag" to grant overall ability to use bpf() (plus further
> restrict on which parts of bpf() syscalls are treated as namespaced).
> 
> The alternative to creating BPF token object was:
>   a) not having any extra object and just pasing BPF FS path to each
>      relevant bpf() command. This seems suboptimal as it's racy (mount
>      under the same path might change in between checking it and using it
>      for bpf() command). And also less flexible if we'd like to further
>      restrict ourselves compared to all the delegated functionality
>      allowed on BPF FS.
>   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
>      a dedicated FD that would represent a token-like functionality. This
>      doesn't seem superior to having a proper bpf() command, so
>      BPF_TOKEN_CREATE was chosen.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf.h            |  40 +++++++
>  include/uapi/linux/bpf.h       |  39 +++++++
>  kernel/bpf/Makefile            |   2 +-
>  kernel/bpf/inode.c             |  10 +-
>  kernel/bpf/syscall.c           |  17 +++
>  kernel/bpf/token.c             | 197 +++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  39 +++++++
>  7 files changed, 339 insertions(+), 5 deletions(-)
>  create mode 100644 kernel/bpf/token.c
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a5bd40f71fd0..c43131a24579 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1572,6 +1576,13 @@ struct bpf_mount_opts {
>  	u64 delegate_attachs;
>  };
>  
> +struct bpf_token {
> +	struct work_struct work;
> +	atomic64_t refcnt;
> +	struct user_namespace *userns;
> +	u64 allowed_cmds;

We'll also need a 'void *security' field to go along with the BPF token
allocation/creation/free hooks, see my comments below.  This is similar
to what we do for other kernel objects.

> +};
> +

...

> diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> new file mode 100644
> index 000000000000..779aad5007a3
> --- /dev/null
> +++ b/kernel/bpf/token.c
> @@ -0,0 +1,197 @@
> +#include <linux/bpf.h>
> +#include <linux/vmalloc.h>
> +#include <linux/anon_inodes.h>

Probably don't need the anon_inode.h include anymore.

> +#include <linux/fdtable.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/kernel.h>
> +#include <linux/idr.h>
> +#include <linux/namei.h>
> +#include <linux/user_namespace.h>
> +
> +bool bpf_token_capable(const struct bpf_token *token, int cap)
> +{
> +	/* BPF token allows ns_capable() level of capabilities */
> +	if (token) {

I think we want a LSM hook here before the token is used in the
capability check.  The LSM will see the capability check, but it will
not be able to distinguish it from the process which created the
delegation token.  This is arguably the purpose of the delegation, but
with the LSM we want to be able to control who can use the delegated
privilege.  How about something like this:

  if (security_bpf_token_capable(token, cap))
     return false;

> +		if (ns_capable(token->userns, cap))
> +			return true;
> +		if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> +			return true;
> +	}
> +	/* otherwise fallback to capable() checks */
> +	return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> +}
> +
> +void bpf_token_inc(struct bpf_token *token)
> +{
> +	atomic64_inc(&token->refcnt);
> +}
> +
> +static void bpf_token_free(struct bpf_token *token)
> +{

We should have a LSM hook here to handle freeing the LSM state
associated with the token.

  security_bpf_token_free(token);

> +	put_user_ns(token->userns);
> +	kvfree(token);
> +}

...

> +static struct bpf_token *bpf_token_alloc(void)
> +{
> +	struct bpf_token *token;
> +
> +	token = kvzalloc(sizeof(*token), GFP_USER);
> +	if (!token)
> +		return NULL;
> +
> +	atomic64_set(&token->refcnt, 1);

We should have a LSM hook here to allocate the LSM state associated
with the token.

  if (security_bpf_token_alloc(token)) {
    kvfree(token);
    return NULL;
  }

> +	return token;
> +}

...

> +int bpf_token_create(union bpf_attr *attr)
> +{
> +	struct bpf_mount_opts *mnt_opts;
> +	struct bpf_token *token = NULL;
> +	struct inode *inode;
> +	struct file *file;
> +	struct path path;
> +	umode_t mode;
> +	int err, fd;
> +
> +	err = user_path_at(attr->token_create.bpffs_path_fd,
> +			   u64_to_user_ptr(attr->token_create.bpffs_pathname),
> +			   LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
> +	if (err)
> +		return err;
> +
> +	if (path.mnt->mnt_root != path.dentry) {
> +		err = -EINVAL;
> +		goto out_path;
> +	}
> +	err = path_permission(&path, MAY_ACCESS);
> +	if (err)
> +		goto out_path;
> +
> +	mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
> +	inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
> +	if (IS_ERR(inode)) {
> +		err = PTR_ERR(inode);
> +		goto out_path;
> +	}
> +
> +	inode->i_op = &bpf_token_iops;
> +	inode->i_fop = &bpf_token_fops;
> +	clear_nlink(inode); /* make sure it is unlinked */
> +
> +	file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
> +	if (IS_ERR(file)) {
> +		iput(inode);
> +		err = PTR_ERR(file);
> +		goto out_file;
> +	}
> +
> +	token = bpf_token_alloc();
> +	if (!token) {
> +		err = -ENOMEM;
> +		goto out_file;
> +	}
> +
> +	/* remember bpffs owning userns for future ns_capable() checks */
> +	token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
> +
> +	mnt_opts = path.dentry->d_sb->s_fs_info;
> +	token->allowed_cmds = mnt_opts->delegate_cmds;

I think we would want a LSM hook here, both to control the creation
of the token and mark it with the security attributes of the creating
process.  How about something like this:

  err = security_bpf_token_create(token);
  if (err)
    goto out_token;

> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0) {
> +		err = fd;
> +		goto out_token;
> +	}
> +
> +	file->private_data = token;
> +	fd_install(fd, file);
> +
> +	path_put(&path);
> +	return fd;
> +
> +out_token:
> +	bpf_token_free(token);
> +out_file:
> +	fput(file);
> +out_path:
> +	path_put(&path);
> +	return err;
> +}

...

> +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> +{
> +	if (!token)
> +		return false;
> +
> +	return token->allowed_cmds & (1ULL << cmd);

Similar to bpf_token_capable(), I believe we want a LSM hook here to
control who is allowed to use the delegated privilege.

  bool bpf_token_allow_cmd(...)
  {
    if (token && (token->allowed_cmds & (1ULL << cmd))
      return security_bpf_token_cmd(token, cmd);
    return false;
  }

> +}

--
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 6/13] bpf: add BPF token support to BPF_PROG_LOAD  command
  2023-09-27 22:58 ` [PATCH v6 bpf-next 06/13] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
@ 2023-10-11  1:17   ` Paul Moore
  2023-10-12  0:31     ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Moore @ 2023-10-11  1:17 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun, selinux

On Sep 27, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> 
> Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of
> allowed BPF program types and attach types, derived from BPF FS at BPF
> token creation time. Then make sure we perform bpf_token_capable()
> checks everywhere where it's relevant.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
>  include/linux/bpf.h                           |  6 ++
>  include/uapi/linux/bpf.h                      |  2 +
>  kernel/bpf/core.c                             |  1 +
>  kernel/bpf/inode.c                            |  6 +-
>  kernel/bpf/syscall.c                          | 87 ++++++++++++++-----
>  kernel/bpf/token.c                            | 25 ++++++
>  tools/include/uapi/linux/bpf.h                |  2 +
>  .../selftests/bpf/prog_tests/libbpf_probes.c  |  2 +
>  .../selftests/bpf/prog_tests/libbpf_str.c     |  3 +
>  9 files changed, 108 insertions(+), 26 deletions(-)

...

> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 5c5c2b6648b2..d0b219f09bcc 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2685,6 +2718,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
>  	prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
>  	prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
>  
> +	/* move token into prog->aux, reuse taken refcnt */
> +	prog->aux->token = token;
> +	token = NULL;
> +
>  	err = security_bpf_prog_alloc(prog->aux);
>  	if (err)
>  		goto free_prog;

As we discussed in the earlier thread, let's tweak/rename/move the
security_bpf_prog_alloc() call down to just before the bpf_check() call
so it looks something like this:

  err = security_bpf_prog_load(prog, &attr, token);
  if (err)
    goto proper_jump_label;
  
  err = bpf_check(...);

With the idea being that LSMs which implement the token hooks would
skip any BPF_PROG_LOAD access controls in security_bpf() and instead
implement them in security_bpf_prog_load().

We should also do something similar for map_create() and
security_bpf_map_alloc() in patch 4/13.

--
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 bpf-next 03/13] bpf: introduce BPF token object
  2023-09-27 22:57 ` [PATCH v6 bpf-next 03/13] bpf: introduce BPF token object Andrii Nakryiko
  2023-10-11  1:17   ` [PATCH v6 3/13] " Paul Moore
@ 2023-10-11  2:35   ` Hou Tao
  2023-10-12  0:31     ` Andrii Nakryiko
  1 sibling, 1 reply; 29+ messages in thread
From: Hou Tao @ 2023-10-11  2:35 UTC (permalink / raw)
  To: Andrii Nakryiko, bpf, netdev
  Cc: linux-fsdevel, linux-security-module, keescook, brauner, lennart,
	kernel-team, sargun

Hi,

On 9/28/2023 6:57 AM, Andrii Nakryiko wrote:
> Add new kind of BPF kernel object, BPF token. BPF token is meant to
> allow delegating privileged BPF functionality, like loading a BPF
> program or creating a BPF map, from privileged process to a *trusted*
> unprivileged process, all while have a good amount of control over which
> privileged operations could be performed using provided BPF token.
>
> This is achieved through mounting BPF FS instance with extra delegation
> mount options, which determine what operations are delegatable, and also
> constraining it to the owning user namespace (as mentioned in the
> previous patch).
SNIP
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 70bfa997e896..78692911f4a0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -847,6 +847,37 @@ union bpf_iter_link_info {
>   *		Returns zero on success. On error, -1 is returned and *errno*
>   *		is set appropriately.
>   *
> + * BPF_TOKEN_CREATE
> + *	Description
> + *		Create BPF token with embedded information about what
> + *		BPF-related functionality it allows:
> + *		- a set of allowed bpf() syscall commands;
> + *		- a set of allowed BPF map types to be created with
> + *		BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
> + *		- a set of allowed BPF program types and BPF program attach
> + *		types to be loaded with BPF_PROG_LOAD command, if
> + *		BPF_PROG_LOAD itself is allowed.
> + *
> + *		BPF token is created (derived) from an instance of BPF FS,
> + *		assuming it has necessary delegation mount options specified.
> + *		BPF FS mount is specified with openat()-style path FD + string.
> + *		This BPF token can be passed as an extra parameter to various
> + *		bpf() syscall commands to grant BPF subsystem functionality to
> + *		unprivileged processes.
> + *
> + *		When created, BPF token is "associated" with the owning
> + *		user namespace of BPF FS instance (super block) that it was
> + *		derived from, and subsequent BPF operations performed with
> + *		BPF token would be performing capabilities checks (i.e.,
> + *		CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
> + *		that user namespace. Without BPF token, such capabilities
> + *		have to be granted in init user namespace, making bpf()
> + *		syscall incompatible with user namespace, for the most part.
> + *
> + *	Return
> + *		A new file descriptor (a nonnegative integer), or -1 if an
> + *		error occurred (in which case, *errno* is set appropriately).
> + *
>   * NOTES
>   *	eBPF objects (maps and programs) can be shared between processes.
>   *
> @@ -901,6 +932,8 @@ enum bpf_cmd {
>  	BPF_ITER_CREATE,
>  	BPF_LINK_DETACH,
>  	BPF_PROG_BIND_MAP,
> +	BPF_TOKEN_CREATE,
> +	__MAX_BPF_CMD,
>  };
>  
>  enum bpf_map_type {
> @@ -1694,6 +1727,12 @@ union bpf_attr {
>  		__u32		flags;		/* extra flags */
>  	} prog_bind_map;
>  
> +	struct { /* struct used by BPF_TOKEN_CREATE command */
> +		__u32		flags;
> +		__u32		bpffs_path_fd;
> +		__u64		bpffs_pathname;

Because bppfs_pathname is a string pointer, so __aligned_u64 is preferred.
> +	} token_create;
> +
>  } __attribute__((aligned(8)));
>  
>  /* The description below is an attempt at providing documentation to eBPF
> diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> index f526b7573e97..4ce95acfcaa7 100644
> --- a/kernel/bpf/Makefile
> +++ b/kernel/bpf/Makefile
> @@ -6,7 +6,7 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
>  endif
>  CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
>  
> -obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
> +obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
>  obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
>  obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
>  obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
> diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> index 24b3faf901f4..de1fdf396521 100644
> --- a/kernel/bpf/inode.c
> +++ b/kernel/bpf/inode.c
> @@ -99,9 +99,9 @@ static const struct inode_operations bpf_prog_iops = { };
>  static const struct inode_operations bpf_map_iops  = { };
>  static const struct inode_operations bpf_link_iops  = { };
>  
> -static struct inode *bpf_get_inode(struct super_block *sb,
> -				   const struct inode *dir,
> -				   umode_t mode)
> +struct inode *bpf_get_inode(struct super_block *sb,
> +			    const struct inode *dir,
> +			    umode_t mode)
>  {
>  	struct inode *inode;
>  
> @@ -603,11 +603,13 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
>  {
>  	struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
>  	umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
> +	u64 mask;
>  
>  	if (mode != S_IRWXUGO)
>  		seq_printf(m, ",mode=%o", mode);
>  
> -	if (opts->delegate_cmds == ~0ULL)
> +	mask = (1ULL << __MAX_BPF_CMD) - 1;
> +	if ((opts->delegate_cmds & mask) == mask)
>  		seq_printf(m, ",delegate_cmds=any");

Should we add a BUILD_BUG_ON assertion to guarantee __MAX_BPF_CMD is
less than sizeof(u64) * 8 ?
>  	else if (opts->delegate_cmds)
>  		seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 7445dad01fb3..b47791a80930 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -5304,6 +5304,20 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
>  	return ret;
>  }
>  
> +#define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_pathname
> +
> +static int token_create(union bpf_attr *attr)
> +{
> +	if (CHECK_ATTR(BPF_TOKEN_CREATE))
> +		return -EINVAL;
> +
> +	/* no flags are supported yet */
> +	if (attr->token_create.flags)
> +		return -EINVAL;
> +
> +	return bpf_token_create(attr);
> +}
> +
>  static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
>  {
>  	union bpf_attr attr;
> @@ -5437,6 +5451,9 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
>  	case BPF_PROG_BIND_MAP:
>  		err = bpf_prog_bind_map(&attr);
>  		break;
> +	case BPF_TOKEN_CREATE:
> +		err = token_create(&attr);
> +		break;
>  	default:
>  		err = -EINVAL;
>  		break;
> diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> new file mode 100644
> index 000000000000..779aad5007a3
> --- /dev/null
> +++ b/kernel/bpf/token.c
SNIP
> +#define BPF_TOKEN_INODE_NAME "bpf-token"
> +
> +static const struct inode_operations bpf_token_iops = { };
> +
> +static const struct file_operations bpf_token_fops = {
> +	.release	= bpf_token_release,
> +	.show_fdinfo	= bpf_token_show_fdinfo,
> +};
> +
> +int bpf_token_create(union bpf_attr *attr)
> +{
> +	struct bpf_mount_opts *mnt_opts;
> +	struct bpf_token *token = NULL;
> +	struct inode *inode;
> +	struct file *file;
> +	struct path path;
> +	umode_t mode;
> +	int err, fd;
> +
> +	err = user_path_at(attr->token_create.bpffs_path_fd,
> +			   u64_to_user_ptr(attr->token_create.bpffs_pathname),
> +			   LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
> +	if (err)
> +		return err;

Need to check the mount is a bpffs mount instead of other filesystem mount.
> +
> +	if (path.mnt->mnt_root != path.dentry) {
> +		err = -EINVAL;
> +		goto out_path;
> +	}
> +	err = path_permission(&path, MAY_ACCESS);
> +	if (err)
> +		goto out_path;
> +
> +	mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
> +	inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
> +	if (IS_ERR(inode)) {
> +		err = PTR_ERR(inode);
> +		goto out_path;
> +	}
> +
> +	inode->i_op = &bpf_token_iops;
> +	inode->i_fop = &bpf_token_fops;
> +	clear_nlink(inode); /* make sure it is unlinked */
> +
> +	file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
> +	if (IS_ERR(file)) {
> +		iput(inode);
> +		err = PTR_ERR(file);
> +		goto out_file;

goto out_path ?
> +	}
> +
> +	token = bpf_token_alloc();
> +	if (!token) {
> +		err = -ENOMEM;
> +		goto out_file;
> +	}
> +
> +	/* remember bpffs owning userns for future ns_capable() checks */
> +	token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
> +
> +	mnt_opts = path.dentry->d_sb->s_fs_info;
> +	token->allowed_cmds = mnt_opts->delegate_cmds;
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0) {
> +		err = fd;
> +		goto out_token;
> +	}
> +
> +	file->private_data = token;
> +	fd_install(fd, file);
> +
> +	path_put(&path);
> +	return fd;
> +
> +out_token:
> +	bpf_token_free(token);
> +out_file:
> +	fput(file);
> +out_path:
> +	path_put(&path);
> +	return err;
> +}
> +
.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 bpf-next 02/13] bpf: add BPF token delegation mount options to BPF FS
  2023-10-10  7:08   ` Hou Tao
@ 2023-10-12  0:30     ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-10-12  0:30 UTC (permalink / raw)
  To: Hou Tao
  Cc: Andrii Nakryiko, bpf, netdev, linux-fsdevel,
	linux-security-module, keescook, brauner, lennart, kernel-team,
	sargun

On Tue, Oct 10, 2023 at 12:09 AM Hou Tao <houtao@huaweicloud.com> wrote:
>
>
>
> On 9/28/2023 6:57 AM, Andrii Nakryiko wrote:
> > Add few new mount options to BPF FS that allow to specify that a given
> > BPF FS instance allows creation of BPF token (added in the next patch),
> > and what sort of operations are allowed under BPF token. As such, we get
> > 4 new mount options, each is a bit mask
> >   - `delegate_cmds` allow to specify which bpf() syscall commands are
> >     allowed with BPF token derived from this BPF FS instance;
> >   - if BPF_MAP_CREATE command is allowed, `delegate_maps` specifies
> >     a set of allowable BPF map types that could be created with BPF token;
> >   - if BPF_PROG_LOAD command is allowed, `delegate_progs` specifies
> >     a set of allowable BPF program types that could be loaded with BPF token;
> >   - if BPF_PROG_LOAD command is allowed, `delegate_attachs` specifies
> >     a set of allowable BPF program attach types that could be loaded with
> >     BPF token; delegate_progs and delegate_attachs are meant to be used
> >     together, as full BPF program type is, in general, determined
> >     through both program type and program attach type.
> >
> > Currently, these mount options accept the following forms of values:
> >   - a special value "any", that enables all possible values of a given
> >   bit set;
> >   - numeric value (decimal or hexadecimal, determined by kernel
> >   automatically) that specifies a bit mask value directly;
> >   - all the values for a given mount option are combined, if specified
> >   multiple times. E.g., `mount -t bpf nodev /path/to/mount -o
> >   delegate_maps=0x1 -o delegate_maps=0x2` will result in a combined 0x3
> >   mask.
> >
> SNIP
> >       return 0;
> > @@ -740,10 +786,14 @@ static int populate_bpffs(struct dentry *parent)
> >  static int bpf_fill_super(struct super_block *sb, struct fs_context *fc)
> >  {
> >       static const struct tree_descr bpf_rfiles[] = { { "" } };
> > -     struct bpf_mount_opts *opts = fc->fs_private;
> > +     struct bpf_mount_opts *opts = sb->s_fs_info;
> >       struct inode *inode;
> >       int ret;
> >
> > +     /* Mounting an instance of BPF FS requires privileges */
> > +     if (fc->user_ns != &init_user_ns && !capable(CAP_SYS_ADMIN))
> > +             return -EPERM;
> > +
> >       ret = simple_fill_super(sb, BPF_FS_MAGIC, bpf_rfiles);
> >       if (ret)
> >               return ret;
> > @@ -765,7 +815,10 @@ static int bpf_get_tree(struct fs_context *fc)
> >
> >  static void bpf_free_fc(struct fs_context *fc)
> >  {
> > -     kfree(fc->fs_private);
> > +     struct bpf_mount_opts *opts = fc->s_fs_info;
> > +
> > +     if (opts)
> > +             kfree(opts);
> >  }
> >
>
> The NULL check is not needed here, use kfree(fc->s_fs_info) will be enough.

yep, dropped the check


> >  static const struct fs_context_operations bpf_context_ops = {
> > @@ -787,17 +840,32 @@ static int bpf_init_fs_context(struct fs_context *fc)
> >
> >       opts->mode = S_IRWXUGO;
> >
> > -     fc->fs_private = opts;
> > +     /* start out with no BPF token delegation enabled */
> > +     opts->delegate_cmds = 0;
> > +     opts->delegate_maps = 0;
> > +     opts->delegate_progs = 0;
> > +     opts->delegate_attachs = 0;
> > +
> > +     fc->s_fs_info = opts;
> >       fc->ops = &bpf_context_ops;
> >       return 0;
> >  }
> >
> > +static void bpf_kill_super(struct super_block *sb)
> > +{
> > +     struct bpf_mount_opts *opts = sb->s_fs_info;
> > +
> > +     kill_litter_super(sb);
> > +     kfree(opts);
> > +}
> > +
> >  static struct file_system_type bpf_fs_type = {
> >       .owner          = THIS_MODULE,
> >       .name           = "bpf",
> >       .init_fs_context = bpf_init_fs_context,
> >       .parameters     = bpf_fs_parameters,
> > -     .kill_sb        = kill_litter_super,
> > +     .kill_sb        = bpf_kill_super,
> > +     .fs_flags       = FS_USERNS_MOUNT,
> >  };
> >
> >  static int __init bpf_init(void)
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 bpf-next 04/13] bpf: add BPF token support to BPF_MAP_CREATE command
  2023-10-10  8:35   ` Jiri Olsa
@ 2023-10-12  0:30     ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-10-12  0:30 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Andrii Nakryiko, bpf, netdev, linux-fsdevel,
	linux-security-module, keescook, brauner, lennart, kernel-team,
	sargun

On Tue, Oct 10, 2023 at 1:35 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Wed, Sep 27, 2023 at 03:58:00PM -0700, Andrii Nakryiko wrote:
>
> SNIP
>
> > -#define BPF_MAP_CREATE_LAST_FIELD map_extra
> > +#define BPF_MAP_CREATE_LAST_FIELD map_token_fd
> >  /* called via syscall */
> >  static int map_create(union bpf_attr *attr)
> >  {
> >       const struct bpf_map_ops *ops;
> > +     struct bpf_token *token = NULL;
> >       int numa_node = bpf_map_attr_numa_node(attr);
> >       u32 map_type = attr->map_type;
> >       struct bpf_map *map;
> > @@ -1157,14 +1158,32 @@ static int map_create(union bpf_attr *attr)
> >       if (!ops->map_mem_usage)
> >               return -EINVAL;
> >
> > +     if (attr->map_token_fd) {
> > +             token = bpf_token_get_from_fd(attr->map_token_fd);
> > +             if (IS_ERR(token))
> > +                     return PTR_ERR(token);
> > +
> > +             /* if current token doesn't grant map creation permissions,
> > +              * then we can't use this token, so ignore it and rely on
> > +              * system-wide capabilities checks
> > +              */
> > +             if (!bpf_token_allow_cmd(token, BPF_MAP_CREATE) ||
> > +                 !bpf_token_allow_map_type(token, attr->map_type)) {
> > +                     bpf_token_put(token);
> > +                     token = NULL;
> > +             }
> > +     }
> > +
> > +     err = -EPERM;
> > +
> >       /* Intent here is for unprivileged_bpf_disabled to block BPF map
> >        * creation for unprivileged users; other actions depend
> >        * on fd availability and access to bpffs, so are dependent on
> >        * object creation success. Even with unprivileged BPF disabled,
> >        * capability checks are still carried out.
> >        */
> > -     if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
> > -             return -EPERM;
> > +     if (sysctl_unprivileged_bpf_disabled && !bpf_token_capable(token, CAP_BPF))
> > +             goto put_token;
> >
> >       /* check privileged map type permissions */
> >       switch (map_type) {
> > @@ -1197,25 +1216,27 @@ static int map_create(union bpf_attr *attr)
> >       case BPF_MAP_TYPE_LRU_PERCPU_HASH:
> >       case BPF_MAP_TYPE_STRUCT_OPS:
> >       case BPF_MAP_TYPE_CPUMAP:
> > -             if (!bpf_capable())
> > -                     return -EPERM;
> > +             if (!bpf_token_capable(token, CAP_BPF))
> > +                     goto put_token;
> >               break;
> >       case BPF_MAP_TYPE_SOCKMAP:
> >       case BPF_MAP_TYPE_SOCKHASH:
> >       case BPF_MAP_TYPE_DEVMAP:
> >       case BPF_MAP_TYPE_DEVMAP_HASH:
> >       case BPF_MAP_TYPE_XSKMAP:
> > -             if (!bpf_net_capable())
> > -                     return -EPERM;
> > +             if (!bpf_token_capable(token, CAP_NET_ADMIN))
> > +                     goto put_token;
> >               break;
> >       default:
> >               WARN(1, "unsupported map type %d", map_type);
> > -             return -EPERM;
> > +             goto put_token;
> >       }
> >
> >       map = ops->map_alloc(attr);
> > -     if (IS_ERR(map))
> > -             return PTR_ERR(map);
> > +     if (IS_ERR(map)) {
> > +             err = PTR_ERR(map);
> > +             goto put_token;
> > +     }
> >       map->ops = ops;
> >       map->map_type = map_type;
> >
> > @@ -1252,7 +1273,7 @@ static int map_create(union bpf_attr *attr)
> >               map->btf = btf;
> >
> >               if (attr->btf_value_type_id) {
> > -                     err = map_check_btf(map, btf, attr->btf_key_type_id,
> > +                     err = map_check_btf(map, token, btf, attr->btf_key_type_id,
> >                                           attr->btf_value_type_id);
> >                       if (err)
> >                               goto free_map;
>
> I might be missing something, but should we call bpf_token_put(token)
> on non-error path as well? probably after bpf_map_save_memcg call

Not missing anything. I used to keep token reference inside struct
bpf_map on success, but I ripped that out, so yes, token has to be put
properly even on success. Thanks for catching this!

And yes, right after bpf_map_save_memcg() seems like the best spot.


>
> jirka
>
> > @@ -1293,6 +1314,8 @@ static int map_create(union bpf_attr *attr)
> >  free_map:
> >       btf_put(map->btf);
> >       map->ops->map_free(map);
> > +put_token:
> > +     bpf_token_put(token);
> >       return err;
> >  }
> >
>
> SNIP

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 bpf-next 03/13] bpf: introduce BPF token object
  2023-10-11  2:35   ` [PATCH v6 bpf-next 03/13] " Hou Tao
@ 2023-10-12  0:31     ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-10-12  0:31 UTC (permalink / raw)
  To: Hou Tao
  Cc: Andrii Nakryiko, bpf, netdev, linux-fsdevel,
	linux-security-module, keescook, brauner, lennart, kernel-team,
	sargun

On Tue, Oct 10, 2023 at 7:35 PM Hou Tao <houtao@huaweicloud.com> wrote:
>
> Hi,
>
> On 9/28/2023 6:57 AM, Andrii Nakryiko wrote:
> > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > allow delegating privileged BPF functionality, like loading a BPF
> > program or creating a BPF map, from privileged process to a *trusted*
> > unprivileged process, all while have a good amount of control over which
> > privileged operations could be performed using provided BPF token.
> >
> > This is achieved through mounting BPF FS instance with extra delegation
> > mount options, which determine what operations are delegatable, and also
> > constraining it to the owning user namespace (as mentioned in the
> > previous patch).
> SNIP
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 70bfa997e896..78692911f4a0 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -847,6 +847,37 @@ union bpf_iter_link_info {
> >   *           Returns zero on success. On error, -1 is returned and *errno*
> >   *           is set appropriately.
> >   *
> > + * BPF_TOKEN_CREATE
> > + *   Description
> > + *           Create BPF token with embedded information about what
> > + *           BPF-related functionality it allows:
> > + *           - a set of allowed bpf() syscall commands;
> > + *           - a set of allowed BPF map types to be created with
> > + *           BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed;
> > + *           - a set of allowed BPF program types and BPF program attach
> > + *           types to be loaded with BPF_PROG_LOAD command, if
> > + *           BPF_PROG_LOAD itself is allowed.
> > + *
> > + *           BPF token is created (derived) from an instance of BPF FS,
> > + *           assuming it has necessary delegation mount options specified.
> > + *           BPF FS mount is specified with openat()-style path FD + string.
> > + *           This BPF token can be passed as an extra parameter to various
> > + *           bpf() syscall commands to grant BPF subsystem functionality to
> > + *           unprivileged processes.
> > + *
> > + *           When created, BPF token is "associated" with the owning
> > + *           user namespace of BPF FS instance (super block) that it was
> > + *           derived from, and subsequent BPF operations performed with
> > + *           BPF token would be performing capabilities checks (i.e.,
> > + *           CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within
> > + *           that user namespace. Without BPF token, such capabilities
> > + *           have to be granted in init user namespace, making bpf()
> > + *           syscall incompatible with user namespace, for the most part.
> > + *
> > + *   Return
> > + *           A new file descriptor (a nonnegative integer), or -1 if an
> > + *           error occurred (in which case, *errno* is set appropriately).
> > + *
> >   * NOTES
> >   *   eBPF objects (maps and programs) can be shared between processes.
> >   *
> > @@ -901,6 +932,8 @@ enum bpf_cmd {
> >       BPF_ITER_CREATE,
> >       BPF_LINK_DETACH,
> >       BPF_PROG_BIND_MAP,
> > +     BPF_TOKEN_CREATE,
> > +     __MAX_BPF_CMD,
> >  };
> >
> >  enum bpf_map_type {
> > @@ -1694,6 +1727,12 @@ union bpf_attr {
> >               __u32           flags;          /* extra flags */
> >       } prog_bind_map;
> >
> > +     struct { /* struct used by BPF_TOKEN_CREATE command */
> > +             __u32           flags;
> > +             __u32           bpffs_path_fd;
> > +             __u64           bpffs_pathname;
>
> Because bppfs_pathname is a string pointer, so __aligned_u64 is preferred.

ok, I'll use __aligned_u64, even though it can never be unaligned in this case


> > +     } token_create;
> > +
> >  } __attribute__((aligned(8)));
> >
> >  /* The description below is an attempt at providing documentation to eBPF
> > diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> > index f526b7573e97..4ce95acfcaa7 100644
> > --- a/kernel/bpf/Makefile
> > +++ b/kernel/bpf/Makefile
> > @@ -6,7 +6,7 @@ cflags-nogcse-$(CONFIG_X86)$(CONFIG_CC_IS_GCC) := -fno-gcse
> >  endif
> >  CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy)
> >
> > -obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o
> > +obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o
> >  obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o
> >  obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o
> >  obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
> > diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
> > index 24b3faf901f4..de1fdf396521 100644
> > --- a/kernel/bpf/inode.c
> > +++ b/kernel/bpf/inode.c
> > @@ -99,9 +99,9 @@ static const struct inode_operations bpf_prog_iops = { };
> >  static const struct inode_operations bpf_map_iops  = { };
> >  static const struct inode_operations bpf_link_iops  = { };
> >
> > -static struct inode *bpf_get_inode(struct super_block *sb,
> > -                                const struct inode *dir,
> > -                                umode_t mode)
> > +struct inode *bpf_get_inode(struct super_block *sb,
> > +                         const struct inode *dir,
> > +                         umode_t mode)
> >  {
> >       struct inode *inode;
> >
> > @@ -603,11 +603,13 @@ static int bpf_show_options(struct seq_file *m, struct dentry *root)
> >  {
> >       struct bpf_mount_opts *opts = root->d_sb->s_fs_info;
> >       umode_t mode = d_inode(root)->i_mode & S_IALLUGO & ~S_ISVTX;
> > +     u64 mask;
> >
> >       if (mode != S_IRWXUGO)
> >               seq_printf(m, ",mode=%o", mode);
> >
> > -     if (opts->delegate_cmds == ~0ULL)
> > +     mask = (1ULL << __MAX_BPF_CMD) - 1;
> > +     if ((opts->delegate_cmds & mask) == mask)
> >               seq_printf(m, ",delegate_cmds=any");
>
> Should we add a BUILD_BUG_ON assertion to guarantee __MAX_BPF_CMD is
> less than sizeof(u64) * 8 ?

yep, good idea, will add for CMD and all others

> >       else if (opts->delegate_cmds)
> >               seq_printf(m, ",delegate_cmds=0x%llx", opts->delegate_cmds);
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 7445dad01fb3..b47791a80930 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -5304,6 +5304,20 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
> >       return ret;
> >  }
> >
> > +#define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_pathname
> > +
> > +static int token_create(union bpf_attr *attr)
> > +{
> > +     if (CHECK_ATTR(BPF_TOKEN_CREATE))
> > +             return -EINVAL;
> > +
> > +     /* no flags are supported yet */
> > +     if (attr->token_create.flags)
> > +             return -EINVAL;
> > +
> > +     return bpf_token_create(attr);
> > +}
> > +
> >  static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
> >  {
> >       union bpf_attr attr;
> > @@ -5437,6 +5451,9 @@ static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
> >       case BPF_PROG_BIND_MAP:
> >               err = bpf_prog_bind_map(&attr);
> >               break;
> > +     case BPF_TOKEN_CREATE:
> > +             err = token_create(&attr);
> > +             break;
> >       default:
> >               err = -EINVAL;
> >               break;
> > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > new file mode 100644
> > index 000000000000..779aad5007a3
> > --- /dev/null
> > +++ b/kernel/bpf/token.c
> SNIP
> > +#define BPF_TOKEN_INODE_NAME "bpf-token"
> > +
> > +static const struct inode_operations bpf_token_iops = { };
> > +
> > +static const struct file_operations bpf_token_fops = {
> > +     .release        = bpf_token_release,
> > +     .show_fdinfo    = bpf_token_show_fdinfo,
> > +};
> > +
> > +int bpf_token_create(union bpf_attr *attr)
> > +{
> > +     struct bpf_mount_opts *mnt_opts;
> > +     struct bpf_token *token = NULL;
> > +     struct inode *inode;
> > +     struct file *file;
> > +     struct path path;
> > +     umode_t mode;
> > +     int err, fd;
> > +
> > +     err = user_path_at(attr->token_create.bpffs_path_fd,
> > +                        u64_to_user_ptr(attr->token_create.bpffs_pathname),
> > +                        LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
> > +     if (err)
> > +             return err;
>
> Need to check the mount is a bpffs mount instead of other filesystem mount.

yep, missed that. Fixed, will check `path.mnt->mnt_sb->s_op != &bpf_super_ops`.

> > +
> > +     if (path.mnt->mnt_root != path.dentry) {
> > +             err = -EINVAL;
> > +             goto out_path;
> > +     }
> > +     err = path_permission(&path, MAY_ACCESS);
> > +     if (err)
> > +             goto out_path;
> > +
> > +     mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
> > +     inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
> > +     if (IS_ERR(inode)) {
> > +             err = PTR_ERR(inode);
> > +             goto out_path;
> > +     }
> > +
> > +     inode->i_op = &bpf_token_iops;
> > +     inode->i_fop = &bpf_token_fops;
> > +     clear_nlink(inode); /* make sure it is unlinked */
> > +
> > +     file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
> > +     if (IS_ERR(file)) {
> > +             iput(inode);
> > +             err = PTR_ERR(file);
> > +             goto out_file;
>
> goto out_path ?

eagle eye, fixed, thanks!


> > +     }
> > +
> > +     token = bpf_token_alloc();
> > +     if (!token) {
> > +             err = -ENOMEM;
> > +             goto out_file;
> > +     }
> > +
> > +     /* remember bpffs owning userns for future ns_capable() checks */
> > +     token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
> > +
> > +     mnt_opts = path.dentry->d_sb->s_fs_info;
> > +     token->allowed_cmds = mnt_opts->delegate_cmds;
> > +
> > +     fd = get_unused_fd_flags(O_CLOEXEC);
> > +     if (fd < 0) {
> > +             err = fd;
> > +             goto out_token;
> > +     }
> > +
> > +     file->private_data = token;
> > +     fd_install(fd, file);
> > +
> > +     path_put(&path);
> > +     return fd;
> > +
> > +out_token:
> > +     bpf_token_free(token);
> > +out_file:
> > +     fput(file);
> > +out_path:
> > +     path_put(&path);
> > +     return err;
> > +}
> > +
> .
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 3/13] bpf: introduce BPF token object
  2023-10-11  1:17   ` [PATCH v6 3/13] " Paul Moore
@ 2023-10-12  0:31     ` Andrii Nakryiko
  2023-10-12 21:48       ` Andrii Nakryiko
  2023-10-12 23:18       ` Paul Moore
  0 siblings, 2 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-10-12  0:31 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, netdev, linux-fsdevel,
	linux-security-module, keescook, brauner, lennart, kernel-team,
	sargun, selinux

On Tue, Oct 10, 2023 at 6:17 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Sep 27, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > allow delegating privileged BPF functionality, like loading a BPF
> > program or creating a BPF map, from privileged process to a *trusted*
> > unprivileged process, all while have a good amount of control over which
> > privileged operations could be performed using provided BPF token.
> >
> > This is achieved through mounting BPF FS instance with extra delegation
> > mount options, which determine what operations are delegatable, and also
> > constraining it to the owning user namespace (as mentioned in the
> > previous patch).
> >
> > BPF token itself is just a derivative from BPF FS and can be created
> > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > a path specification (using the usual fd + string path combo) to a BPF
> > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > prog type, and attach type bit sets from BPF FS as is. In the future,
> > having an BPF token as a separate object with its own FD, we can allow
> > to further restrict BPF token's allowable set of things either at the creation
> > time or after the fact, allowing the process to guard itself further
> > from, e.g., unintentionally trying to load undesired kind of BPF
> > programs. But for now we keep things simple and just copy bit sets as is.
> >
> > When BPF token is created from BPF FS mount, we take reference to the
> > BPF super block's owning user namespace, and then use that namespace for
> > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > capabilities that are normally only checked against init userns (using
> > capable()), but now we check them using ns_capable() instead (if BPF
> > token is provided). See bpf_token_capable() for details.
> >
> > Such setup means that BPF token in itself is not sufficient to grant BPF
> > functionality. User namespaced process has to *also* have necessary
> > combination of capabilities inside that user namespace. So while
> > previously CAP_BPF was useless when granted within user namespace, now
> > it gains a meaning and allows container managers and sys admins to have
> > a flexible control over which processes can and need to use BPF
> > functionality within the user namespace (i.e., container in practice).
> > And BPF FS delegation mount options and derived BPF tokens serve as
> > a per-container "flag" to grant overall ability to use bpf() (plus further
> > restrict on which parts of bpf() syscalls are treated as namespaced).
> >
> > The alternative to creating BPF token object was:
> >   a) not having any extra object and just pasing BPF FS path to each
> >      relevant bpf() command. This seems suboptimal as it's racy (mount
> >      under the same path might change in between checking it and using it
> >      for bpf() command). And also less flexible if we'd like to further
> >      restrict ourselves compared to all the delegated functionality
> >      allowed on BPF FS.
> >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> >      a dedicated FD that would represent a token-like functionality. This
> >      doesn't seem superior to having a proper bpf() command, so
> >      BPF_TOKEN_CREATE was chosen.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf.h            |  40 +++++++
> >  include/uapi/linux/bpf.h       |  39 +++++++
> >  kernel/bpf/Makefile            |   2 +-
> >  kernel/bpf/inode.c             |  10 +-
> >  kernel/bpf/syscall.c           |  17 +++
> >  kernel/bpf/token.c             | 197 +++++++++++++++++++++++++++++++++
> >  tools/include/uapi/linux/bpf.h |  39 +++++++
> >  7 files changed, 339 insertions(+), 5 deletions(-)
> >  create mode 100644 kernel/bpf/token.c
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index a5bd40f71fd0..c43131a24579 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1572,6 +1576,13 @@ struct bpf_mount_opts {
> >       u64 delegate_attachs;
> >  };
> >
> > +struct bpf_token {
> > +     struct work_struct work;
> > +     atomic64_t refcnt;
> > +     struct user_namespace *userns;
> > +     u64 allowed_cmds;
>
> We'll also need a 'void *security' field to go along with the BPF token
> allocation/creation/free hooks, see my comments below.  This is similar
> to what we do for other kernel objects.
>

ok, I'm thinking of adding a dedicated patch for all the
security-related stuff and refactoring of existing LSM hook(s).

> > +};
> > +
>
> ...
>
> > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > new file mode 100644
> > index 000000000000..779aad5007a3
> > --- /dev/null
> > +++ b/kernel/bpf/token.c
> > @@ -0,0 +1,197 @@
> > +#include <linux/bpf.h>
> > +#include <linux/vmalloc.h>
> > +#include <linux/anon_inodes.h>
>
> Probably don't need the anon_inode.h include anymore.

yep, dropped

>
> > +#include <linux/fdtable.h>
> > +#include <linux/file.h>
> > +#include <linux/fs.h>
> > +#include <linux/kernel.h>
> > +#include <linux/idr.h>
> > +#include <linux/namei.h>
> > +#include <linux/user_namespace.h>
> > +
> > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > +{
> > +     /* BPF token allows ns_capable() level of capabilities */
> > +     if (token) {
>
> I think we want a LSM hook here before the token is used in the
> capability check.  The LSM will see the capability check, but it will
> not be able to distinguish it from the process which created the
> delegation token.  This is arguably the purpose of the delegation, but
> with the LSM we want to be able to control who can use the delegated
> privilege.  How about something like this:
>
>   if (security_bpf_token_capable(token, cap))
>      return false;

sounds good, I'll add this hook

btw, I'm thinking of guarding the BPF_TOKEN_CREATE command behind the
ns_capable(CAP_BPF) check, WDYT? This seems appropriate. You can get
BPF token only if you have CAP_BPF **within the userns**, so any
process not granted CAP_BPF within namespace ("container") is
guaranteed to not be able to do anything with BPF token.

>
> > +             if (ns_capable(token->userns, cap))
> > +                     return true;
> > +             if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > +                     return true;
> > +     }
> > +     /* otherwise fallback to capable() checks */
> > +     return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > +}
> > +
> > +void bpf_token_inc(struct bpf_token *token)
> > +{
> > +     atomic64_inc(&token->refcnt);
> > +}
> > +
> > +static void bpf_token_free(struct bpf_token *token)
> > +{
>
> We should have a LSM hook here to handle freeing the LSM state
> associated with the token.
>
>   security_bpf_token_free(token);
>

yep

> > +     put_user_ns(token->userns);
> > +     kvfree(token);
> > +}
>
> ...
>
> > +static struct bpf_token *bpf_token_alloc(void)
> > +{
> > +     struct bpf_token *token;
> > +
> > +     token = kvzalloc(sizeof(*token), GFP_USER);
> > +     if (!token)
> > +             return NULL;
> > +
> > +     atomic64_set(&token->refcnt, 1);
>
> We should have a LSM hook here to allocate the LSM state associated
> with the token.
>
>   if (security_bpf_token_alloc(token)) {
>     kvfree(token);
>     return NULL;
>   }
>
> > +     return token;
> > +}
>
> ...
>

Would having userns and allowed_* masks filled out by that time inside
the token be useful (seems so if we treat bpf_token_alloc as generic
LSM hook). If yes, I'll add security_bpf_token_alloc() after all that
is filled out, right before we try to get unused fd. WDYT?


> > +int bpf_token_create(union bpf_attr *attr)
> > +{
> > +     struct bpf_mount_opts *mnt_opts;
> > +     struct bpf_token *token = NULL;
> > +     struct inode *inode;
> > +     struct file *file;
> > +     struct path path;
> > +     umode_t mode;
> > +     int err, fd;
> > +
> > +     err = user_path_at(attr->token_create.bpffs_path_fd,
> > +                        u64_to_user_ptr(attr->token_create.bpffs_pathname),
> > +                        LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
> > +     if (err)
> > +             return err;
> > +
> > +     if (path.mnt->mnt_root != path.dentry) {
> > +             err = -EINVAL;
> > +             goto out_path;
> > +     }
> > +     err = path_permission(&path, MAY_ACCESS);
> > +     if (err)
> > +             goto out_path;
> > +
> > +     mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
> > +     inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
> > +     if (IS_ERR(inode)) {
> > +             err = PTR_ERR(inode);
> > +             goto out_path;
> > +     }
> > +
> > +     inode->i_op = &bpf_token_iops;
> > +     inode->i_fop = &bpf_token_fops;
> > +     clear_nlink(inode); /* make sure it is unlinked */
> > +
> > +     file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
> > +     if (IS_ERR(file)) {
> > +             iput(inode);
> > +             err = PTR_ERR(file);
> > +             goto out_file;
> > +     }
> > +
> > +     token = bpf_token_alloc();
> > +     if (!token) {
> > +             err = -ENOMEM;
> > +             goto out_file;
> > +     }
> > +
> > +     /* remember bpffs owning userns for future ns_capable() checks */
> > +     token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
> > +
> > +     mnt_opts = path.dentry->d_sb->s_fs_info;
> > +     token->allowed_cmds = mnt_opts->delegate_cmds;
>
> I think we would want a LSM hook here, both to control the creation
> of the token and mark it with the security attributes of the creating
> process.  How about something like this:
>
>   err = security_bpf_token_create(token);
>   if (err)
>     goto out_token;

hmm... so you'd like both security_bpf_token_alloc() and
security_bpf_token_create()? They seem almost identical, do we need
two? Or is it that the security_bpf_token_alloc() is supposed to be
only used to create those `void *security` context pieces, while
security_bpf_token_create() is actually going to be used for
enforcement? For my own education, is there some explicit flag or some
other sort of mark between LSM hooks for setting up security vs
enforcement? Or is it mostly based on convention and implicitly
following the split?

>
> > +     fd = get_unused_fd_flags(O_CLOEXEC);
> > +     if (fd < 0) {
> > +             err = fd;
> > +             goto out_token;
> > +     }
> > +
> > +     file->private_data = token;
> > +     fd_install(fd, file);
> > +
> > +     path_put(&path);
> > +     return fd;
> > +
> > +out_token:
> > +     bpf_token_free(token);
> > +out_file:
> > +     fput(file);
> > +out_path:
> > +     path_put(&path);
> > +     return err;
> > +}
>
> ...
>
> > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > +{
> > +     if (!token)
> > +             return false;
> > +
> > +     return token->allowed_cmds & (1ULL << cmd);
>
> Similar to bpf_token_capable(), I believe we want a LSM hook here to
> control who is allowed to use the delegated privilege.
>
>   bool bpf_token_allow_cmd(...)
>   {
>     if (token && (token->allowed_cmds & (1ULL << cmd))
>       return security_bpf_token_cmd(token, cmd);

ok, so I guess I'll have to add all four variants:
security_bpf_token_{cmd,map_type,prog_type,attach_type}, right?



>     return false;
>   }
>
> > +}
>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 6/13] bpf: add BPF token support to BPF_PROG_LOAD command
  2023-10-11  1:17   ` [PATCH v6 6/13] " Paul Moore
@ 2023-10-12  0:31     ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-10-12  0:31 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, netdev, linux-fsdevel,
	linux-security-module, keescook, brauner, lennart, kernel-team,
	sargun, selinux

On Tue, Oct 10, 2023 at 6:17 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Sep 27, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> >
> > Add basic support of BPF token to BPF_PROG_LOAD. Wire through a set of
> > allowed BPF program types and attach types, derived from BPF FS at BPF
> > token creation time. Then make sure we perform bpf_token_capable()
> > checks everywhere where it's relevant.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
> >  include/linux/bpf.h                           |  6 ++
> >  include/uapi/linux/bpf.h                      |  2 +
> >  kernel/bpf/core.c                             |  1 +
> >  kernel/bpf/inode.c                            |  6 +-
> >  kernel/bpf/syscall.c                          | 87 ++++++++++++++-----
> >  kernel/bpf/token.c                            | 25 ++++++
> >  tools/include/uapi/linux/bpf.h                |  2 +
> >  .../selftests/bpf/prog_tests/libbpf_probes.c  |  2 +
> >  .../selftests/bpf/prog_tests/libbpf_str.c     |  3 +
> >  9 files changed, 108 insertions(+), 26 deletions(-)
>
> ...
>
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index 5c5c2b6648b2..d0b219f09bcc 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -2685,6 +2718,10 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
> >       prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE;
> >       prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS;
> >
> > +     /* move token into prog->aux, reuse taken refcnt */
> > +     prog->aux->token = token;
> > +     token = NULL;
> > +
> >       err = security_bpf_prog_alloc(prog->aux);
> >       if (err)
> >               goto free_prog;
>
> As we discussed in the earlier thread, let's tweak/rename/move the
> security_bpf_prog_alloc() call down to just before the bpf_check() call
> so it looks something like this:
>
>   err = security_bpf_prog_load(prog, &attr, token);
>   if (err)
>     goto proper_jump_label;
>
>   err = bpf_check(...);
>
> With the idea being that LSMs which implement the token hooks would
> skip any BPF_PROG_LOAD access controls in security_bpf() and instead
> implement them in security_bpf_prog_load().
>
> We should also do something similar for map_create() and
> security_bpf_map_alloc() in patch 4/13.

Sounds good, will do!


>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 3/13] bpf: introduce BPF token object
  2023-10-12  0:31     ` Andrii Nakryiko
@ 2023-10-12 21:48       ` Andrii Nakryiko
  2023-10-12 23:43         ` Paul Moore
  2023-10-12 23:18       ` Paul Moore
  1 sibling, 1 reply; 29+ messages in thread
From: Andrii Nakryiko @ 2023-10-12 21:48 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, netdev, linux-fsdevel,
	linux-security-module, keescook, brauner, lennart, kernel-team,
	sargun, selinux

On Wed, Oct 11, 2023 at 5:31 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Oct 10, 2023 at 6:17 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Sep 27, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > > allow delegating privileged BPF functionality, like loading a BPF
> > > program or creating a BPF map, from privileged process to a *trusted*
> > > unprivileged process, all while have a good amount of control over which
> > > privileged operations could be performed using provided BPF token.
> > >
> > > This is achieved through mounting BPF FS instance with extra delegation
> > > mount options, which determine what operations are delegatable, and also
> > > constraining it to the owning user namespace (as mentioned in the
> > > previous patch).
> > >
> > > BPF token itself is just a derivative from BPF FS and can be created
> > > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > > a path specification (using the usual fd + string path combo) to a BPF
> > > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > > prog type, and attach type bit sets from BPF FS as is. In the future,
> > > having an BPF token as a separate object with its own FD, we can allow
> > > to further restrict BPF token's allowable set of things either at the creation
> > > time or after the fact, allowing the process to guard itself further
> > > from, e.g., unintentionally trying to load undesired kind of BPF
> > > programs. But for now we keep things simple and just copy bit sets as is.
> > >
> > > When BPF token is created from BPF FS mount, we take reference to the
> > > BPF super block's owning user namespace, and then use that namespace for
> > > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > > capabilities that are normally only checked against init userns (using
> > > capable()), but now we check them using ns_capable() instead (if BPF
> > > token is provided). See bpf_token_capable() for details.
> > >
> > > Such setup means that BPF token in itself is not sufficient to grant BPF
> > > functionality. User namespaced process has to *also* have necessary
> > > combination of capabilities inside that user namespace. So while
> > > previously CAP_BPF was useless when granted within user namespace, now
> > > it gains a meaning and allows container managers and sys admins to have
> > > a flexible control over which processes can and need to use BPF
> > > functionality within the user namespace (i.e., container in practice).
> > > And BPF FS delegation mount options and derived BPF tokens serve as
> > > a per-container "flag" to grant overall ability to use bpf() (plus further
> > > restrict on which parts of bpf() syscalls are treated as namespaced).
> > >
> > > The alternative to creating BPF token object was:
> > >   a) not having any extra object and just pasing BPF FS path to each
> > >      relevant bpf() command. This seems suboptimal as it's racy (mount
> > >      under the same path might change in between checking it and using it
> > >      for bpf() command). And also less flexible if we'd like to further
> > >      restrict ourselves compared to all the delegated functionality
> > >      allowed on BPF FS.
> > >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> > >      a dedicated FD that would represent a token-like functionality. This
> > >      doesn't seem superior to having a proper bpf() command, so
> > >      BPF_TOKEN_CREATE was chosen.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  include/linux/bpf.h            |  40 +++++++
> > >  include/uapi/linux/bpf.h       |  39 +++++++
> > >  kernel/bpf/Makefile            |   2 +-
> > >  kernel/bpf/inode.c             |  10 +-
> > >  kernel/bpf/syscall.c           |  17 +++
> > >  kernel/bpf/token.c             | 197 +++++++++++++++++++++++++++++++++
> > >  tools/include/uapi/linux/bpf.h |  39 +++++++
> > >  7 files changed, 339 insertions(+), 5 deletions(-)
> > >  create mode 100644 kernel/bpf/token.c
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index a5bd40f71fd0..c43131a24579 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -1572,6 +1576,13 @@ struct bpf_mount_opts {
> > >       u64 delegate_attachs;
> > >  };
> > >
> > > +struct bpf_token {
> > > +     struct work_struct work;
> > > +     atomic64_t refcnt;
> > > +     struct user_namespace *userns;
> > > +     u64 allowed_cmds;
> >
> > We'll also need a 'void *security' field to go along with the BPF token
> > allocation/creation/free hooks, see my comments below.  This is similar
> > to what we do for other kernel objects.
> >
>
> ok, I'm thinking of adding a dedicated patch for all the
> security-related stuff and refactoring of existing LSM hook(s).
>
> > > +};
> > > +
> >
> > ...
> >
> > > diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> > > new file mode 100644
> > > index 000000000000..779aad5007a3
> > > --- /dev/null
> > > +++ b/kernel/bpf/token.c
> > > @@ -0,0 +1,197 @@
> > > +#include <linux/bpf.h>
> > > +#include <linux/vmalloc.h>
> > > +#include <linux/anon_inodes.h>
> >
> > Probably don't need the anon_inode.h include anymore.
>
> yep, dropped
>
> >
> > > +#include <linux/fdtable.h>
> > > +#include <linux/file.h>
> > > +#include <linux/fs.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/idr.h>
> > > +#include <linux/namei.h>
> > > +#include <linux/user_namespace.h>
> > > +
> > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > +{
> > > +     /* BPF token allows ns_capable() level of capabilities */
> > > +     if (token) {
> >
> > I think we want a LSM hook here before the token is used in the
> > capability check.  The LSM will see the capability check, but it will
> > not be able to distinguish it from the process which created the
> > delegation token.  This is arguably the purpose of the delegation, but
> > with the LSM we want to be able to control who can use the delegated
> > privilege.  How about something like this:
> >
> >   if (security_bpf_token_capable(token, cap))
> >      return false;
>
> sounds good, I'll add this hook
>
> btw, I'm thinking of guarding the BPF_TOKEN_CREATE command behind the
> ns_capable(CAP_BPF) check, WDYT? This seems appropriate. You can get
> BPF token only if you have CAP_BPF **within the userns**, so any
> process not granted CAP_BPF within namespace ("container") is
> guaranteed to not be able to do anything with BPF token.
>
> >
> > > +             if (ns_capable(token->userns, cap))
> > > +                     return true;
> > > +             if (cap != CAP_SYS_ADMIN && ns_capable(token->userns, CAP_SYS_ADMIN))
> > > +                     return true;
> > > +     }
> > > +     /* otherwise fallback to capable() checks */
> > > +     return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN));
> > > +}
> > > +
> > > +void bpf_token_inc(struct bpf_token *token)
> > > +{
> > > +     atomic64_inc(&token->refcnt);
> > > +}
> > > +
> > > +static void bpf_token_free(struct bpf_token *token)
> > > +{
> >
> > We should have a LSM hook here to handle freeing the LSM state
> > associated with the token.
> >
> >   security_bpf_token_free(token);
> >
>
> yep
>
> > > +     put_user_ns(token->userns);
> > > +     kvfree(token);
> > > +}
> >
> > ...
> >
> > > +static struct bpf_token *bpf_token_alloc(void)
> > > +{
> > > +     struct bpf_token *token;
> > > +
> > > +     token = kvzalloc(sizeof(*token), GFP_USER);
> > > +     if (!token)
> > > +             return NULL;
> > > +
> > > +     atomic64_set(&token->refcnt, 1);
> >
> > We should have a LSM hook here to allocate the LSM state associated
> > with the token.
> >
> >   if (security_bpf_token_alloc(token)) {
> >     kvfree(token);
> >     return NULL;
> >   }
> >
> > > +     return token;
> > > +}
> >
> > ...
> >
>
> Would having userns and allowed_* masks filled out by that time inside
> the token be useful (seems so if we treat bpf_token_alloc as generic
> LSM hook). If yes, I'll add security_bpf_token_alloc() after all that
> is filled out, right before we try to get unused fd. WDYT?
>
>
> > > +int bpf_token_create(union bpf_attr *attr)
> > > +{
> > > +     struct bpf_mount_opts *mnt_opts;
> > > +     struct bpf_token *token = NULL;
> > > +     struct inode *inode;
> > > +     struct file *file;
> > > +     struct path path;
> > > +     umode_t mode;
> > > +     int err, fd;
> > > +
> > > +     err = user_path_at(attr->token_create.bpffs_path_fd,
> > > +                        u64_to_user_ptr(attr->token_create.bpffs_pathname),
> > > +                        LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
> > > +     if (err)
> > > +             return err;
> > > +
> > > +     if (path.mnt->mnt_root != path.dentry) {
> > > +             err = -EINVAL;
> > > +             goto out_path;
> > > +     }
> > > +     err = path_permission(&path, MAY_ACCESS);
> > > +     if (err)
> > > +             goto out_path;
> > > +
> > > +     mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
> > > +     inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
> > > +     if (IS_ERR(inode)) {
> > > +             err = PTR_ERR(inode);
> > > +             goto out_path;
> > > +     }
> > > +
> > > +     inode->i_op = &bpf_token_iops;
> > > +     inode->i_fop = &bpf_token_fops;
> > > +     clear_nlink(inode); /* make sure it is unlinked */
> > > +
> > > +     file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
> > > +     if (IS_ERR(file)) {
> > > +             iput(inode);
> > > +             err = PTR_ERR(file);
> > > +             goto out_file;
> > > +     }
> > > +
> > > +     token = bpf_token_alloc();
> > > +     if (!token) {
> > > +             err = -ENOMEM;
> > > +             goto out_file;
> > > +     }
> > > +
> > > +     /* remember bpffs owning userns for future ns_capable() checks */
> > > +     token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
> > > +
> > > +     mnt_opts = path.dentry->d_sb->s_fs_info;
> > > +     token->allowed_cmds = mnt_opts->delegate_cmds;
> >
> > I think we would want a LSM hook here, both to control the creation
> > of the token and mark it with the security attributes of the creating
> > process.  How about something like this:
> >
> >   err = security_bpf_token_create(token);
> >   if (err)
> >     goto out_token;
>
> hmm... so you'd like both security_bpf_token_alloc() and
> security_bpf_token_create()? They seem almost identical, do we need
> two? Or is it that the security_bpf_token_alloc() is supposed to be
> only used to create those `void *security` context pieces, while
> security_bpf_token_create() is actually going to be used for
> enforcement? For my own education, is there some explicit flag or some
> other sort of mark between LSM hooks for setting up security vs
> enforcement? Or is it mostly based on convention and implicitly
> following the split?
>
> >
> > > +     fd = get_unused_fd_flags(O_CLOEXEC);
> > > +     if (fd < 0) {
> > > +             err = fd;
> > > +             goto out_token;
> > > +     }
> > > +
> > > +     file->private_data = token;
> > > +     fd_install(fd, file);
> > > +
> > > +     path_put(&path);
> > > +     return fd;
> > > +
> > > +out_token:
> > > +     bpf_token_free(token);
> > > +out_file:
> > > +     fput(file);
> > > +out_path:
> > > +     path_put(&path);
> > > +     return err;
> > > +}
> >
> > ...
> >
> > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > > +{
> > > +     if (!token)
> > > +             return false;
> > > +
> > > +     return token->allowed_cmds & (1ULL << cmd);
> >
> > Similar to bpf_token_capable(), I believe we want a LSM hook here to
> > control who is allowed to use the delegated privilege.
> >
> >   bool bpf_token_allow_cmd(...)
> >   {
> >     if (token && (token->allowed_cmds & (1ULL << cmd))
> >       return security_bpf_token_cmd(token, cmd);
>
> ok, so I guess I'll have to add all four variants:
> security_bpf_token_{cmd,map_type,prog_type,attach_type}, right?
>

Thinking a bit more about this, I think this is unnecessary. All these
allow checks to control other BPF commands (BPF map creation, BPF
program load, bpf() syscall command, etc). We have dedicated LSM hooks
for each such operation, most importantly security_bpf_prog_load() and
security_bpf_map_create(). I'm extending both of those to be
token-aware, and struct bpf_token is one of the input arguments, so if
LSM need to override BPF token allow_* checks, they can do in
respective more specialized hooks.

Adding so many token hooks, one for each different allow mask (or any
other sort of "allow something" parameter) seems to be excessive. It
will both add too many super-detailed LSM hooks and will unnecessarily
tie BPF token implementation details to LSM hook implementations, IMO.
I'll send v7 with just security_bpf_token_{create,free}(), please take
a look and let me know if you are still not convinced.

>
>
> >     return false;
> >   }
> >
> > > +}
> >
> > --
> > paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 3/13] bpf: introduce BPF token object
  2023-10-12  0:31     ` Andrii Nakryiko
  2023-10-12 21:48       ` Andrii Nakryiko
@ 2023-10-12 23:18       ` Paul Moore
  2023-10-12 23:45         ` Andrii Nakryiko
  1 sibling, 1 reply; 29+ messages in thread
From: Paul Moore @ 2023-10-12 23:18 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, netdev, linux-fsdevel,
	linux-security-module, keescook, brauner, lennart, kernel-team,
	sargun, selinux

On Wed, Oct 11, 2023 at 8:31 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Tue, Oct 10, 2023 at 6:17 PM Paul Moore <paul@paul-moore.com> wrote:
> > On Sep 27, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> > >
> > > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > > allow delegating privileged BPF functionality, like loading a BPF
> > > program or creating a BPF map, from privileged process to a *trusted*
> > > unprivileged process, all while have a good amount of control over which
> > > privileged operations could be performed using provided BPF token.
> > >
> > > This is achieved through mounting BPF FS instance with extra delegation
> > > mount options, which determine what operations are delegatable, and also
> > > constraining it to the owning user namespace (as mentioned in the
> > > previous patch).
> > >
> > > BPF token itself is just a derivative from BPF FS and can be created
> > > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > > a path specification (using the usual fd + string path combo) to a BPF
> > > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > > prog type, and attach type bit sets from BPF FS as is. In the future,
> > > having an BPF token as a separate object with its own FD, we can allow
> > > to further restrict BPF token's allowable set of things either at the creation
> > > time or after the fact, allowing the process to guard itself further
> > > from, e.g., unintentionally trying to load undesired kind of BPF
> > > programs. But for now we keep things simple and just copy bit sets as is.
> > >
> > > When BPF token is created from BPF FS mount, we take reference to the
> > > BPF super block's owning user namespace, and then use that namespace for
> > > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > > capabilities that are normally only checked against init userns (using
> > > capable()), but now we check them using ns_capable() instead (if BPF
> > > token is provided). See bpf_token_capable() for details.
> > >
> > > Such setup means that BPF token in itself is not sufficient to grant BPF
> > > functionality. User namespaced process has to *also* have necessary
> > > combination of capabilities inside that user namespace. So while
> > > previously CAP_BPF was useless when granted within user namespace, now
> > > it gains a meaning and allows container managers and sys admins to have
> > > a flexible control over which processes can and need to use BPF
> > > functionality within the user namespace (i.e., container in practice).
> > > And BPF FS delegation mount options and derived BPF tokens serve as
> > > a per-container "flag" to grant overall ability to use bpf() (plus further
> > > restrict on which parts of bpf() syscalls are treated as namespaced).
> > >
> > > The alternative to creating BPF token object was:
> > >   a) not having any extra object and just pasing BPF FS path to each
> > >      relevant bpf() command. This seems suboptimal as it's racy (mount
> > >      under the same path might change in between checking it and using it
> > >      for bpf() command). And also less flexible if we'd like to further
> > >      restrict ourselves compared to all the delegated functionality
> > >      allowed on BPF FS.
> > >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> > >      a dedicated FD that would represent a token-like functionality. This
> > >      doesn't seem superior to having a proper bpf() command, so
> > >      BPF_TOKEN_CREATE was chosen.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > ---
> > >  include/linux/bpf.h            |  40 +++++++
> > >  include/uapi/linux/bpf.h       |  39 +++++++
> > >  kernel/bpf/Makefile            |   2 +-
> > >  kernel/bpf/inode.c             |  10 +-
> > >  kernel/bpf/syscall.c           |  17 +++
> > >  kernel/bpf/token.c             | 197 +++++++++++++++++++++++++++++++++
> > >  tools/include/uapi/linux/bpf.h |  39 +++++++
> > >  7 files changed, 339 insertions(+), 5 deletions(-)
> > >  create mode 100644 kernel/bpf/token.c
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index a5bd40f71fd0..c43131a24579 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -1572,6 +1576,13 @@ struct bpf_mount_opts {
> > >       u64 delegate_attachs;
> > >  };
> > >
> > > +struct bpf_token {
> > > +     struct work_struct work;
> > > +     atomic64_t refcnt;
> > > +     struct user_namespace *userns;
> > > +     u64 allowed_cmds;
> >
> > We'll also need a 'void *security' field to go along with the BPF token
> > allocation/creation/free hooks, see my comments below.  This is similar
> > to what we do for other kernel objects.
>
> ok, I'm thinking of adding a dedicated patch for all the
> security-related stuff and refactoring of existing LSM hook(s).

No objection here.  My main concern is that we get the LSM stuff in
the same patchset as the rest of the BPF token patches; if all of the
LSM bits are in a separate patch I'm not bothered.

Once we settle on the LSM hooks I'll draft a SELinux implementation of
the hooks which I'll hand off to you for inclusion in the patchset as
well.  I'd encourage the other LSMs that are interested to do the
same.

> > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > +{
> > > +     /* BPF token allows ns_capable() level of capabilities */
> > > +     if (token) {
> >
> > I think we want a LSM hook here before the token is used in the
> > capability check.  The LSM will see the capability check, but it will
> > not be able to distinguish it from the process which created the
> > delegation token.  This is arguably the purpose of the delegation, but
> > with the LSM we want to be able to control who can use the delegated
> > privilege.  How about something like this:
> >
> >   if (security_bpf_token_capable(token, cap))
> >      return false;
>
> sounds good, I'll add this hook
>
> btw, I'm thinking of guarding the BPF_TOKEN_CREATE command behind the
> ns_capable(CAP_BPF) check, WDYT? This seems appropriate. You can get
> BPF token only if you have CAP_BPF **within the userns**, so any
> process not granted CAP_BPF within namespace ("container") is
> guaranteed to not be able to do anything with BPF token.

I don't recall seeing any capability checks guarding BPF_TOKEN_CREATE
in this revision so I don't think you're weakening the restrictions
any, and the logic above seems reasonable: if you don't have CAP_BPF
you shouldn't be creating a token.

> > > +static struct bpf_token *bpf_token_alloc(void)
> > > +{
> > > +     struct bpf_token *token;
> > > +
> > > +     token = kvzalloc(sizeof(*token), GFP_USER);
> > > +     if (!token)
> > > +             return NULL;
> > > +
> > > +     atomic64_set(&token->refcnt, 1);
> >
> > We should have a LSM hook here to allocate the LSM state associated
> > with the token.
> >
> >   if (security_bpf_token_alloc(token)) {
> >     kvfree(token);
> >     return NULL;
> >   }
> >
> > > +     return token;
> > > +}
> >
> > ...
> >
>
> Would having userns and allowed_* masks filled out by that time inside
> the token be useful (seems so if we treat bpf_token_alloc as generic
> LSM hook). If yes, I'll add security_bpf_token_alloc() after all that
> is filled out, right before we try to get unused fd. WDYT?

The security_bpf_token_alloc() hook isn't intended to do any access
control, it's just there so that the LSMs which need to allocate state
for the token object can do so at the same time the token is
allocated.  It has been my experience that allocating and releasing
the LSM state at the same time as the primary object's state is much
less fragile than disconnecting the two lifetimes and allocating the
LSM state later.

> > > +int bpf_token_create(union bpf_attr *attr)
> > > +{
> > > +     struct bpf_mount_opts *mnt_opts;
> > > +     struct bpf_token *token = NULL;
> > > +     struct inode *inode;
> > > +     struct file *file;
> > > +     struct path path;
> > > +     umode_t mode;
> > > +     int err, fd;
> > > +
> > > +     err = user_path_at(attr->token_create.bpffs_path_fd,
> > > +                        u64_to_user_ptr(attr->token_create.bpffs_pathname),
> > > +                        LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
> > > +     if (err)
> > > +             return err;
> > > +
> > > +     if (path.mnt->mnt_root != path.dentry) {
> > > +             err = -EINVAL;
> > > +             goto out_path;
> > > +     }
> > > +     err = path_permission(&path, MAY_ACCESS);
> > > +     if (err)
> > > +             goto out_path;
> > > +
> > > +     mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
> > > +     inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
> > > +     if (IS_ERR(inode)) {
> > > +             err = PTR_ERR(inode);
> > > +             goto out_path;
> > > +     }
> > > +
> > > +     inode->i_op = &bpf_token_iops;
> > > +     inode->i_fop = &bpf_token_fops;
> > > +     clear_nlink(inode); /* make sure it is unlinked */
> > > +
> > > +     file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
> > > +     if (IS_ERR(file)) {
> > > +             iput(inode);
> > > +             err = PTR_ERR(file);
> > > +             goto out_file;
> > > +     }
> > > +
> > > +     token = bpf_token_alloc();
> > > +     if (!token) {
> > > +             err = -ENOMEM;
> > > +             goto out_file;
> > > +     }
> > > +
> > > +     /* remember bpffs owning userns for future ns_capable() checks */
> > > +     token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
> > > +
> > > +     mnt_opts = path.dentry->d_sb->s_fs_info;
> > > +     token->allowed_cmds = mnt_opts->delegate_cmds;
> >
> > I think we would want a LSM hook here, both to control the creation
> > of the token and mark it with the security attributes of the creating
> > process.  How about something like this:
> >
> >   err = security_bpf_token_create(token);
> >   if (err)
> >     goto out_token;
>
> hmm... so you'd like both security_bpf_token_alloc() and
> security_bpf_token_create()? They seem almost identical, do we need
> two? Or is it that the security_bpf_token_alloc() is supposed to be
> only used to create those `void *security` context pieces, while
> security_bpf_token_create() is actually going to be used for
> enforcement?

I tried to explain a bit in my comment above, but the alloc hook
basically just does the LSM state allocation whereas the token_create
hook does the access control and setting of the LSM state associated
with the token.

If you want to get rid of the bpf_token_alloc() function and fold it
into the bpf_token_create() function then we can go down to one LSM
hook that covers state allocation, initialization, and access control.
However, if it remains possible to allocate a token object outside of
bpf_token_create() I think it is a good idea to keep the dedicated LSM
allocation hooks.

You can apply the same logic to the LSM token state destructor hook.

> For my own education, is there some explicit flag or some
> other sort of mark between LSM hooks for setting up security vs
> enforcement? Or is it mostly based on convention and implicitly
> following the split?

Generally convention based around trying to match the LSM state
lifetime with the lifetime of the associated object; as I mentioned
earlier, we've had problems in the past when the two differ.  If you
look at all of the LSM hooks you'll see a number of
"security_XXXX_alloc()" hooks for things like superblocks, inodes,
files, task structs, creds, and the like; we're just doing the same
things here with BPF tokens.  If you're still not convinced, it may be
worth noting that we currently have security_bpf_map_alloc() and
security_bpf_prog_alloc() hooks.

> > > +     fd = get_unused_fd_flags(O_CLOEXEC);
> > > +     if (fd < 0) {
> > > +             err = fd;
> > > +             goto out_token;
> > > +     }
> > > +
> > > +     file->private_data = token;
> > > +     fd_install(fd, file);
> > > +
> > > +     path_put(&path);
> > > +     return fd;
> > > +
> > > +out_token:
> > > +     bpf_token_free(token);
> > > +out_file:
> > > +     fput(file);
> > > +out_path:
> > > +     path_put(&path);
> > > +     return err;
> > > +}
> >
> > ...
> >
> > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > > +{
> > > +     if (!token)
> > > +             return false;
> > > +
> > > +     return token->allowed_cmds & (1ULL << cmd);
> >
> > Similar to bpf_token_capable(), I believe we want a LSM hook here to
> > control who is allowed to use the delegated privilege.
> >
> >   bool bpf_token_allow_cmd(...)
> >   {
> >     if (token && (token->allowed_cmds & (1ULL << cmd))
> >       return security_bpf_token_cmd(token, cmd);
>
> ok, so I guess I'll have to add all four variants:
> security_bpf_token_{cmd,map_type,prog_type,attach_type}, right?

Not necessarily.  Currently only SELinux provides a set of LSM BPF
access controls, and it only concerns itself with BPF maps and
programs.  From a map perspective it comes down to controlling which
applications can create a map, or use a map created elsewhere (we
label BPF map objects just as we would any other kernel object).  From
a program perspective, it is about loading programs and executing
them.  While not BPF specific, SELinux also provides controls that
restrict the ability to exercise capabilities.

Keeping the above access controls in mind, I'm hopeful you can better
understand the reasoning behind the hooks I'm proposing.  The
security_bpf_token_cmd() hook is there so that we can control a
token's ability to authorize BPF_MAP_CREATE and BPF_PROG_LOAD.  The
security_bpf_token_capable() hook is there so that we can control a
token's ability to authorize the BPF-related capabilities.  While I
guess it's possible someone might develop a security model for BPF
that would require the other LSM hooks you've mentioned above, but I
see no need for that now, and to be honest I don't see any need on the
visible horizon either.

Does that make sense?

--
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 3/13] bpf: introduce BPF token object
  2023-10-12 21:48       ` Andrii Nakryiko
@ 2023-10-12 23:43         ` Paul Moore
  2023-10-12 23:51           ` Andrii Nakryiko
  0 siblings, 1 reply; 29+ messages in thread
From: Paul Moore @ 2023-10-12 23:43 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Andrii Nakryiko, bpf, netdev, linux-fsdevel,
	linux-security-module, keescook, brauner, lennart, kernel-team,
	sargun, selinux

On Thu, Oct 12, 2023 at 5:48 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> On Wed, Oct 11, 2023 at 5:31 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > ok, so I guess I'll have to add all four variants:
> > security_bpf_token_{cmd,map_type,prog_type,attach_type}, right?
> >
>
> Thinking a bit more about this, I think this is unnecessary. All these
> allow checks to control other BPF commands (BPF map creation, BPF
> program load, bpf() syscall command, etc). We have dedicated LSM hooks
> for each such operation, most importantly security_bpf_prog_load() and
> security_bpf_map_create(). I'm extending both of those to be
> token-aware, and struct bpf_token is one of the input arguments, so if
> LSM need to override BPF token allow_* checks, they can do in
> respective more specialized hooks.
>
> Adding so many token hooks, one for each different allow mask (or any
> other sort of "allow something" parameter) seems to be excessive. It
> will both add too many super-detailed LSM hooks and will unnecessarily
> tie BPF token implementation details to LSM hook implementations, IMO.
> I'll send v7 with just security_bpf_token_{create,free}(), please take
> a look and let me know if you are still not convinced.

I'm hoping my last email better explains why we only really need
security_bpf_token_cmd() and security_bpf_token_capable() as opposed
to the full list of security_bpf_token_XXX().  If not, please let me
know and I'll try to do a better job explaining my reasoning :)

One thing I didn't discuss in my last email was why there is value in
having both security_bpf_token_{cmd,capable}() as well as
security_bpf_prog_load(); I'll try to do that below.

As we talked about previously, the reason for having
security_bpf_prog_load() is to provide a token-aware version of
security_bpf().  Currently the LSMs enforce their access controls
around BPF commands using the security_bpf() hook which is
unfortunately well before we have access to the BPF token.  If we want
to be able to take the BPF token into account we need to have a hook
placed after the token is retrieved and validated, hence the
security_bpf_prog_load() hook.  In a kernel that provides BPF tokens,
I would expect that LSMs would use security_bpf() to control access to
BPF operations where a token is not a concern, and new token-aware
security_bpf_OPERATION() hooks when the LSM needs to consider the BPF
token.

With the understanding that security_bpf_prog_load() is essentially a
token-aware version of security_bpf(), I'm hopeful that you can begin
to understand that it  serves a different purpose than
security_bpf_token_{cmd,capable}().  The simple answer is that
security_bpf_token_cmd() applies to more than just BPF_PROG_LOAD, but
the better answer is that it has the ability to impact more than just
the LSM authorization decision.  Hooking the LSM into the
bpf_token_allow_cmd() function allows the LSM to authorize the
individual command overrides independent of the command specific LSM
hook, if one exists.  The security_bpf_token_cmd() hook can allow or
disallow the use of a token for all aspects of a specific BPF
operation including all of the token related logic outside of the LSM,
something the security_bpf_prog_load() hook could never do.

I'm hoping that makes sense :)

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 3/13] bpf: introduce BPF token object
  2023-10-12 23:18       ` Paul Moore
@ 2023-10-12 23:45         ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-10-12 23:45 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, netdev, linux-fsdevel,
	linux-security-module, keescook, brauner, lennart, kernel-team,
	sargun, selinux

On Thu, Oct 12, 2023 at 4:19 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Wed, Oct 11, 2023 at 8:31 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Tue, Oct 10, 2023 at 6:17 PM Paul Moore <paul@paul-moore.com> wrote:
> > > On Sep 27, 2023 Andrii Nakryiko <andrii@kernel.org> wrote:
> > > >
> > > > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > > > allow delegating privileged BPF functionality, like loading a BPF
> > > > program or creating a BPF map, from privileged process to a *trusted*
> > > > unprivileged process, all while have a good amount of control over which
> > > > privileged operations could be performed using provided BPF token.
> > > >
> > > > This is achieved through mounting BPF FS instance with extra delegation
> > > > mount options, which determine what operations are delegatable, and also
> > > > constraining it to the owning user namespace (as mentioned in the
> > > > previous patch).
> > > >
> > > > BPF token itself is just a derivative from BPF FS and can be created
> > > > through a new bpf() syscall command, BPF_TOKEN_CREAT, which accepts
> > > > a path specification (using the usual fd + string path combo) to a BPF
> > > > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > > > prog type, and attach type bit sets from BPF FS as is. In the future,
> > > > having an BPF token as a separate object with its own FD, we can allow
> > > > to further restrict BPF token's allowable set of things either at the creation
> > > > time or after the fact, allowing the process to guard itself further
> > > > from, e.g., unintentionally trying to load undesired kind of BPF
> > > > programs. But for now we keep things simple and just copy bit sets as is.
> > > >
> > > > When BPF token is created from BPF FS mount, we take reference to the
> > > > BPF super block's owning user namespace, and then use that namespace for
> > > > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > > > capabilities that are normally only checked against init userns (using
> > > > capable()), but now we check them using ns_capable() instead (if BPF
> > > > token is provided). See bpf_token_capable() for details.
> > > >
> > > > Such setup means that BPF token in itself is not sufficient to grant BPF
> > > > functionality. User namespaced process has to *also* have necessary
> > > > combination of capabilities inside that user namespace. So while
> > > > previously CAP_BPF was useless when granted within user namespace, now
> > > > it gains a meaning and allows container managers and sys admins to have
> > > > a flexible control over which processes can and need to use BPF
> > > > functionality within the user namespace (i.e., container in practice).
> > > > And BPF FS delegation mount options and derived BPF tokens serve as
> > > > a per-container "flag" to grant overall ability to use bpf() (plus further
> > > > restrict on which parts of bpf() syscalls are treated as namespaced).
> > > >
> > > > The alternative to creating BPF token object was:
> > > >   a) not having any extra object and just pasing BPF FS path to each
> > > >      relevant bpf() command. This seems suboptimal as it's racy (mount
> > > >      under the same path might change in between checking it and using it
> > > >      for bpf() command). And also less flexible if we'd like to further
> > > >      restrict ourselves compared to all the delegated functionality
> > > >      allowed on BPF FS.
> > > >   b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> > > >      a dedicated FD that would represent a token-like functionality. This
> > > >      doesn't seem superior to having a proper bpf() command, so
> > > >      BPF_TOKEN_CREATE was chosen.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > ---
> > > >  include/linux/bpf.h            |  40 +++++++
> > > >  include/uapi/linux/bpf.h       |  39 +++++++
> > > >  kernel/bpf/Makefile            |   2 +-
> > > >  kernel/bpf/inode.c             |  10 +-
> > > >  kernel/bpf/syscall.c           |  17 +++
> > > >  kernel/bpf/token.c             | 197 +++++++++++++++++++++++++++++++++
> > > >  tools/include/uapi/linux/bpf.h |  39 +++++++
> > > >  7 files changed, 339 insertions(+), 5 deletions(-)
> > > >  create mode 100644 kernel/bpf/token.c
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index a5bd40f71fd0..c43131a24579 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -1572,6 +1576,13 @@ struct bpf_mount_opts {
> > > >       u64 delegate_attachs;
> > > >  };
> > > >
> > > > +struct bpf_token {
> > > > +     struct work_struct work;
> > > > +     atomic64_t refcnt;
> > > > +     struct user_namespace *userns;
> > > > +     u64 allowed_cmds;
> > >
> > > We'll also need a 'void *security' field to go along with the BPF token
> > > allocation/creation/free hooks, see my comments below.  This is similar
> > > to what we do for other kernel objects.
> >
> > ok, I'm thinking of adding a dedicated patch for all the
> > security-related stuff and refactoring of existing LSM hook(s).
>
> No objection here.  My main concern is that we get the LSM stuff in
> the same patchset as the rest of the BPF token patches; if all of the
> LSM bits are in a separate patch I'm not bothered.
>
> Once we settle on the LSM hooks I'll draft a SELinux implementation of
> the hooks which I'll hand off to you for inclusion in the patchset as
> well.  I'd encourage the other LSMs that are interested to do the
> same.

I posted v7 ([0]) a few minutes ago, please take a look when you get a
chance. All the LSM-related stuff is in a few separate patches.

  [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=792765&state=*

>
> > > > +bool bpf_token_capable(const struct bpf_token *token, int cap)
> > > > +{
> > > > +     /* BPF token allows ns_capable() level of capabilities */
> > > > +     if (token) {
> > >
> > > I think we want a LSM hook here before the token is used in the
> > > capability check.  The LSM will see the capability check, but it will
> > > not be able to distinguish it from the process which created the
> > > delegation token.  This is arguably the purpose of the delegation, but
> > > with the LSM we want to be able to control who can use the delegated
> > > privilege.  How about something like this:
> > >
> > >   if (security_bpf_token_capable(token, cap))
> > >      return false;
> >
> > sounds good, I'll add this hook
> >
> > btw, I'm thinking of guarding the BPF_TOKEN_CREATE command behind the
> > ns_capable(CAP_BPF) check, WDYT? This seems appropriate. You can get
> > BPF token only if you have CAP_BPF **within the userns**, so any
> > process not granted CAP_BPF within namespace ("container") is
> > guaranteed to not be able to do anything with BPF token.
>
> I don't recall seeing any capability checks guarding BPF_TOKEN_CREATE
> in this revision so I don't think you're weakening the restrictions
> any, and the logic above seems reasonable: if you don't have CAP_BPF
> you shouldn't be creating a token.

Right, there was none based on the assumption that you have to have an
access to BPF FS controlled through other means. But it does seem
right to use ns_capable(CAP_BPF) in addition to all that, so that's
what I did in v7.

>
> > > > +static struct bpf_token *bpf_token_alloc(void)
> > > > +{
> > > > +     struct bpf_token *token;
> > > > +
> > > > +     token = kvzalloc(sizeof(*token), GFP_USER);
> > > > +     if (!token)
> > > > +             return NULL;
> > > > +
> > > > +     atomic64_set(&token->refcnt, 1);
> > >
> > > We should have a LSM hook here to allocate the LSM state associated
> > > with the token.
> > >
> > >   if (security_bpf_token_alloc(token)) {
> > >     kvfree(token);
> > >     return NULL;
> > >   }
> > >
> > > > +     return token;
> > > > +}
> > >
> > > ...
> > >
> >
> > Would having userns and allowed_* masks filled out by that time inside
> > the token be useful (seems so if we treat bpf_token_alloc as generic
> > LSM hook). If yes, I'll add security_bpf_token_alloc() after all that
> > is filled out, right before we try to get unused fd. WDYT?
>
> The security_bpf_token_alloc() hook isn't intended to do any access
> control, it's just there so that the LSMs which need to allocate state
> for the token object can do so at the same time the token is
> allocated.  It has been my experience that allocating and releasing
> the LSM state at the same time as the primary object's state is much
> less fragile than disconnecting the two lifetimes and allocating the
> LSM state later.

You'll be able to see what I did in v7, but I don't think separate
lifetimes are a concern, because all these alloc/free hooks are called
just as bpf_{prog,map,token} are created or right when they are freed,
so in that regard everything is good.

But based on our discussion to rework bpf_prog_alloc/bpf_map_alloc
into bpf_prog_load/bpf_map_create, it would be inconsistent between
prog/map and token. Anyways, the code is out, take a look. If you hate
it, it's just a matter of adding one extra LSM hook for token, map,
and prog each.

>
> > > > +int bpf_token_create(union bpf_attr *attr)
> > > > +{
> > > > +     struct bpf_mount_opts *mnt_opts;
> > > > +     struct bpf_token *token = NULL;
> > > > +     struct inode *inode;
> > > > +     struct file *file;
> > > > +     struct path path;
> > > > +     umode_t mode;
> > > > +     int err, fd;
> > > > +
> > > > +     err = user_path_at(attr->token_create.bpffs_path_fd,
> > > > +                        u64_to_user_ptr(attr->token_create.bpffs_pathname),
> > > > +                        LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
> > > > +     if (err)
> > > > +             return err;
> > > > +
> > > > +     if (path.mnt->mnt_root != path.dentry) {
> > > > +             err = -EINVAL;
> > > > +             goto out_path;
> > > > +     }
> > > > +     err = path_permission(&path, MAY_ACCESS);
> > > > +     if (err)
> > > > +             goto out_path;
> > > > +
> > > > +     mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
> > > > +     inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
> > > > +     if (IS_ERR(inode)) {
> > > > +             err = PTR_ERR(inode);
> > > > +             goto out_path;
> > > > +     }
> > > > +
> > > > +     inode->i_op = &bpf_token_iops;
> > > > +     inode->i_fop = &bpf_token_fops;
> > > > +     clear_nlink(inode); /* make sure it is unlinked */
> > > > +
> > > > +     file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
> > > > +     if (IS_ERR(file)) {
> > > > +             iput(inode);
> > > > +             err = PTR_ERR(file);
> > > > +             goto out_file;
> > > > +     }
> > > > +
> > > > +     token = bpf_token_alloc();
> > > > +     if (!token) {
> > > > +             err = -ENOMEM;
> > > > +             goto out_file;
> > > > +     }
> > > > +
> > > > +     /* remember bpffs owning userns for future ns_capable() checks */
> > > > +     token->userns = get_user_ns(path.dentry->d_sb->s_user_ns);
> > > > +
> > > > +     mnt_opts = path.dentry->d_sb->s_fs_info;
> > > > +     token->allowed_cmds = mnt_opts->delegate_cmds;
> > >
> > > I think we would want a LSM hook here, both to control the creation
> > > of the token and mark it with the security attributes of the creating
> > > process.  How about something like this:
> > >
> > >   err = security_bpf_token_create(token);
> > >   if (err)
> > >     goto out_token;
> >
> > hmm... so you'd like both security_bpf_token_alloc() and
> > security_bpf_token_create()? They seem almost identical, do we need
> > two? Or is it that the security_bpf_token_alloc() is supposed to be
> > only used to create those `void *security` context pieces, while
> > security_bpf_token_create() is actually going to be used for
> > enforcement?
>
> I tried to explain a bit in my comment above, but the alloc hook
> basically just does the LSM state allocation whereas the token_create
> hook does the access control and setting of the LSM state associated
> with the token.
>
> If you want to get rid of the bpf_token_alloc() function and fold it
> into the bpf_token_create() function then we can go down to one LSM
> hook that covers state allocation, initialization, and access control.
> However, if it remains possible to allocate a token object outside of
> bpf_token_create() I think it is a good idea to keep the dedicated LSM
> allocation hooks.
>
> You can apply the same logic to the LSM token state destructor hook.

Yes, so that's what I went with: combining allocation and access
control. And no, there is no way to get a prog/map/token without
having a respective alloc/control hook.

I also called out the fact that we might be having a memory leak when
using multiple LSMs together and them disagreeing on alloc hook
return. You'll see in the v7 what I mean, I mention that in the commit
messages.

>
> > For my own education, is there some explicit flag or some
> > other sort of mark between LSM hooks for setting up security vs
> > enforcement? Or is it mostly based on convention and implicitly
> > following the split?
>
> Generally convention based around trying to match the LSM state
> lifetime with the lifetime of the associated object; as I mentioned
> earlier, we've had problems in the past when the two differ.  If you
> look at all of the LSM hooks you'll see a number of
> "security_XXXX_alloc()" hooks for things like superblocks, inodes,
> files, task structs, creds, and the like; we're just doing the same
> things here with BPF tokens.  If you're still not convinced, it may be
> worth noting that we currently have security_bpf_map_alloc() and
> security_bpf_prog_alloc() hooks.

It's not like I'm convinced or not, I'm trying to keep things
consistent without breaking anything or causing a mess, but also
hopefully not add too many unnecessary hooks. Let's continue on
respective patches in v7, I'm curious to hear your comments based on
the code I posted.

>
> > > > +     fd = get_unused_fd_flags(O_CLOEXEC);
> > > > +     if (fd < 0) {
> > > > +             err = fd;
> > > > +             goto out_token;
> > > > +     }
> > > > +
> > > > +     file->private_data = token;
> > > > +     fd_install(fd, file);
> > > > +
> > > > +     path_put(&path);
> > > > +     return fd;
> > > > +
> > > > +out_token:
> > > > +     bpf_token_free(token);
> > > > +out_file:
> > > > +     fput(file);
> > > > +out_path:
> > > > +     path_put(&path);
> > > > +     return err;
> > > > +}
> > >
> > > ...
> > >
> > > > +bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd)
> > > > +{
> > > > +     if (!token)
> > > > +             return false;
> > > > +
> > > > +     return token->allowed_cmds & (1ULL << cmd);
> > >
> > > Similar to bpf_token_capable(), I believe we want a LSM hook here to
> > > control who is allowed to use the delegated privilege.
> > >
> > >   bool bpf_token_allow_cmd(...)
> > >   {
> > >     if (token && (token->allowed_cmds & (1ULL << cmd))
> > >       return security_bpf_token_cmd(token, cmd);
> >
> > ok, so I guess I'll have to add all four variants:
> > security_bpf_token_{cmd,map_type,prog_type,attach_type}, right?
>
> Not necessarily.  Currently only SELinux provides a set of LSM BPF
> access controls, and it only concerns itself with BPF maps and
> programs.  From a map perspective it comes down to controlling which
> applications can create a map, or use a map created elsewhere (we
> label BPF map objects just as we would any other kernel object).  From
> a program perspective, it is about loading programs and executing
> them.  While not BPF specific, SELinux also provides controls that
> restrict the ability to exercise capabilities.
>
> Keeping the above access controls in mind, I'm hopeful you can better
> understand the reasoning behind the hooks I'm proposing.  The
> security_bpf_token_cmd() hook is there so that we can control a
> token's ability to authorize BPF_MAP_CREATE and BPF_PROG_LOAD.  The
> security_bpf_token_capable() hook is there so that we can control a
> token's ability to authorize the BPF-related capabilities.  While I
> guess it's possible someone might develop a security model for BPF
> that would require the other LSM hooks you've mentioned above, but I
> see no need for that now, and to be honest I don't see any need on the
> visible horizon either.
>
> Does that make sense?

I'll reply to your second email :)

>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v6 3/13] bpf: introduce BPF token object
  2023-10-12 23:43         ` Paul Moore
@ 2023-10-12 23:51           ` Andrii Nakryiko
  0 siblings, 0 replies; 29+ messages in thread
From: Andrii Nakryiko @ 2023-10-12 23:51 UTC (permalink / raw)
  To: Paul Moore
  Cc: Andrii Nakryiko, bpf, netdev, linux-fsdevel,
	linux-security-module, keescook, brauner, lennart, kernel-team,
	sargun, selinux

On Thu, Oct 12, 2023 at 4:43 PM Paul Moore <paul@paul-moore.com> wrote:
>
> On Thu, Oct 12, 2023 at 5:48 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Wed, Oct 11, 2023 at 5:31 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > ok, so I guess I'll have to add all four variants:
> > > security_bpf_token_{cmd,map_type,prog_type,attach_type}, right?
> > >
> >
> > Thinking a bit more about this, I think this is unnecessary. All these
> > allow checks to control other BPF commands (BPF map creation, BPF
> > program load, bpf() syscall command, etc). We have dedicated LSM hooks
> > for each such operation, most importantly security_bpf_prog_load() and
> > security_bpf_map_create(). I'm extending both of those to be
> > token-aware, and struct bpf_token is one of the input arguments, so if
> > LSM need to override BPF token allow_* checks, they can do in
> > respective more specialized hooks.
> >
> > Adding so many token hooks, one for each different allow mask (or any
> > other sort of "allow something" parameter) seems to be excessive. It
> > will both add too many super-detailed LSM hooks and will unnecessarily
> > tie BPF token implementation details to LSM hook implementations, IMO.
> > I'll send v7 with just security_bpf_token_{create,free}(), please take
> > a look and let me know if you are still not convinced.
>
> I'm hoping my last email better explains why we only really need
> security_bpf_token_cmd() and security_bpf_token_capable() as opposed
> to the full list of security_bpf_token_XXX().  If not, please let me
> know and I'll try to do a better job explaining my reasoning :)
>
> One thing I didn't discuss in my last email was why there is value in
> having both security_bpf_token_{cmd,capable}() as well as
> security_bpf_prog_load(); I'll try to do that below.
>
> As we talked about previously, the reason for having
> security_bpf_prog_load() is to provide a token-aware version of
> security_bpf().  Currently the LSMs enforce their access controls
> around BPF commands using the security_bpf() hook which is
> unfortunately well before we have access to the BPF token.  If we want
> to be able to take the BPF token into account we need to have a hook
> placed after the token is retrieved and validated, hence the
> security_bpf_prog_load() hook.  In a kernel that provides BPF tokens,
> I would expect that LSMs would use security_bpf() to control access to
> BPF operations where a token is not a concern, and new token-aware
> security_bpf_OPERATION() hooks when the LSM needs to consider the BPF
> token.
>
> With the understanding that security_bpf_prog_load() is essentially a
> token-aware version of security_bpf(), I'm hopeful that you can begin
> to understand that it  serves a different purpose than
> security_bpf_token_{cmd,capable}().  The simple answer is that
> security_bpf_token_cmd() applies to more than just BPF_PROG_LOAD, but
> the better answer is that it has the ability to impact more than just
> the LSM authorization decision.  Hooking the LSM into the
> bpf_token_allow_cmd() function allows the LSM to authorize the
> individual command overrides independent of the command specific LSM
> hook, if one exists.  The security_bpf_token_cmd() hook can allow or
> disallow the use of a token for all aspects of a specific BPF
> operation including all of the token related logic outside of the LSM,
> something the security_bpf_prog_load() hook could never do.
>
> I'm hoping that makes sense :)

Yes, I think I understand what you are trying to do, but I need to
clarify something about the bpf_token_allow_cmd() check. It's
meaningless for any command besides BPF_PROG_LOAD, BPF_MAP_CREATE, and
BPF_BTF_LOAD. For any other command you cannot even specify token_fd.
So even if you create a token allowing, say, BPF_MAP_LOOKUP_ELEM, it
has no effect, because BPF_MAP_LOOKUP_ELEM is doing its own checks
based on the provided BPF map FD.

So only if the command is token-aware itself, this allowed_cmd makes
any difference. And in such a case we'll most probably have and/or
want to have an LSM hook for that specific command that accepts struct
bpf_token as an argument. Which is what I did for
security_bpf_prog_load and security_bpf_map_create.

Granted, we don't have any LSM hooks for BPF_BTF_LOAD, mostly because
BTF is just a blob of type info data, and I guess no one bothered to
control the ability to load that. But we can add that easily, if you
think it's important.

So taking everything you said, I still think we don't want a
bpf_token_capable hook, and we'll just be getting targeted LSM hooks
if we need them for some new or existing BPF commands.

Basically, bpf_token's allow_cmd doesn't give you a bypass for
non-token checks we are already doing. So security_bpf() for
everything besides BPF_PROG_LOAD/BPF_MAP_CREATE/BPF_BTF_LOAD is a
completely valid way to restrict everything. You won't miss anything.

>
> --
> paul-moore.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-10-12 23:52 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-27 22:57 [PATCH v6 bpf-next 00/13] BPF token and BPF FS-based delegation Andrii Nakryiko
2023-09-27 22:57 ` [PATCH v6 bpf-next 01/13] bpf: align CAP_NET_ADMIN checks with bpf_capable() approach Andrii Nakryiko
2023-09-27 22:57 ` [PATCH v6 bpf-next 02/13] bpf: add BPF token delegation mount options to BPF FS Andrii Nakryiko
2023-10-10  7:08   ` Hou Tao
2023-10-12  0:30     ` Andrii Nakryiko
2023-09-27 22:57 ` [PATCH v6 bpf-next 03/13] bpf: introduce BPF token object Andrii Nakryiko
2023-10-11  1:17   ` [PATCH v6 3/13] " Paul Moore
2023-10-12  0:31     ` Andrii Nakryiko
2023-10-12 21:48       ` Andrii Nakryiko
2023-10-12 23:43         ` Paul Moore
2023-10-12 23:51           ` Andrii Nakryiko
2023-10-12 23:18       ` Paul Moore
2023-10-12 23:45         ` Andrii Nakryiko
2023-10-11  2:35   ` [PATCH v6 bpf-next 03/13] " Hou Tao
2023-10-12  0:31     ` Andrii Nakryiko
2023-09-27 22:58 ` [PATCH v6 bpf-next 04/13] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
2023-10-10  8:35   ` Jiri Olsa
2023-10-12  0:30     ` Andrii Nakryiko
2023-09-27 22:58 ` [PATCH v6 bpf-next 05/13] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
2023-09-27 22:58 ` [PATCH v6 bpf-next 06/13] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
2023-10-11  1:17   ` [PATCH v6 6/13] " Paul Moore
2023-10-12  0:31     ` Andrii Nakryiko
2023-09-27 22:58 ` [PATCH v6 bpf-next 07/13] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
2023-09-27 22:58 ` [PATCH v6 bpf-next 08/13] bpf: consistenly use BPF token throughout BPF verifier logic Andrii Nakryiko
2023-09-27 22:58 ` [PATCH v6 bpf-next 09/13] libbpf: add bpf_token_create() API Andrii Nakryiko
2023-09-27 22:58 ` [PATCH v6 bpf-next 10/13] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
2023-09-27 22:58 ` [PATCH v6 bpf-next 11/13] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
2023-09-27 22:58 ` [PATCH v6 bpf-next 12/13] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
2023-09-27 22:58 ` [PATCH v6 bpf-next 13/13] selftests/bpf: add BPF token-enabled tests Andrii Nakryiko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).